riptable.rt_str

Classes

CatString

Provides access to FAString methods for Categoricals.

FAString

String accessor class for FastArray.

class riptable.rt_str.CatString(cat)

Provides access to FAString methods for Categoricals. All string methods are wrappers of the FAString equivalent with categorical re-expansion and option for how to fill filtered elements.

property _isfiltered
property substr
classmethod _build_method(method)

General purpose factory for FAString function wrappers.

classmethod _build_property(name)

General purpose factory for FAString property wrappers.

_convert_fastring_output(out)
extract(regex, expand=None, fillna='', names=None)
class riptable.rt_str.FAString(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)

Bases: riptable.rt_fastarray.FastArray

String accessor class for FastArray.

property backtostring

convert back to FastArray or np.ndarray ‘S’ or ‘U’ string ‘S12’ or ‘U40’

property lower

upper case a string (bytes or unicode) makes a copy

Examples

>>> FAString(['THIS','THAT','TEST']).lower
FastArray(['this','that','test'], dtype='<U4')
property n_elements

The number of elements in the original string array

property reverse

upper case a string (bytes or unicode) does not make a copy

Examples

FAString([‘this’,’that’,’test’]).reverse

property reverse_inplace

upper case a string (bytes or unicode) does not make a copy

Examples

FAString([‘this’,’that’,’test’]).reverse_inplace

property str

Casts an array of byte strings or unicode as FAString.

Enables a variety of useful string manipulation methods.

Return type:

FAString

Raises:

TypeError – If the FastArray is of dtype other than byte string or unicode

See also

np.chararray, np.char, rt.FAString.apply

Examples

>>> s=FA(['this','that','test ']*100_000)
>>> s.str.upper
FastArray([b'THIS', b'THAT', b'TEST ', ..., b'THIS', b'THAT', b'TEST '],
          dtype='|S5')
>>> s.str.lower
FastArray([b'this', b'that', b'test ', ..., b'this', b'that', b'test '],
          dtype='|S5')
>>> s.str.removetrailing()
FastArray([b'this', b'that', b'test', ..., b'this', b'that', b'test'],
          dtype='|S5')
property strlen

return the string length of every string (bytes or unicode)

Examples

>>> FAString(['this  ','that ','test']).strlen
FastArray([6, 5, 4])
property substr
property upper

upper case a string (bytes or unicode) makes a copy

Examples

>>> FAString(['this','that','test']).upper
FastArray(['THIS','THAT','TEST'], dtype='<U4')
property upper_inplace

upper case a string (bytes or unicode) does not make a copy

Examples

FAString([‘this’,’that’,’test’]).upper_inplace

_APPLY_PARALLEL_THRESHOLD = 10000
nb_char
nb_char_par
nb_contains
nb_contains_par
nb_endswith
nb_endswith_par
nb_find
nb_index
nb_index_any_of
nb_index_any_of_par
nb_index_par
nb_lower
nb_lower_par
nb_removetrailing
nb_removetrailing_par
nb_replace
nb_replace_par
nb_reverse
nb_reverse_inplace
nb_reverse_inplace_par
nb_reverse_par
nb_startswith
nb_startswith_par
nb_strlen
nb_strlen_par
nb_substr
nb_substr_par
nb_upper
nb_upper_inplace
nb_upper_inplace_par
nb_upper_par
_apply_func(func, funcp, *args, dtype=None, input=None)
_find(str2)

Searches src for occurences of str2 and build a Boolean mask the same size as src indicating the starting point of all such occurences.

Parameters:

for (str2 - a string with one or more characters to search) –

Examples

>>> FAString(['this','that','test']).find('t')
FastArray([
    [True, False, False, False],
    [True, False, False, True],
    [True, False, False, True]
])
_nb_char(position, itemsize, strlen, out)
_nb_contains(itemsize, dest, str2)
_nb_endswith(itemsize, dest, str2)
_nb_find(itemsize, dest, str2)

Searches src for occurrences of str2 and build a Boolean array with a row per string indicating indicating the starting points of all such occurrences.

_nb_index(itemsize, dest, str2)
_nb_index_any_of(itemsize, dest, str2)
_nb_lower(itemsize, dest)
_nb_removetrailing(itemsize, dest, removechar)
_nb_replace(itemsize, dest, dest_itemsize, old, new, locations)
_nb_reverse(itemsize, dest)
_nb_reverse_inplace(itemsize)
_nb_startswith(itemsize, dest, str2)
_nb_strlen(itemsize, dest)
_nb_substr(out, itemsize, start, stop, strlen)
_nb_upper(itemsize, dest)
_nb_upper_inplace(itemsize)
_substr(start, stop=None)

Take a substring of each element using slice args. Behaves like slice, such that a single argument is treated as the stop. start, stop may be integers or arrays of integers aligned with self.

Examples

>>> a = rt.FA(['abc', 'xyzQ'])
>>> a.str.substr(2)
FastArray([b'ab', b'xy'], dtype='|S2')
>>> a.str.substr(0, 2)
FastArray([b'ab', b'xy'], dtype='|S2')
>>> a.str.substr(1, 2)
FastArray([b'b', b'y'], dtype='|S2')
>>> a.str.substr([1, 2])    # element-wise bounds
FastArray([b'a', b'xy'], dtype='|S2')
_validate_input(str2)
apply(func, *args, dtype=None)

Write your own string apply function NOTE: byte strings are passed as uint8 NOTE: unicode strings are passed as uint32

default signature must match

@nb.njit(cache=get_global_settings().enable_numba_cache, nogil=True) def nb_upper(src, itemsize, dest):

src: is uint array itemsize: is how wide the string is per row dest: is return uint array

Parameters:
  • *args (pass in zero or more arguments (the arguments are always at the end)) –

  • dtype (specify a different dtype) –

Example

>>> import numba as nb
... @nb.njit(cache=get_global_settings().enable_numba_cache, nogil=True)
... def nb_upper(src, itemsize, dest):
...     for i in nb.prange(len(src) / itemsize):
...         rowpos = i * itemsize
...        for j in range(itemsize):
...             c=src[rowpos+j]
...             if c >= 97 and c <= 122:
...                 # convert to ASCII upper
...                 dest[rowpos+j] = c-32
...             else:
...                dest[rowpos+j] = c
>>> FAString(['this  ','that ','test']).apply(nb_upper)
char(position)

Take a single character from each element.

Parameters:

position (int or list of int or np.ndarray) – The position of the character to be extracted. Negative values respect the length of the individual strings. If an array, the length must be equal to the number of strings. An error is raised if any positions are out of bounds (>= self._itemsize).

contains(str2)

Return a boolean array that’s True for each string element that contains the given substring, otherwise False.

The entire substring must match.

Parameters:

str2 (str) – A string with one or more characters to search for. To search using regular expressions, use FAString.regex_match().

Returns:

A boolean array where the value is True if the string contains the entire substring specified in str2, otherwise False.

Return type:

FastArray

Examples

>>> FAString(['this  ','that ','test']).contains('at')
FastArray([False, True, False])

This can be called on a FastArray using .str.contains().

>>> a = rt.FastArray(['this  ','that ','test'])
>>> a.str.contains('at')
FastArray([False,  True, False])
endswith(str2)

Return a boolean array that’s True where the given substring matches the end of each string element, otherwise False.

The entire substring must match.

Parameters:

str2 (str) – A string with one or more characters to search for. To search using regular expressions, use FAString.regex_match().

Returns:

A boolean array where the value is True if the string ends with the entire substring specified in str2, otherwise False.

Return type:

FastArray

Examples

>>> FAString(['abab','ababa','abababb']).endswith('ab')
FastArray([True, False, False])

This can be called on a FastArray using .str.endswith().

>>> a = rt.FastArray(['abab','ababa','abababb'])
>>> a.str.endswith('ab')
FastArray([True, False, False])
extract(regex, expand=None, fillna='', names=None, apply_unique=True)

Extract one or more pattern groups from each element of an array into a FastArray or Dataset.

This is useful when you have pieces of data in a string that you want to split into separate elements.

For one capture group, the default is to return a FastArray, but this can be overridden by setting expand to True or by providing a name of a Dataset column to populate. For more than one capture group, a Dataset is returned.

Column names for the resulting Dataset can be specified within the regex using (?P<name>) in the capture group(s) or by passing the names argument, which may be more convenient.

Parameters:
  • regex (str) – The pattern(s) to search for. Define multiple capture groups using parentheses.

  • expand (bool, default False) – Set to True to return a Dataset for a single capture group. If False, a FastArray is returned.

  • fillna (str, default '' (empty string)) – For elements where there’s no match, this is the fill value for the resulting FastArray or Dataset column.

  • names (list of str, default None) – For more than one capture group, a Dataset is returned. Optionally, you can provide column names (keys) for the extracted data.

  • apply_unique (bool) – When True, the regex is applied to the unique values and then expanded using the reverse index (see riptable.unique()). This is optimal for repetitive data and benign for unique or highly non-repetitive data.

Returns:

For one capture group, a FastArray (or optionally a Dataset) is returned. For more than one capture group, a Dataset is returned.

Return type:

FastArray or Dataset

See also

FAString.regex_match

Return a boolean array that indicates whether given string or regular expression pattern is contained in each string element.

FAString.regex_replace

Replace each instance of a specified string or pattern.

Examples

These examples use a FastArray containing OSI symbols.

>>> osi = rt.FastArray(['SPX UO 12/15/23 C5700', 'SPXW UO 09/17/21 C3650'])

Extract one substring:

>>> osi.str.extract('\w+')
FastArray([b'SPX', b'SPXW'], dtype='|S4')

Provide a name for the resulting Dataset column:

>>> osi.str.extract('(?P<root>\w+)')
#   root
-   ----
0   SPX
1   SPXW

Define two capture groups and provide names for the resulting Dataset columns:

>>> osi.str.extract('(\w+).* (\d{2}/\d{2}/\d{2})', names = ['root', 'expiration'])
#   root   expiration
-   ----   ----------
0   SPX    12/15/23
1   SPXW   09/17/21

Extract one substring into a Dataset column using expand = True. (Note that for the element with an unmatched pattern, an empty string is returned).

>>> osi.str.extract('\w+W', expand = True)
#   group_0
-   -------
0
1   SPXW
index(str2)

return the first index location of the entire substring specified in str2, or -1 if the substring does not exist

Parameters:

for (str2 - a string with one or more characters to search) –

Examples

>>> FAString(['this  ','that ','test']).index('at')
FastArray([-1, 2, -1])
index_any_of(str2)

return the first index location any of the characters that are part of str2, or -1 if none of the characters match

Parameters:

for (str2 - a string with one or more characters to search) –

Examples

>>> FAString(['this  ','that ','test']).index_any_of('ia')
FastArray([2, 2, -1])
possibly_convert_tostr(arr)

converts list like or an array to the same string type

regex_match(regex, apply_unique=True)

Return a boolean array that’s True where the given substring or regular expression pattern is contained in each string element, otherwise False.

The entire substring or pattern must match.

Applies re.search() on each element with regex as the pattern.

Parameters:
  • regex (str) – String or regular expression pattern to search for.

  • apply_unique (bool, default True) – When True, the regex is applied to the unique values and then expanded using the reverse index (see riptable.unique()). This is optimal for repetitive data and benign for unique or highly non-repetitive data.

Returns:

A boolean array where the value is True if the string element contains the entire substring or regex pattern specified in regex, otherwise False.

Return type:

FastArray

See also

FAString.regex_replace

Replace each instance of a specified substring or pattern.

FAString.extract

Extract one or more pattern groups into a Dataset or FastArray.

FAString.contains, FAString.startswith, FAString.endswith

Examples

Find any instance of ‘ab’ that appears at the end of a string:

>>> FAString(['abab','ababa','abababb']).regex_match('ab$')
FastArray([True, False, False])

This can be called on a FastArray using .str.regex_match().

>>> a = rt.FastArray(['abab','ababa','abababb'])
>>> a.str.regex_match('ab$')
FastArray([True, False, False])
regex_replace(regex, repl, apply_unique=True)

Replace each instance of a specified substring or pattern.

The entire substring or pattern must match. If the substring or pattern isn’t found, the original string is returned unchanged.

The behavior is identical to that of re.sub(). In particular, the returned string is obtained by replacing the leftmost non-overlapping occurrences of the substring or pattern with the replacement string.

Parameters:
  • regex (str) – String or regular expression pattern to search for.

  • repl (str) – The replacement string.

  • apply_unique (bool, default True) – When True, the regex is applied to the unique values and then expanded using the reverse index (see riptable.unique()). This is optimal for repetitive data and benign for unique or highly non-repetitive data.

Returns:

An array with all occurrences of the substring or pattern replaced.

Return type:

FastArray

See also

FAString.regex_match

Return a boolean array that indicates whether given substring or regular expression pattern is contained in each string element.

FAString.extract

Extract one or more pattern groups into a Dataset or FastArray.

FAString.contains, FAString.startswith, FAString.endswith

Examples

Replace instances of ‘aa’ with ‘b’. All non-overlapping occurrences are replaced, starting from the left:

>>> FAString(['aaa', 'aaaa', 'aaaaa']).regex_replace('aa', 'b')
FastArray(['ba', 'bb', 'bba'], dtype='<U3>')

Replace any instance of ‘ab’ that appears at the end of a string with ‘b’.

>>> FAString(['abab','ababa','abababb']).regex_replace('ab$', 'b')
FastArray(['abb', 'ababa', 'abababb'], dtype='<U7')

This can be called on a FastArray using .str.regex_replace(). The returned FastArray elements are byte strings.

>>> a = rt.FastArray(['abab','ababa','abababb'])
>>> a.str.regex_replace('ab$', 'b')
FastArray([b'abb', b'ababa', b'abababb'], dtype='|S7')
removetrailing(remove=32)

removes spaces at end of string (often to fixup matlab string) makes a copy

Parameters:

character) (remove=32. defaults to removing ascii 32 (space) –

Examples

>>> FAString(['this  ','that ','test']).removetrailing()
FastArray(['this','that','test'], dtype='<U6')
replace(old, new)

Replace all occurrences of old with new

startswith(str2)

Return a boolean array that’s True where the given substring matches the start of each string element, otherwise False.

The entire substring must match.

Parameters:

str2 (str) – A string with one or more characters to search for. To search using regular expressions, use FAString.regex_match().

Returns:

A boolean array where the value is True if the string starts with the entire substring specified in str2, otherwise False.

Return type:

FastArray

Examples

>>> FAString(['this  ','that ','test']).startswith('thi')
FastArray([True, False, False])

This can be called on a FastArray using .str.startswith().

>>> a = rt.FastArray(['this  ','that ','test'])
>>> a.str.startswith('thi')
FastArray([True, False, False])
strpbrk(str2)
strstr(str2)
strstrb(str2)
substr_char_stop(stop, inclusive=False)

Take a substring of each element using characters as bounds.

Parameters:
  • stop – A string used to determine the start of the sub-string. Excluded from the result by default. We go to the end of the string where stop is not in found in the corresponding element

  • inclusive (bool) – If True, include the stopping string in the result

Examples

>>> s = FastArray(['ABC', 'A_B', 'AB_C', 'AB_C_DD'])
>>> s.str.substr_char_stop('_')
FastArray([b'ABC', b'A', b'AB', b'AB'], dtype='|S2')
>>> s.str.substr_char_stop('_', inclusive=True)
FastArray([b'ABC', b'A_', b'AB_', b'AB_'], dtype='|S2')