riptable.rt_numpy

Classes

bool_

The Riptable equivalent of numpy.bool_, with the concept of an invalid added.

bytes_

The Riptable equivalent of numpy.bytes_, with the concept of an invalid added.

float32

The Riptable equivalent of numpy.float32, with the concept of an invalid added.

float64

The Riptable equivalent of numpy.float64, with the concept of an invalid added.

int0

The Riptable equivalent of numpy.int64, with the concept of an invalid added.

int16

The Riptable equivalent of numpy.int16, with the concept of an invalid added.

int32

The Riptable equivalent of numpy.int32, with the concept of an invalid added.

int64

The Riptable equivalent of numpy.int64, with the concept of an invalid added.

int8

The Riptable equivalent of numpy.int8, with the concept of an invalid added.

str_

The Riptable equivalent of numpy.str_, with the concept of an invalid added.

uint0

The Riptable equivalent of numpy.uint64, with the concept of an invalid added.

uint16

The Riptable equivalent of numpy.uint16, with the concept of an invalid added.

uint32

The Riptable equivalent of numpy.uint32, with the concept of an invalid added.

uint64

The Riptable equivalent of numpy.uint64, with the concept of an invalid added.

uint8

The Riptable equivalent of numpy.uint8, with the concept of an invalid added.

Functions

_searchsorted(array, v[, side, sorter])

abs(*args, **kwargs)

This will check for numpy array first and call np.abs

absolute(*args, **kwargs)

all(*args, **kwargs)

any(*args, **kwargs)

arange(*args, **kwargs)

Return an array of evenly spaced values within a specified interval.

argsort(*args, **kwargs)

asanyarray(a[, dtype, order])

asarray(a[, dtype, order])

assoc_copy(key1, key2, arr)

param key1:

Numpy arrays to match against; all arrays must be same length.

assoc_index(key1, key2)

param key1:

Numpy arrays to match against; all arrays must be same length.

bincount(*args, **kwargs)

bitcount(a)

Count the number of set (True) bits in an integer or in each integer within an array of

bool_to_fancy(arr[, both])

param arr:

A boolean array of True/False values

cat2keys(key1, key2[, filter, ordered, sort_gb, ...])

Create a Categorical from two keys or two Categorical objects with all possible unique combinations.

ceil(*args, **kwargs)

combine2keys(key1, key2, unique_count1, unique_count2)

param key1:

First index array (int8, int16, int32 or int64).

combine_accum1_filter(key1, unique_count1[, filter])

param key1:

index array (int8, int16, int32 or int64) [must be base 1 -- if base 0, increment by 1]

combine_accum2_filter(key1, key2, unique_count1, ...)

param key1:

First index array (int8, int16, int32 or int64).

combine_filter(key, filter)

param key:

index array (int8, int16, int32 or int64)

concatenate(*args, **kwargs)

crc32c(arr)

Calculate the 32-bit CRC of the data in an array using the Castagnoli polynomial (CRC32C).

crc64(arr)

cumsum(*args, **kwargs)

diff(*args, **kwargs)

double(a)

empty(shape[, dtype, order])

Return a new array of specified shape and type, without initializing entries.

empty_like(array[, dtype, order, subok, shape])

Return a new array with the same shape and type as the specified array,

floor(*args, **kwargs)

full(shape, fill_value[, dtype, order])

Return a new array of a specified shape and type, filled with a specified value.

full_like(a, fill_value[, dtype, order, subok, shape])

Return a full array with the same shape and type as a given array.

get_common_dtype(x, y)

Return the dtype of two arrays, or two scalars, or a scalar and an array.

get_dtype(val)

Return the dtype of an array, list, or builtin int, float, bool, str, bytes.

groupby(list_arrays[, filter, cutoffs, base_index, ...])

Main routine used to groupby one or more keys.

groupbyhash(list_arrays[, hint_size, filter, ...])

Find unique values in an array using a linear hashing algorithm.

groupbylex(list_arrays[, filter, cutoffs, base_index, rec])

param list_arrays:

A list of numpy arrays to hash on (multikey). All arrays must be the same size.

groupbypack(ikey, ncountgroup[, unique_count, cutoffs])

A routine often called after groupbyhash or groupbylex.

hstack(tup[, dtype])

see numpy hstack

interp(x, xp, fp)

One-dimensional or two-dimensional linear interpolation with clipping.

interp_extrap(x, xp, fp)

One-dimensional or two-dimensional linear interpolation without clipping.

isfinite(*args, **kwargs)

Return True for each finite element, False otherwise.

isinf(*args, **kwargs)

Return True for each element that's positive or negative infinity, False otherwise.

ismember(a, b[, h, hint_size, base_index])

The ismember function is meant to mimic the ismember function in MATLab. It takes two sets of data

isnan(*args, **kwargs)

Return True for each element that's a NaN (Not a Number), False otherwise.

isnanorzero(*args, **kwargs)

Return True for each element that's a NaN (Not a Number) or zero, False otherwise.

isnotfinite(*args, **kwargs)

Return True for each non-finite element, False otherwise.

isnotinf(*args, **kwargs)

Return True for each element that's not positive or negative infinity,

isnotnan(*args, **kwargs)

Return True for each element that's not a NaN (Not a Number), False otherwise.

issorted(*args)

Return True if the array is sorted, False otherwise.

lexsort(*args, **kwargs)

log(*args, **kwargs)

log10(*args, **kwargs)

logical(a)

makeifirst(key, unique_count[, filter])

param key:

Index array (int8, int16, int32 or int64).

makeilast(key, unique_count[, filter])

param key:

Index array (int8, int16, int32 or int64).

makeinext(key, unique_count)

param key:

index array (int8, int16, int32 or int64)

makeiprev(key, unique_count)

param key:

index array (int8, int16, int32 or int64)

mask_and(*args, **kwargs)

pass in a tuple or list of boolean arrays to AND together

mask_andi(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to AND together

mask_andnot(*args, **kwargs)

pass in a tuple or list of boolean arrays to ANDNOT together

mask_andnoti(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to ANDNOT together

mask_or(*args, **kwargs)

pass in a tuple or list of boolean arrays to OR together

mask_ori(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to OR together

mask_xor(*args, **kwargs)

pass in a tuple or list of boolean arrays to XOR together

mask_xori(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to XOR together

max(*args, **kwargs)

maximum(x1, x2, *args, **kwargs)

mean(*args[, filter, dtype])

Compute the arithmetic mean of the values in the first argument.

median(*args, **kwargs)

min(*args, **kwargs)

minimum(x1, x2, *args, **kwargs)

multikeyhash(*args)

Returns 7 arrays to help navigate data.

nan_to_num(*args, **kwargs)

arg1: ndarray

nan_to_zero(a)

Replace the NaN or invalid values in an array with zeroes.

nanargmax(*args, **kwargs)

nanargmin(*args, **kwargs)

nanmax(*args, **kwargs)

nanmean(*args[, filter, dtype])

Compute the arithmetic mean of the values in the first argument, ignoring NaNs.

nanmedian(*args, **kwargs)

nanmin(*args, **kwargs)

nanpercentile(*args, **kwargs)

nanstd(*args[, filter, dtype])

Compute the standard deviation of the values in the first argument, ignoring NaNs.

nansum(*args[, filter, dtype])

Compute the sum of the values in the first argument, ignoring NaNs.

nanvar(*args[, filter, dtype])

Compute the variance of the values in the first argument, ignoring NaNs.

ones(shape[, dtype, order, like])

Return a new array of the specified shape and data type, filled with ones.

ones_like(a[, dtype, order, subok, shape])

Return an array of ones with the same shape and data type as the specified array.

percentile(*args, **kwargs)

putmask(a, mask, values)

This is roughly the equivalent of arr[mask] = arr2[mask].

reindex_fast(index, array)

reshape(*args, **kwargs)

round(*args, **kwargs)

This will check for numpy array first and call np.round

searchsorted(a, v[, side, sorter])

see np.searchsorted

single(a)

sort(*args, **kwargs)

sortinplaceindirect(*args, **kwargs)

std(*args[, filter, dtype])

Compute the standard deviation of the values in the first argument.

sum(*args[, filter, dtype])

Compute the sum of the values in the first argument.

tile(arr, reps)

Construct an array by repeating a specified array a specified number of

transpose(*args, **kwargs)

trunc(*args, **kwargs)

unique32(list_keys[, hintSize, filter])

Returns the index location of the first occurence of each key.

var(*args[, filter, dtype])

Compute the variance of the values in the first argument.

vstack(arrlist[, dtype, order])

param arrlist:

these arrays are considered the columns

where(condition[, x, y])

Return a new FastArray or Categorical with elements from x or y

zeros(*args, **kwargs)

Return a new array of the specified shape and data type, filled with zeros.

zeros_like(a[, dtype, order, subok, shape])

Return an array of zeros with the same shape and data type as the specified array.

Attributes

asanyarray(a[, dtype, order])

asarray(a[, dtype, order])

class riptable.rt_numpy.bool_(value)

Bases: numpy.bool_

The Riptable equivalent of numpy.bool_, with the concept of an invalid added.

Examples

>>> rt.bool_.inv
False
inv
class riptable.rt_numpy.bytes_

Bases: numpy.bytes_

The Riptable equivalent of numpy.bytes_, with the concept of an invalid added.

See also

np.bytes_, float32, float64, int8, uint8, int16, uint16, int32, uint32, int64, uint64, str_, bool_

Examples

>>> rt.bytes_.inv
b''
inv
class riptable.rt_numpy.float32(value)

Bases: numpy.float32

The Riptable equivalent of numpy.float32, with the concept of an invalid added.

Examples

>>> rt.float32.inv
nan
inv
class riptable.rt_numpy.float64(value)

Bases: numpy.float64

The Riptable equivalent of numpy.float64, with the concept of an invalid added.

Examples

>>> rt.float64.inv
nan
inv
class riptable.rt_numpy.int0(value)

Bases: int64

The Riptable equivalent of numpy.int64, with the concept of an invalid added.

Examples

>>> rt.int64.inv
-9223372036854775808
class riptable.rt_numpy.int16(value)

Bases: numpy.int16

The Riptable equivalent of numpy.int16, with the concept of an invalid added.

Examples

>>> rt.int16.inv
-32768
inv
class riptable.rt_numpy.int32(value)

Bases: numpy.int32

The Riptable equivalent of numpy.int32, with the concept of an invalid added.

Examples

>>> rt.int32.inv
-2147483648
inv
class riptable.rt_numpy.int64(value)

Bases: numpy.int64

The Riptable equivalent of numpy.int64, with the concept of an invalid added.

Examples

>>> rt.int64.inv
-9223372036854775808
inv
class riptable.rt_numpy.int8(value)

Bases: numpy.int8

The Riptable equivalent of numpy.int8, with the concept of an invalid added.

Examples

>>> rt.int8.inv
-128
inv
class riptable.rt_numpy.str_

Bases: numpy.str_

The Riptable equivalent of numpy.str_, with the concept of an invalid added.

Examples

>>> rt.str_.inv
''
inv
class riptable.rt_numpy.uint0(value)

Bases: uint64

The Riptable equivalent of numpy.uint64, with the concept of an invalid added.

Examples

>>> rt.uint64.inv
18446744073709551615
class riptable.rt_numpy.uint16(value)

Bases: numpy.uint16

The Riptable equivalent of numpy.uint16, with the concept of an invalid added.

Examples

>>> rt.uint16.inv
65535
inv
class riptable.rt_numpy.uint32(value)

Bases: numpy.uint32

The Riptable equivalent of numpy.uint32, with the concept of an invalid added.

Examples

>>> rt.uint32.inv
4294967295
inv
class riptable.rt_numpy.uint64(value)

Bases: numpy.uint64

The Riptable equivalent of numpy.uint64, with the concept of an invalid added.

Examples

>>> rt.uint64.inv
18446744073709551615
inv
class riptable.rt_numpy.uint8(value)

Bases: numpy.uint8

The Riptable equivalent of numpy.uint8, with the concept of an invalid added.

Examples

>>> rt.uint8.inv
255
inv
riptable.rt_numpy._searchsorted(array, v, side='left', sorter=None)
riptable.rt_numpy.abs(*args, **kwargs)

This will check for numpy array first and call np.abs

riptable.rt_numpy.absolute(*args, **kwargs)
riptable.rt_numpy.all(*args, **kwargs)
riptable.rt_numpy.any(*args, **kwargs)
riptable.rt_numpy.arange(*args, **kwargs)

Return an array of evenly spaced values within a specified interval.

The half-open interval includes start but excludes stop: [start, stop).

For integer arguments the function is roughly equivalent to the Python built-in range, but returns a FastArray rather than a range instance.

When using a non-integer step, such as 0.1, it’s often better to use numpy.linspace().

For additional warnings, see numpy.arange().

Parameters:
  • start (int or float, default 0) – Start of interval. The interval includes this value.

  • stop (int or float) – End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of the output.

  • step (int or float, default 1) – Spacing between values. For any output out, this is the distance between two adjacent values: out[i+1] - out[i]. If step is specified as a positional argument, start must also be given.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – The type of the output array. If dtype is not given, the data type is inferred from the other input arguments.

  • like (array_like, optional) – Reference object to allow the creation of arrays that are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

Returns:

A FastArray of evenly spaced numbers within the specified interval. For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of the output being greater than stop.

Return type:

FastArray

See also

numpy.arange, riptable.ones, riptable.ones_like, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.empty_like, riptable.full, riptable.arange, Categorical.full

Examples

>>> rt.arange(3)
FastArray([0, 1, 2])
>>> rt.arange(3.0)
FastArray([ 0.,  1.,  2.])
>>> rt.arange(3, 7)
FastArray([3, 4, 5, 6])
>>> rt.arange(3, 7, 2)
FastArray([3, 5])
riptable.rt_numpy.argsort(*args, **kwargs)
riptable.rt_numpy.asanyarray(a, dtype=None, order=None)
riptable.rt_numpy.asarray(a, dtype=None, order=None)
riptable.rt_numpy.assoc_copy(key1, key2, arr)
Parameters:
  • key1 (ndarray / list thereof or a Dataset) – Numpy arrays to match against; all arrays must be same length.

  • key2 (ndarray / list thereof or a Dataset) – Numpy arrays that will be matched with key1; all arrays must be same length.

  • arr (ndarray / Dataset) – An array or Dataset the same length as key2 arrays which will be mapped to the size of key1 In the case of an array, the output will be cast to FastArray to accomodate support of fancy-indexing with sentinel values

Returns:

A new array the same length as key1 arrays which has mapped the input arr from key2 to key1 the array’s dtype will match the dtype of the input array (3rd parameter). However, outputs will be FastArrays when the input array is a numpy arrays such that fancy indexing with sentinels works correctly.

Return type:

array_like

Examples

>>> np.random.seed(12345)
>>> ds=Dataset({'time': rt.arange(200_000_000.0)})
>>> ds.data = np.random.randint(7, size=200_000_000)
>>> ds.symbol = rt.Cat(1 + rt.arange(200_000_000) % 7, ['AAPL','AMZN', 'FB', 'GOOG', 'IBM','MSFT','UBER'])
>>> dsa = rt.Dataset({'data': rt.repeat(rt.arange(7), 7), 'symbol': rt.tile(rt.FastArray(['AAPL','AMZN', 'FB', 'GOOG', 'IBM','MSFT','UBER']), 7), 'time': 48 - rt.arange(49.0)})
>>> rt.assoc_copy([ds.symbol, ds.data], [dsa.symbol, dsa.data], dsa.time)
FastArray([13.,  5., 46., ...,  5., 11., 24.])
riptable.rt_numpy.assoc_index(key1, key2)
Parameters:
  • key1 (ndarray / list thereof or a Dataset) – Numpy arrays to match against; all arrays must be same length.

  • key2 (ndarray / list thereof or a Dataset) – Numpy arrays that will be matched with key1; all arrays must be same length.

Returns:

fancy_index – Fancy index where the index of key2 is matched against key1; if there was no match, the minimum integer (aka sentinel) is the index value.

Return type:

ndarray of ints

Examples

>>> np.random.seed(12345)
>>> ds = rt.Dataset({'time': rt.arange(200_000_000.0)})
>>> ds.data = np.random.randint(7, size=200_000_000)
>>> ds.symbol = rt.Cat(1 + rt.arange(200_000_000) % 7, ['AAPL','AMZN', 'FB', 'GOOG', 'IBM','MSFT','UBER'])
>>> dsa = rt.Dataset({'data': rt.repeat(rt.arange(7), 7), 'symbol': rt.tile(rt.FastArray(['AAPL','AMZN', 'FB', 'GOOG', 'IBM','MSFT','UBER']), 7)})
>>> rt.assoc_index([ds.symbol, ds.data], [dsa.symbol, dsa.data])
FastArray([35, 43,  2, ..., 43, 37, 24])
riptable.rt_numpy.bincount(*args, **kwargs)
riptable.rt_numpy.bitcount(a)

Count the number of set (True) bits in an integer or in each integer within an array of integers. This operation is also known as population count or Hamming weight.

Parameters:

a (int or sequence or numpy.array) – A Python integer or a sequence of integers or a numpy integer array.

Returns:

If the input is Python int the return is int. If the input is sequence or numpy array the return is a numpy array with dtype int8.

Return type:

int or numpy.array

Examples

>>> arr = rt.FastArray([741858, 77285, 916765, 395393, 347556, 896425, 921598, 86398])
>>> rt.bitcount(arr)
FastArray([10, 10, 14,  5,  9, 12, 14, 10], dtype=int8)
riptable.rt_numpy.bool_to_fancy(arr, both=False)
Parameters:
  • arr (ndarray of bools) – A boolean array of True/False values

  • both (bool) – Controls whether to return a the True and False elements in arr. Defaults to False.

Returns:

  • fancy_index (ndarray of bools) – Fancy index array of where the True values are. If both is True, there are two fancy index array sections: The first array slice is where the True values are; The second array slice is where the False values are. The True count is returned.

  • true_count (int, optional) – When both is True, this value is returned to indicate how many True values were in arr; this is then used to slice fancy_index into two slices indicating where the True and False values are, respectively, within arr.

Notes

runs in parallel

Examples

>>> np.random.seed(12345)
>>> bools = np.random.randint(2, size=20, dtype=np.int8).astype(bool)
>>> rt.bool_to_fancy(bools)
FastArray([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 12, 15, 17, 18, 19])

Setting the both parameter to True causes the function to return an array containing the indices of the True values in arr followed by the indices of the False values, along with the number (count) of True values. This count can be used to slice the returned array if you want just the True indices and False indices.

>>> fancy_index, true_count = rt.bool_to_fancy(bools, both=True)
>>> fancy_index[:true_count], fancy_index[true_count:]
(FastArray([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 12, 15, 17, 18, 19]), FastArray([ 0, 11, 13, 14, 16]))
riptable.rt_numpy.cat2keys(key1, key2, filter=None, ordered=True, sort_gb=False, invalid=False, fuse=False)

Create a Categorical from two keys or two Categorical objects with all possible unique combinations.

Notes

Code assumes Categoricals are base 1.

Parameters:
  • key1 (Categorical, ndarray, or list of ndarray) – If a list of arrays is passed for this parameter, all arrays in the list must have the same length.

  • key2 (Categorical, ndarray, or list of ndarray) – If a list of arrays is passed for this parameter, all arrays in the list must have the same length.

  • filter (ndarray of bool, optional) – only valid when invalid is set to True

  • ordered (bool, default True) – only applies when key1 or key2 is not a categorical

  • sort_gb (bool, default False) – only applies when key1 or key2 is not a categorical

  • invalid (bool, default False) – Specifies whether or not to insert the invalid when creating the n x m unique matrix.

  • fuse (bool, default False) – When True, forces the resulting categorical to have 2 keys, one for rows, and one for columns.

Returns:

A multikey categorical that has at least 2 keys.

Return type:

Categorical

Examples

The following examples demonstrate using cat2keys on keys as lists and arrays, lists of arrays, and Categoricals. In each of the examples, you can determine the unique combinations by zipping the same position of each of the values of the category dictionary.

Creating a MultiKey Categorical from two lists of equal length.

>>> rt.cat2keys(list('abc'), list('xyz'))
Categorical([(a, x), (b, y), (c, z)]) Length: 3
  FastArray([1, 5, 9], dtype=int64) Base Index: 1
  {'key_0': FastArray([b'a', b'b', b'c', b'a', b'b', b'c', b'a', b'b', b'c'], dtype='|S1'), 'key_01': FastArray([b'x', b'x', b'x', b'y', b'y', b'y', b'z', b'z', b'z'], dtype='|S1')} Unique count: 9
>>> rt.cat2keys(np.array(list('abc')), np.array(list('xyz')))
Categorical([(a, x), (b, y), (c, z)]) Length: 3
  FastArray([1, 5, 9], dtype=int64) Base Index: 1
  {'key_0': FastArray([b'a', b'b', b'c', b'a', b'b', b'c', b'a', b'b', b'c'], dtype='|S1'), 'key_01': FastArray([b'x', b'x', b'x', b'y', b'y', b'y', b'z', b'z', b'z'], dtype='|S1')} Unique count: 9
>>> key1, key2 = [rt.FA(list('abc')), rt.FA(list('def'))], [rt.FA(list('uvw')), rt.FA(list('xyz'))]
>>> rt.cat2keys(key1, key2)
Categorical([(a, d, u, x), (b, e, v, y), (c, f, w, z)]) Length: 3
  FastArray([1, 5, 9], dtype=int64) Base Index: 1
  {'key_0': FastArray([b'a', b'b', b'c', b'a', b'b', b'c', b'a', b'b', b'c'], dtype='|S1'), 'key_1': FastArray([b'd', b'e', b'f', b'd', b'e', b'f', b'd', b'e', b'f'], dtype='|S1'), 'key_01': FastArray([b'u', b'u', b'u', b'v', b'v', b'v', b'w', b'w', b'w'], dtype='|S1'), 'key_11': FastArray([b'x', b'x', b'x', b'y', b'y', b'y', b'z', b'z', b'z'], dtype='|S1')} Unique count: 9
>>> cat.category_dict
{'key_0': FastArray([b'a', b'b', b'c', b'a', b'b', b'c', b'a', b'b', b'c'],
           dtype='|S1'),
 'key_1': FastArray([b'd', b'e', b'f', b'd', b'e', b'f', b'd', b'e', b'f'],
           dtype='|S1'),
 'key_01': FastArray([b'u', b'u', b'u', b'v', b'v', b'v', b'w', b'w', b'w'],
           dtype='|S1'),
 'key_11': FastArray([b'x', b'x', b'x', b'y', b'y', b'y', b'z', b'z', b'z'],
           dtype='|S1')}
riptable.rt_numpy.ceil(*args, **kwargs)
riptable.rt_numpy.combine2keys(key1, key2, unique_count1, unique_count2, filter=None)
Parameters:
  • key1 (ndarray of ints) – First index array (int8, int16, int32 or int64).

  • key2 (ndarray of ints) – Second index array (int8, int16, int32 or int64).

  • unique_count1 (int) – Number of unique values in key1 (often returned by groupbyhash/groupbylex).

  • unique_count2 (int) – Number of unique values in key2.

  • filter (ndarray of bools, optional) – Boolean array with same length as key1 array, defaults to None.

Returns:

  • TWO ARRAYs (iKey (for 2 dims), nCountGroup)

  • bin is a 1 based index array with each False value setting the index to 0

  • nCountGroup is INT32 array with size = to (unique_count1 + 1)*(unique_count2 + 1)

riptable.rt_numpy.combine_accum1_filter(key1, unique_count1, filter=None)
Parameters:
  • key1 (ndarray of ints) – index array (int8, int16, int32 or int64) [must be base 1 – if base 0, increment by 1] often referred to as iKey or the bin array for categoricals

  • unique_count1 (int) – Maximum number of uniques in key1 array.

  • filter (ndarray of bool, optional) – Boolean array same length as key1 array, defaults to None.

Returns:

  • iKey (a new 1 based index array with each False value setting the index to 0) – iKey dtype will match the dtype in Arg1

  • iFirstKey (an INT32 array, the fixup for first since some bins may have been removed)

  • unique_count (INT32 and is the new unique_count1. It is the length of iFirstKey)

Example

>>> a = rt.arange(20) % 10
>>> b = a.astype('S')
>>> c = rt.Cat(b)
>>> rt.combine_accum1_filter(c, c.unique_count, rt.logical(rt.arange(20) % 2))
{'iKey': FastArray([0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5],
           dtype=int8),
 'iFirstKey': FastArray([1, 3, 5, 7, 9]),
 'unique_count': 5}
riptable.rt_numpy.combine_accum2_filter(key1, key2, unique_count1, unique_count2, filter=None)
Parameters:
  • key1 (ndarray of ints) – First index array (int8, int16, int32 or int64).

  • key2 (ndarray of ints) – Second index array (int8, int16, int32 or int64).

  • unique_count1 (int) – Maximum number of unique values in key1.

  • unique_count2 (int) – Maximum number of unique values in key2.

  • filter (ndarray of bools, optional) – Boolean array with same length as key1 array, defaults to None.

Returns:

  • TWO ARRAYs (iKey (for 2 dims), nCountGroup)

  • bin is a 1 based index array with each False value setting the index to 0

  • nCountGroup is INT32 array with size = to (unique_count1 + 1)*(unique_count2 + 1)

riptable.rt_numpy.combine_filter(key, filter)
Parameters:
  • key (ndarray of ints) – index array (int8, int16, int32 or int64)

  • filter (ndarray of bools) – Boolean array same length as key.

Returns:

1 based index array with each False value setting the index to 0. The equivalent function is return index*filter or np.where(filter, index, 0).

Return type:

ndarray of ints

Notes

This routine can run in parallel.

riptable.rt_numpy.concatenate(*args, **kwargs)
riptable.rt_numpy.crc32c(arr)

Calculate the 32-bit CRC of the data in an array using the Castagnoli polynomial (CRC32C).

This function does not consider the array’s shape or strides when calculating the CRC, it simply calculates the CRC value over the entire buffer described by the array.

Parameters:

arr

Returns:

The 32-bit CRC value calculated from the array data.

Return type:

int

Notes

TODO: Warn when the array has non-default striding, as that is not currently respected by

the implementation of this function.

riptable.rt_numpy.crc64(arr)
riptable.rt_numpy.cumsum(*args, **kwargs)
riptable.rt_numpy.diff(*args, **kwargs)
riptable.rt_numpy.double(a)
riptable.rt_numpy.empty(shape, dtype=float, order='C')

Return a new array of specified shape and type, without initializing entries.

Parameters:
  • shape (int or tuple of int) – Shape of the empty array, e.g., (2, 3) or 2. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, default numpy.float64) – The desired data type for the array.

  • order ({'C', 'F'}, default 'C') – Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

Returns:

A new FastArray of uninitialized (arbitrary) data of the specified shape and type.

Return type:

FastArray

See also

riptable.empty_like, riptable.ones, riptable.ones_like, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.full, Categorical.full

Notes

Unlike zeros, empty doesn’t set the array values to zero, so it may be marginally faster. On the other hand, it requires the user to manually set all the values in the array, so it should be used with caution.

Examples

>>> rt.empty(5)
FastArray([0.  , 0.25, 0.5 , 0.75, 1.  ])  # uninitialized
>>> rt.empty(5, dtype = int)
FastArray([80288976,        0,        0,        0,        1])  # uninitialized
riptable.rt_numpy.empty_like(array, dtype=None, order='K', subok=True, shape=None)

Return a new array with the same shape and type as the specified array, without initializing entries.

Parameters:
  • array (array) – The shape and data type of array define the same attributes of the returned array. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – Overrides the data type of the result.

  • order ({'K', C', 'F', or 'A'}, default 'K') – Overrides the memory layout of the result. ‘K’ (the default) means match the layout of array as closely as possible. ‘C’ means row-major (C-style); ‘F’ means column-major (Fortran-style); ‘A’ means ‘F’ if array is Fortran-contiguous, ‘C’ otherwise.

  • subok (bool, default True) – If True (the default), then the newly created array will use the sub-class type of array, otherwise it will be a base-class array.

  • shape (int or sequence of ints, optional) – Overrides the shape of the result. If order=’K’ and the number of dimensions is unchanged, it will try to keep the same order; otherwise, order=’C’ is implied. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

Returns:

A new FastArray of uninitialized (arbitrary) data with the same shape and type as array.

Return type:

FastArray

See also

riptable.empty, riptable.ones, riptable.ones_like, riptable.zeros, riptable.zeros_like, riptable.full, Categorical.full

Examples

>>> a = rt.FastArray([1, 2, 3, 4])
>>> rt.empty_like(a)
FastArray([ 1814376192,  1668069856, -1994737310,   746250422])  # uninitialized
>>> rt.empty_like(a, dtype = float)
FastArray([0.25, 0.5 , 0.75, 1.  ])  # uninitialized
riptable.rt_numpy.floor(*args, **kwargs)
riptable.rt_numpy.full(shape, fill_value, dtype=None, order='C')

Return a new array of a specified shape and type, filled with a specified value.

Parameters:
  • shape (int or sequence of int) – Shape of the new array, e.g., (2, 3) or 2. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • fill_value (scalar or array) – Fill value. For 1-dimensional arrays, only scalar values are accepted.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – The desired data type for the array. The default is the data type that would result from creating a FastArray with the specified fill_value: rt.FastArray(fill_value).dtype.

  • order ({'C', 'F'}, default 'C') – Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

Returns:

A new FastArray of the specified shape and type, filled with the specified value.

Return type:

FastArray

See also

Categorical.full, riptable.ones, riptable.ones_like, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.empty_like

Examples

>>> rt.full(5, 2)
FastArray([2, 2, 2, 2, 2])
>>> rt.full(5, 2.0)
FastArray([2., 2., 2., 2., 2.])

Specify a data type:

>>> rt.full(5, 2, dtype = float)
FastArray([2., 2., 2., 2., 2.])
riptable.rt_numpy.full_like(a, fill_value, dtype=None, order='K', subok=True, shape=None)

Return a full array with the same shape and type as a given array.

Parameters:
  • a (array) – The shape and data type of a define the same attributes of the returned array. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • fill_value (scalar or array_like) – Fill value.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – Overrides the data type of the result.

  • order ({'C', 'F', 'A', or 'K'}, default 'K') – Overrides the memory layout of the result. ‘C’ means row-major (C-style), ‘F’ means column-major (Fortran-style), ‘A’ means ‘F’ if a is Fortran-contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible.

  • subok (bool, default True) – If True (the default), then the newly created array will use the sub-class type of a, otherwise it will be a base-class array.

  • shape (int or sequence of int, optional) – Overrides the shape of the result. If order=’K’ and the number of dimensions is unchanged, it will try to keep the same order; otherwise, order=’C’ is implied. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

Returns:

A FastArray with the same shape and data type as the specified array, filled with fill_value.

Return type:

FastArray

See also

riptable.ones, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.empty_like, riptable.full

Examples

>>> a = rt.FastArray([1, 2, 3, 4])
>>> rt.full_like(a, 9)
FastArray([9, 9, 9, 9])
>>> rt.ones_like(a, dtype = float)
FastArray([1., 1., 1., 1.])
riptable.rt_numpy.get_common_dtype(x, y)

Return the dtype of two arrays, or two scalars, or a scalar and an array.

Will dtype normal python ints to int32 or int64 (not int8 or int16). Used in where, put, take, putmask.

Parameters:
  • x (scalar or array_like) – A scalar and/or array to find the common dtype of.

  • y (scalar or array_like) – A scalar and/or array to find the common dtype of.

Returns:

The data type (dtype) common to both x and y. If the objects don’t have exactly the same dtype, returns the dtype which both types could be implicitly coerced to.

Return type:

data-type

Examples

>>> get_common_type('test','hello')
dtype('<U5')
>>> get_common_type(14,'hello')
dtype('<U16')
>>> get_common_type(14,b'hello')
dtype('<S16')
>>> get_common_type(14, 17)
dtype('int32')
>>> get_common_type(arange(10), arange(10.0))
dtype('float64')
>>> get_common_type(arange(10).astype(bool), True)
dtype('bool')
riptable.rt_numpy.get_dtype(val)

Return the dtype of an array, list, or builtin int, float, bool, str, bytes.

Parameters:

val – An object to get the dtype of.

Returns:

The data-type (dtype) for val (if it has a dtype), or a dtype compatible with val.

Return type:

data-type

Notes

if a python integer, will use int32 or int64 (never uint) for a python float, always returns float64 for a string, will return U or S with size

TODO: consider pushing down into C++

Examples

>>> get_dtype(10)
dtype('int32')
>>> get_dtype(123.45)
dtype('float64')
>>> get_dtype('hello')
dtype('<U5')
>>> get_dtype(b'hello')
dtype('S5')
riptable.rt_numpy.groupby(list_arrays, filter=None, cutoffs=None, base_index=1, lex=False, rec=False, pack=False, hint_size=0)

Main routine used to groupby one or more keys.

Parameters:
  • list_arrays (list of ndarray) – A list of numpy arrays to hash on (multikey). All arrays must be the same size.

  • filter (ndarray of bool, optional) – A boolean array the same length as the arrays in list_arrays used to pre-filter the input data before passing it to the grouping algorithm, defaults to None.

  • cutoffs (ndarray, optional) – INT64 array of cutoffs

  • base_index (int) –

  • lex (defaults to False. if False will call groupbyhash) – If set to true will call groupbylex

  • rec (bool) – When set to true, a record array is created, and then the data is sorted. A record array is faster, but may not produce a true lexicographical sort. Defaults to False. Only applicable when lex is True.

  • pack (bool) – Set to True to return iGroup, iFirstGroup, nCountGroup also; defaults to False. This is only meaningful when using hash-based grouping – when lex is True, the sorting-based grouping always computes and returns this information.

  • hint_size (int) – An integer hint if the number of unique keys is known in advance, defaults to zero. Only applicable when using hash-based grouping (i.e. lex is False).

Notes

Ends up calling groupbyhash or groupbylex.

riptable.rt_numpy.groupbyhash(list_arrays, hint_size=0, filter=None, hash_mode=2, cutoffs=None, pack=False)

Find unique values in an array using a linear hashing algorithm.

Find unique values in an array using a linear hashing algorithm; it will then bin each group according to first appearance. The zero bin is reserved for anything filtered out.

Parameters:
  • list_arrays (ndarray or list of ndarray) – a single numpy array or a list of numpy arrays to hash on (multikey) - all arrays must be the same size

  • hint_size (int, optional) – An integer hint if the number of unique keys is known in advance, defaults to zero.

  • filter (ndarray of bool, optional) – A boolean filter to pre-filter the values on, defaults to None.

  • hash_mode (int) – Setting for controlling the hashing mode; defaults to 2. Users generally should not override the default value of this parameter.

  • cutoffs (ndarray, optional) – An int64 array of cutoffs, defaults to None.

  • pack (bool) – Set to True to return iGroup, iFirstGroup, nCountGroup also; defaults to False.

Returns:

  • A dictionary of 3 arrays

  • ’iKey’ (array size is same as multikey, the unique key for which this row in multikey belongs)

  • ’iFirstKey’ (array size is same as unique keys, index into the first row for that unique key)

  • ’unique_count’ (number of uniques (not including the zero bin))

Examples

>>> np.random.seed(12345)
>>> c = np.random.randint(0, 8000, 2_000_000)
>>> rt.groupbyhash(c)
{'iKey': FastArray([   1,    2,    3, ..., 6061, 7889, 3002]),
 'iFirstKey': FastArray([    0,     1,     2, ..., 67072, 67697, 68250]),
 'unique_count': 8000,
 'iGroup': None,
 'iFirstGroup': None,
 'nCountGroup': None}

The ‘pack’ parameter can be overridden to True to calculate additional information about the relationship between elements in the input array and their group. Note this information is the same type of information groupbylex returns by default.

>>> rt.groupbyhash(c, pack=True)
{'iKey': FastArray([1, 2, 2, ..., 4, 6, 1]),
 'iFirstKey': FastArray([ 0,  1,  3,  4,  6, 14, 18, 20]),
 'unique_count': 8,
 'iGroup': FastArray([   0,    9,   21, ..., 9988, 9991, 9992]),
 'iFirstGroup': FastArray([   0,    0, 1213, 2465, 3761, 4987, 6239, 7522, 8797]),
 'nCountGroup': FastArray([   0, 1213, 1252, 1296, 1226, 1252, 1283, 1275, 1203])}

The output from groupbyhash is useful as an input to rc.BinCount:

>>> x = rt.groupbyhash(c)
>>> rc.BinCount(x['iKey'], x['unique_count'] + 1)
FastArray([  0, 251, 262, ..., 239, 217, 246])

A filter (boolean array) can be passed to groupbyhash; this causes groupbyhash to only operate on the elements of the input array where the filter has a corresponding True value.

>>> f = (c % 3).astype(bool)
>>> rt.groupbyhash(c, filter=f)
{'iKey': FastArray([   0,    1,    2, ...,    0, 5250, 1973]),
 'iFirstKey': FastArray([    1,     2,     3, ..., 54422, 58655, 68250]),
 'unique_count': 5333,
 'iGroup': None,
 'iFirstGroup': None,
 'nCountGroup': None}

The groupbyhash function can also operate on multikeys (tuple keys).

>>> d = np.random.randint(0, 8000, 2_000_000)
>>> rt.groupbyhash([c, d])
{'iKey': FastArray([      1,       2,       3, ..., 1968854, 1968855, 1968856]),
 'iFirstKey': FastArray([      0,       1,       2, ..., 1999997, 1999998, 1999999]),
 'unique_count': 1968856,
 'iGroup': None,
 'iFirstGroup': None,
 'nCountGroup': None}
riptable.rt_numpy.groupbylex(list_arrays, filter=None, cutoffs=None, base_index=1, rec=False)
Parameters:
  • list_arrays (ndarray or list of ndarray) – A list of numpy arrays to hash on (multikey). All arrays must be the same size.

  • filter (ndarray of bool, optional) – A boolean array of true/false filters, defaults to None.

  • cutoffs (ndarray, optional) – INT64 array of cutoffs

  • base_index (int) –

  • rec (bool) – When set to true, a record array is created, and then the data is sorted. A record array is faster, but may not produce a true lexicographical sort. Defaults to False.

Returns:

  • A dict of 6 numpy arrays

  • iKey (array size is same as multikey, the unique key for which this row in multikey belongs)

  • iFirstKey (array size is same as unique keys, index into the first row for that unique key)

  • unique_count (number of uniques)

  • iGroup (result from lexsort (fancy index sort of list_arrays))

  • iFirstGroup (array size is same as unique keys + 1: offset into iGroup)

  • nCountGroup (array size is same as unique keys + 1: length of slice in iGroup)

Examples

>>> a = rt.arange(100).astype('S')
>>> f = rt.logical(rt.arange(100) % 3)
>>> rt.groupbylex([a], filter=f)
{'iKey': FastArray([ 0,  1,  9,  0, 23, 31,  0, 45, 53,  0,  2,  3,  0,  4,  5,  0,
        6,  7,  0,  8, 10,  0, 11, 12,  0, 13, 14,  0, 15, 16,  0, 17,
       18,  0, 19, 20,  0, 21, 22,  0, 24, 25,  0, 26, 27,  0, 28, 29,
        0, 30, 32,  0, 33, 34,  0, 35, 36,  0, 37, 38,  0, 39, 40,  0,
       41, 42,  0, 43, 44,  0, 46, 47,  0, 48, 49,  0, 50, 51,  0, 52,
       54,  0, 55, 56,  0, 57, 58,  0, 59, 60,  0, 61, 62,  0, 63, 64,
        0, 65, 66,  0]),
 'iFirstKey': FastArray([ 1, 10, 11, 13, 14, 16, 17, 19,  2, 20, 22, 23, 25, 26, 28, 29,
       31, 32, 34, 35, 37, 38,  4, 40, 41, 43, 44, 46, 47, 49,  5, 50,
       52, 53, 55, 56, 58, 59, 61, 62, 64, 65, 67, 68,  7, 70, 71, 73,
       74, 76, 77, 79,  8, 80, 82, 83, 85, 86, 88, 89, 91, 92, 94, 95,
       97, 98]),
 'unique_count': 66,
 'iGroup': FastArray([ 1, 10, 11, 13, 14, 16, 17, 19,  2, 20, 22, 23, 25, 26, 28, 29,
       31, 32, 34, 35, 37, 38,  4, 40, 41, 43, 44, 46, 47, 49,  5, 50,
       52, 53, 55, 56, 58, 59, 61, 62, 64, 65, 67, 68,  7, 70, 71, 73,
       74, 76, 77, 79,  8, 80, 82, 83, 85, 86, 88, 89, 91, 92, 94, 95,
       97, 98]),
 'iFirstGroup': FastArray([66,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
       15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
       31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
       47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,
       63, 64, 65]),
 'nCountGroup': FastArray([34,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1])}
riptable.rt_numpy.groupbypack(ikey, ncountgroup, unique_count=None, cutoffs=None)

A routine often called after groupbyhash or groupbylex. Operates on binned integer arrays only (int8, int16, int32, or int64).

Parameters:
  • ikey (ndarray of ints) – iKey from groupbyhash or groupbylex

  • ncountgroup (ndarray of ints, optional) – From rc.BinCount or hash, if passed in it will be returned unchanged as part of this function’s output.

  • unique_count (int, optional) – required if ncountgroup is None, otherwise not unique_count (scalar int) (must include the 0 bin so +1 often added)

  • cutoffs (array_like, optional) – cutoff array for parallel processing

Returns:

  • 3 arrays in a dict

  • [‘iGroup’] (array size is same as ikey, unique keys are grouped together)

  • [‘iFirstGroup’] (array size is number of unique keys, indexes into iGroup)

  • [‘nCountGroup’] (array size is number of unique keys, how many in each group)

Examples

>>> np.random.seed(12345)
>>> c = np.random.randint(0, 8, 10_000)
>>> x = rt.groupbyhash(c)
>>> ncountgroup = rc.BinCount(x['iKey'], x['unique_count'] + 1)
>>> rt.groupbypack(x['iKey'], ncountgroup)
{'iGroup': FastArray([   0,    9,   21, ..., 9988, 9991, 9992]),
 'iFirstGroup': FastArray([   0,    0, 1213, 2465, 3761, 4987, 6239, 7522, 8797]),
 'nCountGroup': FastArray([   0, 1213, 1252, 1296, 1226, 1252, 1283, 1275, 1203])}

The sum of the entries in the nCountGroup array returned by groupbypack matches the length of the original array.

>>> rt.groupbypack(x['iKey'], ncountgroup)['nCountGroup'].sum()
10000
riptable.rt_numpy.hstack(tup, dtype=None, **kwargs)

see numpy hstack riptable can also take a dtype (it will convert all arrays to that dtype while stacking) riptable version will preserve sentinels riptable version is multithreaded for special classes like categorical and dataset, it will check to see if the class has it’s own hstack and it will call that

riptable.rt_numpy.interp(x, xp, fp)

One-dimensional or two-dimensional linear interpolation with clipping.

Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points (xp, fp), evaluated at x.

Parameters:
  • x (array of float32 or float64) – The x-coordinates at which to evaluate the interpolated values.

  • xp (1-D or 2-D sequence of float32 or float64) – The x-coordinates of the data points, must be increasing if argument period is not specified. Otherwise, xp is internally sorted after normalizing the periodic boundaries with xp = xp % period.

  • fp (1-D or 2-D sequence of float32 or float64) – The y-coordinates of the data points, same length as xp.

Returns:

y – The interpolated values, same shape as x.

Return type:

float32 or float64 (corresponding to fp) or ndarray

See also

np.interp, rt.interp_extrap

Notes

riptable version does not handle kwargs left/right whereas np does riptable version handles floats or doubles, whereas np is always a double riptable will warn if first parameter is a float32, but xp or yp is a double

riptable.rt_numpy.interp_extrap(x, xp, fp)

One-dimensional or two-dimensional linear interpolation without clipping.

Returns the one-dimensional piecewise linear interpolant to a function with given discrete data points (xp, fp), evaluated at x.

See also

np.interp, rt.interp

Notes

  • riptable version handles floats or doubles, wheras np is always a double

  • 2d mode is auto-detected based on xp/fp

riptable.rt_numpy.isfinite(*args, **kwargs)

Return True for each finite element, False otherwise.

A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s finite, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isnotfinite, riptable.isinf, riptable.isnotinf, FastArray.isfinite, FastArray.isnotfinite, FastArray.isinf, FastArray.isnotinf

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> rt.isfinite(a)
FastArray([False, False, False,  True])
>>> rt.isfinite(1)
True
riptable.rt_numpy.isinf(*args, **kwargs)

Return True for each element that’s positive or negative infinity, False otherwise.

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s positive or negative infinity, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isnotinf, riptable.isfinite, riptable.isnotfinite, FastArray.isinf, FastArray.isnotinf, FastArray.isfinite, FastArray.isnotfinite

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> rt.isinf(a)
FastArray([ True,  True, False, False])
>>> rt.isinf(1)
False
riptable.rt_numpy.ismember(a, b, h=2, hint_size=0, base_index=0)

The ismember function is meant to mimic the ismember function in MATLab. It takes two sets of data and returns two - a boolean array and array of indices of the first occurrence of an element in a in b - otherwise NaN.

Parameters:
  • a (A python list (strings), python tuple (strings), chararray, ndarray of unicode strings,) – ndarray of int32, int64, float32, or float64.

  • b (A list with the same constraints as a. Note: if a contains string data, b must also contain) – string data. If it contains different numerical data, casting will occur in either a or b.

  • h (There are currently two different hashing functions that can be used to execute ismember.) – Depending on the size, type, and number of matches in the data, the hashes perform differently. Currently accepts 1 or 2. 1=PRIME number (might be faster for floats - uses less memory) 2=MASK using power of 2 (usually faster but uses more memory)

  • hint_size (int, default 0) – For large arrays with a low unique count, setting this value to 4*expected unique count may speed up hashing.

  • base_index (int, default 0) – When set to 1 the first return argument is no longer a boolean array but an integer that is 1 or 0. A return value of 1 indicates there exists values in b that do not exist in a.

Returns:

  • c (int or np.ndarray of bool) – A boolean array the same size as a indicating whether or not the element at the corresponding index in a was found in b.

  • d (np.ndarray of int) – An array of indices the same size as a which each indicate where an element in a first occured in b or NaN otherwise.

Raises:
  • TypeError – input must be ndarray, python list, or python tuple

  • ValueError – data must be int32, int64, float32, float64, chararray, or unicode strings. If a contains string data, b must also contain string data and vice versa.

Examples

>>> a = [1.0, 2.0, 3.0, 4.0]
>>> b = [1.0, 3.0, 4.0, 4.0]
>>> c,d = ismember(a,b)
>>> c
FastArray([ True, False,  True,  True])
>>> d
FastArray([   0, -128,    1,    2], dtype=int8)

NaN values do not behave the same way as other elements. A NaN in the first will not register as existing in the second array. This is the expected behavior (to match MatLab nan MATLab nan handling):

>>> a = FastArray([1.,2.,3.,np.nan])
>>> b = FastArray([2.,3.,np.nan])
>>> c,d = ismember(a,b)
>>> c
FastArray([False,  True,  True, False])
>>> d
FastArray([-128,    0,    1, -128], dtype=int8)
riptable.rt_numpy.isnan(*args, **kwargs)

Return True for each element that’s a NaN (Not a Number), False otherwise.

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s a NaN, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isnotnan, riptable.isnanorzero, FastArray.isnan, FastArray.isnotnan, FastArray.notna, FastArray.isnanorzero, Categorical.isnan, Categorical.isnotnan, Categorical.notna, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([rt.nan, rt.inf, 2])
>>> rt.isnan(a)
FastArray([ True, False, False])
>>> rt.isnan(0)
False
riptable.rt_numpy.isnanorzero(*args, **kwargs)

Return True for each element that’s a NaN (Not a Number) or zero, False otherwise.

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s a NaN or zero, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

FastArray.isnanorzero, riptable.isnan, riptable.isnotnan, FastArray.isnan, FastArray.isnotnan, Categorical.isnan, Categorical.isnotnan, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([0, rt.nan, rt.inf, 3])
>>> rt.isnanorzero(a)
FastArray([ True,  True, False, False])
>>> rt.isnanorzero(0)
True
riptable.rt_numpy.isnotfinite(*args, **kwargs)

Return True for each non-finite element, False otherwise.

A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each non-finite element, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isfinite, riptable.isinf, riptable.isnotinf, FastArray.isfinite, FastArray.isnotfinite, FastArray.isinf, FastArray.isnotinf

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> rt.isnotfinite(a)
FastArray([ True,  True,  True, False])
>>> rt.isnotfinite(1)
False
riptable.rt_numpy.isnotinf(*args, **kwargs)

Return True for each element that’s not positive or negative infinity, False otherwise.

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s not positive or negative infinity, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isinf, FastArray.isnotinf, FastArray.isinf, riptable.isfinite, riptable.isnotfinite, FastArray.isfinite, FastArray.isnotfinite

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> rt.isnotinf(a)
FastArray([False, False,  True,  True])
>>> rt.isnotinf(1)
True
riptable.rt_numpy.isnotnan(*args, **kwargs)

Return True for each element that’s not a NaN (Not a Number), False otherwise.

Parameters:
Returns:

For array input, a FastArray of booleans is returned that’s True for each element that’s not a NaN, False otherwise. For scalar input, a boolean is returned.

Return type:

FastArray or bool

See also

riptable.isnan, riptable.isnanorzero, FastArray.isnan, FastArray.isnotnan, FastArray.notna, FastArray.isnanorzero, Categorical.isnan, Categorical.isnotnan, Categorical.notna, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([rt.nan, rt.inf, 2])
>>> rt.isnotnan(a)
FastArray([False,  True,  True])
>>> rt.isnotnan(0)
True
riptable.rt_numpy.issorted(*args)

Return True if the array is sorted, False otherwise.

NaNs at the end of an array are considered sorted.

Parameters:

*args (ndarray) – The array to check. It must be one-dimensional and contiguous.

Returns:

True if the array is sorted, False otherwise.

Return type:

bool

See also

FastArray.issorted

Examples

>>> a = rt.FastArray(['a', 'c', 'b'])
>>> rt.issorted(a)
False
>>> a = rt.FastArray([1.0, 2.0, 3.0, rt.nan])
>>> rt.issorted(a)
True
>>> cat = rt.Categorical(['a', 'a', 'a', 'b', 'b'])
>>> rt.issorted(cat)
True
>>> dt = rt.Date.range('20190201', '20190208')
>>> rt.issorted(dt)
True
>>> dtn = rt.DateTimeNano(['6/30/19', '1/30/19'], format='%m/%d/%y', from_tz='NYC')
>>> rt.issorted(dtn)
False
riptable.rt_numpy.lexsort(*args, **kwargs)
riptable.rt_numpy.log(*args, **kwargs)
riptable.rt_numpy.log10(*args, **kwargs)
riptable.rt_numpy.logical(a)
riptable.rt_numpy.makeifirst(key, unique_count, filter=None)
Parameters:
  • key (ndarray of ints) – Index array (int8, int16, int32 or int64).

  • unique_count (int) – Maximum number of unique values in key array.

  • filter (ndarray of bools, optional) – Boolean array same length as key array, defaults to None.

Returns:

index – An index array of the same dtype and length of the key passed in. The index array will have the invalid value for the array’s dtype set at any locations it could not find a first occurrence.

Return type:

ndarray of ints

Notes

makeifirst will NOT reduce the index/ikey unique size even when a filter is passed. Based on the integer dtype int8/16/32/64, all locations that have no first will be set to invalid. If an invalid is used as a riptable fancy index, it will pull in another invalid, for example ‘’ empty string

riptable.rt_numpy.makeilast(key, unique_count, filter=None)
Parameters:
  • key (ndarray of ints) – Index array (int8, int16, int32 or int64).

  • unique_count (int) – Maximum number of unique values in key array.

  • filter (ndarray of bools, optional) – Boolean array same length as key array, defaults to None.

Returns:

index – An index array of the same dtype and length of the key passed in. The index array will have the invalid value for the array’s dtype set at any locations it could not find a last occurrence.

Return type:

ndarray of ints

Notes

makeilast will NOT reduce the index/ikey unique size even when a filter is passed. Based on the integer dtype int8/16/32/64, all locations that have no last will be set to invalid. If an invalid is used as a riptable fancy index, it will pull in another invalid, for example ‘’ empty string

riptable.rt_numpy.makeinext(key, unique_count)
Parameters:
  • key (ndarray of integers) – index array (int8, int16, int32 or int64)

  • unique_count (int) – max uniques in ‘key’ array

Returns:

  • An index array of the same dtype and length of the next row

  • The index array will have -MAX_INT set to any locations it could not find a next

riptable.rt_numpy.makeiprev(key, unique_count)
Parameters:
  • key (ndarray of integers) – index array (int8, int16, int32 or int64)

  • unique_count (int) – max uniques in ‘key’ array

Return type:

The index array will have -MAX_INT set to any locations it could not find a previous

riptable.rt_numpy.mask_and(*args, **kwargs)

pass in a tuple or list of boolean arrays to AND together

riptable.rt_numpy.mask_andi(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to AND together

riptable.rt_numpy.mask_andnot(*args, **kwargs)

pass in a tuple or list of boolean arrays to ANDNOT together

riptable.rt_numpy.mask_andnoti(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to ANDNOT together

riptable.rt_numpy.mask_or(*args, **kwargs)

pass in a tuple or list of boolean arrays to OR together

riptable.rt_numpy.mask_ori(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to OR together

riptable.rt_numpy.mask_xor(*args, **kwargs)

pass in a tuple or list of boolean arrays to XOR together

riptable.rt_numpy.mask_xori(*args, **kwargs)

inplace version: pass in a tuple or list of boolean arrays to XOR together

riptable.rt_numpy.max(*args, **kwargs)
riptable.rt_numpy.maximum(x1, x2, *args, **kwargs)
riptable.rt_numpy.mean(*args, filter=None, dtype=None, **kwargs)

Compute the arithmetic mean of the values in the first argument.

When possible, rt.mean(x, *args) calls x.mean(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.mean accepts the filter and dtype keyword arguments, but Dataset.mean does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly False, rt.mean returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.mean(dtype = my_type) is equivalent to my_type(x.mean()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s mean.

Return type:

scalar or Dataset

See also

nanmean

Computes the mean, ignoring NaNs.

Dataset.mean

Computes the mean of numerical Dataset columns.

FastArray.mean

Computes the mean of FastArray values.

GroupByOps.mean

Computes the mean of each group. Used by Categorical objects.

Notes

The dtype keyword for rt.mean specifies the data type of the result. This differs from numpy.mean, where it specifies the data type used to compute the mean.

Examples

>>> a = rt.FastArray([1, 3, 5, 7])
>>> rt.mean(a)
4.0

With a dtype specified:

>>> a = rt.FastArray([1, 3, 5, 7])
>>> rt.mean(a, dtype = rt.int32)
4

With a filter:

>>> a = rt.FastArray([1, 3, 5, 7])
>>> b = rt.FastArray([False, True, False, True])
>>> rt.mean(a, filter = b)
5.0
riptable.rt_numpy.median(*args, **kwargs)
riptable.rt_numpy.min(*args, **kwargs)
riptable.rt_numpy.minimum(x1, x2, *args, **kwargs)
riptable.rt_numpy.multikeyhash(*args)

Returns 7 arrays to help navigate data.

Parameters:
  • key – the unique occurence

  • nth – the nth unique occurence

  • bktsize – how many unique occurences occur

  • next – index to the next unique occurence and previous

  • prev – index to the next unique occurence and previous

  • first – index to the first unique occurence and last

  • last – index to the first unique occurence and last

Examples

>>> myarr = rt.arange(10) % 3
>>> myarr
FastArray([0, 1, 2, 0, 1, 2, 0, 1, 2, 0])
>>> mkgrp = rt.Dataset(rt.multikeyhash([myarr]).asdict())
>>> mkgrp.a = myarr
>>> mkgrp
#   key   nth   bktsize   next   prev   first   last   a
-   ---   ---   -------   ----   ----   -----   ----   -
0     1     1         4      3     -1       0      9   0
1     2     1         3      4     -1       1      7   1
2     3     1         3      5     -1       2      8   2
3     1     2         4      6      0       0      9   0
4     2     2         3      7      1       1      7   1
5     3     2         3      8      2       2      8   2
6     1     3         4      9      3       0      9   0
7     2     3         3     -1      4       1      7   1
8     3     3         3     -1      5       2      8   2
9     1     4         4     -1      6       0      9   0
riptable.rt_numpy.nan_to_num(*args, **kwargs)

arg1: ndarray returns: ndarray with nan_to_num notes: if you want to do this inplace contact TJD

riptable.rt_numpy.nan_to_zero(a)

Replace the NaN or invalid values in an array with zeroes.

This is an in-place operation – the input array is returned after being modified.

Parameters:

a (ndarray) – The input array.

Returns:

The input array a (after it’s been modified).

Return type:

ndarray

riptable.rt_numpy.nanargmax(*args, **kwargs)
riptable.rt_numpy.nanargmin(*args, **kwargs)
riptable.rt_numpy.nanmax(*args, **kwargs)
riptable.rt_numpy.nanmean(*args, filter=None, dtype=None, **kwargs)

Compute the arithmetic mean of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, 0.0 is returned.

When possible, rt.nanmean(x, *args) calls x.nanmean(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.nanmean accepts the filter and dtype keyword arguments, but Dataset.nanmean does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly False, rt.nanmean returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanmean(dtype = my_type) is equivalent to my_type(x.nanmean()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s mean.

Return type:

scalar or Dataset

See also

mean

Computes the mean.

Dataset.nanmean

Computes the mean of numerical Dataset columns, ignoring NaNs.

FastArray.nanmean

Computes the mean of FastArray values, ignoring NaNs.

GroupByOps.nanmean

Computes the mean of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for rt.nanmean specifies the data type of the result. This differs from numpy.nanmean, where it specifies the data type used to compute the mean.

Examples

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> rt.nanmean(a)
3.0

With a dtype specified:

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> rt.nanmean(a, dtype = rt.int32)
3

With a filter:

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> rt.nanmean(a, filter = b)
4.0
riptable.rt_numpy.nanmedian(*args, **kwargs)
riptable.rt_numpy.nanmin(*args, **kwargs)
riptable.rt_numpy.nanpercentile(*args, **kwargs)
riptable.rt_numpy.nanstd(*args, filter=None, dtype=None, **kwargs)

Compute the standard deviation of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, NaN is returned.

Riptable uses the convention that ddof = 1, meaning the standard deviation of [x_1, ..., x_n] is defined by std = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

When possible, rt.nanstd(x, *args) calls x.nanstd(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.nanstd accepts the filter and dtype keyword arguments, but Dataset.nanstd does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly False, rt.nanstd returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanstd(dtype = my_type) is equivalent to my_type(x.nanstd()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s standard deviation.

Return type:

scalar or Dataset

See also

std

Computes the standard deviation.

FastArray.nanstd

Computes the standard deviation of FastArray values, ignoring NaNs.

Dataset.nanstd

Computes the standard deviation of numerical Dataset columns, ignoring NaNs.

GroupByOps.nanstd

Computes the standard deviation of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for rt.nanstd specifies the data type of the result. This differs from numpy.nanstd, where it specifies the data type used to compute the standard deviation.

Examples

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> rt.nanstd(a)
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> rt.nanstd(a, dtype = rt.int32)
1

With filter:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> rt.nanstd(a, filter = b)
0.7071067811865476
riptable.rt_numpy.nansum(*args, filter=None, dtype=None, **kwargs)

Compute the sum of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, 0.0 is returned.

When possible, rt.nansum(x, *args) calls x.nansum(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.nansum accepts the filter and dtype keyword arguments, but Dataset.nansum does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the sum calculation. If the filter is uniformly False, rt.nansum returns 0.0.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nansum(dtype = my_type) is equivalent to my_type(x.nansum()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s sum.

Return type:

scalar or Dataset

See also

sum

Sums the values of the input.

FastArray.nansum

Sums the values of a FastArray, ignoring NaNs.

Dataset.nansum

Sums the values of numerical Dataset columns, ignoring NaNs.

GroupByOps.nansum

Sums the values of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for rt.nansum specifies the data type of the result. This differs from numpy.nansum, where it specifies the data type used to compute the sum.

Examples

>>> a = rt.FastArray( [1, 3, 5, 7, rt.nan])
>>> rt.nansum(a)
16.0

With a dtype specified:

>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0, rt.nan])
>>> rt.nansum(a, dtype = rt.int32)
16

With a filter:

>>> a = rt.FastArray([1, 3, 5, 7, rt.nan])
>>> b = rt.FastArray([False, True, False, True, True])
>>> rt.nansum(a, filter = b)
10.0
riptable.rt_numpy.nanvar(*args, filter=None, dtype=None, **kwargs)

Compute the variance of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, NaN is returned.

Riptable uses the convention that ddof = 1, meaning the variance of [x_1, ..., x_n] is defined by var = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

When possible, rt.nanvar(x, *args) calls x.nanvar(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.nanvar accepts the filter and dtype keyword arguments, but Dataset.nanvar does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly False, rt.nanvar returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanvar(dtype = my_type) is equivalent to my_type(x.nanvar()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s variance.

Return type:

scalar or Dataset

See also

var

Computes the variance.

FastArray.nanvar

Computes the variance of FastArray values, ignoring NaNs.

Dataset.nanvar

Computes the variance of numerical Dataset columns, ignoring NaNs.

GroupByOps.nanvar

Computes the variance of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for rt.nanvar specifies the data type of the result. This differs from numpy.nanvar, where it specifies the data type used to compute the variance.

Examples

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> rt.nanvar(a)
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> rt.nanvar(a, dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> rt.nanvar(a, filter = b)
0.5
riptable.rt_numpy.ones(shape, dtype=None, order='C', *, like=None)

Return a new array of the specified shape and data type, filled with ones.

Parameters:
  • shape (int or sequence of int) – Shape of the new array, e.g., (2, 3) or 2. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, default numpy.float64) – The desired data type for the array.

  • order ({'C', 'F'}, default 'C') – Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

  • like (array_like, default None) – Reference object to allow the creation of arrays that are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

Returns:

A new FastArray of the specified shape and type, filled with ones.

Return type:

FastArray

See also

riptable.ones_like, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.empty_like, riptable.full

Examples

>>> rt.ones(5)
FastArray([1., 1., 1., 1., 1.])
>>> rt.ones(5, dtype='int8')
FastArray([1, 1, 1, 1, 1], dtype=int8)
riptable.rt_numpy.ones_like(a, dtype=None, order='K', subok=True, shape=None)

Return an array of ones with the same shape and data type as the specified array.

Parameters:
  • a (array) – The shape and data type of a define the same attributes of the returned array. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – Overrides the data type of the result.

  • order ({'C', 'F', 'A', or 'K'}, default 'K') – Overrides the memory layout of the result. ‘C’ means row-major (C-style), ‘F’ means column-major (Fortran-style), ‘A’ means ‘F’ if a is Fortran-contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible.

  • subok (bool, default True) – If True (the default), then the newly created array will use the sub-class type of a, otherwise it will be a base-class array.

  • shape (int or sequence of int, optional) – Overrides the shape of the result. If order=’K’ and the number of dimensions is unchanged, it will try to keep the same order; otherwise, order=’C’ is implied. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

Returns:

A FastArray with the same shape and data type as the specified array, filled with ones.

Return type:

FastArray

See also

riptable.ones, riptable.zeros, riptable.zeros_like, riptable.empty, riptable.empty_like, riptable.full

Examples

>>> a = rt.FastArray([1, 2, 3, 4])
>>> rt.ones_like(a)
FastArray([1, 1, 1, 1])
>>> rt.ones_like(a, dtype = float)
FastArray([1., 1., 1., 1.])
riptable.rt_numpy.percentile(*args, **kwargs)
riptable.rt_numpy.putmask(a, mask, values)

This is roughly the equivalent of arr[mask] = arr2[mask].

Examples

>>> arr = rt.FastArray([10, 10, 10, 10])
>>> arr2 = rt.FastArray([1, 2, 3, 4])
>>> mask = rt.FastArray([False, True, True, False])
>>> rt.putmask(arr, mask, arr2)
>>> arr
FastArray([10,  2,  3, 10])

It’s important to note that the length of arr and arr2 are presumed to be the same, otherwise the values in arr2 are repeated until they have the same dimension.

It should NOT be used to replace this operation:

>>> arr = rt.FastArray([10, 10, 10, 10])
>>> arr2 = rt.FastArray([1, 2])
>>> mask = rt.FastArray([False, True, True, False])
>>> arr[mask] = arr2
>>> arr
FastArray([10,  1,  2, 10])

arr2 is repeated to create rt.FastArray([1, 2, 1, 2]) before performing the operation, hence the different result.

>>> arr = rt.FastArray([10, 10, 10, 10])
>>> arr2 = rt.FastArray([1, 2])
>>> mask = rt.FastArray([False, True, True, False])
>>> rt.putmask(arr, mask, arr2)
>>> arr
FastArray([10,  2,  1, 10])
riptable.rt_numpy.reindex_fast(index, array)
riptable.rt_numpy.reshape(*args, **kwargs)
riptable.rt_numpy.round(*args, **kwargs)

This will check for numpy array first and call np.round

riptable.rt_numpy.searchsorted(a, v, side='left', sorter=None)

see np.searchsorted side =’leftplus’ is a new option in riptable where values > get a 0

riptable.rt_numpy.single(a)
riptable.rt_numpy.sort(*args, **kwargs)
riptable.rt_numpy.sortinplaceindirect(*args, **kwargs)
riptable.rt_numpy.std(*args, filter=None, dtype=None, **kwargs)

Compute the standard deviation of the values in the first argument.

Riptable uses the convention that ddof = 1, meaning the standard deviation of [x_1, ..., x_n] is defined by std = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

When possible, rt.std(x, *args) calls x.std(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.std accepts the filter and dtype keyword arguments, but Dataset.std does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly False, rt.std returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.std(dtype = my_type) is equivalent to my_type(x.std()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s standard deviation.

Return type:

scalar or Dataset

See also

nanstd

Computes the standard deviation, ignoring NaNs.

FastArray.std

Computes the standard deviation of FastArray values.

Dataset.std

Computes the standard deviation of numerical Dataset columns.

GroupByOps.std

Computes the standard deviation of each group. Used by Categorical objects.

Notes

The dtype keyword for rt.std specifies the data type of the result. This differs from numpy.std, where it specifies the data type used to compute the standard deviation.

Examples

>>> a = rt.FastArray([1, 2, 3])
>>> rt.std(a)
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3])
>>> rt.std(a, dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3])
>>> b = rt.FA([False, True, True])
>>> rt.std(a, filter = b)
0.7071067811865476
riptable.rt_numpy.sum(*args, filter=None, dtype=None, **kwargs)

Compute the sum of the values in the first argument.

When possible, rt.sum(x, *args) calls x.sum(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below. For example, Dataset.sum() does not accept the filter or dtype keyword arguments.

For FastArray.sum, see numpy.sum for documentation but note the following:

  • Until a reported bug is fixed, the dtype keyword argument may not work as expected:

    • Riptable data types (for example, rt.float64) are ignored.

    • NumPy integer data types (for example, numpy.int32) are also ignored.

    • NumPy floating point data types are applied as numpy.float64.

  • If you include another NumPy parameter (for example, axis=0), the NumPy implementation of sum will be used and the dtype will be used to compute the sum.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the sum calculation.

  • dtype (rt.dtype or numpy.dtype, optional) – The data type of the result. By default, for integer input the result dtype is int64 and for floating point input the result dtype is float64. See the notes above about using this keyword argument with FastArray objects as input.

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s sum.

Return type:

scalar or Dataset

See also

numpy.sum

nansum

Sums the values, ignoring NaNs.

FastArray.sum

Sums the values of a FastArray.

Dataset.sum

Sums the values of numerical Dataset columns.

GroupByOps.sum

Sums the values of each group. Used by Categorical objects.

Examples

>>> a = rt.FastArray([1, 3, 5, 7])
>>> rt.sum(a)
16
>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0])
>>> rt.sum(a)
16.0
riptable.rt_numpy.tile(arr, reps)

Construct an array by repeating a specified array a specified number of times.

Parameters:
  • a (array or scalar) – The input array or scalar.

  • reps (int or array of int) – The number of repetitions of a along each axis. For examples of tile used with multi-dimensional arrays, see numpy.tile(). Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

Returns:

A new FastArray of the repeated input arrays.

Return type:

FastArray

See also

riptable.repeat

Construct an array by repeating each element of a specified array.

Examples

Tile a scalar:

>>> rt.tile(2, 5)
FastArray([2, 2, 2, 2, 2])

Tile an array:

>>> x = rt.FA([1, 2, 3, 4])
>>> rt.tile(x, 2)
FastArray([1, 2, 3, 4, 1, 2, 3, 4])
riptable.rt_numpy.transpose(*args, **kwargs)
riptable.rt_numpy.trunc(*args, **kwargs)
riptable.rt_numpy.unique32(list_keys, hintSize=0, filter=None)

Returns the index location of the first occurence of each key.

Parameters:
  • list_keys (list of ndarray) – A list of numpy arrays to hash on (multikey); if there is just one item it still needs to be in a list such as [array1].

  • hintSize (int) – Integer hint if the number of unique keys (in list_keys) is known in advance, defaults to 0.

  • filter (ndarray of bools, optional) – Boolean array used to pre-filter the array(s) in list_keys prior to processing them, defaults to None.

Returns:

An array the size of the total unique values; the array contains the INDEX to the first occurence of the unique value. the second array contains the INDEX to the last occurence of the unique value.

Return type:

ndarray of ints

riptable.rt_numpy.var(*args, filter=None, dtype=None, **kwargs)

Compute the variance of the values in the first argument.

Riptable uses the convention that ddof = 1, meaning the variance of [x_1, ..., x_n] is defined by var = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

When possible, rt.var(x, *args) calls x.var(*args); look there for documentation. In particular, note whether the called function accepts the keyword arguments listed below.

For example, FastArray.var accepts the filter and dtype keyword arguments, but Dataset.var does not.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly False, rt.var returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.var(dtype = my_type) is equivalent to my_type(x.var()).

Returns:

Scalar for FastArray input. For Dataset input, returns a Dataset consisting of a row with each numerical column’s variance.

Return type:

scalar or Dataset

See also

nanvar

Computes the variance, ignoring NaNs.

FastArray.var

Computes the variance of FastArray values.

Dataset.var

Computes the variance of numerical Dataset columns.

GroupByOps.var

Computes the variance of each group. Used by Categorical objects.

Notes

The dtype keyword for rt.var specifies the data type of the result. This differs from numpy.var, where it specifies the data type used to compute the variance.

Examples

>>> a = rt.FastArray([1, 2, 3])
>>> rt.var(a)
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3])
>>> rt.var(a, dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3])
>>> b = rt.FastArray([False, True, True])
>>> rt.var(a, filter = b)
0.5
riptable.rt_numpy.vstack(arrlist, dtype=None, order='C')
Parameters:
  • arrlist (list of 1d numpy arrays of the same length) – these arrays are considered the columns

  • order (defaults to 'C' for row major. 'F' will be column major.) –

  • dtype (defaults to None. Can specifiy the final dtype for order='F' only.) –

  • WARNING (when order='F' riptable vstack will return a diffrent shape) –

  • length (from np.vstack since it will try to keep the first dim the same) –

  • contiguous. (while keeping data) –

  • passed (If order='F' is not) –

  • assumed. (order='C' is) –

  • fails (If riptable) –

  • called. (then normal np.vstack will be) –

  • arrays (For large) –

  • fly. (riptable can run in parallel while converting to the dtype on the) –

Returns:

  • a 2d array that is column major and can be insert into a dataset

  • Use v[ (,0] then v[:,1] to access the columns instead of)

  • v[0] and v[1] which would be the method with np.vstack

See also

np.vstack, np.column_stack

Examples

>>> a = rt.arange(100)
>>> b = rt.arange(100.0)
>>> v = rt.vstack([a,b], order='F')
>>> v.strides
(8, 800)
>>> v.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
>>> v.shape
(100,2)
riptable.rt_numpy.where(condition, x=None, y=None)

Return a new FastArray or Categorical with elements from x or y depending on whether condition is True.

For 1-dimensional arrays, this function is equivalent to:

[xv if c else yv
 for c, xv, yv in zip(condition, x, y)]

If only condition is provided, this function returns a tuple containing an integer FastArray with the indices where the condition is True. Note that this usage of where is not supported for FastArray objects of more than one dimension.

Note also that this case of where uses riptable.bool_to_fancy(). Using bool_to_fancy directly is preferred, as it behaves correctly for subclasses.

Parameters:
  • condition (bool or array of bool) – Where True, yield x, otherwise yield y.

  • x (scalar, array, or callable, optional) – The value to use where condition is True. If x is provided, y must also be provided, and x and y should be the same type. If x is an array, a callable that returns an array, or a Categorical, it must be the same length as condition. The value of x that corresponds to the True value is used.

  • y (scalar, array, or callable, optional) – The value to use where condition is False. If y is provided, x must also be provided, and x and y should be the same type. If y is an array, a callable that returns an array, or a Categorical, it must be the same length as condition. The value of y that corresponds to the False value is used.

Returns:

If x and y are Categorical objects, a Categorical is returned. Otherwise, if x and y are provided a FastArray is returned. When only condition is provided, a tuple is returned containing an integer FastArray with the indices where the condition is True.

Return type:

FastArray or Categorical or tuple

See also

FastArray.where

Replace values where a given condition is False.

riptable.bool_to_fancy

The function called when x and y are omitted.

Examples

condition is a comparison that creates an array of booleans, and x and y are scalars:

>>> a = rt.FastArray(rt.arange(5))
>>> a
FastArray([0, 1, 2, 3, 4])
>>> rt.where(a < 2, 100, 200)
FastArray([100, 100, 200, 200, 200])

condition and x are same-length arrays, and y is a scalar:

>>> condition = rt.FastArray([False, False, True, True, True])
>>> x = rt.FastArray([100, 101, 102, 103, 104])
>>> y = 200
>>> rt.where(condition, x, y)
FastArray([200, 200, 102, 103, 104])

When x and y are Categorical objects, a Categorical is returned:

>>> primary_traders = rt.Cat(['John', 'Mary', 'John', 'Mary', 'John', 'Mary'])
>>> secondary_traders = rt.Cat(['Chris', 'Duncan', 'Chris', 'Duncan', 'Duncan', 'Chris'])
>>> is_primary = rt.FA([True, True, False, True, False, True])
>>> rt.where(is_primary, primary_traders, secondary_traders)
Categorical([John, Mary, Chris, Mary, Duncan, Mary]) Length: 6
  FastArray([3, 4, 1, 4, 2, 4], dtype=int8) Base Index: 1
  FastArray([b'Chris', b'Duncan', b'John', b'Mary'], dtype='|S6') Unique count: 4

When x and y are Date objects, a FastArray of integers is returned that can be converted to a Date (other datetime objects are similar):

>>> x = rt.Date(['20230101', '20220101', '20210101'])
>>> y = rt.Date(['20190101', '20180101', '20170101'])
>>> condition = x > rt.Date(['20211231'])
>>> rt.where(condition, x, y)
>>> FastArray([19358, 18993, 17167])
>>> rt.Date(_)
Date(['2023-01-01', '2022-01-01', '2017-01-01'])

When only a condition is provided, a tuple is returned containing a FastArray with the indices where the condition is True:

>>> a = rt.FastArray([10, 20, 30, 40, 50])
>>> rt.where(a < 40)
(FastArray([0, 1, 2]),)
riptable.rt_numpy.zeros(*args, **kwargs)

Return a new array of the specified shape and data type, filled with zeros.

Parameters:
  • shape (int or sequence of int) – Shape of the new array, e.g., (2, 3) or 2. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, default numpy.float64) – The desired data type for the array.

  • order ({'C', 'F'}, default 'C') – Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.

  • like (array_like, default None) – Reference object to allow the creation of arrays that are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

Returns:

A new FastArray of the specified shape and type, filled with zeros.

Return type:

FastArray

See also

riptable.zeros_like, riptable.ones, riptable.ones_like, riptable.empty, riptable.empty_like, riptable.full

Examples

>>> rt.zeros(5)
FastArray([0., 0., 0., 0., 0.])
>>> rt.zeros(5, dtype = 'int8')
FastArray([0, 0, 0, 0, 0], dtype=int8)
riptable.rt_numpy.zeros_like(a, dtype=None, order='k', subok=True, shape=None)

Return an array of zeros with the same shape and data type as the specified array.

Parameters:
  • a (array) – The shape and data type of a define the same attributes of the returned array. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str or NumPy dtype or Riptable dtype, optional) – Overrides the data type of the result.

  • order ({'C', 'F', 'A', or 'K'}, default 'K') – Overrides the memory layout of the result. ‘C’ means row-major (C-style), ‘F’ means column-major (Fortran-style), ‘A’ means ‘F’ if a is Fortran-contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible.

  • subok (bool, default True) – If True (the default), then the newly created array will use the sub-class type of a, otherwise it will be a base-class array.

  • shape (int or sequence of int, optional) – Overrides the shape of the result. If order=’K’ and the number of dimensions is unchanged, it will try to keep the same order; otherwise, order=’C’ is implied. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

Returns:

A FastArray with the same shape and data type as the specified array, filled with zeros.

Return type:

FastArray

See also

riptable.zeros, riptable.ones, riptable.ones_like, riptable.empty, riptable.empty_like, riptable.full

Examples

>>> a = rt.FastArray([1, 2, 3, 4])
>>> rt.zeros_like(a)
FastArray([0, 0, 0, 0])
>>> rt.zeros_like(a, dtype = float)
FastArray([1., 1., 1., 1.])
riptable.rt_numpy.asanyarray
riptable.rt_numpy.asarray