riptable.rt_fastarray

Classes

FastArray

A FastArray is a 1-dimensional array of items that are the same data type.

Ledger

Recycle

Threading

class riptable.rt_fastarray.FastArray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)

Bases: numpy.ndarray

A FastArray is a 1-dimensional array of items that are the same data type.

Because it’s a subclass of NumPy’s numpy.ndarray, all ndarray functions and attributes can be used with FastArray objects. However, Riptable optimizes many of NumPy’s functions to make them faster and more memory-efficient. Riptable has also added some methods.

FastArray objects with more than 1 dimension are not supported.

See NumPy’s docs for details on all ndarray methods and attributes.

Parameters:
  • arr (array, iterable, or scalar value) – Contains data to be stored in the FastArray.

  • **kwargs – Additional keyword arguments to be passed to the function.

Notes

To improve performance, FastArray objects take over some of NumPy’s universal functions (ufuncs), use array recycling and multiple threads, and pass certain method calls to Bottleneck.

Note that whenever Riptable has implemented its own version of an existing NumPy method, a call to the NumPy method results in a call to the optimized Riptable version instead. We encourage users to directly call the Riptable method in order to avoid any confusion as to what method is actually being called.

See the list of NumPy Methods Optimized by Riptable for FastArrays.

Examples

Construct a FastArray

Pass a list to the constructor:

>>> rt.FastArray([1, 2, 3, 4, 5])
FastArray([1, 2, 3, 4, 5])
>>> #NOTE: rt.FA also works.
>>> rt.FA([1.0, 2.0, 3.0, 4.0, 5.0])
FastArray([1., 2., 3., 4., 5.])

Or use a utility function:

>>> rt.full(10, 0.7)
FastArray([0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7])
>>> rt.arange(10)
FastArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can optionally specify a data type:

>>> x = rt.FastArray([3, 6, 10],  dtype = rt.float64)
>>> x, x.dtype
(FastArray([ 3.,  6., 10.]), dtype('float64'))
>>> # Using a string shortcut:
>>> x = rt.FastArray([3,6,10],  dtype = 'float64')
>>> x, x.dtype
(FastArray([ 3.,  6., 10.]), dtype('float64'))

By default, characters are stored as byte strings. When unicode=True, the FastArray allows Unicode characters.

>>> rt.FA(list('abc'), unicode=True)
FastArray(['a', 'b', 'c'], dtype='<U1')

To convert an existing NumPy array, use the FastArray constructor.

>>> np_arr = np.array([1, 2, 3])
>>> rt.FA(np_arr)
FastArray([1, 2, 3])

To view the NumPy array as a FastArray (which is slightly less expensive than using the constructor), use the view method.

>>> fa = np_arr.view(FA)
>>> fa
FastArray([1, 2, 3])

To view it as a NumPy array again:

>>> fa.view(np.ndarray)
array([1, 2, 3])
>>> # Alternatively:
>>> fa._np
array([1, 2, 3])

Get a Subset of a FastArray

You can use standard Python slicing notation or fancy indexing to access a subset of a FastArray.

>>> # Create a FastArray:
>>> array = rt.arange(8)**2
>>> array
FastArray([0, 1, 4, 9, 16, 25, 36, 49])
>>> # Use Python slicing to get elements 2, 3, and 4:
>>> array[2:5]
FastArray([4, 9, 16])
>>> # Use fancy indexing to get elements 2, 4, and 1 (in that order):
>>> array[[2, 4, 1]]
FastArray([4, 16, 1])

For more details, see the examples for 1-dimensional arrays in NumPy’s docs: Indexing on ndarrays.

Note that slicing creates a view of the array and does not copy the underlying data; modifying the slice modifies the original array. Fancy indexing creates a copy of the extracted data; modifying this array does not modify the original array.

You can also pass a Boolean mask array.

>>> # Create a Boolean mask:
>>> evenMask = (array % 2 == 0)
>>> evenMask
FastArray([True, False, True, False, True, False, True, False])
>>> # Index using the Boolean mask:
>>> array[evenMask]
FastArray([0, 4, 16, 36])

How to Subclass FastArray

Include the required class definition:

>>> class TestSubclass(FastArray):
...     def __new__(cls, arr, **args):
...         # Before this call, arr needs to be a np.ndarray instance.
...         return arr.view(cls)
...     def __init__(self, arr, **args):
...         pass

If the subclass is computable, you might define your own math operations. In these operations, you might define what the subclass can be computed with. For examples of new definitions, see the DateTimeNano class.

Common operations to hook are comparisons (__eq__(), __ne__(), __gt__(), __lt__(), __le__(), __ge__()) and basic math functions (__add__(), __sub__(), __mul__(), etc.).

Bracket indexing operations are very common. If the subclass needs to set or return a value other than that in the underlying array, you need to take over __getitem__() or __setitem__().

Indexing is also used in display. For regular console/notebook display, you need to take over:

  • __repr__()

  • __str__()

  • _repr_html_() (for JupyterLab and Jupyter notebooks)

If the array is being displayed in a Dataset and you require certain formatting, you need to define two more methods:

display_query_properties()

Returns an ItemFormat object (see rt.Utils.rt_display_properties)

display_convert_func()

The conversion function returned by display_query_properties() must return a string. Each item being displayed, the result of __getitem__() at a single index, will go through this function individually, accompanied by an ItemFormat object.

Many Riptable operations need to return arrays of the same class they received. To ensure that your subclass will retain its special properties, you need to take over newclassfrominstance(). Failure to take this over will often result in an object with uninitialized variables.

copy() is another method that is called generically in Riptable routines, and needs to be taken over to retain subclass properties.

For a view of the underlying FastArray, you can use the _fa property.

class _ArrayFunctionHelper

Array function helper is responsible maintaining the array function protocol array implementations in the form of the following API:

  • get_array_function: given the Numpy function, returns overridden array function

  • get_array_function_type_compatibility_check: given the Numpy function, returns overridden array function type compatibility check

  • register_array_function: a function decorator whose argument is the Numpy function to override and the function that will override it

  • register_array_function_type_compatibility: similar to register_array_function, but guards against incompatible array function protocol type arguments for the given Numpy function

  • deregister: deregistration of the Numpy function and type compatibility override

  • deregister_array_function_type_compatibility: deregistration of Numpy function type compatibility override

HANDLED_FUNCTIONS: Dict[callable, callable]

Dictionary of Numpy API function with overridden functions.

HANDLED_TYPE_COMPATIBILITY_CHECK: Dict[callable, callable]

Dictionary of type compatibility functions per each Numpy API overridden function.

classmethod deregister(np_function)
classmethod deregister_array_function(np_function)

Deregistration of the Numpy function and type compatibility override.

Parameters:

np_function (callable) – The overridden Numpy array function.

classmethod deregister_array_function_type_compatibility(np_function)

Deregistration of the Numpy function and type compatibility override.

Parameters:

np_function (callable) – The overridden Numpy array function.

classmethod get_array_function(np_function)

Given the Numpy function, returns overridden array function if implemented, otherwise None.

Parameters:

np_function (callable) – The overridden Numpy array function.

Returns:

The overridden function as a callable or None if it’s not implemented.

Return type:

callable, optional

classmethod get_array_function_type_compatibility_check(np_function)

Given the Numpy function, returns the corresponding array function type compatibility callable, otherwise None.

Parameters:

np_function (callable) – The overridden Numpy array function.

Returns:

The overridden type compatibility function as a callable or None if it’s not implemented.

Return type:

callable, optional

classmethod register_array_function(np_function)

A function decorator whose argument is the Numpy function to override and the function that will override it. This registers the np_function with the function that it decorates.

Parameters:

np_function (callable) – The overridden Numpy array function.

Returns:

The decorator that registers np_function with the decorated function.

Return type:

callable

classmethod register_array_function_type_compatibility(np_function)

This registers the type compatibility check for the np_function with the function that it decorates.

Parameters:

np_function (callable) – The overridden Numpy array function.

Returns:

The decorator that registers the type compatibility check for the np_function with the decorated function.

Return type:

callable

property _np: numpy.ndarray

Return a NumPy array view of the input FastArray.

Returns:

A NumPy array view of the input FastArray.

Return type:

numpy.ndarray

See also

numpy.ndarray.view

Can be used to view a NumPy array as a FastArray.

Examples

Return a NumPy array view for an integer FastArray:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a
FastArray([1, 2, 3, 4, 5])
>>> a._np
array([1, 2, 3, 4, 5])

Changes to the view are reflected in the original FastArray:

>>> npview = a._np
>>> npview[2] = 10
>>> a
FastArray([ 1,  2, 10,  4,  5])

To view a NumPy array as a FastArray, you can use numpy.ndarray.view:

>>> npview.view(rt.FastArray)
FastArray([1, 2, 10, 4, 5])
property crc: int

Calculate the 32-bit CRC of the data in this array using the Castagnoli polynomial (CRC32C).

This function does not consider the array’s shape or strides when calculating the CRC, it simply calculates the CRC value over the entire buffer described by the array.

Examples

can be used to compare two arrays for structural equality >>> a = arange(100) >>> b = arange(100.0) >>> a.crc == b.crc False

property doc

Return the Doc object for the input FastArray.

If no Doc object exists, return None.

Returns:

The Doc object for the input FastArray. If no Doc object exists, return None.

Return type:

Doc

See also

FastArray.info

Return a description of the input array’s contents.

apply_schema

Set Doc object values.

Examples

No Doc object exists:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> print(a.doc)
None

Apply a schema and return the Doc object:

>>> schema = {"Description": "This is an array", "Steward": "Brian", "Type": "int32"}
>>> a.apply_schema(schema)
{}
>>> a.doc
Description: This is an array
Steward: Brian
Type: int32

Return specific Doc object information:

>>> a.doc._type
'int32'
>>> a.doc._descrip
'This is an array'
>>> a.doc._steward
'Brian'
>>> print(a.doc._detail)
None
property inv: Any

Return the invalid value for the input array’s data type.

Returns:

The invalid value for the input array’s dtype. For example, int8 returns -128, uint8 returns 255, and bool_ returns False.

Return type:

Any

See also

FastArray.copy_invalid

Return a copy of a FastArray filled with the invalid value for the array’s dtype.

FastArray.fill_invalid

Replace the values of a FastArray with the invalid value for the array’s dtype.

INVALID_DICT

A mapping of invalid values to dtypes.

Examples

Return the invalid value for an integer array:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a
FastArray([1, 2, 3, 4, 5])
>>> a.inv
-2147483648

Return the invalid value for a floating-point array:

>>> a2 = rt.FA([0., 1., 2., 3., 4.])
>>> a2
FastArray([0., 1., 2., 3., 4.])
>>> a2.inv
nan

Return the invalid value for a string array:

>>> a3 = rt.FA(["AMZN", "IBM", "MSFT", "AAPL"])
>>> a3
FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4')
>>> a3.inv
b''
property numbastring

converts byte string and unicode strings to a 2dimensional array so that numba can process it correctly

Examples

>>> @numba.jit(nopython=True)
... def numba_str(txt):
...     x=0
...     for i in range(txt.shape[0]):
...         if (txt[i,0]==116 and  # 't'
...             txt[i,1]==101 and  # 'e'
...             txt[i,2]==120 and  # 'x'
...             txt[i,3]==116):    # 't'
...             x += 1
...     return x
>>>
>>> x=FastArray(['some','text','this','is'])
>>> numba_str(x.view(np.uint8).reshape((len(x), x.itemsize)))
>>> numba_str(x.numbastring)
CompressPickle = True
FasterUFunc = True
MAX_DISPLAY_LEN = 10
NEW_ARRAY_FUNCTION_ENABLED = False

Enable implementation of array function protocol (default False).

NoTolerance = False
Recycle = True
SafeConversions = True
Verbose = 1
WarningDict
WarningLevel = 1
_reduce_op_identity_value: Mapping[riptable.rt_enum.REDUCE_FUNCTIONS, Any]
add
div
floordiv
mod
mul
pow
sub
static _FastFunctionsOff()
static _FastFunctionsOn()
static _GCNOW(timeout=0)

Pass the garbage collector timeout value to cleanup. Passing 0 will force an immediate garbage collection.

Return type:

Dictionary of memory heuristics including ‘TotalDeleted’

static _GCSET(timeout=100)

Pass the garbage collector timeout value to expire The timeout value is roughly in 2/5 secs A value of 100 is usually about 40 seconds

Return type:

Previous timespan

static _LCLEAR()

Clear all the entries in the math ledger

static _LDUMP(dataset=True)

Print out the math ledger

static _LDUMPF(filename)

Save the math ledger to a file

static _LOFF()

Turn the math ledger off

static _LON()

Turn the math ledger on to record all array math routines

static _OFF()

disable intercepting of array ufunc

static _ON()

enable intercepting array ufunc

static _RDUMP()

Displays to server’s stdout

Return type:

Total size of items not in use

static _ROFF(quiet=False)

Turn off recycling.

Parameters:

quiet (bool, optional) –

Return type:

True if recycling was previously on, else False

static _RON(quiet=False)

Turn on recycling.

Parameters:

quiet (bool, optional) –

Return type:

True if recycling was previously on, else False

static _TOFF()
static _TON()
static _V0()
static _V1()
static _V2()
__array_finalize__(obj)

Finalizes self from other, called as part of ndarray.__new__()

__array_function__(func, types, args, kwargs)
__array_ufunc__(ufunc, method, *inputs, **kwargs)

The FastArray universal function (or ufunc) override offers multithreaded C/C++ implementation at the RiptideCPP layer.

When FastArray receives a ufunc callable it will attempt to handle it in priority order:
  1. considering FastArray FastFunction is enabled, ufunc is handled by an explicit ufunc override, otherwise

  2. ufunc is handled at the Riptable / Numpy API overrides level, otherwise

  3. ufunc is handled at the Numpy API level.

Given a combination of ufunc, inputs, and kwargs, if neither of the aforementioned cases support this then a warning is emitted.

The following references to supported ufuncs are grouped by method type.
  • For method type reduce, see gReduceUFuncs.

  • For method type __call__, see gBinaryUFuncs, gBinaryLogicalUFuncs, gBinaryBitwiseUFuncs, and gUnaryUFuncs.

  • For method type at return None.

If out argument is specified, then an extra array copy is performed on the result of the ufunc computation.

If a dtype keyword is specified, all efforts are made to respect the dtype on the result of the computation.

Parameters:
  • ufunc (callable) – The ufunc object that was called.

  • method (str) – A string indicating which Ufunc method was called (one of “__call__”, “reduce”, “reduceat”, “accumulate”, “outer”, “inner”).

  • inputs – A tuple of the input arguments to the ufunc.

  • kwargs – A dictionary containing the optional input arguments of the ufunc. If given, any out arguments, both positional and keyword, are passed as a tuple in kwargs.

Return type:

The method should return either the result of the operation, or NotImplemented if the operation requested is not implemented.

Notes

The current implementation does not support the following keyword arguments: casting, sig, signature, and core_signature.

It has partial support for keyword arguments: where, axis, and axes, if they match the default values.

If FastArray’s WarningLevel is enabled, then warnings will be emitted if any of unsupported or partially supported keyword arguments are passed.

TODO document custom up casting rules.

__arrow_array__(type=None)

Implementation of the __arrow_array__ protocol for conversion to a pyarrow array.

Parameters:

type (pyarrow.DataType, optional, defaults to None) –

Return type:

pyarrow.Array or pyarrow.ChunkedArray

Notes

https://arrow.apache.org/docs/python/extending_types.html#controlling-conversion-to-pyarrow-array-with-the-arrow-array-protocol

__eq__(other)

Return self==value.

__ge__(other)

Return self>=value.

__getitem__(fld)

riptable has special routines to handle array input in the indexer. Everything else will go to numpy getitem.

__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)

Return self!=value.

__reduce__()

Used for pickling. For just a FastArray we pass back the view of the np.ndarray, which then knows how to pickle itself. NOTE: I think there is a faster way.. possible returning a byte string.

__setitem__(fld, value)

Used on the left hand side of arr[fld] = value

This routine tries to convert invalid dtypes to that invalids are preserved when setting The mbset portion of this is no written (which will not raise an indexerror on out of bounds)

Parameters:
  • fld (scalar, boolean, fancy index mask, slice, sequence, or list) –

  • value (scalar, sequence or dataset value as follows) – sequence can be list, tuple, np.ndarray, FastArray

Raises:

IndexError

static _argmax(a, axis=None, out=None)
static _argmin(a, axis=None, out=None)
classmethod _check_ndim(instance)

Iterates through dimensions of an array, counting how many dimensions have values greater than 1. Problems may occure with multidimensional FastArrays, and the user will be warned.

_compare_check(func, other)
static _empty_like(array, dtype=None, order='K', subok=True, shape=None)
_fa_filter_wrapper(myFunc, filter=None, dtype=None)
_fa_keyword_wrapper(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
_fill_invalid_internal(shape=None, dtype=None, inplace=True, fill_val=None)
static _from_arrow(arr, zero_copy_only=True, writable=False, auto_widen=False)

Convert a pyarrow Array to a riptable FastArray.

Parameters:
  • arr (pyarrow.Array or pyarrow.ChunkedArray) –

  • zero_copy_only (bool, default True) – If True, an exception will be raised if the conversion to a FastArray would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types).

  • writable (bool, default False) – For a FastArray created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable.

  • auto_widen (bool, optional, default to False) – When False (the default), if an arrow array contains a value which would be considered the ‘invalid’/NA value for the equivalent dtype in a FastArray, raise an exception because direct conversion would be lossy / change the semantic meaning of the data. When True, the converted array will be widened (if possible) to the next-largest dtype to ensure the data will be interpreted in the same way.

Return type:

FastArray

_internal_self_compare(math_op, periods=1, fancy=False)

internal routine used for differs and transitions

_is_not_supported(arr)

returns True if a numpy array is not FastArray internally supported

_kwarg_check(*args, **kwargs)
_legacy_array_function(func, types, args, kwargs)

Called before array_ufunc. Does not get called for every function np.isnan/trunc/true_divide for instance.

static _max(a, axis=None, out=None, keepdims=None, initial=None, where=None)
static _mean(a, axis=None, dtype=None, out=None, keepdims=None)
static _min(a, axis=None, out=None, keepdims=None, initial=None, where=None)
static _nanargmax(a, axis=None)
static _nanargmin(a, axis=None)
static _nanmax(a, axis=None, out=None, keepdims=None)
static _nanmean(a, axis=None, dtype=None, out=None, keepdims=None)
static _nanmin(a, axis=None, out=None, keepdims=None)
static _nanstd(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
static _nansum(a, axis=None, dtype=None, out=None, keepdims=None)
static _nanvar(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
_new_array_function(func, types, args, kwargs)

FastArray implementation of the array function protocol.

Parameters:
  • func (callable) – An callable exposed by NumPy’s public API, which was called in the form func(*args, **kwargs).

  • types (tuple) – A tuple of unique argument types from the original NumPy function call that implement __array_function__.

  • args (tuple) – The tuple of arguments that will be passed to func.

  • kwargs (dict) – The dictionary of keyword arguments that will be passed to func.

Raises:

TypeError – If func is not overridden by a corresponding riptable array function then a TypeError is raised.

Notes

This array function implementation requires each class, such as FastArray and any other derived class, to implement their own version of the Numpy array function API. In the event these array functions defer to the inheriting class they will need to either re-wrap the results in the correct type or raise exception if a particular operation is not well-defined nor meaningful for the derived class. If an array function, which is also a universal function, is not overridden as an array function, but defined as a ufunc then it will not be called unless it is registered with the array function helper since array function protocol takes priority over the universal function protocol.

Reference: NEP 18 Array Function Protocol

classmethod _possibly_warn(warning_string)
classmethod _py_number_to_np_dtype(val, dtype)

Convert a python type to numpy dtype. Only handles integers.

_reduce_check(reduceFunc, npFunc, *args, **kwargs)

Arg2: npFunc pass in None if no numpy equivalent function

static _round_(a, decimals=None, out=None)
static _std(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
static _sum(a, axis=None, dtype=None, out=None, keepdims=None, initial=None, where=None)
_unary_op(funcnum, fancy=False)
static _var(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
_view_internal(type=None)

FastArray subclasses need to take this over if they want to make a shallow copy of a fastarray instead of viewing themselves as a fastarray (which drops their other properties). Taking over view directly may have a lot of unintended consequences.

abs(**kwargs)
apply(pyfunc, *args, otypes=None, doc=None, excluded=None, cache=False, signature=None)

Generalized function class. see: np.vectorize

Creates and then applies a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

Parameters:
  • pyfunc (callable) – A python function or method.

  • otypes (str or list of dtypes, optional) – The output data type. It must be specified as either a string of typecode characters or a list of data type specifiers. There should be one data type specifier for each output.

  • doc (str, optional) – The docstring for the function. If None, the docstring will be the pyfunc.__doc__.

  • excluded (set, optional) –

    Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to pyfunc unmodified.

    New in version 1.7.0.

  • cache (bool, optional) –

    If True, then cache the first function call that determines the number of outputs if otypes is not provided.

    New in version 1.7.0.

  • signature (string, optional) –

    Generalized universal function signature, e.g., (m,n),(n)->(m) for vectorized matrix-vector multiplication. If provided, pyfunc will be called with (and expected to return) arrays with shapes given by the size of corresponding core dimensions. By default, pyfunc is assumed to take scalars as input and output.

    New in version 1.12.0.

Returns:

vectorized – Vectorized function.

Return type:

callable

Examples

>>> def myfunc(a, b):
...     "Return a-b if a>b, otherwise return a+b"
...     if a > b:
...         return a - b
...     else:
...         return a + b
>>>
>>> a=arange(10)
>>> b=arange(10)+1
>>> a.apply(myfunc,b)
FastArray([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

Example with one input array

>>> def square(x):
...     return x**2
>>>
>>> a=arange(10)
>>> a.apply(square)
FastArray([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Example with lambda

>>> a=arange(10)
>>> a.apply(lambda x: x**2)
FastArray([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Example with numba

>>> from numba import jit
>>> @jit
... def squareit(x):
...     return x**2
>>> a.apply(squareit)
FastArray([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Examples to use existing builtin oct function but change the output from string, to unicode, to object

>>> a=arange(10)
>>> a.apply(oct, otypes=['S'])
FastArray([b'0o0', b'0o1', b'0o2', b'0o3', b'0o4', b'0o5', b'0o6', b'0o7', b'0o10', b'0o11'], dtype='|S4')
>>> a=arange(10)
>>> a.apply(oct, otypes=['U'])
FastArray(['0o0', '0o1', '0o2', '0o3', '0o4', '0o5', '0o6', '0o7', '0o10', '0o11'], dtype='<U4')
>>> a=arange(10)
>>> a.apply(oct, otypes=['O'])
FastArray(['0o0', '0o1', '0o2', '0o3', '0o4', '0o5', '0o6', '0o7', '0o10', '0o11'], dtype=object)
apply_numba(*args, otype=None, myfunc='myfunc', name=None)

Print to screen an example numba signature for the array.

You can then copy this example to build your own numba function.

Parameters:
  • *args – Test arguments

  • otype (str, default None) – A different output data type

  • myfunc (str, default 'myfunc') – A string to call the function

  • name (str, default None) – A string to name the array

Examples

>>> import numba
>>> @numba.guvectorize(['void(int64[:], int64[:])'], '(n)->(n)')
... def squarev(x,out):
...     for i in range(len(x)):
...         out[i]=x[i]**2
...
>>> a=arange(1_000_000).astype(np.int64)
>>> squarev(a)
FastArray([           0,            1,            4, ..., 999994000009,
           999996000004, 999998000001], dtype=int64)
apply_pandas(func, convert_dtype=True, args=(), **kwds)

Invoke function on values of FastArray. Can be ufunc (a NumPy function that applies to the entire FastArray) or a Python function that only works on single values

Parameters:
  • func (function) –

  • convert_dtype (boolean, default True) – Try to find better dtype for elementwise function results. If False, leave as dtype=object

  • args (tuple) – Positional arguments to pass to function in addition to the value

  • function (Additional keyword arguments will be passed as keywords to the) –

Returns:

y

Return type:

FastArray or Dataset if func returns a FastArray

See also

FastArray.map

For element-wise operations

FastArray.agg

only perform aggregating type operations

FastArray.transform

only perform transforming type operations

Examples

Create a FastArray with typical summer temperatures for each city.

>>> fa = rt.FastArray([20, 21, 12], index=['London', 'New York','Helsinki'])
>>> fa
London      20
New York    21
Helsinki    12
dtype: int64

Square the values by defining a function and passing it as an argument to apply().

>>> def square(x):
...     return x**2
>>> fa.apply(square)
London      400
New York    441
Helsinki    144
dtype: int64

Square the values by passing an anonymous function as an argument to apply().

>>> fa.apply(lambda x: x**2)
London      400
New York    441
Helsinki    144
dtype: int64

Define a custom function that needs additional positional arguments and pass these additional arguments using the args keyword.

>>> def subtract_custom_value(x, custom_value):
...     return x-custom_value
>>> fa.apply(subtract_custom_value, args=(5,))
London      15
New York    16
Helsinki     7
dtype: int64

Define a custom function that takes keyword arguments and pass these arguments to apply.

>>> def add_custom_values(x, **kwargs):
...     for month in kwargs:
...         x+=kwargs[month]
...     return x
>>> fa.apply(add_custom_values, june=30, july=20, august=25)
London      95
New York    96
Helsinki    87
dtype: int64

Use a function from the Numpy library.

>>> fa.apply(np.log)
London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64
apply_schema(schema)

Apply a schema containing descriptive information to the FastArray

Parameters:

schema – dict

Returns:

dictionary of deviations from the schema

argmax(**kwargs)
argmin(**kwargs)
argpartition2(*args, **kwargs)
astype(dtype, order='K', casting='unsafe', subok=True, copy=True)

Return a FastArray with values converted to the specified data type.

Check your results when you convert missing values. Sentinel values are preserved when Riptable handles the conversion. However, in some cases the array is sent to NumPy for conversion and results may not be what you expect.

For parameter descriptions, see numpy.ndarray.astype(). Note that until a reported bug is fixed, the casting parameter is ignored when Riptable handles the conversion.

Returns:

A FastArray with values converted to the specified data type.

Return type:

FastArray

See also

Dataset.astype

Examples

>>> a = rt.FastArray([1.7, 2.0, 3.0])
>>> a.astype(int)
FastArray([1, 2, 3])

Convert a NaN to an int sentinel and back:

>>> a = rt.FastArray([rt.nan, 1.0, 2.0])
>>> a_int = a.astype(int)
>>> a_int
FastArray([-2147483648,           1,           2])
>>> a_int.astype(float)
FastArray([nan,  1.,  2.])
between(low, high, include_low=True, include_high=False)

Return a boolean FastArray indicating which input values are in a specified interval.

Parameters:
  • low (scalar or array) – Lower bound for the interval. If an array, it must be the same size as self, and comparisons are done elementwise.

  • high (scalar or array) – Upper bound for the interval. If an array, it must be the same size as self, and comparisons are done elementwise.

  • include_low (bool, default True) – Specifies whether low is included when performing comparisons.

  • include_high (bool, default False) – Specifies whether high is included when performing comparisons.

Returns:

A boolean FastArray indicating which input values are in a specified interval.

Return type:

FastArray

Examples

Specify an interval using scalars:

>>> a = rt.FA([9, 2, 3, 5, 8, 9, 1, 4, 6])
>>> a.between(5, 9, include_low=False)  # Exclude 5 (left endpoint).
FastArray([False, False, False, False,  True, False, False, False,  True])

Specify an interval using arrays:

>>> a2 = rt.FA([1, 2, 3, 4, 5])
>>> a2.between([1, 3, 5, 5, 5], [2, 4, 6, 6, 6])
FastArray([ True, False, False, False,  True])

Specify an interval mixing scalar and array bounds:

>>> a3 = rt.FA([1, 2, 3, 4, 5])
>>> a3.between(2, [2, 4, 6, 6, 6])
FastArray([False,  True,  True,  True,  True])
clip_lower(a_min, **kwargs)
clip_upper(a_max, **kwargs)
copy(order='K')

Return a copy of the input FastArray.

Parameters:

order ({'K', 'C', 'F', 'A'}, default 'K') – Controls the memory layout of the copy: ‘K’ means match the layout of the input array as closely as possible; ‘C’ means row-based (C-style) order; ‘F’ means column-based (Fortran-style) order; ‘A’ means ‘F’ if the input array is formatted as ‘F’, ‘C’ if not.

Returns:

A copy of the input FastArray.

Return type:

FastArray

See also

Categorical.copy

Return a copy of the input Categorical.

Dataset.copy

Return a copy of the input Dataset.

Struct.copy

Return a copy of the input Struct.

Examples

Copy a FastArray:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a
FastArray([1, 2, 3, 4, 5])
>>> a2 = a.copy()
>>> a2
FastArray([1, 2, 3, 4, 5])
>>> a2 is a
False  # The copy is a separate object.
copy_invalid()

Return a copy of a FastArray filled with the invalid value for the array’s data type.

Returns:

A copy of the input array, filled with the invalid value for the array’s dtype.

Return type:

FastArray

See also

FastArray.inv

Return the invalid value for the input array’s dtype.

FastArray.fill_invalid

Replace the values of a FastArray with the invalid value for the array’s dtype.

Examples

Copy an integer array and replace with invalids:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a
FastArray([1, 2, 3, 4, 5])
>>> a2 = a.copy_invalid()
>>> a2
FastArray([-2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648])
>>> a
FastArray([1, 2, 3, 4, 5])  # a is unchanged.

Copy a floating-point array and replace with invalids:

>>> a3 = rt.FA([0., 1., 2., 3., 4.])
>>> a3
FastArray([0., 1., 2., 3., 4.])
>>> a3.copy_invalid()
FastArray([nan, nan, nan, nan, nan])

Copy a string array and replace with invalids:

>>> a4 = rt.FA(['AMZN', 'IBM', 'MSFT', 'AAPL'])
>>> a4
FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4')
>>> a4.copy_invalid()
FastArray([b'', b'', b'', b''], dtype='|S4')  # Invalid string value is an empty string.
count(sorted=True, filter=None)

The count of each unique value.

This returns the same information that .unique(return_counts = True) does, except in a Dataset instead of a tuple.

Parameters:
  • sorted (bool, default True) – When True (the default), unique values are returned in sorted order. Set to False to return them in order of first appearance.

  • filter (ndarray of bool, default None) – If provided, any False values will be ignored in the calculation.

Returns:

A Dataset containing the unique values and their counts.

Return type:

Dataset

See also

FastArray.unique

Examples

>>> a = rt.FastArray([0, 2, 1, 3, 3, 2, 2])
>>> a.count()
*Unique   Count
-------   -----
      0       1
      1       1
      2       3
      3       2

With sorted = False:

>>> a.count(sorted = False)
*Unique   Count
-------   -----
      0       1
      2       3
      1       1
      3       2
diff(periods=1)

Compute the differences between adjacent elements of a FastArray.

Spaces at either end are filled with invalid values based on the input array’s dtype. If a calculated difference isn’t supported by the dtype, it is displayed as a NaN or rollover value. For example, negative differences in a uint8 array are displayed as 255. To resolve this, you can explicitly upcast to the next larger signed int dtype before calculating the differences.

Parameters:

periods (int, default 1) – Number of element positions to shift right (if positive) or left (if negative) before subtracting. Raises an error if set to 0.

Returns:

An equivalent-length array containing the differences between input array elements that are adjacent or separated by a specified period. Spaces at either end are filled with invalids based on the input array’s dtype.

Return type:

FastArray

See also

FastArray.shift

Shift an array’s elements right or left.

Examples

Calculate differences using the periods=1 default (array elements one position to the right):

>>> a=rt.FA([0, 2, 4, 8, 16, 32])
>>> a
FastArray([ 0,  2,  4,  8, 16, 32])
>>> a.diff()
FastArray([-2147483648,           2,           2,           4,
                     8,          16])

Calculate differences using array elements two positions to the left:

>>> a.diff(-2)
FastArray([         -4,          -6,         -12,         -24,
           -2147483648, -2147483648])

Specify a periods value that is greater than the array length:

>>> a.diff(10)
FastArray([-2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648, -2147483648])
differs(periods=1, fancy=False)

Identify array values that are the same as adjacent values.

Returns either a boolean FastArray, where True indicates equivalent values, or a fancy index FastArray containing the indices of equivalent values.

Parameters:
  • periods (int, default 1) – The number of array element positions to look behind (positive number) or look ahead (negative number) for comparison.

  • fancy (bool, default False) – If False (the default), returns a boolean array. If True, returns a fancy index array.

Returns:

A boolean or fancy index array that identifies equivalent elements in the input array.

Return type:

FastArray

See also

FastArray.transitions

Identify nonequivalent items in the input array and return a boolean or fancy index array.

Examples

Return a boolean array using the periods=1 default value (look behind one element position for comparisons):

>>> a = rt.FA([1, 2, 2, 3, 2, 4, 5, 6, 2, 2, 5])
>>> a
FastArray([1, 2, 2, 3, 2, 4, 5, 6, 2, 2, 5])
>>> a.differs()
FastArray([False, False,  True, False, False, False, False, False, False,
            True, False])

Return a boolean array and look ahead three element positions for comparisons:

>>> a.differs(periods=-3)
FastArray([False,  True, False, False, False, False, False, False, False,
           False, False])

Return a fancy index array using the periods=1 default value (look behind one element position for comparisons):

>>> a.differs(fancy=True)
FastArray([2, 9])

Set periods to a number larger than the length of the input array:

>>> a.differs(periods=15)
FastArray([False, False, False, False, False, False, False, False, False,
           False, False])
display_query_properties()

Returns an ItemFormat object and a function for converting the FastArrays items to strings. Basic types: Bool, Int, Float, Bytes, String all have default formats / conversion functions. (see Utils.rt_display_properties)

If a new type is a subclass of FastArray and needs to be displayed in format different from its underlying type, it will need to take over this routine.

duplicated(keep='first', high_unique=False)

Return a boolean FastArray indicating True for duplicate items in the input array.

Parameters:
  • keep ({'first', 'last', 'False'}, default 'first') –

    • ‘first’ : Mark each duplicate as True except for the first occurrence.

    • ’last’ : Mark each duplicate as True except for the last occurrence.

    • ’False’ : Mark all duplicates as True.

  • high_unique (bool, default False (hashing)) – Controls whether the function uses hashing- or sorting-based logic to find unique values in the input array. If your data has a high proportion of unique values, set to True for faster performance.

Returns:

A boolean FastArray indicating True for duplicate items in the input array.

Return type:

FastArray

See also

FastArray.nunique

Return the number of unique values in an array.

Dataset.duplicated

Return a boolean FastArray indicating True for duplicate rows.

Examples

Exclude the first occurrence of each duplicate (use the default keep value):

>>> a = rt.FA([1, 2, 3, 4, 2, 7, 8, 8, 3])
>>> a
FastArray([1, 2, 3, 4, 2, 7, 8, 8, 3])
>>> a.duplicated()
FastArray([False, False, False, False,  True, False, False,  True,  True])

Mark all duplicates:

>>> a.duplicated(keep=False)
FastArray([False,  True,  True, False,  True, False,  True,  True,  True])
eq(other)
fill_invalid(shape=None, dtype=None, inplace=True)

Replace all values of the input FastArray with an invalid value.

The invalid value used is determined by the input array’s dtype or a user-specified dtype.

Warning: By default, this operation is in place.

Parameters:
  • shape (int or sequence of int, optional) – Shape of the new array, for example: (2, 3) or 2. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.

  • dtype (str, optional) – The desired dtype for the returned array.

  • inplace (bool, default True) – If True (the default), modify original data. If False, return a copy of the array.

Returns:

If inplace=False, a copy of the input FastArray is returned that has all values replaced with an invalid value. Otherwise, nothing is returned.

Return type:

FastArray, optional

See also

FastArray.inv

Return the invalid value for the input array’s dtype.

FastArray.copy_invalid

Return a copy of a FastArray filled with the invalid value for the array’s dtype.

Examples

Replace an integer array’s values with the invalid value for the array’s dtype. By default, the returned array is the same size and dtype as the input array, and the operation is performed in place:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a
FastArray([1, 2, 3, 4, 5])
>>> a.fill_invalid()
>>> a
FastArray([-2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648])

Replace a floating-point array’s values with the invalid value for the int32 dtype:

>>> a2 = rt.FA([0., 1., 2., 3., 4.])
>>> a2
FastArray([0., 1., 2., 3., 4.])
>>> a2.fill_invalid(dtype="int32", inplace=False)
FastArray([-2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648])

Specify the size and dtype of the output array:

>>> a3 = rt.FA(["AMZN", "IBM", "MSFT", "AAPL"])
>>> a3
FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4')
>>> a3.fill_invalid(2, dtype="bool", inplace=False)
FastArray([False, False])
fillna(value=None, method=None, inplace=False, limit=None)

Replace NaN and invalid values with a specified value or nearby data.

Optionally, you can modify the original FastArray if it’s not locked.

Parameters:
  • value (scalar or array, default None) – A value or an array of values to replace all NaN and invalid values. A value is required if method = None. An array can be used only when method = None. If an array is used, the number of values in the array must equal the number of NaN and invalid values.

  • method ({None, 'backfill', 'bfill', 'pad', 'ffill'}, default None) –

    Method to use to propagate valid values.

    • backfill/bfill: Propagates the next encountered valid value backward. Calls FastArray.fill_backward().

    • pad/ffill: Propagates the last encountered valid value forward. Calls FastArray.fill_forward().

    • None: A replacement value is required if method = None. Calls FastArray.replacena().

    If there’s not a valid value to propagate forward or backward, the NaN or invalid value is not replaced unless you also specify a value.

  • inplace (bool, default False) – If False, return a copy of the FastArray. If True, modify original data. This will modify any other views on this object. This fails if the FastArray is locked.

  • limit (int, default None) – If method is specified, this is the maximium number of consecutive NaN or invalid values to fill. If there is a gap with more than this number of consecutive NaN or invalid values, the gap will be only partially filled.

Returns:

The FastArray will be the same size and dtype as the original array.

Return type:

FastArray

See also

riptable.rt_fastarraynumba.fill_forward

Replace NaN and invalid values with the last valid value.

riptable.rt_fastarraynumba.fill_backward

Replace NaN and invalid values with the next valid value.

riptable.fill_forward

Replace NaN and invalid values with the last valid value.

riptable.fill_backward

Replace NaN and invalid values with the next valid value.

Dataset.fillna

Replace NaN and invalid values with a specified value or nearby data.

FastArray.replacena

Replace NaN and invalid values with a specified value.

Categorical.fill_forward

Replace NaN and invalid values with the last valid group value.

Categorical.fill_backward

Replace NaN and invalid values with the next valid group value.

GroupBy.fill_forward

Replace NaN and invalid values with the last valid group value.

GroupBy.fill_backward

Replace NaN and invalid values with the next valid group value.

Examples

Replace all NaN values with 0s:

>>> a = rt.FastArray([rt.nan, 1.0, rt.nan, rt.nan, rt.nan, 5.0])
>>> a.fillna(0)
FastArray([0., 1., 0., 0., 0., 5.])

Replace all invalid values with 0s:

>>> b = rt.FastArray([0, 1, 2, 3, 4, 5])
>>> b[0:3] = b.inv
>>> b.fillna(0)
FastArray([0, 0, 0, 3, 4, 5])

Replace each instance of NaN with a different value:

>>> a.fillna([0, 2, 3, 4])
FastArray([0., 1., 2., 3., 4., 5.])

Propagate the last encountered valid value forward. Note that where there’s no valid value to propagate, the NaN or invalid value isn’t replaced.

>>> a.fillna(method = 'ffill')
FastArray([nan,  1.,  1.,  1.,  1.,  5.])

You can use the value parameter to specify a value to use where there’s no valid value to propagate.

>>> a.fillna(value = 0, method = 'ffill')
FastArray([0., 1., 1., 1., 1., 5.])

Replace only the first NaN or invalid value in any consecutive series of NaN or invalid values.

>>> a.fillna(method = 'bfill', limit = 1)
FastArray([ 1.,  1., nan, nan,  5.,  5.])
filter(filter)

Return a copy of the FastArray containing only the elements that meet the specified condition.

Parameters:

filter (array: fancy index or Boolean mask) – A fancy index specifies both the desired elements and their order in the returned FastArray. When a Boolean mask is passed, only rows that meet the specified condition are in the returned FastArray.

Return type:

FastArray

Notes

If you want to perform an operation on a filtered FastArray, it’s more efficient to perform the operation using the filter keyword argument. For example, my_fa.sum(filter = boolean_mask).

Examples

Create a FastArray:

>>> fa = rt.FastArray(np.linspace(0, 1, 11))
>>> fa
FastArray([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

Filter using a fancy index:

>>> fa.filter([5, 0, 1])
FastArray([0.5, 0. , 0.1])

Filter using a condition that creates a Boolean mask array:

>>> fa.filter(fa > 0.75)
FastArray([0.8, 0.9, 1. ])
static from_arrow(arr, zero_copy_only=True, writable=False, auto_widen=False)

Convert a pyarrow Array to a riptable FastArray.

Parameters:
  • arr (pyarrow.Array or pyarrow.ChunkedArray) –

  • zero_copy_only (bool, default True) – If True, an exception will be raised if the conversion to a FastArray would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types).

  • writable (bool, default False) – For a FastArray created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable.

  • auto_widen (bool, optional, default to False) – When False (the default), if an arrow array contains a value which would be considered the ‘invalid’/NA value for the equivalent dtype in a FastArray, raise an exception because direct conversion would be lossy / change the semantic meaning of the data. When True, the converted array will be widened (if possible) to the next-largest dtype to ensure the data will be interpreted in the same way.

Return type:

FastArray

ge(other)
get_name()

Get the name that’s assigned to a FastArray.

When a FastArray object is created, it has no name. It can be assigned a name via set_name. For details, see FastArray.set_name().

Returns:

The assigned name, or None if the array has not been named.

Return type:

str or None

Examples

Assign the FastArray a name using FastArray.set_name():

>>> a = rt.arange(5)
>>> a.set_name('FA Name')
FastArray([0, 1, 2, 3, 4])

Get the name:

>>> a.get_name()
'FA Name'
gt(other)
info(**kwargs)

Return a description of the input array’s contents.

This information is set using FastArray.apply_schema and includes the steward and dtype.

Parameters:

**kwargs (optional) – Keyword arguments passed to rt_meta.info().

Returns:

A description of the input array’s contents.

Return type:

rt_meta.Info

See also

FastArray.doc

Return the Doc object for the input FastArray.

Categorical.info

Display a description of the input Categorical.

Struct.info

Return an object containing a description of the input structure’s contents.

Examples

Return the description of the input array’s contents:

>>> a = rt.FA([1, 2, 3, 4, 5])
>>> a.info()
Description: <no description>
Steward: <no steward>
Type: int32

Apply a schema and return the description of the input array’s contents:

>>> schema = {"Description": "This is an array", "Steward": "Brian"}
>>> a.apply_schema(schema)
{}
>>> a.info()
Description: This is an array
Steward: Brian
Type: int32

Return the description of the input array’s contents with a title:

>>> a.info(title="Test")
Test
====
Description: This is an array
Steward: Brian
Type: int32
iscomputable()
isfinite(fancy=False)

Return a boolean array that’s True for each finite FastArray element, False otherwise.

A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (finite) values.

Returns:

An array or booleans or indices.

Return type:

FastArray

See also

FastArray.isnotfinite, riptable.isfinite, riptable.isnotfinite, riptable.isinf, riptable.isnotinf, FastArray.isinf, FastArray.isnotinf

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> a.isfinite()
FastArray([False, False, False,  True])

With fancy = True:

>>> a.isfinite(fancy = True)
FastArray([3])
isin(test_elements, *, assume_unique=False, invert=False)

Calculates self in test_elements, broadcasting over self only. Returns a boolean array of the same shape as self that is True where an element of self is in test_elements and False otherwise.

Parameters:
  • test_elements (array_like) – The values against which to test each value of element. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.

  • assume_unique (bool, optional) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.

  • invert (bool, optional) – If True, the values in the returned array are inverted, as if calculating element not in test_elements. Default is False. np.isin(a, b, invert=True) is equivalent to (but faster than) np.invert(np.isin(a, b)).

Returns:

  • isin (ndarray, bool) – Has the same shape as element. The values element[isin] are in test_elements.

  • Note (behavior differs from pandas)

  • - Riptable favors bytestrings, and will make conversions from unicode/bytes to match for operations as necessary.

  • - We will also accept single scalars for values.

  • - Pandas series will return another series - we have no series, and will return a FastArray

Examples

>>> from riptable import *
>>> a = FA(['a','b','c','d','e'], unicode=False)
>>> a.isin(['a','b'])
FastArray([ True,  True, False, False, False])
>>> a.isin('a')
FastArray([ True,  False, False, False, False])
>>> a.isin({'b'})
FastArray([ False, True, False, False, False])
isinf(fancy=False)

Return a boolean array that’s True for each FastArray element that’s positive or negative infinity, False otherwise.

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (infinite) values.

Returns:

An array or booleans or indices.

Return type:

FastArray

See also

FastArray.isnotinf, FastArray.isfinite, FastArray.isnotfinite, riptable.isinf, riptable.isnotinf, riptable.isfinite, riptable.isnotfinite

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> a.isinf()
FastArray([ True,  True, False, False])

With fancy = True:

>>> a.isinf(fancy = True)
FastArray([0, 1])
isna()

isnan is mapped directly to isnan() Categoricals and DateTime take over isnan. FastArray handles sentinels.

>>> a=arange(100.0)
>>> a[5]=np.nan
>>> a[87]=np.nan
>>> sum(a.isna())
2
>>> sum(a.astype(np.int32).isna())
2
isnan(fancy=False)

Return a boolean array that’s True for each element that’s a NaN (Not a Number), False otherwise.

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (NaN) values.

Returns:

A FastArray of booleans or indices.

Return type:

FastArray

See also

FastArray.isnotnan, FastArray.notna, FastArray.isnanorzero, riptable.isnan, riptable.isnotnan, riptable.isnanorzero, Categorical.isnan, Categorical.isnotnan, Categorical.notna, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([rt.nan, rt.nan, rt.inf, 3])
>>> a.isnan()
FastArray([ True,  True, False, False])

With fancy = True:

>>> a.isnan(fancy = True)
FastArray([0, 1])
isnanorzero(fancy=False)

Return a boolean array that’s True for each element that’s a NaN (Not a Number) or zero, False otherwise.

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (NaN or zero) values.

Returns:

A FastArray of booleans or indices.

Return type:

FastArray

See also

riptable.isnanorzero, riptable.isnan, riptable.isnotnan, FastArray.isnan, FastArray.isnotnan, Categorical.isnan, Categorical.isnotnan, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([0, rt.nan, rt.inf, 3])
>>> a.isnanorzero()
FastArray([ True,  True, False, False])

With fancy = True:

>>> a.isnanorzero(fancy = True)
FastArray([0, 1])
isnormal(fancy=False)
isnotfinite(fancy=False)

Return a boolean array that’s True for each non-finite FastArray element, False otherwise.

A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (non-finite) values.

Returns:

An array or booleans or indices.

Return type:

FastArray

See also

FastArray.isfinite, riptable.isfinite, riptable.isnotfinite, riptable.isinf, riptable.isnotinf, FastArray.isinf, FastArray.isnotinf

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> a.isnotfinite()
FastArray([ True,  True,  True, False])

With fancy = True:

>>> a.isnotfinite(fancy = True)
FastArray([0, 1, 2])
isnotinf(fancy=False)

Return a boolean array that’s True for each FastArray element that’s not positive or negative infinity, False otherwise.

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (non-infinite) values.

Returns:

An array or booleans or indices.

Return type:

FastArray

See also

FastArray.isinf, riptable.isnotinf, riptable.isinf, riptable.isfinite, riptable.isnotfinite, FastArray.isfinite, FastArray.isnotfinite

Dataset.mask_or_isfinite

Return a boolean array that’s True for each Dataset row that has at least one finite value.

Dataset.mask_and_isfinite

Return a boolean array that’s True for each Dataset row that contains all finite values.

Dataset.mask_or_isinf

Return a boolean array that’s True for each Dataset row that has at least one value that’s positive or negative infinity.

Dataset.mask_and_isinf

Return a boolean array that’s True for each Dataset row that contains all infinite values.

Examples

>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0])
>>> a.isnotinf()
FastArray([False, False,  True,  True])

With fancy = True:

>>> a.isnotinf(fancy = True)
FastArray([2, 3])
isnotnan(fancy=False)

Return a boolean array that’s True for each element that’s not a NaN (Not a Number), False otherwise.

Parameters:

fancy (bool, default False) – Set to True to instead return the indices of the True (non-NaN) values.

Returns:

A FastArray of booleans or indices.

Return type:

FastArray

See also

FastArray.isnan, FastArray.notna, FastArray.isnanorzero, riptable.isnan, riptable.isnotnan, riptable.isnanorzero, Categorical.isnan, Categorical.isnotnan, Categorical.notna, Date.isnan, Date.isnotnan, DateTimeNano.isnan, DateTimeNano.isnotnan

Dataset.mask_or_isnan

Return a boolean array that’s True for each Dataset row that contains at least one NaN.

Dataset.mask_and_isnan

Return a boolean array that’s True for each all-NaN Dataset row.

Examples

>>> a = rt.FastArray([rt.nan, rt.inf, 2])
>>> a.isnotnan()
FastArray([False,  True,  True])

With fancy = True:

>>> a.isnotnan(fancy = True)
FastArray([1, 2])
isnotnormal(fancy=False)
issorted()

Return True if the array is sorted, False otherwise.

NaNs at the end of an array are considered sorted.

Calls riptable.issorted().

Returns:

True if the array is sorted, False otherwise.

Return type:

bool

See also

riptable.issorted

Examples

>>> a = rt.FastArray(['a', 'b', 'c'])
>>> a.issorted()
True
>>> a = rt.FastArray([1.0, 2.0, 3.0, rt.nan])
>>> rt.issorted(a)
True
>>> a = rt.FastArray(['a', 'c', 'b'])
>>> a.issorted()
False
le(other)
lt(other)
map(npdict)

Notes

Uses ismember and can handle large dictionaries

Examples

>>> a=arange(3)
>>> a.map({1: 'a', 2:'b', 3:'c'})
FastArray(['', 'a', 'b'], dtype='<U1')
>>> a=arange(3)+1
>>> a.map({1: 'a', 2:'b', 3:'c'})
FastArray(['a', 'b', 'c'], dtype='<U1')
map_old(npdict)

Example

>>> d = {1:10, 2:20}
>>> dat['c'] = dat.a.map(d)
>>> print(dat)
   a  b   cb   c
0  1  0  0.0  10
1  1  1  1.0  10
2  1  2  3.0  10
3  2  3  5.0  20
4  2  4  7.0  20
5  2  5  9.0  20
mean(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)

Compute the arithmetic mean of the values in the first argument.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly False, mean returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.mean(dtype = my_type) is equivalent to my_type(x.mean()).

Returns:

The mean of the values.

Return type:

scalar

See also

numpy.mean

FastArray.nanmean

Computes the mean of FastArray values, ignoring NaNs.

Dataset.mean

Computes the mean of numerical Dataset columns.

GroupByOps.mean

Computes the mean of each group. Used by Categorical objects.

Notes

The dtype keyword for FastArray.mean specifies the data type of the result. This differs from numpy.mean, where it specifies the data type used to compute the mean.

Notes on Using NumPy Parameters

Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.mean). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the mean as it is in numpy.mean.

Also note that if you use either of the following NumPy parameters and also include a filter keyword argument (which numpy.mean does not accept), Riptable’s implementation of mean will be used with the filter argument and the NumPy parameters will be ignored.

axisNone or int or tuple of ints, optional

Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.

If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-class’s method does not implement keepdims, any exceptions will be raised.

Examples

>>> a = rt.FastArray([1, 3, 5, 7])
>>> a.mean()
4.0

With a dtype specified:

>>> a = rt.FastArray([1, 3, 5, 7])
>>> a.mean(dtype = rt.int32)
4

With a filter:

>>> a = rt.FastArray([1, 3, 5, 7])
>>> b = rt.FastArray([False, True, False, True])
>>> a.mean(filter = b)
5.0
median(**kwargs)
move_argmax(*args, **kwargs)
move_argmin(*args, **kwargs)
move_max(*args, **kwargs)
move_mean(*args, **kwargs)
move_median(*args, **kwargs)
move_min(*args, **kwargs)
move_rank(*args, **kwargs)
move_std(*args, **kwargs)
move_sum(*args, **kwargs)
move_var(*args, **kwargs)
nanargmax(**kwargs)
nanargmin(**kwargs)
nanmax(**kwargs)
nanmean(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)

Compute the arithmetic mean of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, 0.0 is returned.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly False, nanmean returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanmean(dtype = my_type) is equivalent to my_type(x.nanmean()).

Returns:

The mean of the values.

Return type:

scalar

See also

numpy.nanmean

FastArray.mean

Computes the mean of FastArray values.

Dataset.nanmean

Computes the mean of numerical Dataset columns, ignoring NaNs.

GroupByOps.nanmean

Computes the mean of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for FastArray.nanmean specifies the data type of the result. This differs from numpy.nanmean, where it specifies the data type used to compute the mean.

Notes on Using NumPy Parameters

Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.nanmean). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the mean as it is in numpy.nanmean.

Also note that if you use either of the following NumPy parameters and also include a filter keyword argument (which numpy.nanmean does not accept), Riptable’s implementation of nanmean will be used with the filter argument and the NumPy parameters will be ignored.

axis{int, tuple of int, None}, optional

Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.

If the value is anything but the default, then keepdims will be passed through to the mean or sum methods of sub-classes of ndarray. If the sub-classes’ methods do not implement keepdims, any exceptions will be raised.

Examples

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> a.nanmean()
3.0

With a dtype specified:

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> a.nanmean(dtype = rt.int32)
3

With a filter:

>>> a = rt.FastArray([1, 3, 5, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> a.nanmean(filter = b)
4.0
nanmedian(**kwargs)
nanmin(**kwargs)
nanpercentile(**kwargs)
nanquantile(**kwargs)
nanrankdata(*args, **kwargs)
nanstd(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)

Compute the standard deviation of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, NaN is returned.

Riptable uses the convention that ddof = 1, meaning the standard deviation of [x_1, ..., x_n] is defined by std = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly False, nanstd returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanstd(dtype = my_type) is equivalent to my_type(x.nanstd()).

Returns:

The standard deviation of the values.

Return type:

scalar

See also

numpy.nanstd

FastArray.std

Computes the standard deviation of FastArray values.

Dataset.nanstd

Computes the standard deviation of numerical Dataset columns, ignoring NaNs.

GroupByOps.nanstd

Computes the standard deviation of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for FastArray.nanstd specifies the data type of the result. This differs from numpy.nanstd, where it specifies the data type used to compute the standard deviation.

Notes on Using NumPy Parameters

Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.nanstd). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the variance as it is in numpy.nanstd.

Also note that if you use any of the following NumPy parameters and also include a filter keyword argument (which numpy.nanstd does not accept), Riptable’s implementation of nanstd will be used with the filter argument and the NumPy parameters will be ignored.

axis{int, tuple of int, None}, optional

Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.

If this value is anything but the default it is passed through as-is to the relevant functions of the sub-classes. If these functions do not have a keepdims kwarg, a RuntimeError will be raised.

ddofint, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero for the NumPy implementation, versus one for the Riptable implementation.

Examples

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> a.nanstd()
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> a.nanstd(dtype = rt.int32)
1

With filter:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> a.nanstd(filter = b)
0.7071067811865476
nansum(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)

Compute the sum of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, 0.0 is returned.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the sum calculation. If the filter is uniformly False, nansum returns 0.0.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nansum(dtype = my_type) is equivalent to my_type(x.nansum()).

Returns:

The sum of the values.

Return type:

scalar

See also

numpy.nansum

Dataset.nansum

Sums the values of numerical Dataset columns, ignoring NaNs.

GroupByOps.nansum

Sums the values of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for FastArray.nansum specifies the data type of the result. This differs from numpy.nansum, where it specifies the data type used to compute the sum.

Notes on Using NumPy Parameters

Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.nansum). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the sum as it is in numpy.nansum.

Also note that if you use either of the following NumPy parameters and also include a filter keyword argument (which numpy.nansum does not accept), Riptable’s implementation of nansum will be used with the filter argument and the NumPy parameters will be ignored.

axis{int, tuple of int, None}, optional

Axis or axes along which the sum is computed. The default is to compute the sum of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.

If the value is anything but the default, then keepdims will be passed through to the mean or sum methods of sub-classes of ndarray. If the sub-classes’ methods do not implement keepdims, any exceptions will be raised.

Examples

>>> a = rt.FastArray([1, 3, 5, 7, rt.nan])
>>> a.nansum()
16.0

With a dtype specified:

>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0, rt.nan])
>>> a.nansum(dtype = rt.int32)
16

With a filter:

>>> a = rt.FastArray([1, 3, 5, 7, rt.nan])
>>> b = rt.FastArray([False, True, False, True, True])
>>> a.nansum(filter = b)
10.0
nanvar(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)

Compute the variance of the values in the first argument, ignoring NaNs.

If all values in the first argument are NaNs, NaN is returned.

Riptable uses the convention that ddof = 1, meaning the variance of [x_1, ..., x_n] is defined by var = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly False, nanvar returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.nanvar(dtype = my_type) is equivalent to my_type(x.nanvar()).

Returns:

The variance of the values.

Return type:

scalar

See also

numpy.nanvar

FastArray.var

Computes the variance of FastArray values.

Dataset.nanvar

Computes the variance of numerical Dataset columns, ignoring NaNs.

GroupByOps.nanvar

Computes the variance of each group, ignoring NaNs. Used by Categorical objects.

Notes

The dtype keyword for FastArray.nanvar specifies the data type of the result. This differs from numpy.nanvar, where it specifies the data type used to compute the variance.

Notes on Using NumPy Parameters

Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.nanvar). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the variance as it is in numpy.nanvar.

Also note that if you use any of the following NumPy parameters and also include a filter keyword argument (which numpy.nanvar does not accept), Riptable’s implementation of nanvar will be used with the filter argument and the NumPy parameters will be ignored.

axis{int, tuple of int, None}, optional

Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.

ddofint, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of non-NaN elements. By default ddof is zero for the NumPy implementation, versus one for the Riptable implementation.

Examples

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> a.nanvar()
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> a.nanvar(dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3, rt.nan])
>>> b = rt.FastArray([False, True, True, True])
>>> a.nanvar(filter = b)
0.5
ne(other)
normalize_minmax()
normalize_zscore()
notna()

notna is mapped directly to isnotnan() Categoricals and DateTime take over isnotnan. FastArray handles sentinels.

>>> a=arange(100.0)
>>> a[5]=np.nan
>>> a[87]=np.nan
>>> sum(a.notna())
98
>>> sum(a.astype(np.int32).notna())
98
nunique()

Return the number of unique values in the input FastArray.

Does not include NaN or sentinel values.

Returns:

Number of unique values in the input FastArray, excluding NaN and sentinel values.

Return type:

int

See also

FastArray.duplicated

Return a boolean FastArray indicating duplicate values.

Categorical.nunique

Return the number of unique values in the Categorical.

Examples

Retrieve the number of unique values in a floating-point FastArray:

>>> a = rt.FastArray([1., 2., 3., 1., 2., 3.])
>>> a
FastArray([1., 2., 3., 1., 2., 3.])
>>> a.nunique()
3

Retrieve the number of unique values in a floating-point FastArray with a NaN value:

>>> a2 = rt.FastArray([1., 2., 3., 1., 2., 3., rt.nan])
>>> a2
FastArray([ 1.,  2.,  3.,  1.,  2.,  3., nan])
>>> a2.nunique()  # The NaN value is not included.
3

Retrieve the number of unique values in an unsigned integer FastArray with a sentinel value:

>>> a3 = rt.FastArray([255, 2, 3, 2, 3], dtype="uint8")
>>> a3
FastArray([255,   2,   3,   2,   3], dtype=uint8)
>>> a3.nunique()  # The sentinel value is not included.
2
partition2(*args, **kwargs)
percentile(**kwargs)
push(*args, **kwargs)
quantile(**kwargs)
rankdata(*args, **kwargs)
classmethod register_function(name, func)

Used to register functions to FastArray. Used by rt_fastarraynumba

repeat(repeats, axis=None)

See riptable.repeat.

replace(old, new)
replacena(value, inplace=False)

Return a FastArray with all NaN and invalid values set to the specified value.

Optionally, you can modify the original FastArray if it’s not locked.

Parameters:
  • value (scalar or array) – A value or an array of values to replace all NaN and invalid values. If an array, the number of values must equal the number of NaN and invalid values.

  • inplace (bool, default False) – If False, return a copy of the FastArray. If True, modify the original. This will modify any other views on this object. This fails if the FastArray is locked.

Returns:

The FastArray will be the same size and dtype as the original array. Returns None if inplace = True.

Return type:

FastArray or None

See also

FastArray.fillna

Replace NaN and invalid values with a specified value or nearby data.

Dataset.fillna

Replace NaN and invalid values with a specified value or nearby data.

Categorical.fill_forward

Replace NaN and invalid values with the last valid group value.

Categorical.fill_backward

Replace NaN and invalid values with the next valid group value.

GroupBy.fill_forward

Replace NaN and invalid values with the last valid group value.

GroupBy.fill_backward

Replace NaN and invalid values with the next valid group value.

Examples

Replace all instances of NaN with a single value:

>>> a = rt.FastArray([rt.nan, 1.0, rt.nan, 3.0])
>>> a.replacena(0)
FastArray([0., 1., 0., 3.])

Replace all invalid values with 0s:

>>> b = rt.FastArray([0, 1, 2, 3, 4, 5])
>>> b[0:3] = b.inv
>>> b.replacena(0)
FastArray([0, 0, 0, 3, 4, 5])

Replace each instance of NaN with a different value:

>>> a.replacena([0, 2])
FastArray([0., 1., 2., 3.])
reshape(*args, **kwargs)
rolling_mean(window=3)
rolling_nanmean(window=3)
rolling_nanstd(window=3)
rolling_nansum(window=3)
rolling_nanvar(window=3)
rolling_quantile(q, window=3)
rolling_std(window=3)
rolling_sum(window=3)
rolling_var(window=3)
sample(N=10, filter=None, seed=None)

Return a given number of randomly selected values from a FastArray.

Parameters:
  • N (int, default 10) – Number of values to select. The entire array is returned if N is greater than the size of the array.

  • filter (array (bool or int), optional) – A boolean mask or index array to filter values before selection. A boolean mask must have the same length as the original FastArray.

  • seed (int or other types, optional) – A seed to initialize the random number generator. If one is not provided, the generator is initialized using random data from the OS. For details and other accepted types, see the seed parameter for numpy.random.default_rng.

Returns:

A new FastArray containing the randomly selected values.

Return type:

FastArray

See also

Dataset.sample

Return a specified number of randomly selected rows from a Dataset.

Examples

No sample size specified:

>>> a = rt.FA([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> a.sample()  # 10 randomly selected values returned.
FastArray([ 1,  2,  3,  4,  5,  6,  7,  9, 10, 11])  # Random

Sample 3 values:

>>> a = rt.FA([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> a.sample(3)
FastArray([1, 4, 9])  # Random

Specify a sample size larger than the array:

>>> a2 = rt.FA([1, 2, 3, 4, 5])
>>> a2.sample(100)  # The entire array is returned.
FastArray([1, 2, 3, 4, 5])

Specify an index array for filtering:

>>> a3 = rt.FA(['TSLA','AMZN','IBM', 'SPY', 'GME', 'AAPL', 'FB', 'GOOG',
...             'MSFT', 'UBER'])  # Create sample data.
>>> filter = rt.FA([0, 1, 3, 7])  # Specify indices of a3 to take the sample from.
>>> a3.sample(2, filter)
FastArray([b'TSLA', b'GOOG'], dtype='|S4')  # Random

Specify a boolean mask array for filtering:

>>> a3.sample(8, filter=rt.FA(a3 != 'SPY'))
FastArray([b'TSLA', b'IBM', b'GME', b'AAPL', b'FB', b'GOOG', b'MSFT',
           b'UBER'], dtype='|S4')  # Random
save(filepath, share=None, compress=True, overwrite=True, name=None)

Save a FastArray to an .sds file.

Parameters:
  • filepath (str or os.PathLike) – Path for the .sds file. If there’s a trailing slash, filepath is treated as a path to a directory and you also need to specify name. Alternatively, you can include a file name (with or without the .sds extension) at the end of filepath (with no trailing slash), and an .sds file with that name is created. Directories that don’t yet exist are created.

  • share (str, optional) – If specified, the FastArray is saved to shared memory (NOT to disk) and path information from filepath is discarded. A name value must be provided. When shared memory is used, data is not compressed. Note that shared memory functions are not currently supported on Windows.

  • compress (bool, default True) – When True (the default), compression is used when writing to the .sds file. Otherwise, no compression is used. (If shared memory is used, data is always saved uncompressed.)

  • overwrite (bool, default True) – When True (the default), the user is not prompted to specify whether or not to overwrite an existing .sds file. When set to False, a prompt is displayed.

  • name (str, optional) – Name for the .sds file. The .sds extension is not required. Note that if name is provided, filepath is treated as a path to a directory, even if filepath has no trailing slash.

Return type:

An .sds file containing the FastArray.

See also

save_sds

Save Dataset objects and arrays into a single .sds file.

load_sds

Load an .sds file.

Examples

Include a file name in the path:

>>> a = rt.FA([0, 1, 2, 3, 4])
FastArray([0, 1, 2, 3, 4])
>>> a.save("C://junk//saved_file")
>>> os.listdir("C://junk")
['saved_file.sds']

When name is specified, filepath is treated as a path to a directory:

>>> a.save("C://junk//saved_file", name="fa")
>>> os.listdir("C://junk//saved_file")
['fa.sds']

Display a prompt before overwriting an existing file:

>>> a.save("C://junk//saved_file", overwrite=False)
C://junk//saved_file.sds already exists. Overwrite? (y/n) n
No file was saved.
searchsorted(v, side='left', sorter=None)
set_name(name)

Assign a name to a FastArray.

A FastArray is a wrapper around a NumPy ndarray. When a FastArray is created, it has no name. You can assign it a name using set_name.

Interactions with Dataset Objects

When an unnamed FastArray is added to a Dataset:

  • The FastArray inherits the name of the Dataset column.

  • Calling fa.set_name or ds.col.set_name, or changing the displayed column name via ds.col_rename, changes the name assigned to the FastArray.

    • Note that calling fa.set_name or ds.col.set_name doesn’t change the displayed column name.

When a named FastArray is added to a Dataset:

  • A new FastArray instance is created that inherits the Dataset column name.

  • Calling ds.col.set_name or changing the displayed column name via ds.col_rename changes the new instance’s name.

  • Calling set_name on the original FastArray instance changes only that instance’s name.

In both cases, the NumPy array underlying the FastArray is shared – changes to its values appear in the Dataset column, and vice-versa.

Interactions with FastArray Objects

Parameters:

name (str) – The name to assign to the FastArray.

Returns:

The FastArray is returned. The name can be accessed using FastArray.get_name().

Return type:

FastArray

Examples

>>> a = rt.arange(5)
>>> a.set_name('FA Name')
FastArray([0, 1, 2, 3, 4])

You can get the name using FastArray.get_name():

>>> a.get_name()
'FA Name'

When an unnamed FastArray is added to a Dataset column, the FastArray inherits the name of the column.

>>> a = rt.FastArray([1, 2, 3])
>>> ds = rt.Dataset()
>>> ds.Column_Name = a
>>> a.get_name()
'Column_Name'

Calling ds.col.set_name changes the name assigned to the FastArray (but not the displayed column name).

>>> ds.Column_Name.set_name('New Name')
>>> a.get_name()
'New Name'
>>> ds
#   Column_Name
-   -----------
0             1
1             2
2             3

When a named FastArray is added to a Dataset column, a new FastArray instance is created that inherits the column name. The original instance is not renamed.

>>> a = rt.FastArray([1, 2, 3])
>>> a.set_name('FA Name')
>>> ds = rt.Dataset()
>>> ds.Column_Name = a
>>> ds.Column_Name.get_name()
'Column_Name'
>>> a.get_name()
'FA Name'

Changing the displayed column name affects the name of the new instance, but not the name of the original FastArray.

>>> ds.col_rename('Column_Name', 'New_Column')
>>> ds.New_Column.get_name()
'New_Column'
>>> a.get_name()
'FA Name'
shift(periods=1, invalid=None)

Shift an array’s elements right or left.

Newly empty elements at either end (resulting from the shift) are filled with the invalid value for the input array’s data type.

Parameters:

periods (int, default 1) – Number of element positions to shift right (if positive) or left (if negative).

Returns:

A shifted FastArray. Newly empty elements are filled with the invalid values for the input array’s data type.

Return type:

FastArray

See also

FastArray.diff

Return a FastArray containing the differences between adjacent input array values.

Categorical.shift

Shift values in the Categorical by a specified number of periods.

Examples

Shift array elements one position to the right:

>>> a = rt.FA([0, 2, 4, 8, 16, 32])
>>> a
FastArray([ 0,  2,  4,  8, 16, 32])
>>> a.shift()
FastArray([-2147483648,           0,           2,           4,
                     8,          16])

Shift array elements two positions to the left:

>>> a.shift(-2)
>>> FastArray([          4,           8,          16,          32,
           -2147483648, -2147483648])

Specify a shift value greater than the array length:

>>> a.shift(10)
FastArray([-2147483648, -2147483648, -2147483648, -2147483648,
           -2147483648, -2147483648])
sign(**kwargs)
squeeze(*args, **kwargs)
statx()
std(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)

Compute the standard deviation of the values in the first argument.

Riptable uses the convention that ddof = 1, meaning the standard deviation of [x_1, ..., x_n] is defined by std = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly False, std returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.std(dtype = my_type) is equivalent to my_type(x.std()).

Returns:

The standard deviation of the values.

Return type:

scalar

See also

numpy.std

FastArray.nanstd

Computes the standard deviation of FastArray values, ignoring NaNs.

Dataset.std

Computes the standard deviation of numerical Dataset columns.

GroupByOps.std

Computes the standard deviation of each group. Used by Categorical objects.

Notes

The dtype keyword for FastArray.std specifies the data type of the result. This differs from numpy.std, where it specifies the data type used to compute the standard deviation.

Notes on Using NumPy Parameters

Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.std). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the variance as it is in numpy.std.

Also note that if you use any of the following NumPy parameters and also include a filter keyword argument (which numpy.std does not accept), Riptable’s implementation of std will be used with the filter argument and the NumPy parameters will be ignored.

axisNone or int or tuple of ints, optional

Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

New in version 1.7.0.

If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

If the default value is passed, then keepdims will not be passed through to the std method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims, any exceptions will be raised.

ddofint, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero for the NumPy implementation, versus one for the Riptable implementation.

Examples

>>> a = rt.FastArray([1, 2, 3])
>>> a.std()
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3])
>>> a.std(dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3])
>>> b = rt.FA([False, True, True])
>>> a.std(filter = b)
0.7071067811865476
str()

Casts an array of byte strings or unicode as FAString.

Enables a variety of useful string manipulation methods.

Return type:

FAString

Raises:

TypeError – If the FastArray is of dtype other than byte string or unicode

See also

np.chararray, np.char, rt.FAString.apply

Examples

>>> s=FA(['this','that','test ']*100_000)
>>> s.str.upper
FastArray([b'THIS', b'THAT', b'TEST ', ..., b'THIS', b'THAT', b'TEST '],
          dtype='|S5')
>>> s.str.lower
FastArray([b'this', b'that', b'test ', ..., b'this', b'that', b'test '],
          dtype='|S5')
>>> s.str.removetrailing()
FastArray([b'this', b'that', b'test', ..., b'this', b'that', b'test'],
          dtype='|S5')
str_append(other)
tile(reps)

See riptable.tile.

timewindow_prod(time_array, time_dist)

The input array must be int64 and sorted with ever increasing values. Multiplies up the values for a given time window.

Parameters:
  • time_array (sorted integer array of timestamps) –

  • time_dist (integer value of the time window size) –

Examples

>>> a=rt.arange(10, dtype=rt.int64)
>>> a.timewindow_prod(a,5)
FastArray([    0,     0,     0,     0,     0,     0,   720,  5040, 20160, 60480], dtype=int64)
timewindow_sum(time_array, time_dist)

The input array must be int64 and sorted with ever increasing values. Sums up the values for a given time window.

Parameters:
  • time_array (sorted integer array of timestamps) –

  • time_dist (integer value of the time window size) –

Examples

>>> a=rt.arange(10, dtype=rt.int64)
>>> a.timewindow_sum(a,5)
FastArray([ 0,  1,  3,  6, 10, 15, 21, 27, 33, 39], dtype=int64)
to_arrow(type=None, *, preserve_fixed_bytes=False, empty_strings_to_null=True)

Convert this FastArray to a pyarrow.Array.

Parameters:
  • type (pyarrow.DataType, optional, defaults to None) –

  • preserve_fixed_bytes (bool, optional, defaults to False) – If this FastArray is an ASCII string array (dtype.kind == ‘S’), set this parameter to True to produce a fixed-length binary array instead of a variable-length string array.

  • empty_strings_to_null (bool, optional, defaults To True) – If this FastArray is an ASCII or Unicode string array, specify True for this parameter to convert empty strings to nulls in the output. riptable inconsistently recognizes the empty string as an ‘invalid’, so this parameter allows the caller to specify which interpretation they want.

Return type:

pyarrow.Array or pyarrow.ChunkedArray

Notes

TODO: Add bool parameter which directs the conversion to choose the most-compact output type possible?

This would be relevant to indices of categorical/dictionary-encoded arrays, but could also make sense for regular FastArray types (e.g. to use an int8 instead of an int32 when it’d be a lossless conversion).

transitions(periods=1, fancy=False)

Returns a boolean array. The boolean array is set to True when the previous item in the array does not equal the current. Use -1 instead of 1 if you want True set when the next item in the array does not equal the previous. See also: differs

Parameters:
  • periods (int) – The number of elements to look ahead (or behind), defaults to 1

  • fancy (bool) – Indicates whether to return a fancy_index instead of a boolean array, defaults to False.

Returns:

boolean FastArray, or fancyIndex (see: fancy kwarg)

>>> a = FastArray([0,1,2,3,3,3,4])
>>> a.transitions(periods=1)
FastArray([False, True, True, True, False, False, True])
>>> a.transitions(periods=2)
FastArray([False, False, True, True, True, False, True])
>>> a.transitions(periods=-1)
FastArray([ True, True, True, False, False, True, False])
trunc(**kwargs)
unique(return_index=False, return_inverse=False, return_counts=False, sorted=True, lex=False, dtype=None, filter=None, **kwargs)

Find the unique elements of an array or the unique combinations of elements with corresponding indices in multiple arrays.

See riptable.unique() for full documentation.

var(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)

Compute the variance of the values in the first argument.

Riptable uses the convention that ddof = 1, meaning the variance of [x_1, ..., x_n] is defined by var = 1/(n - 1) * sum(x_i - mean )**2 (note the n - 1 instead of n). This differs from NumPy, which uses ddof = 0 by default.

Parameters:
  • filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly False, var returns a ZeroDivisionError.

  • dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a FastArray x, x.var(dtype = my_type) is equivalent to my_type(x.var()).

Returns:

The variance of the values.

Return type:

scalar

See also

numpy.var

FastArray.nanvar

Computes the variance of FastArray values, ignoring NaNs.

Dataset.var

Computes the variance of numerical Dataset columns.

GroupByOps.var

Computes the variance of each group. Used by Categorical objects.

Notes

The dtype keyword for FastArray.var specifies the data type of the result. This differs from numpy.var, where it specifies the data type used to compute the variance.

Notes on Using NumPy Parameters

Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (numpy.var). However, until a reported bug is fixed, if you also include the dtype parameter it will be applied to the result, not used to compute the variance as it is in numpy.var.

Also note that if you use any of the following NumPy parameters and also include a filter keyword argument (which numpy.var does not accept), Riptable’s implementation of var will be used with the filter argument and the NumPy parameters will be ignored.

axisNone or int or tuple of ints, optional

Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of ndarray, however any non-default value will be. If the sub-classes’ method does not implement keepdims, any exceptions will be raised.

ddofint, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero for the NumPy implementation, versus one for the Riptable implementation.

Examples

>>> a = rt.FastArray([1, 2, 3])
>>> a.var()
1.0

With a dtype specified:

>>> a = rt.FastArray([1, 2, 3])
>>> a.var(dtype = rt.int32)
1

With a filter:

>>> a = rt.FastArray([1, 2, 3])
>>> b = rt.FastArray([False, True, True])
>>> a.var(filter = b)
0.5
where(condition, y=np.nan)

Return a new FastArray in which values are replaced where a given condition is False.

To also provide a value for where the condition is True, use riptable.where().

Parameters:
  • condition (bool or array of bool) – Where the condition is True, keep the original value. Where False, replace with y (if y is a scalar) or the corresponding value from y (if y is an array). If condition is an array or a a comparison that returns an array, the array must be the same length as the calling FastArray.

  • y (scalar, array, or callable, default np.nan) – The value to use where condition is False. If y is an array or a callable that returns an array, it must be the same length as the calling FastArray. The value of y that corresponds to the False value is used.

Returns:

A new FastArray with values replaced where condition is False.

Return type:

FastArray

See also

riptable.where

Replace values depending on whether a given condition is True or False.

Examples

condition is a comparison that creates an array of booleans, and y is a scalar:

>>> a = rt.FastArray(rt.arange(5))
>>> a
FastArray([0, 1, 2, 3, 4])
>>> a.where(a > 2, 100)
FastArray([100, 100, 100,   3,   4])

condition and y are same-length arrays:

>>> condition = rt.FastArray([True, True, False, False, False])
>>> y = rt.FastArray([100, 200, 300, 400, 500])
>>> a.where(condition, y)
FastArray([  0,   1, 300, 400, 500])
class riptable.rt_fastarray.Ledger
static clear()

Clear all the entries in the math ledger

static dump(dataset=True)

Print out the math ledger

static off()

Turn the math ledger off

static on()

Turn the math ledger on to record all array math routines

static to_file(filename)

Save the math ledger to a file

class riptable.rt_fastarray.Recycle
static now(timeout=0)

Pass the garbage collector timeout value to cleanup. Also calls the python garbage collector.

Parameters:

timeout (default to 0. 0 will not set a timeout) –

Return type:

total arrays deleted

static off()
static on()

Turn riptable recycling on. Used only when riptable recycling was turned off.

Example

a=arange(1_000_00) Recycle.off() %timeit a=a + 1 Recycle.on() %timeit a=a + 1

static timeout(timeout=100)

Pass the garbage collector timeout value to expire. The timeout value is roughly in 2/5 secs. A value of 100 is usually about 40 seconds. If an array has not been reused by the timeout, it is permanently deleted.

Return type:

previous timespan

class riptable.rt_fastarray.Threading
static off()

Turn riptable threading off. Useful for when the system has other processes using other threads or to limit threading resources.

Example

a=rt.arange(1_000_00) Threading.off() %time a+=1 Threading.on() %time a+=1

Return type:

Previously whether threading was on or not. 0 or 1. 0=threading was off before.

static on()

Turn riptable threading on. Used only when riptable threading was turned off.

Example

a=rt.arange(1_000_00) Threading.off() %time a+=1 Threading.on() %time a+=1

Return type:

Previously whether threading was on or not. 0 or 1. 0=threading was off before.

static threads(threadcount)

Set how many worker threads riptable can use. Often defaults to 12 and cannot be set below 1 or > 31.

To turn riptable threading off completely use Threading.off() Useful for when the system has other processes using other threads or to limit threading resources.

Example

Threading.threads(8)

Return type:

number of threads previously used