riptable.rt_fastarray
Classes
A |
|
- class riptable.rt_fastarray.FastArray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)
Bases:
numpy.ndarray
A
FastArray
is a 1-dimensional array of items that are the same data type.Because it’s a subclass of NumPy’s
numpy.ndarray
, allndarray
functions and attributes can be used withFastArray
objects. However, Riptable optimizes many of NumPy’s functions to make them faster and more memory-efficient. Riptable has also added some methods.FastArray
objects with more than 1 dimension are not supported.See NumPy’s docs for details on all
ndarray
methods and attributes.- Parameters:
arr (array, iterable, or scalar value) – Contains data to be stored in the
FastArray
.**kwargs – Additional keyword arguments to be passed to the function.
Notes
To improve performance,
FastArray
objects take over some of NumPy’s universal functions (ufuncs), use array recycling and multiple threads, and pass certain method calls to Bottleneck.Note that whenever Riptable has implemented its own version of an existing NumPy method, a call to the NumPy method results in a call to the optimized Riptable version instead. We encourage users to directly call the Riptable method in order to avoid any confusion as to what method is actually being called.
See the list of NumPy Methods Optimized by Riptable for FastArrays.
Examples
Construct a FastArray
Pass a list to the constructor:
>>> rt.FastArray([1, 2, 3, 4, 5]) FastArray([1, 2, 3, 4, 5])
>>> #NOTE: rt.FA also works. >>> rt.FA([1.0, 2.0, 3.0, 4.0, 5.0]) FastArray([1., 2., 3., 4., 5.])
Or use a utility function:
>>> rt.full(10, 0.7) FastArray([0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7])
>>> rt.arange(10) FastArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
You can optionally specify a data type:
>>> x = rt.FastArray([3, 6, 10], dtype = rt.float64) >>> x, x.dtype (FastArray([ 3., 6., 10.]), dtype('float64'))
>>> # Using a string shortcut: >>> x = rt.FastArray([3,6,10], dtype = 'float64') >>> x, x.dtype (FastArray([ 3., 6., 10.]), dtype('float64'))
By default, characters are stored as byte strings. When
unicode=True
, theFastArray
allows Unicode characters.>>> rt.FA(list('abc'), unicode=True) FastArray(['a', 'b', 'c'], dtype='<U1')
To convert an existing NumPy array, use the
FastArray
constructor.>>> np_arr = np.array([1, 2, 3]) >>> rt.FA(np_arr) FastArray([1, 2, 3])
To view the NumPy array as a
FastArray
(which is slightly less expensive than using the constructor), use theview
method.>>> fa = np_arr.view(FA) >>> fa FastArray([1, 2, 3])
To view it as a NumPy array again:
>>> fa.view(np.ndarray) array([1, 2, 3])
>>> # Alternatively: >>> fa._np array([1, 2, 3])
Get a Subset of a FastArray
You can use standard Python slicing notation or fancy indexing to access a subset of a
FastArray
.>>> # Create a FastArray: >>> array = rt.arange(8)**2 >>> array FastArray([0, 1, 4, 9, 16, 25, 36, 49]) >>> # Use Python slicing to get elements 2, 3, and 4: >>> array[2:5] FastArray([4, 9, 16])
>>> # Use fancy indexing to get elements 2, 4, and 1 (in that order): >>> array[[2, 4, 1]] FastArray([4, 16, 1])
For more details, see the examples for 1-dimensional arrays in NumPy’s docs: Indexing on ndarrays.
Note that slicing creates a view of the array and does not copy the underlying data; modifying the slice modifies the original array. Fancy indexing creates a copy of the extracted data; modifying this array does not modify the original array.
You can also pass a Boolean mask array.
>>> # Create a Boolean mask: >>> evenMask = (array % 2 == 0) >>> evenMask FastArray([True, False, True, False, True, False, True, False]) >>> # Index using the Boolean mask: >>> array[evenMask] FastArray([0, 4, 16, 36])
How to Subclass FastArray
Include the required class definition:
>>> class TestSubclass(FastArray): ... def __new__(cls, arr, **args): ... # Before this call, arr needs to be a np.ndarray instance. ... return arr.view(cls) ... def __init__(self, arr, **args): ... pass
If the subclass is computable, you might define your own math operations. In these operations, you might define what the subclass can be computed with. For examples of new definitions, see the
DateTimeNano
class.Common operations to hook are comparisons (
__eq__()
,__ne__()
,__gt__()
,__lt__()
,__le__()
,__ge__()
) and basic math functions (__add__()
,__sub__()
,__mul__()
, etc.).Bracket indexing operations are very common. If the subclass needs to set or return a value other than that in the underlying array, you need to take over
__getitem__()
or__setitem__()
.Indexing is also used in display. For regular console/notebook display, you need to take over:
__repr__()
__str__()
_repr_html_()
(for JupyterLab and Jupyter notebooks)
If the array is being displayed in a
Dataset
and you require certain formatting, you need to define two more methods:display_query_properties()
Returns an
ItemFormat
object (seert.Utils.rt_display_properties
)display_convert_func()
The conversion function returned by
display_query_properties()
must return a string. Each item being displayed, the result of__getitem__()
at a single index, will go through this function individually, accompanied by anItemFormat
object.
Many Riptable operations need to return arrays of the same class they received. To ensure that your subclass will retain its special properties, you need to take over
newclassfrominstance()
. Failure to take this over will often result in an object with uninitialized variables.copy()
is another method that is called generically in Riptable routines, and needs to be taken over to retain subclass properties.For a view of the underlying
FastArray
, you can use the_fa
property.- class _ArrayFunctionHelper
Array function helper is responsible maintaining the array function protocol array implementations in the form of the following API:
get_array_function: given the Numpy function, returns overridden array function
get_array_function_type_compatibility_check: given the Numpy function, returns overridden array function type compatibility check
register_array_function: a function decorator whose argument is the Numpy function to override and the function that will override it
register_array_function_type_compatibility: similar to register_array_function, but guards against incompatible array function protocol type arguments for the given Numpy function
deregister: deregistration of the Numpy function and type compatibility override
deregister_array_function_type_compatibility: deregistration of Numpy function type compatibility override
- HANDLED_FUNCTIONS: Dict[callable, callable]
Dictionary of Numpy API function with overridden functions.
- HANDLED_TYPE_COMPATIBILITY_CHECK: Dict[callable, callable]
Dictionary of type compatibility functions per each Numpy API overridden function.
- classmethod deregister(np_function)
- classmethod deregister_array_function(np_function)
Deregistration of the Numpy function and type compatibility override.
- Parameters:
np_function (callable) – The overridden Numpy array function.
- classmethod deregister_array_function_type_compatibility(np_function)
Deregistration of the Numpy function and type compatibility override.
- Parameters:
np_function (callable) – The overridden Numpy array function.
- classmethod get_array_function(np_function)
Given the Numpy function, returns overridden array function if implemented, otherwise None.
- Parameters:
np_function (callable) – The overridden Numpy array function.
- Returns:
The overridden function as a callable or None if it’s not implemented.
- Return type:
callable, optional
- classmethod get_array_function_type_compatibility_check(np_function)
Given the Numpy function, returns the corresponding array function type compatibility callable, otherwise None.
- Parameters:
np_function (callable) – The overridden Numpy array function.
- Returns:
The overridden type compatibility function as a callable or None if it’s not implemented.
- Return type:
callable, optional
- classmethod register_array_function(np_function)
A function decorator whose argument is the Numpy function to override and the function that will override it. This registers the
np_function
with the function that it decorates.- Parameters:
np_function (callable) – The overridden Numpy array function.
- Returns:
The decorator that registers
np_function
with the decorated function.- Return type:
callable
- classmethod register_array_function_type_compatibility(np_function)
This registers the type compatibility check for the
np_function
with the function that it decorates.- Parameters:
np_function (callable) – The overridden Numpy array function.
- Returns:
The decorator that registers the type compatibility check for the
np_function
with the decorated function.- Return type:
callable
- property _np: numpy.ndarray
Return a NumPy array view of the input
FastArray
.- Returns:
A NumPy array view of the input
FastArray
.- Return type:
See also
numpy.ndarray.view
Can be used to view a NumPy array as a
FastArray
.
Examples
Return a NumPy array view for an integer
FastArray
:>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a FastArray([1, 2, 3, 4, 5]) >>> a._np array([1, 2, 3, 4, 5])
Changes to the view are reflected in the original
FastArray
:>>> npview = a._np >>> npview[2] = 10 >>> a FastArray([ 1, 2, 10, 4, 5])
To view a NumPy array as a
FastArray
, you can usenumpy.ndarray.view
:>>> npview.view(rt.FastArray) FastArray([1, 2, 10, 4, 5])
- property crc: int
Calculate the 32-bit CRC of the data in this array using the Castagnoli polynomial (CRC32C).
This function does not consider the array’s shape or strides when calculating the CRC, it simply calculates the CRC value over the entire buffer described by the array.
Examples
can be used to compare two arrays for structural equality >>> a = arange(100) >>> b = arange(100.0) >>> a.crc == b.crc False
- property doc
Return the
Doc
object for the inputFastArray
.If no
Doc
object exists, returnNone
.See also
FastArray.info
Return a description of the input array’s contents.
apply_schema
Set
Doc
object values.
Examples
No
Doc
object exists:>>> a = rt.FA([1, 2, 3, 4, 5]) >>> print(a.doc) None
Apply a schema and return the
Doc
object:>>> schema = {"Description": "This is an array", "Steward": "Brian", "Type": "int32"} >>> a.apply_schema(schema) {} >>> a.doc Description: This is an array Steward: Brian Type: int32
Return specific
Doc
object information:>>> a.doc._type 'int32' >>> a.doc._descrip 'This is an array' >>> a.doc._steward 'Brian' >>> print(a.doc._detail) None
- property inv: Any
Return the invalid value for the input array’s data type.
- Returns:
The invalid value for the input array’s dtype. For example,
int8
returns -128,uint8
returns 255, andbool_
returnsFalse
.- Return type:
Any
See also
FastArray.copy_invalid
Return a copy of a
FastArray
filled with the invalid value for the array’s dtype.FastArray.fill_invalid
Replace the values of a
FastArray
with the invalid value for the array’s dtype.INVALID_DICT
A mapping of invalid values to dtypes.
Examples
Return the invalid value for an integer array:
>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a FastArray([1, 2, 3, 4, 5]) >>> a.inv -2147483648
Return the invalid value for a floating-point array:
>>> a2 = rt.FA([0., 1., 2., 3., 4.]) >>> a2 FastArray([0., 1., 2., 3., 4.]) >>> a2.inv nan
Return the invalid value for a string array:
>>> a3 = rt.FA(["AMZN", "IBM", "MSFT", "AAPL"]) >>> a3 FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4') >>> a3.inv b''
- property numbastring
converts byte string and unicode strings to a 2dimensional array so that numba can process it correctly
Examples
>>> @numba.jit(nopython=True) ... def numba_str(txt): ... x=0 ... for i in range(txt.shape[0]): ... if (txt[i,0]==116 and # 't' ... txt[i,1]==101 and # 'e' ... txt[i,2]==120 and # 'x' ... txt[i,3]==116): # 't' ... x += 1 ... return x >>> >>> x=FastArray(['some','text','this','is']) >>> numba_str(x.view(np.uint8).reshape((len(x), x.itemsize))) >>> numba_str(x.numbastring)
- CompressPickle = True
- FasterUFunc = True
- MAX_DISPLAY_LEN = 10
- NEW_ARRAY_FUNCTION_ENABLED = False
Enable implementation of array function protocol (default False).
- NoTolerance = False
- Recycle = True
- SafeConversions = True
- Verbose = 1
- WarningDict
- WarningLevel = 1
- _reduce_op_identity_value: Mapping[riptable.rt_enum.REDUCE_FUNCTIONS, Any]
- add
- div
- floordiv
- mod
- mul
- pow
- sub
- static _FastFunctionsOff()
- static _FastFunctionsOn()
- static _GCNOW(timeout=0)
Pass the garbage collector timeout value to cleanup. Passing 0 will force an immediate garbage collection.
- Return type:
Dictionary of memory heuristics including ‘TotalDeleted’
- static _GCSET(timeout=100)
Pass the garbage collector timeout value to expire The timeout value is roughly in 2/5 secs A value of 100 is usually about 40 seconds
- Return type:
Previous timespan
- static _LCLEAR()
Clear all the entries in the math ledger
- static _LDUMP(dataset=True)
Print out the math ledger
- static _LDUMPF(filename)
Save the math ledger to a file
- static _LOFF()
Turn the math ledger off
- static _LON()
Turn the math ledger on to record all array math routines
- static _OFF()
disable intercepting of array ufunc
- static _ON()
enable intercepting array ufunc
- static _RDUMP()
Displays to server’s stdout
- Return type:
Total size of items not in use
- static _ROFF(quiet=False)
Turn off recycling.
- Parameters:
quiet (bool, optional) –
- Return type:
True if recycling was previously on, else False
- static _RON(quiet=False)
Turn on recycling.
- Parameters:
quiet (bool, optional) –
- Return type:
True if recycling was previously on, else False
- static _TOFF()
- static _TON()
- static _V0()
- static _V1()
- static _V2()
- __array_finalize__(obj)
Finalizes self from other, called as part of ndarray.__new__()
- __array_function__(func, types, args, kwargs)
- __array_ufunc__(ufunc, method, *inputs, **kwargs)
The FastArray universal function (or ufunc) override offers multithreaded C/C++ implementation at the RiptideCPP layer.
- When FastArray receives a
ufunc
callable it will attempt to handle it in priority order: considering
FastArray
FastFunction
is enabled, ufunc is handled by an explicit ufunc override, otherwiseufunc is handled at the Riptable / Numpy API overrides level, otherwise
ufunc is handled at the Numpy API level.
Given a combination of
ufunc
,inputs
, andkwargs
, if neither of the aforementioned cases support this then a warning is emitted.- The following references to supported ufuncs are grouped by method type.
For
method
typereduce
, seegReduceUFuncs
.For
method
type__call__
, seegBinaryUFuncs
,gBinaryLogicalUFuncs
,gBinaryBitwiseUFuncs
, andgUnaryUFuncs
.For
method
typeat
returnNone
.
If
out
argument is specified, then an extra array copy is performed on the result of the ufunc computation.If a
dtype
keyword is specified, all efforts are made to respect thedtype
on the result of the computation.- Parameters:
ufunc (callable) – The ufunc object that was called.
method (str) – A string indicating which Ufunc method was called (one of “__call__”, “reduce”, “reduceat”, “accumulate”, “outer”, “inner”).
inputs – A tuple of the input arguments to the ufunc.
kwargs – A dictionary containing the optional input arguments of the ufunc. If given, any out arguments, both positional and keyword, are passed as a tuple in kwargs.
- Return type:
The method should return either the result of the operation, or NotImplemented if the operation requested is not implemented.
Notes
The current implementation does not support the following keyword arguments:
casting
,sig
,signature
, andcore_signature
.It has partial support for keyword arguments:
where
,axis
, andaxes
, if they match the default values.If FastArray’s
WarningLevel
is enabled, then warnings will be emitted if any of unsupported or partially supported keyword arguments are passed.TODO document custom up casting rules.
- When FastArray receives a
- __arrow_array__(type=None)
Implementation of the
__arrow_array__
protocol for conversion to a pyarrow array.- Parameters:
type (pyarrow.DataType, optional, defaults to None) –
- Return type:
Notes
- __eq__(other)
Return self==value.
- __ge__(other)
Return self>=value.
- __getitem__(fld)
riptable has special routines to handle array input in the indexer. Everything else will go to numpy getitem.
- __gt__(other)
Return self>value.
- __le__(other)
Return self<=value.
- __lt__(other)
Return self<value.
- __ne__(other)
Return self!=value.
- __reduce__()
Used for pickling. For just a FastArray we pass back the view of the np.ndarray, which then knows how to pickle itself. NOTE: I think there is a faster way.. possible returning a byte string.
- __setitem__(fld, value)
Used on the left hand side of arr[fld] = value
This routine tries to convert invalid dtypes to that invalids are preserved when setting The mbset portion of this is no written (which will not raise an indexerror on out of bounds)
- Parameters:
- Raises:
- static _argmax(a, axis=None, out=None)
- static _argmin(a, axis=None, out=None)
- classmethod _check_ndim(instance)
Iterates through dimensions of an array, counting how many dimensions have values greater than 1. Problems may occure with multidimensional FastArrays, and the user will be warned.
- _compare_check(func, other)
- static _empty_like(array, dtype=None, order='K', subok=True, shape=None)
- _fa_filter_wrapper(myFunc, filter=None, dtype=None)
- _fa_keyword_wrapper(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
- _fill_invalid_internal(shape=None, dtype=None, inplace=True, fill_val=None)
- static _from_arrow(arr, zero_copy_only=True, writable=False, auto_widen=False)
Convert a pyarrow
Array
to a riptableFastArray
.- Parameters:
arr (pyarrow.Array or pyarrow.ChunkedArray) –
zero_copy_only (bool, default True) – If True, an exception will be raised if the conversion to a
FastArray
would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types).writable (bool, default False) – For a
FastArray
created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable.auto_widen (bool, optional, default to False) – When False (the default), if an arrow array contains a value which would be considered the ‘invalid’/NA value for the equivalent dtype in a
FastArray
, raise an exception because direct conversion would be lossy / change the semantic meaning of the data. When True, the converted array will be widened (if possible) to the next-largest dtype to ensure the data will be interpreted in the same way.
- Return type:
- _internal_self_compare(math_op, periods=1, fancy=False)
internal routine used for differs and transitions
- _is_not_supported(arr)
returns True if a numpy array is not FastArray internally supported
- _kwarg_check(*args, **kwargs)
- _legacy_array_function(func, types, args, kwargs)
Called before array_ufunc. Does not get called for every function np.isnan/trunc/true_divide for instance.
- static _max(a, axis=None, out=None, keepdims=None, initial=None, where=None)
- static _mean(a, axis=None, dtype=None, out=None, keepdims=None)
- static _min(a, axis=None, out=None, keepdims=None, initial=None, where=None)
- static _nanargmax(a, axis=None)
- static _nanargmin(a, axis=None)
- static _nanmax(a, axis=None, out=None, keepdims=None)
- static _nanmean(a, axis=None, dtype=None, out=None, keepdims=None)
- static _nanmin(a, axis=None, out=None, keepdims=None)
- static _nanstd(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
- static _nansum(a, axis=None, dtype=None, out=None, keepdims=None)
- static _nanvar(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
- _new_array_function(func, types, args, kwargs)
FastArray implementation of the array function protocol.
- Parameters:
func (callable) – An callable exposed by NumPy’s public API, which was called in the form
func(*args, **kwargs)
.types (tuple) – A tuple of unique argument types from the original NumPy function call that implement
__array_function__
.args (tuple) – The tuple of arguments that will be passed to
func
.kwargs (dict) – The dictionary of keyword arguments that will be passed to
func
.
- Raises:
TypeError – If
func
is not overridden by a corresponding riptable array function then a TypeError is raised.
Notes
This array function implementation requires each class, such as FastArray and any other derived class, to implement their own version of the Numpy array function API. In the event these array functions defer to the inheriting class they will need to either re-wrap the results in the correct type or raise exception if a particular operation is not well-defined nor meaningful for the derived class. If an array function, which is also a universal function, is not overridden as an array function, but defined as a ufunc then it will not be called unless it is registered with the array function helper since array function protocol takes priority over the universal function protocol.
Reference: NEP 18 Array Function Protocol
- classmethod _possibly_warn(warning_string)
- classmethod _py_number_to_np_dtype(val, dtype)
Convert a python type to numpy dtype. Only handles integers.
- _reduce_check(reduceFunc, npFunc, *args, **kwargs)
Arg2: npFunc pass in None if no numpy equivalent function
- static _round_(a, decimals=None, out=None)
- static _std(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
- static _sum(a, axis=None, dtype=None, out=None, keepdims=None, initial=None, where=None)
- _unary_op(funcnum, fancy=False)
- static _var(a, axis=None, dtype=None, out=None, ddof=None, keepdims=None)
- _view_internal(type=None)
FastArray subclasses need to take this over if they want to make a shallow copy of a fastarray instead of viewing themselves as a fastarray (which drops their other properties). Taking over view directly may have a lot of unintended consequences.
- abs(**kwargs)
- apply(pyfunc, *args, otypes=None, doc=None, excluded=None, cache=False, signature=None)
Generalized function class. see: np.vectorize
Creates and then applies a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output. The vectorized function evaluates
pyfunc
over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.The data type of the output of
vectorized
is determined by calling the function with the first element of the input. This can be avoided by specifying theotypes
argument.- Parameters:
pyfunc (callable) – A python function or method.
otypes (str or list of dtypes, optional) – The output data type. It must be specified as either a string of typecode characters or a list of data type specifiers. There should be one data type specifier for each output.
doc (str, optional) – The docstring for the function. If
None
, the docstring will be thepyfunc.__doc__
.excluded (set, optional) –
Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to
pyfunc
unmodified.New in version 1.7.0.
cache (bool, optional) –
If
True
, then cache the first function call that determines the number of outputs ifotypes
is not provided.New in version 1.7.0.
signature (string, optional) –
Generalized universal function signature, e.g.,
(m,n),(n)->(m)
for vectorized matrix-vector multiplication. If provided,pyfunc
will be called with (and expected to return) arrays with shapes given by the size of corresponding core dimensions. By default,pyfunc
is assumed to take scalars as input and output.New in version 1.12.0.
- Returns:
vectorized – Vectorized function.
- Return type:
callable
See also
Examples
>>> def myfunc(a, b): ... "Return a-b if a>b, otherwise return a+b" ... if a > b: ... return a - b ... else: ... return a + b >>> >>> a=arange(10) >>> b=arange(10)+1 >>> a.apply(myfunc,b) FastArray([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
Example with one input array
>>> def square(x): ... return x**2 >>> >>> a=arange(10) >>> a.apply(square) FastArray([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
Example with lambda
>>> a=arange(10) >>> a.apply(lambda x: x**2) FastArray([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
Example with numba
>>> from numba import jit >>> @jit ... def squareit(x): ... return x**2 >>> a.apply(squareit) FastArray([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
Examples to use existing builtin oct function but change the output from string, to unicode, to object
>>> a=arange(10) >>> a.apply(oct, otypes=['S']) FastArray([b'0o0', b'0o1', b'0o2', b'0o3', b'0o4', b'0o5', b'0o6', b'0o7', b'0o10', b'0o11'], dtype='|S4')
>>> a=arange(10) >>> a.apply(oct, otypes=['U']) FastArray(['0o0', '0o1', '0o2', '0o3', '0o4', '0o5', '0o6', '0o7', '0o10', '0o11'], dtype='<U4')
>>> a=arange(10) >>> a.apply(oct, otypes=['O']) FastArray(['0o0', '0o1', '0o2', '0o3', '0o4', '0o5', '0o6', '0o7', '0o10', '0o11'], dtype=object)
- apply_numba(*args, otype=None, myfunc='myfunc', name=None)
Print to screen an example numba signature for the array.
You can then copy this example to build your own numba function.
- Parameters:
Examples
>>> import numba >>> @numba.guvectorize(['void(int64[:], int64[:])'], '(n)->(n)') ... def squarev(x,out): ... for i in range(len(x)): ... out[i]=x[i]**2 ... >>> a=arange(1_000_000).astype(np.int64) >>> squarev(a) FastArray([ 0, 1, 4, ..., 999994000009, 999996000004, 999998000001], dtype=int64)
- apply_pandas(func, convert_dtype=True, args=(), **kwds)
Invoke function on values of FastArray. Can be ufunc (a NumPy function that applies to the entire FastArray) or a Python function that only works on single values
- Parameters:
func (function) –
convert_dtype (boolean, default True) – Try to find better dtype for elementwise function results. If False, leave as dtype=object
args (tuple) – Positional arguments to pass to function in addition to the value
function (Additional keyword arguments will be passed as keywords to the) –
- Returns:
y
- Return type:
FastArray or Dataset if func returns a FastArray
See also
FastArray.map
For element-wise operations
FastArray.agg
only perform aggregating type operations
FastArray.transform
only perform transforming type operations
Examples
Create a FastArray with typical summer temperatures for each city.
>>> fa = rt.FastArray([20, 21, 12], index=['London', 'New York','Helsinki']) >>> fa London 20 New York 21 Helsinki 12 dtype: int64
Square the values by defining a function and passing it as an argument to
apply()
.>>> def square(x): ... return x**2 >>> fa.apply(square) London 400 New York 441 Helsinki 144 dtype: int64
Square the values by passing an anonymous function as an argument to
apply()
.>>> fa.apply(lambda x: x**2) London 400 New York 441 Helsinki 144 dtype: int64
Define a custom function that needs additional positional arguments and pass these additional arguments using the
args
keyword.>>> def subtract_custom_value(x, custom_value): ... return x-custom_value >>> fa.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64
Define a custom function that takes keyword arguments and pass these arguments to
apply
.>>> def add_custom_values(x, **kwargs): ... for month in kwargs: ... x+=kwargs[month] ... return x >>> fa.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64
Use a function from the Numpy library.
>>> fa.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64
- apply_schema(schema)
Apply a schema containing descriptive information to the FastArray
- Parameters:
schema – dict
- Returns:
dictionary of deviations from the schema
- argmax(**kwargs)
- argmin(**kwargs)
- argpartition2(*args, **kwargs)
- astype(dtype, order='K', casting='unsafe', subok=True, copy=True)
Return a
FastArray
with values converted to the specified data type.Check your results when you convert missing values. Sentinel values are preserved when Riptable handles the conversion. However, in some cases the array is sent to NumPy for conversion and results may not be what you expect.
For parameter descriptions, see
numpy.ndarray.astype()
. Note that until a reported bug is fixed, thecasting
parameter is ignored when Riptable handles the conversion.See also
Dataset.astype
Examples
>>> a = rt.FastArray([1.7, 2.0, 3.0]) >>> a.astype(int) FastArray([1, 2, 3])
Convert a
NaN
to anint
sentinel and back:>>> a = rt.FastArray([rt.nan, 1.0, 2.0]) >>> a_int = a.astype(int) >>> a_int FastArray([-2147483648, 1, 2]) >>> a_int.astype(float) FastArray([nan, 1., 2.])
- between(low, high, include_low=True, include_high=False)
Return a boolean
FastArray
indicating which input values are in a specified interval.- Parameters:
low (scalar or array) – Lower bound for the interval. If an array, it must be the same size as
self
, and comparisons are done elementwise.high (scalar or array) – Upper bound for the interval. If an array, it must be the same size as
self
, and comparisons are done elementwise.include_low (bool, default
True
) – Specifies whetherlow
is included when performing comparisons.include_high (bool, default
False
) – Specifies whetherhigh
is included when performing comparisons.
- Returns:
A boolean
FastArray
indicating which input values are in a specified interval.- Return type:
Examples
Specify an interval using scalars:
>>> a = rt.FA([9, 2, 3, 5, 8, 9, 1, 4, 6]) >>> a.between(5, 9, include_low=False) # Exclude 5 (left endpoint). FastArray([False, False, False, False, True, False, False, False, True])
Specify an interval using arrays:
>>> a2 = rt.FA([1, 2, 3, 4, 5]) >>> a2.between([1, 3, 5, 5, 5], [2, 4, 6, 6, 6]) FastArray([ True, False, False, False, True])
Specify an interval mixing scalar and array bounds:
>>> a3 = rt.FA([1, 2, 3, 4, 5]) >>> a3.between(2, [2, 4, 6, 6, 6]) FastArray([False, True, True, True, True])
- clip_lower(a_min, **kwargs)
- clip_upper(a_max, **kwargs)
- copy(order='K')
Return a copy of the input
FastArray
.- Parameters:
order ({'K', 'C', 'F', 'A'}, default 'K') – Controls the memory layout of the copy: ‘K’ means match the layout of the input array as closely as possible; ‘C’ means row-based (C-style) order; ‘F’ means column-based (Fortran-style) order; ‘A’ means ‘F’ if the input array is formatted as ‘F’, ‘C’ if not.
- Returns:
A copy of the input
FastArray
.- Return type:
See also
Categorical.copy
Return a copy of the input
Categorical
.Dataset.copy
Return a copy of the input
Dataset
.Struct.copy
Return a copy of the input
Struct
.
Examples
Copy a
FastArray
:>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a FastArray([1, 2, 3, 4, 5]) >>> a2 = a.copy() >>> a2 FastArray([1, 2, 3, 4, 5]) >>> a2 is a False # The copy is a separate object.
- copy_invalid()
Return a copy of a
FastArray
filled with the invalid value for the array’s data type.- Returns:
A copy of the input array, filled with the invalid value for the array’s dtype.
- Return type:
See also
FastArray.inv
Return the invalid value for the input array’s dtype.
FastArray.fill_invalid
Replace the values of a
FastArray
with the invalid value for the array’s dtype.
Examples
Copy an integer array and replace with invalids:
>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a FastArray([1, 2, 3, 4, 5]) >>> a2 = a.copy_invalid() >>> a2 FastArray([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648]) >>> a FastArray([1, 2, 3, 4, 5]) # a is unchanged.
Copy a floating-point array and replace with invalids:
>>> a3 = rt.FA([0., 1., 2., 3., 4.]) >>> a3 FastArray([0., 1., 2., 3., 4.]) >>> a3.copy_invalid() FastArray([nan, nan, nan, nan, nan])
Copy a string array and replace with invalids:
>>> a4 = rt.FA(['AMZN', 'IBM', 'MSFT', 'AAPL']) >>> a4 FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4') >>> a4.copy_invalid() FastArray([b'', b'', b'', b''], dtype='|S4') # Invalid string value is an empty string.
- count(sorted=True, filter=None)
The count of each unique value.
This returns the same information that
.unique(return_counts = True)
does, except in aDataset
instead of a tuple.- Parameters:
- Returns:
A
Dataset
containing the unique values and their counts.- Return type:
See also
Examples
>>> a = rt.FastArray([0, 2, 1, 3, 3, 2, 2]) >>> a.count() *Unique Count ------- ----- 0 1 1 1 2 3 3 2
With
sorted = False
:>>> a.count(sorted = False) *Unique Count ------- ----- 0 1 2 3 1 1 3 2
- diff(periods=1)
Compute the differences between adjacent elements of a
FastArray
.Spaces at either end are filled with invalid values based on the input array’s dtype. If a calculated difference isn’t supported by the dtype, it is displayed as a NaN or rollover value. For example, negative differences in a
uint8
array are displayed as 255. To resolve this, you can explicitly upcast to the next larger signedint
dtype before calculating the differences.- Parameters:
periods (int, default 1) – Number of element positions to shift right (if positive) or left (if negative) before subtracting. Raises an error if set to 0.
- Returns:
An equivalent-length array containing the differences between input array elements that are adjacent or separated by a specified period. Spaces at either end are filled with invalids based on the input array’s dtype.
- Return type:
See also
FastArray.shift
Shift an array’s elements right or left.
Examples
Calculate differences using the
periods=1
default (array elements one position to the right):>>> a=rt.FA([0, 2, 4, 8, 16, 32]) >>> a FastArray([ 0, 2, 4, 8, 16, 32]) >>> a.diff() FastArray([-2147483648, 2, 2, 4, 8, 16])
Calculate differences using array elements two positions to the left:
>>> a.diff(-2) FastArray([ -4, -6, -12, -24, -2147483648, -2147483648])
Specify a
periods
value that is greater than the array length:>>> a.diff(10) FastArray([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648])
- differs(periods=1, fancy=False)
Identify array values that are the same as adjacent values.
Returns either a boolean
FastArray
, whereTrue
indicates equivalent values, or a fancy indexFastArray
containing the indices of equivalent values.- Parameters:
- Returns:
A boolean or fancy index array that identifies equivalent elements in the input array.
- Return type:
See also
FastArray.transitions
Identify nonequivalent items in the input array and return a boolean or fancy index array.
Examples
Return a boolean array using the
periods=1
default value (look behind one element position for comparisons):>>> a = rt.FA([1, 2, 2, 3, 2, 4, 5, 6, 2, 2, 5]) >>> a FastArray([1, 2, 2, 3, 2, 4, 5, 6, 2, 2, 5]) >>> a.differs() FastArray([False, False, True, False, False, False, False, False, False, True, False])
Return a boolean array and look ahead three element positions for comparisons:
>>> a.differs(periods=-3) FastArray([False, True, False, False, False, False, False, False, False, False, False])
Return a fancy index array using the
periods=1
default value (look behind one element position for comparisons):>>> a.differs(fancy=True) FastArray([2, 9])
Set
periods
to a number larger than the length of the input array:>>> a.differs(periods=15) FastArray([False, False, False, False, False, False, False, False, False, False, False])
- display_query_properties()
Returns an ItemFormat object and a function for converting the FastArrays items to strings. Basic types: Bool, Int, Float, Bytes, String all have default formats / conversion functions. (see Utils.rt_display_properties)
If a new type is a subclass of FastArray and needs to be displayed in format different from its underlying type, it will need to take over this routine.
- duplicated(keep='first', high_unique=False)
Return a boolean
FastArray
indicatingTrue
for duplicate items in the input array.- Parameters:
- Returns:
A boolean
FastArray
indicatingTrue
for duplicate items in the input array.- Return type:
See also
FastArray.nunique
Return the number of unique values in an array.
Dataset.duplicated
Return a boolean
FastArray
indicatingTrue
for duplicate rows.
Examples
Exclude the first occurrence of each duplicate (use the default
keep
value):>>> a = rt.FA([1, 2, 3, 4, 2, 7, 8, 8, 3]) >>> a FastArray([1, 2, 3, 4, 2, 7, 8, 8, 3]) >>> a.duplicated() FastArray([False, False, False, False, True, False, False, True, True])
Mark all duplicates:
>>> a.duplicated(keep=False) FastArray([False, True, True, False, True, False, True, True, True])
- eq(other)
- fill_invalid(shape=None, dtype=None, inplace=True)
Replace all values of the input
FastArray
with an invalid value.The invalid value used is determined by the input array’s dtype or a user-specified dtype.
Warning: By default, this operation is in place.
- Parameters:
shape (int or sequence of int, optional) – Shape of the new array, for example:
(2, 3)
or2
. Note that although multi-dimensional arrays are technically supported by Riptable, you may get unexpected results when working with them.dtype (str, optional) – The desired dtype for the returned array.
inplace (bool, default True) – If
True
(the default), modify original data. IfFalse
, return a copy of the array.
- Returns:
If
inplace=False
, a copy of the inputFastArray
is returned that has all values replaced with an invalid value. Otherwise, nothing is returned.- Return type:
FastArray, optional
See also
FastArray.inv
Return the invalid value for the input array’s dtype.
FastArray.copy_invalid
Return a copy of a
FastArray
filled with the invalid value for the array’s dtype.
Examples
Replace an integer array’s values with the invalid value for the array’s dtype. By default, the returned array is the same size and dtype as the input array, and the operation is performed in place:
>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a FastArray([1, 2, 3, 4, 5]) >>> a.fill_invalid() >>> a FastArray([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648])
Replace a floating-point array’s values with the invalid value for the
int32
dtype:>>> a2 = rt.FA([0., 1., 2., 3., 4.]) >>> a2 FastArray([0., 1., 2., 3., 4.]) >>> a2.fill_invalid(dtype="int32", inplace=False) FastArray([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648])
Specify the size and dtype of the output array:
>>> a3 = rt.FA(["AMZN", "IBM", "MSFT", "AAPL"]) >>> a3 FastArray([b'AMZN', b'IBM', b'MSFT', b'AAPL'], dtype='|S4') >>> a3.fill_invalid(2, dtype="bool", inplace=False) FastArray([False, False])
- fillna(value=None, method=None, inplace=False, limit=None)
Replace NaN and invalid values with a specified value or nearby data.
Optionally, you can modify the original
FastArray
if it’s not locked.- Parameters:
value (scalar or array, default None) – A value or an array of values to replace all NaN and invalid values. A
value
is required ifmethod = None
. An array can be used only whenmethod = None
. If an array is used, the number of values in the array must equal the number of NaN and invalid values.method ({None, 'backfill', 'bfill', 'pad', 'ffill'}, default None) –
Method to use to propagate valid values.
backfill/bfill: Propagates the next encountered valid value backward. Calls
FastArray.fill_backward()
.pad/ffill: Propagates the last encountered valid value forward. Calls
FastArray.fill_forward()
.None: A replacement value is required if
method = None
. CallsFastArray.replacena()
.
If there’s not a valid value to propagate forward or backward, the NaN or invalid value is not replaced unless you also specify a
value
.inplace (bool, default False) – If False, return a copy of the
FastArray
. If True, modify original data. This will modify any other views on this object. This fails if theFastArray
is locked.limit (int, default None) – If
method
is specified, this is the maximium number of consecutive NaN or invalid values to fill. If there is a gap with more than this number of consecutive NaN or invalid values, the gap will be only partially filled.
- Returns:
The
FastArray
will be the same size and dtype as the original array.- Return type:
See also
riptable.rt_fastarraynumba.fill_forward
Replace NaN and invalid values with the last valid value.
riptable.rt_fastarraynumba.fill_backward
Replace NaN and invalid values with the next valid value.
riptable.fill_forward
Replace NaN and invalid values with the last valid value.
riptable.fill_backward
Replace NaN and invalid values with the next valid value.
Dataset.fillna
Replace NaN and invalid values with a specified value or nearby data.
FastArray.replacena
Replace NaN and invalid values with a specified value.
Categorical.fill_forward
Replace NaN and invalid values with the last valid group value.
Categorical.fill_backward
Replace NaN and invalid values with the next valid group value.
GroupBy.fill_forward
Replace NaN and invalid values with the last valid group value.
GroupBy.fill_backward
Replace NaN and invalid values with the next valid group value.
Examples
Replace all NaN values with 0s:
>>> a = rt.FastArray([rt.nan, 1.0, rt.nan, rt.nan, rt.nan, 5.0]) >>> a.fillna(0) FastArray([0., 1., 0., 0., 0., 5.])
Replace all invalid values with 0s:
>>> b = rt.FastArray([0, 1, 2, 3, 4, 5]) >>> b[0:3] = b.inv >>> b.fillna(0) FastArray([0, 0, 0, 3, 4, 5])
Replace each instance of NaN with a different value:
>>> a.fillna([0, 2, 3, 4]) FastArray([0., 1., 2., 3., 4., 5.])
Propagate the last encountered valid value forward. Note that where there’s no valid value to propagate, the NaN or invalid value isn’t replaced.
>>> a.fillna(method = 'ffill') FastArray([nan, 1., 1., 1., 1., 5.])
You can use the
value
parameter to specify a value to use where there’s no valid value to propagate.>>> a.fillna(value = 0, method = 'ffill') FastArray([0., 1., 1., 1., 1., 5.])
Replace only the first NaN or invalid value in any consecutive series of NaN or invalid values.
>>> a.fillna(method = 'bfill', limit = 1) FastArray([ 1., 1., nan, nan, 5., 5.])
- filter(filter)
Return a copy of the
FastArray
containing only the elements that meet the specified condition.- Parameters:
filter (array: fancy index or Boolean mask) – A fancy index specifies both the desired elements and their order in the returned
FastArray
. When a Boolean mask is passed, only rows that meet the specified condition are in the returnedFastArray
.- Return type:
Notes
If you want to perform an operation on a filtered FastArray, it’s more efficient to perform the operation using the
filter
keyword argument. For example,my_fa.sum(filter = boolean_mask)
.Examples
Create a
FastArray
:>>> fa = rt.FastArray(np.linspace(0, 1, 11)) >>> fa FastArray([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
Filter using a fancy index:
>>> fa.filter([5, 0, 1]) FastArray([0.5, 0. , 0.1])
Filter using a condition that creates a Boolean mask array:
>>> fa.filter(fa > 0.75) FastArray([0.8, 0.9, 1. ])
- static from_arrow(arr, zero_copy_only=True, writable=False, auto_widen=False)
Convert a pyarrow
Array
to a riptableFastArray
.- Parameters:
arr (pyarrow.Array or pyarrow.ChunkedArray) –
zero_copy_only (bool, default True) – If True, an exception will be raised if the conversion to a
FastArray
would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types).writable (bool, default False) – For a
FastArray
created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable.auto_widen (bool, optional, default to False) – When False (the default), if an arrow array contains a value which would be considered the ‘invalid’/NA value for the equivalent dtype in a
FastArray
, raise an exception because direct conversion would be lossy / change the semantic meaning of the data. When True, the converted array will be widened (if possible) to the next-largest dtype to ensure the data will be interpreted in the same way.
- Return type:
- ge(other)
- get_name()
Get the name that’s assigned to a
FastArray
.When a
FastArray
object is created, it has no name. It can be assigned a name viaset_name
. For details, seeFastArray.set_name()
.- Returns:
The assigned name, or None if the array has not been named.
- Return type:
str or None
See also
Examples
Assign the
FastArray
a name usingFastArray.set_name()
:>>> a = rt.arange(5) >>> a.set_name('FA Name') FastArray([0, 1, 2, 3, 4])
Get the name:
>>> a.get_name() 'FA Name'
- gt(other)
- info(**kwargs)
Return a description of the input array’s contents.
This information is set using
FastArray.apply_schema
and includes the steward and dtype.- Parameters:
**kwargs (optional) – Keyword arguments passed to
rt_meta.info()
.- Returns:
A description of the input array’s contents.
- Return type:
See also
FastArray.doc
Categorical.info
Display a description of the input
Categorical
.Struct.info
Return an object containing a description of the input structure’s contents.
Examples
Return the description of the input array’s contents:
>>> a = rt.FA([1, 2, 3, 4, 5]) >>> a.info() Description: <no description> Steward: <no steward> Type: int32
Apply a schema and return the description of the input array’s contents:
>>> schema = {"Description": "This is an array", "Steward": "Brian"} >>> a.apply_schema(schema) {} >>> a.info() Description: This is an array Steward: Brian Type: int32
Return the description of the input array’s contents with a title:
>>> a.info(title="Test") Test ==== Description: This is an array Steward: Brian Type: int32
- iscomputable()
- isfinite(fancy=False)
Return a boolean array that’s True for each finite
FastArray
element, False otherwise.A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).
- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (finite) values.
- Returns:
An array or booleans or indices.
- Return type:
See also
FastArray.isnotfinite
,riptable.isfinite
,riptable.isnotfinite
,riptable.isinf
,riptable.isnotinf
,FastArray.isinf
,FastArray.isnotinf
Dataset.mask_or_isfinite
Return a boolean array that’s True for each
Dataset
row that has at least one finite value.Dataset.mask_and_isfinite
Return a boolean array that’s True for each
Dataset
row that contains all finite values.Dataset.mask_or_isinf
Return a boolean array that’s True for each
Dataset
row that has at least one value that’s positive or negative infinity.Dataset.mask_and_isinf
Return a boolean array that’s True for each
Dataset
row that contains all infinite values.
Examples
>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0]) >>> a.isfinite() FastArray([False, False, False, True])
With
fancy = True
:>>> a.isfinite(fancy = True) FastArray([3])
- isin(test_elements, *, assume_unique=False, invert=False)
Calculates
self in test_elements
, broadcasting overself
only. Returns a boolean array of the same shape asself
that is True where an element ofself
is intest_elements
and False otherwise.- Parameters:
test_elements (array_like) – The values against which to test each value of
element
. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.assume_unique (bool, optional) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
invert (bool, optional) – If True, the values in the returned array are inverted, as if calculating
element not in test_elements
. Default is False.np.isin(a, b, invert=True)
is equivalent to (but faster than)np.invert(np.isin(a, b))
.
- Returns:
isin (ndarray, bool) – Has the same shape as
element
. The valueselement[isin]
are intest_elements
.Note (behavior differs from pandas)
- Riptable favors bytestrings, and will make conversions from unicode/bytes to match for operations as necessary.
- We will also accept single scalars for values.
- Pandas series will return another series - we have no series, and will return a FastArray
Examples
>>> from riptable import * >>> a = FA(['a','b','c','d','e'], unicode=False) >>> a.isin(['a','b']) FastArray([ True, True, False, False, False]) >>> a.isin('a') FastArray([ True, False, False, False, False]) >>> a.isin({'b'}) FastArray([ False, True, False, False, False])
- isinf(fancy=False)
Return a boolean array that’s True for each
FastArray
element that’s positive or negative infinity, False otherwise.- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (infinite) values.
- Returns:
An array or booleans or indices.
- Return type:
See also
FastArray.isnotinf
,FastArray.isfinite
,FastArray.isnotfinite
,riptable.isinf
,riptable.isnotinf
,riptable.isfinite
,riptable.isnotfinite
Dataset.mask_or_isfinite
Return a boolean array that’s True for each
Dataset
row that has at least one finite value.Dataset.mask_and_isfinite
Return a boolean array that’s True for each
Dataset
row that contains all finite values.Dataset.mask_or_isinf
Return a boolean array that’s True for each
Dataset
row that has at least one value that’s positive or negative infinity.Dataset.mask_and_isinf
Return a boolean array that’s True for each
Dataset
row that contains all infinite values.
Examples
>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0]) >>> a.isinf() FastArray([ True, True, False, False])
With
fancy = True
:>>> a.isinf(fancy = True) FastArray([0, 1])
- isna()
isnan is mapped directly to isnan() Categoricals and DateTime take over isnan. FastArray handles sentinels.
>>> a=arange(100.0) >>> a[5]=np.nan >>> a[87]=np.nan >>> sum(a.isna()) 2 >>> sum(a.astype(np.int32).isna()) 2
- isnan(fancy=False)
Return a boolean array that’s True for each element that’s a NaN (Not a Number), False otherwise.
- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (NaN) values.
- Returns:
A
FastArray
of booleans or indices.- Return type:
See also
FastArray.isnotnan
,FastArray.notna
,FastArray.isnanorzero
,riptable.isnan
,riptable.isnotnan
,riptable.isnanorzero
,Categorical.isnan
,Categorical.isnotnan
,Categorical.notna
,Date.isnan
,Date.isnotnan
,DateTimeNano.isnan
,DateTimeNano.isnotnan
Dataset.mask_or_isnan
Return a boolean array that’s True for each
Dataset
row that contains at least one NaN.Dataset.mask_and_isnan
Return a boolean array that’s True for each all-NaN
Dataset
row.
Examples
>>> a = rt.FastArray([rt.nan, rt.nan, rt.inf, 3]) >>> a.isnan() FastArray([ True, True, False, False])
With
fancy = True
:>>> a.isnan(fancy = True) FastArray([0, 1])
- isnanorzero(fancy=False)
Return a boolean array that’s True for each element that’s a NaN (Not a Number) or zero, False otherwise.
- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (NaN or zero) values.
- Returns:
A
FastArray
of booleans or indices.- Return type:
See also
riptable.isnanorzero
,riptable.isnan
,riptable.isnotnan
,FastArray.isnan
,FastArray.isnotnan
,Categorical.isnan
,Categorical.isnotnan
,Date.isnan
,Date.isnotnan
,DateTimeNano.isnan
,DateTimeNano.isnotnan
Dataset.mask_or_isnan
Return a boolean array that’s True for each
Dataset
row that contains at least one NaN.Dataset.mask_and_isnan
Return a boolean array that’s True for each all-NaN
Dataset
row.
Examples
>>> a = rt.FastArray([0, rt.nan, rt.inf, 3]) >>> a.isnanorzero() FastArray([ True, True, False, False])
With
fancy = True
:>>> a.isnanorzero(fancy = True) FastArray([0, 1])
- isnormal(fancy=False)
- isnotfinite(fancy=False)
Return a boolean array that’s True for each non-finite
FastArray
element, False otherwise.A value is considered to be finite if it’s not positive or negative infinity or a NaN (Not a Number).
- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (non-finite) values.
- Returns:
An array or booleans or indices.
- Return type:
See also
FastArray.isfinite
,riptable.isfinite
,riptable.isnotfinite
,riptable.isinf
,riptable.isnotinf
,FastArray.isinf
,FastArray.isnotinf
Dataset.mask_or_isfinite
Return a boolean array that’s True for each
Dataset
row that has at least one finite value.Dataset.mask_and_isfinite
Return a boolean array that’s True for each
Dataset
row that contains all finite values.Dataset.mask_or_isinf
Return a boolean array that’s True for each
Dataset
row that has at least one value that’s positive or negative infinity.Dataset.mask_and_isinf
Return a boolean array that’s True for each
Dataset
row that contains all infinite values.
Examples
>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0]) >>> a.isnotfinite() FastArray([ True, True, True, False])
With
fancy = True
:>>> a.isnotfinite(fancy = True) FastArray([0, 1, 2])
- isnotinf(fancy=False)
Return a boolean array that’s True for each
FastArray
element that’s not positive or negative infinity, False otherwise.- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (non-infinite) values.
- Returns:
An array or booleans or indices.
- Return type:
See also
FastArray.isinf
,riptable.isnotinf
,riptable.isinf
,riptable.isfinite
,riptable.isnotfinite
,FastArray.isfinite
,FastArray.isnotfinite
Dataset.mask_or_isfinite
Return a boolean array that’s True for each
Dataset
row that has at least one finite value.Dataset.mask_and_isfinite
Return a boolean array that’s True for each
Dataset
row that contains all finite values.Dataset.mask_or_isinf
Return a boolean array that’s True for each
Dataset
row that has at least one value that’s positive or negative infinity.Dataset.mask_and_isinf
Return a boolean array that’s True for each
Dataset
row that contains all infinite values.
Examples
>>> a = rt.FastArray([rt.inf, -rt.inf, rt.nan, 0]) >>> a.isnotinf() FastArray([False, False, True, True])
With
fancy = True
:>>> a.isnotinf(fancy = True) FastArray([2, 3])
- isnotnan(fancy=False)
Return a boolean array that’s True for each element that’s not a NaN (Not a Number), False otherwise.
- Parameters:
fancy (bool, default False) – Set to True to instead return the indices of the True (non-NaN) values.
- Returns:
A
FastArray
of booleans or indices.- Return type:
See also
FastArray.isnan
,FastArray.notna
,FastArray.isnanorzero
,riptable.isnan
,riptable.isnotnan
,riptable.isnanorzero
,Categorical.isnan
,Categorical.isnotnan
,Categorical.notna
,Date.isnan
,Date.isnotnan
,DateTimeNano.isnan
,DateTimeNano.isnotnan
Dataset.mask_or_isnan
Return a boolean array that’s True for each
Dataset
row that contains at least one NaN.Dataset.mask_and_isnan
Return a boolean array that’s True for each all-NaN
Dataset
row.
Examples
>>> a = rt.FastArray([rt.nan, rt.inf, 2]) >>> a.isnotnan() FastArray([False, True, True])
With
fancy = True
:>>> a.isnotnan(fancy = True) FastArray([1, 2])
- isnotnormal(fancy=False)
- issorted()
Return True if the array is sorted, False otherwise.
NaNs at the end of an array are considered sorted.
Calls
riptable.issorted()
.- Returns:
True if the array is sorted, False otherwise.
- Return type:
See also
riptable.issorted
Examples
>>> a = rt.FastArray(['a', 'b', 'c']) >>> a.issorted() True
>>> a = rt.FastArray([1.0, 2.0, 3.0, rt.nan]) >>> rt.issorted(a) True
>>> a = rt.FastArray(['a', 'c', 'b']) >>> a.issorted() False
- le(other)
- lt(other)
- map(npdict)
Notes
Uses ismember and can handle large dictionaries
Examples
>>> a=arange(3) >>> a.map({1: 'a', 2:'b', 3:'c'}) FastArray(['', 'a', 'b'], dtype='<U1') >>> a=arange(3)+1 >>> a.map({1: 'a', 2:'b', 3:'c'}) FastArray(['a', 'b', 'c'], dtype='<U1')
- map_old(npdict)
Example
>>> d = {1:10, 2:20} >>> dat['c'] = dat.a.map(d) >>> print(dat) a b cb c 0 1 0 0.0 10 1 1 1 1.0 10 2 1 2 3.0 10 3 2 3 5.0 20 4 2 4 7.0 20 5 2 5 9.0 20
- mean(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)
Compute the arithmetic mean of the values in the first argument.
- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly
False
,mean
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.mean(dtype = my_type)
is equivalent tomy_type(x.mean())
.
- Returns:
The mean of the values.
- Return type:
scalar
See also
FastArray.nanmean
Computes the mean of
FastArray
values, ignoring NaNs.Dataset.mean
Computes the mean of numerical
Dataset
columns.GroupByOps.mean
Computes the mean of each group. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.mean
specifies the data type of the result. This differs fromnumpy.mean
, where it specifies the data type used to compute the mean.Notes on Using NumPy Parameters
Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.mean
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the mean as it is innumpy.mean
.Also note that if you use either of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.mean
does not accept), Riptable’s implementation ofmean
will be used with the filter argument and the NumPy parameters will be ignored.- axisNone or int or tuple of ints, optional
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.
If the default value is passed, then
keepdims
will not be passed through to themean
method of sub-classes ofndarray
, however any non-default value will be. If the sub-class’s method does not implementkeepdims
, any exceptions will be raised.
Examples
>>> a = rt.FastArray([1, 3, 5, 7]) >>> a.mean() 4.0
With a
dtype
specified:>>> a = rt.FastArray([1, 3, 5, 7]) >>> a.mean(dtype = rt.int32) 4
With a filter:
>>> a = rt.FastArray([1, 3, 5, 7]) >>> b = rt.FastArray([False, True, False, True]) >>> a.mean(filter = b) 5.0
- median(**kwargs)
- move_argmax(*args, **kwargs)
- move_argmin(*args, **kwargs)
- move_max(*args, **kwargs)
- move_mean(*args, **kwargs)
- move_median(*args, **kwargs)
- move_min(*args, **kwargs)
- move_rank(*args, **kwargs)
- move_std(*args, **kwargs)
- move_sum(*args, **kwargs)
- move_var(*args, **kwargs)
- nanargmax(**kwargs)
- nanargmin(**kwargs)
- nanmax(**kwargs)
- nanmean(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)
Compute the arithmetic mean of the values in the first argument, ignoring NaNs.
If all values in the first argument are NaNs,
0.0
is returned.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the mean calculation. If the filter is uniformly
False
,nanmean
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.nanmean(dtype = my_type)
is equivalent tomy_type(x.nanmean())
.
- Returns:
The mean of the values.
- Return type:
scalar
See also
FastArray.mean
Computes the mean of
FastArray
values.Dataset.nanmean
Computes the mean of numerical
Dataset
columns, ignoring NaNs.GroupByOps.nanmean
Computes the mean of each group, ignoring NaNs. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.nanmean
specifies the data type of the result. This differs fromnumpy.nanmean
, where it specifies the data type used to compute the mean.Notes on Using NumPy Parameters
Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.nanmean
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the mean as it is innumpy.nanmean
.Also note that if you use either of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.nanmean
does not accept), Riptable’s implementation ofnanmean
will be used with the filter argument and the NumPy parameters will be ignored.- axis{int, tuple of int, None}, optional
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.
If the value is anything but the default, then
keepdims
will be passed through to themean
orsum
methods of sub-classes ofndarray
. If the sub-classes’ methods do not implementkeepdims
, any exceptions will be raised.
Examples
>>> a = rt.FastArray([1, 3, 5, rt.nan]) >>> a.nanmean() 3.0
With a
dtype
specified:>>> a = rt.FastArray([1, 3, 5, rt.nan]) >>> a.nanmean(dtype = rt.int32) 3
With a filter:
>>> a = rt.FastArray([1, 3, 5, rt.nan]) >>> b = rt.FastArray([False, True, True, True]) >>> a.nanmean(filter = b) 4.0
- nanmedian(**kwargs)
- nanmin(**kwargs)
- nanpercentile(**kwargs)
- nanquantile(**kwargs)
- nanrankdata(*args, **kwargs)
- nanstd(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
Compute the standard deviation of the values in the first argument, ignoring NaNs.
If all values in the first argument are NaNs,
NaN
is returned.Riptable uses the convention that
ddof = 1
, meaning the standard deviation of[x_1, ..., x_n]
is defined bystd = 1/(n - 1) * sum(x_i - mean )**2
(note then - 1
instead ofn
). This differs from NumPy, which usesddof = 0
by default.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly
False
,nanstd
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.nanstd(dtype = my_type)
is equivalent tomy_type(x.nanstd())
.
- Returns:
The standard deviation of the values.
- Return type:
scalar
See also
FastArray.std
Computes the standard deviation of
FastArray
values.Dataset.nanstd
Computes the standard deviation of numerical
Dataset
columns, ignoring NaNs.GroupByOps.nanstd
Computes the standard deviation of each group, ignoring NaNs. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.nanstd
specifies the data type of the result. This differs fromnumpy.nanstd
, where it specifies the data type used to compute the standard deviation.Notes on Using NumPy Parameters
Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.nanstd
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the variance as it is innumpy.nanstd
.Also note that if you use any of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.nanstd
does not accept), Riptable’s implementation ofnanstd
will be used with the filter argument and the NumPy parameters will be ignored.- axis{int, tuple of int, None}, optional
Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.
If this value is anything but the default it is passed through as-is to the relevant functions of the sub-classes. If these functions do not have a
keepdims
kwarg, a RuntimeError will be raised.- ddofint, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By defaultddof
is zero for the NumPy implementation, versus one for the Riptable implementation.
Examples
>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> a.nanstd() 1.0
With a
dtype
specified:>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> a.nanstd(dtype = rt.int32) 1
With filter:
>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> b = rt.FastArray([False, True, True, True]) >>> a.nanstd(filter = b) 0.7071067811865476
- nansum(filter=None, dtype=None, axis=None, keepdims=None, **kwargs)
Compute the sum of the values in the first argument, ignoring NaNs.
If all values in the first argument are NaNs,
0.0
is returned.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the sum calculation. If the filter is uniformly
False
,nansum
returns0.0
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.nansum(dtype = my_type)
is equivalent tomy_type(x.nansum())
.
- Returns:
The sum of the values.
- Return type:
scalar
See also
Dataset.nansum
Sums the values of numerical
Dataset
columns, ignoring NaNs.GroupByOps.nansum
Sums the values of each group, ignoring NaNs. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.nansum
specifies the data type of the result. This differs fromnumpy.nansum
, where it specifies the data type used to compute the sum.Notes on Using NumPy Parameters
Using either of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.nansum
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the sum as it is innumpy.nansum
.Also note that if you use either of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.nansum
does not accept), Riptable’s implementation ofnansum
will be used with the filter argument and the NumPy parameters will be ignored.- axis{int, tuple of int, None}, optional
Axis or axes along which the sum is computed. The default is to compute the sum of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.
If the value is anything but the default, then
keepdims
will be passed through to themean
orsum
methods of sub-classes ofndarray
. If the sub-classes’ methods do not implementkeepdims
, any exceptions will be raised.
Examples
>>> a = rt.FastArray([1, 3, 5, 7, rt.nan]) >>> a.nansum() 16.0
With a
dtype
specified:>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0, rt.nan]) >>> a.nansum(dtype = rt.int32) 16
With a filter:
>>> a = rt.FastArray([1, 3, 5, 7, rt.nan]) >>> b = rt.FastArray([False, True, False, True, True]) >>> a.nansum(filter = b) 10.0
- nanvar(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
Compute the variance of the values in the first argument, ignoring NaNs.
If all values in the first argument are NaNs,
NaN
is returned.Riptable uses the convention that
ddof = 1
, meaning the variance of[x_1, ..., x_n]
is defined byvar = 1/(n - 1) * sum(x_i - mean )**2
(note then - 1
instead ofn
). This differs from NumPy, which usesddof = 0
by default.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly
False
,nanvar
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.nanvar(dtype = my_type)
is equivalent tomy_type(x.nanvar())
.
- Returns:
The variance of the values.
- Return type:
scalar
See also
FastArray.var
Computes the variance of
FastArray
values.Dataset.nanvar
Computes the variance of numerical
Dataset
columns, ignoring NaNs.GroupByOps.nanvar
Computes the variance of each group, ignoring NaNs. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.nanvar
specifies the data type of the result. This differs fromnumpy.nanvar
, where it specifies the data type used to compute the variance.Notes on Using NumPy Parameters
Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.nanvar
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the variance as it is innumpy.nanvar
.Also note that if you use any of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.nanvar
does not accept), Riptable’s implementation ofnanvar
will be used with the filter argument and the NumPy parameters will be ignored.- axis{int, tuple of int, None}, optional
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original input array.
- ddofint, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of non-NaN elements. By defaultddof
is zero for the NumPy implementation, versus one for the Riptable implementation.
Examples
>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> a.nanvar() 1.0
With a
dtype
specified:>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> a.nanvar(dtype = rt.int32) 1
With a filter:
>>> a = rt.FastArray([1, 2, 3, rt.nan]) >>> b = rt.FastArray([False, True, True, True]) >>> a.nanvar(filter = b) 0.5
- ne(other)
- normalize_minmax()
- normalize_zscore()
- notna()
notna is mapped directly to isnotnan() Categoricals and DateTime take over isnotnan. FastArray handles sentinels.
>>> a=arange(100.0) >>> a[5]=np.nan >>> a[87]=np.nan >>> sum(a.notna()) 98 >>> sum(a.astype(np.int32).notna()) 98
- nunique()
Return the number of unique values in the input
FastArray
.Does not include NaN or sentinel values.
- Returns:
Number of unique values in the input
FastArray
, excluding NaN and sentinel values.- Return type:
See also
FastArray.duplicated
Return a boolean
FastArray
indicating duplicate values.Categorical.nunique
Return the number of unique values in the
Categorical
.
Examples
Retrieve the number of unique values in a floating-point
FastArray
:>>> a = rt.FastArray([1., 2., 3., 1., 2., 3.]) >>> a FastArray([1., 2., 3., 1., 2., 3.]) >>> a.nunique() 3
Retrieve the number of unique values in a floating-point
FastArray
with a NaN value:>>> a2 = rt.FastArray([1., 2., 3., 1., 2., 3., rt.nan]) >>> a2 FastArray([ 1., 2., 3., 1., 2., 3., nan]) >>> a2.nunique() # The NaN value is not included. 3
Retrieve the number of unique values in an unsigned integer
FastArray
with a sentinel value:>>> a3 = rt.FastArray([255, 2, 3, 2, 3], dtype="uint8") >>> a3 FastArray([255, 2, 3, 2, 3], dtype=uint8) >>> a3.nunique() # The sentinel value is not included. 2
- partition2(*args, **kwargs)
- percentile(**kwargs)
- push(*args, **kwargs)
- quantile(**kwargs)
- rankdata(*args, **kwargs)
- classmethod register_function(name, func)
Used to register functions to FastArray. Used by rt_fastarraynumba
- repeat(repeats, axis=None)
See
riptable.repeat
.
- replace(old, new)
- replacena(value, inplace=False)
Return a
FastArray
with all NaN and invalid values set to the specified value.Optionally, you can modify the original
FastArray
if it’s not locked.- Parameters:
value (scalar or array) – A value or an array of values to replace all NaN and invalid values. If an array, the number of values must equal the number of NaN and invalid values.
inplace (bool, default False) – If False, return a copy of the
FastArray
. If True, modify the original. This will modify any other views on this object. This fails if theFastArray
is locked.
- Returns:
The
FastArray
will be the same size and dtype as the original array. Returns None ifinplace = True
.- Return type:
FastArray
or None
See also
FastArray.fillna
Replace NaN and invalid values with a specified value or nearby data.
Dataset.fillna
Replace NaN and invalid values with a specified value or nearby data.
Categorical.fill_forward
Replace NaN and invalid values with the last valid group value.
Categorical.fill_backward
Replace NaN and invalid values with the next valid group value.
GroupBy.fill_forward
Replace NaN and invalid values with the last valid group value.
GroupBy.fill_backward
Replace NaN and invalid values with the next valid group value.
Examples
Replace all instances of NaN with a single value:
>>> a = rt.FastArray([rt.nan, 1.0, rt.nan, 3.0]) >>> a.replacena(0) FastArray([0., 1., 0., 3.])
Replace all invalid values with 0s:
>>> b = rt.FastArray([0, 1, 2, 3, 4, 5]) >>> b[0:3] = b.inv >>> b.replacena(0) FastArray([0, 0, 0, 3, 4, 5])
Replace each instance of NaN with a different value:
>>> a.replacena([0, 2]) FastArray([0., 1., 2., 3.])
- reshape(*args, **kwargs)
- rolling_mean(window=3)
- rolling_nanmean(window=3)
- rolling_nanstd(window=3)
- rolling_nansum(window=3)
- rolling_nanvar(window=3)
- rolling_quantile(q, window=3)
- rolling_std(window=3)
- rolling_sum(window=3)
- rolling_var(window=3)
- sample(N=10, filter=None, seed=None)
Return a given number of randomly selected values from a
FastArray
.- Parameters:
N (int, default 10) – Number of values to select. The entire array is returned if
N
is greater than the size of the array.filter (array (bool or int), optional) – A boolean mask or index array to filter values before selection. A boolean mask must have the same length as the original
FastArray
.seed (int or other types, optional) – A seed to initialize the random number generator. If one is not provided, the generator is initialized using random data from the OS. For details and other accepted types, see the
seed
parameter fornumpy.random.default_rng
.
- Returns:
A new
FastArray
containing the randomly selected values.- Return type:
See also
Dataset.sample
Return a specified number of randomly selected rows from a
Dataset
.
Examples
No sample size specified:
>>> a = rt.FA([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) >>> a.sample() # 10 randomly selected values returned. FastArray([ 1, 2, 3, 4, 5, 6, 7, 9, 10, 11]) # Random
Sample 3 values:
>>> a = rt.FA([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) >>> a.sample(3) FastArray([1, 4, 9]) # Random
Specify a sample size larger than the array:
>>> a2 = rt.FA([1, 2, 3, 4, 5]) >>> a2.sample(100) # The entire array is returned. FastArray([1, 2, 3, 4, 5])
Specify an index array for filtering:
>>> a3 = rt.FA(['TSLA','AMZN','IBM', 'SPY', 'GME', 'AAPL', 'FB', 'GOOG', ... 'MSFT', 'UBER']) # Create sample data. >>> filter = rt.FA([0, 1, 3, 7]) # Specify indices of a3 to take the sample from. >>> a3.sample(2, filter) FastArray([b'TSLA', b'GOOG'], dtype='|S4') # Random
Specify a boolean mask array for filtering:
>>> a3.sample(8, filter=rt.FA(a3 != 'SPY')) FastArray([b'TSLA', b'IBM', b'GME', b'AAPL', b'FB', b'GOOG', b'MSFT', b'UBER'], dtype='|S4') # Random
- save(filepath, share=None, compress=True, overwrite=True, name=None)
Save a
FastArray
to an .sds file.- Parameters:
filepath (str or os.PathLike) – Path for the .sds file. If there’s a trailing slash,
filepath
is treated as a path to a directory and you also need to specifyname
. Alternatively, you can include a file name (with or without the .sds extension) at the end offilepath
(with no trailing slash), and an .sds file with that name is created. Directories that don’t yet exist are created.share (str, optional) – If specified, the
FastArray
is saved to shared memory (NOT to disk) and path information fromfilepath
is discarded. Aname
value must be provided. When shared memory is used, data is not compressed. Note that shared memory functions are not currently supported on Windows.compress (bool, default True) – When
True
(the default), compression is used when writing to the .sds file. Otherwise, no compression is used. (If shared memory is used, data is always saved uncompressed.)overwrite (bool, default True) – When
True
(the default), the user is not prompted to specify whether or not to overwrite an existing .sds file. When set toFalse
, a prompt is displayed.name (str, optional) – Name for the .sds file. The .sds extension is not required. Note that if
name
is provided,filepath
is treated as a path to a directory, even iffilepath
has no trailing slash.
- Return type:
An .sds file containing the
FastArray
.
See also
Examples
Include a file name in the path:
>>> a = rt.FA([0, 1, 2, 3, 4]) FastArray([0, 1, 2, 3, 4]) >>> a.save("C://junk//saved_file") >>> os.listdir("C://junk") ['saved_file.sds']
When
name
is specified,filepath
is treated as a path to a directory:>>> a.save("C://junk//saved_file", name="fa") >>> os.listdir("C://junk//saved_file") ['fa.sds']
Display a prompt before overwriting an existing file:
>>> a.save("C://junk//saved_file", overwrite=False) C://junk//saved_file.sds already exists. Overwrite? (y/n) n No file was saved.
- searchsorted(v, side='left', sorter=None)
- set_name(name)
Assign a name to a
FastArray
.A
FastArray
is a wrapper around a NumPyndarray
. When aFastArray
is created, it has no name. You can assign it a name usingset_name
.Interactions with Dataset Objects
When an unnamed
FastArray
is added to aDataset
:The
FastArray
inherits the name of theDataset
column.Calling
fa.set_name
ords.col.set_name
, or changing the displayed column name viads.col_rename
, changes the name assigned to theFastArray
.Note that calling
fa.set_name
ords.col.set_name
doesn’t change the displayed column name.
When a named
FastArray
is added to aDataset
:A new
FastArray
instance is created that inherits theDataset
column name.Calling
ds.col.set_name
or changing the displayed column name viads.col_rename
changes the new instance’s name.Calling
set_name
on the originalFastArray
instance changes only that instance’s name.
In both cases, the NumPy array underlying the
FastArray
is shared – changes to its values appear in theDataset
column, and vice-versa.Interactions with FastArray Objects
When a
FastArray
is created as a view of another, namedFastArray
, the newFastArray
instance inherits the name from the originalFastArray
.Whether the original
FastArray
is named or unnamed, callingset_name
on eitherFastArray
does not change the name of the otherFastArray
.
- Parameters:
- Returns:
The
FastArray
is returned. The name can be accessed usingFastArray.get_name()
.- Return type:
See also
Examples
>>> a = rt.arange(5) >>> a.set_name('FA Name') FastArray([0, 1, 2, 3, 4])
You can get the name using
FastArray.get_name()
:>>> a.get_name() 'FA Name'
When an unnamed
FastArray
is added to aDataset
column, theFastArray
inherits the name of the column.>>> a = rt.FastArray([1, 2, 3]) >>> ds = rt.Dataset() >>> ds.Column_Name = a >>> a.get_name() 'Column_Name'
Calling
ds.col.set_name
changes the name assigned to theFastArray
(but not the displayed column name).>>> ds.Column_Name.set_name('New Name') >>> a.get_name() 'New Name' >>> ds # Column_Name - ----------- 0 1 1 2 2 3
When a named
FastArray
is added to aDataset
column, a newFastArray
instance is created that inherits the column name. The original instance is not renamed.>>> a = rt.FastArray([1, 2, 3]) >>> a.set_name('FA Name') >>> ds = rt.Dataset() >>> ds.Column_Name = a >>> ds.Column_Name.get_name() 'Column_Name' >>> a.get_name() 'FA Name'
Changing the displayed column name affects the name of the new instance, but not the name of the original
FastArray
.>>> ds.col_rename('Column_Name', 'New_Column') >>> ds.New_Column.get_name() 'New_Column' >>> a.get_name() 'FA Name'
- shift(periods=1, invalid=None)
Shift an array’s elements right or left.
Newly empty elements at either end (resulting from the shift) are filled with the invalid value for the input array’s data type.
- Parameters:
periods (int, default 1) – Number of element positions to shift right (if positive) or left (if negative).
- Returns:
A shifted
FastArray
. Newly empty elements are filled with the invalid values for the input array’s data type.- Return type:
See also
FastArray.diff
Return a
FastArray
containing the differences between adjacent input array values.Categorical.shift
Shift values in the
Categorical
by a specified number of periods.
Examples
Shift array elements one position to the right:
>>> a = rt.FA([0, 2, 4, 8, 16, 32]) >>> a FastArray([ 0, 2, 4, 8, 16, 32]) >>> a.shift() FastArray([-2147483648, 0, 2, 4, 8, 16])
Shift array elements two positions to the left:
>>> a.shift(-2) >>> FastArray([ 4, 8, 16, 32, -2147483648, -2147483648])
Specify a shift value greater than the array length:
>>> a.shift(10) FastArray([-2147483648, -2147483648, -2147483648, -2147483648, -2147483648, -2147483648])
- sign(**kwargs)
- squeeze(*args, **kwargs)
- statx()
- std(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
Compute the standard deviation of the values in the first argument.
Riptable uses the convention that
ddof = 1
, meaning the standard deviation of[x_1, ..., x_n]
is defined bystd = 1/(n - 1) * sum(x_i - mean )**2
(note then - 1
instead ofn
). This differs from NumPy, which usesddof = 0
by default.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the standard deviation calculation. If the filter is uniformly
False
,std
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.std(dtype = my_type)
is equivalent tomy_type(x.std())
.
- Returns:
The standard deviation of the values.
- Return type:
scalar
See also
FastArray.nanstd
Computes the standard deviation of
FastArray
values, ignoring NaNs.Dataset.std
Computes the standard deviation of numerical
Dataset
columns.GroupByOps.std
Computes the standard deviation of each group. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.std
specifies the data type of the result. This differs fromnumpy.std
, where it specifies the data type used to compute the standard deviation.Notes on Using NumPy Parameters
Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.std
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the variance as it is innumpy.std
.Also note that if you use any of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.std
does not accept), Riptable’s implementation ofstd
will be used with the filter argument and the NumPy parameters will be ignored.- axisNone or int or tuple of ints, optional
Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
New in version 1.7.0.
If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then
keepdims
will not be passed through to thestd
method of sub-classes ofndarray
, however any non-default value will be. If the sub-class’ method does not implementkeepdims
, any exceptions will be raised.- ddofint, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By defaultddof
is zero for the NumPy implementation, versus one for the Riptable implementation.
Examples
>>> a = rt.FastArray([1, 2, 3]) >>> a.std() 1.0
With a
dtype
specified:>>> a = rt.FastArray([1, 2, 3]) >>> a.std(dtype = rt.int32) 1
With a filter:
>>> a = rt.FastArray([1, 2, 3]) >>> b = rt.FA([False, True, True]) >>> a.std(filter = b) 0.7071067811865476
- str()
Casts an array of byte strings or unicode as
FAString
.Enables a variety of useful string manipulation methods.
- Return type:
- Raises:
TypeError – If the FastArray is of dtype other than byte string or unicode
See also
np.chararray
,np.char
,rt.FAString.apply
Examples
>>> s=FA(['this','that','test ']*100_000) >>> s.str.upper FastArray([b'THIS', b'THAT', b'TEST ', ..., b'THIS', b'THAT', b'TEST '], dtype='|S5')
>>> s.str.lower FastArray([b'this', b'that', b'test ', ..., b'this', b'that', b'test '], dtype='|S5')
>>> s.str.removetrailing() FastArray([b'this', b'that', b'test', ..., b'this', b'that', b'test'], dtype='|S5')
- str_append(other)
- tile(reps)
See
riptable.tile
.
- timewindow_prod(time_array, time_dist)
The input array must be int64 and sorted with ever increasing values. Multiplies up the values for a given time window.
- Parameters:
time_array (sorted integer array of timestamps) –
time_dist (integer value of the time window size) –
Examples
>>> a=rt.arange(10, dtype=rt.int64) >>> a.timewindow_prod(a,5) FastArray([ 0, 0, 0, 0, 0, 0, 720, 5040, 20160, 60480], dtype=int64)
- timewindow_sum(time_array, time_dist)
The input array must be int64 and sorted with ever increasing values. Sums up the values for a given time window.
- Parameters:
time_array (sorted integer array of timestamps) –
time_dist (integer value of the time window size) –
Examples
>>> a=rt.arange(10, dtype=rt.int64) >>> a.timewindow_sum(a,5) FastArray([ 0, 1, 3, 6, 10, 15, 21, 27, 33, 39], dtype=int64)
- to_arrow(type=None, *, preserve_fixed_bytes=False, empty_strings_to_null=True)
Convert this
FastArray
to apyarrow.Array
.- Parameters:
type (pyarrow.DataType, optional, defaults to None) –
preserve_fixed_bytes (bool, optional, defaults to False) – If this
FastArray
is an ASCII string array (dtype.kind == ‘S’), set this parameter to True to produce a fixed-length binary array instead of a variable-length string array.empty_strings_to_null (bool, optional, defaults To True) – If this
FastArray
is an ASCII or Unicode string array, specify True for this parameter to convert empty strings to nulls in the output. riptable inconsistently recognizes the empty string as an ‘invalid’, so this parameter allows the caller to specify which interpretation they want.
- Return type:
Notes
- TODO: Add bool parameter which directs the conversion to choose the most-compact output type possible?
This would be relevant to indices of categorical/dictionary-encoded arrays, but could also make sense for regular FastArray types (e.g. to use an int8 instead of an int32 when it’d be a lossless conversion).
- transitions(periods=1, fancy=False)
Returns a boolean array. The boolean array is set to True when the previous item in the array does not equal the current. Use -1 instead of 1 if you want True set when the next item in the array does not equal the previous. See also:
differs
- Parameters:
- Returns:
boolean
FastArray
, or fancyIndex (see:fancy
kwarg)
>>> a = FastArray([0,1,2,3,3,3,4]) >>> a.transitions(periods=1) FastArray([False, True, True, True, False, False, True])
>>> a.transitions(periods=2) FastArray([False, False, True, True, True, False, True])
>>> a.transitions(periods=-1) FastArray([ True, True, True, False, False, True, False])
- trunc(**kwargs)
- unique(return_index=False, return_inverse=False, return_counts=False, sorted=True, lex=False, dtype=None, filter=None, **kwargs)
Find the unique elements of an array or the unique combinations of elements with corresponding indices in multiple arrays.
See
riptable.unique()
for full documentation.
- var(filter=None, dtype=None, axis=None, keepdims=None, ddof=None, **kwargs)
Compute the variance of the values in the first argument.
Riptable uses the convention that
ddof = 1
, meaning the variance of[x_1, ..., x_n]
is defined byvar = 1/(n - 1) * sum(x_i - mean )**2
(note then - 1
instead ofn
). This differs from NumPy, which usesddof = 0
by default.- Parameters:
filter (array of bool, default None) – Specifies which elements to include in the variance calculation. If the filter is uniformly
False
,var
returns aZeroDivisionError
.dtype (rt.dtype or numpy.dtype, default float64) – The data type of the result. For a
FastArray
x
,x.var(dtype = my_type)
is equivalent tomy_type(x.var())
.
- Returns:
The variance of the values.
- Return type:
scalar
See also
FastArray.nanvar
Computes the variance of
FastArray
values, ignoring NaNs.Dataset.var
Computes the variance of numerical
Dataset
columns.GroupByOps.var
Computes the variance of each group. Used by
Categorical
objects.
Notes
The
dtype
keyword forFastArray.var
specifies the data type of the result. This differs fromnumpy.var
, where it specifies the data type used to compute the variance.Notes on Using NumPy Parameters
Using any of the following NumPy parameters will cause Riptable to switch to the NumPy implementation of this method (
numpy.var
). However, until a reported bug is fixed, if you also include thedtype
parameter it will be applied to the result, not used to compute the variance as it is innumpy.var
.Also note that if you use any of the following NumPy parameters and also include a
filter
keyword argument (whichnumpy.var
does not accept), Riptable’s implementation ofvar
will be used with the filter argument and the NumPy parameters will be ignored.- axisNone or int or tuple of ints, optional
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then
keepdims
will not be passed through to thevar
method of sub-classes ofndarray
, however any non-default value will be. If the sub-classes’ method does not implementkeepdims
, any exceptions will be raised.- ddofint, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By defaultddof
is zero for the NumPy implementation, versus one for the Riptable implementation.
Examples
>>> a = rt.FastArray([1, 2, 3]) >>> a.var() 1.0
With a
dtype
specified:>>> a = rt.FastArray([1, 2, 3]) >>> a.var(dtype = rt.int32) 1
With a filter:
>>> a = rt.FastArray([1, 2, 3]) >>> b = rt.FastArray([False, True, True]) >>> a.var(filter = b) 0.5
- where(condition, y=np.nan)
Return a new
FastArray
in which values are replaced where a given condition is False.To also provide a value for where the condition is True, use
riptable.where()
.- Parameters:
condition (bool or array of bool) – Where the condition is True, keep the original value. Where False, replace with
y
(ify
is a scalar) or the corresponding value fromy
(ify
is an array). Ifcondition
is an array or a a comparison that returns an array, the array must be the same length as the callingFastArray
.y (scalar, array, or callable, default np.nan) – The value to use where
condition
is False. Ify
is an array or a callable that returns an array, it must be the same length as the callingFastArray
. The value ofy
that corresponds to the False value is used.
- Returns:
A new
FastArray
with values replaced wherecondition
is False.- Return type:
See also
riptable.where
Replace values depending on whether a given condition is True or False.
Examples
condition
is a comparison that creates an array of booleans, andy
is a scalar:>>> a = rt.FastArray(rt.arange(5)) >>> a FastArray([0, 1, 2, 3, 4]) >>> a.where(a > 2, 100) FastArray([100, 100, 100, 3, 4])
condition
andy
are same-length arrays:>>> condition = rt.FastArray([True, True, False, False, False]) >>> y = rt.FastArray([100, 200, 300, 400, 500]) >>> a.where(condition, y) FastArray([ 0, 1, 300, 400, 500])
- class riptable.rt_fastarray.Ledger
- static clear()
Clear all the entries in the math ledger
- static dump(dataset=True)
Print out the math ledger
- static off()
Turn the math ledger off
- static on()
Turn the math ledger on to record all array math routines
- static to_file(filename)
Save the math ledger to a file
- class riptable.rt_fastarray.Recycle
- static now(timeout=0)
Pass the garbage collector timeout value to cleanup. Also calls the python garbage collector.
- Parameters:
timeout (default to 0. 0 will not set a timeout) –
- Return type:
total arrays deleted
- static off()
- static on()
Turn riptable recycling on. Used only when riptable recycling was turned off.
Example
a=arange(1_000_00) Recycle.off() %timeit a=a + 1 Recycle.on() %timeit a=a + 1
- static timeout(timeout=100)
Pass the garbage collector timeout value to expire. The timeout value is roughly in 2/5 secs. A value of 100 is usually about 40 seconds. If an array has not been reused by the timeout, it is permanently deleted.
- Return type:
previous timespan
- class riptable.rt_fastarray.Threading
- static off()
Turn riptable threading off. Useful for when the system has other processes using other threads or to limit threading resources.
Example
a=rt.arange(1_000_00) Threading.off() %time a+=1 Threading.on() %time a+=1
- Return type:
Previously whether threading was on or not. 0 or 1. 0=threading was off before.
- static on()
Turn riptable threading on. Used only when riptable threading was turned off.
Example
a=rt.arange(1_000_00) Threading.off() %time a+=1 Threading.on() %time a+=1
- Return type:
Previously whether threading was on or not. 0 or 1. 0=threading was off before.
- static threads(threadcount)
Set how many worker threads riptable can use. Often defaults to 12 and cannot be set below 1 or > 31.
To turn riptable threading off completely use Threading.off() Useful for when the system has other processes using other threads or to limit threading resources.
Example
Threading.threads(8)
- Return type:
number of threads previously used