`riptable.rt_compressedarray`

Classes

CompressedArray

A FastArray is a 1-dimensional array of items that are the same data type.

class riptable.rt_compressedarray.CompressedArray(arr)

Bases: riptable.rt_fastarray.FastArray

A FastArray is a 1-dimensional array of items that are the same data type.

Because it’s a subclass of NumPy’s numpy.ndarray, all ndarray functions and attributes can be used with FastArray objects. However, Riptable optimizes many of NumPy’s functions to make them faster and more memory-efficient. Riptable has also added some methods.

FastArray objects with more than 1 dimension are not supported.

See NumPy’s docs for details on all ndarray methods and attributes.

Parameters:

arr (array, iterable, or scalar value) – Contains data to be stored in the FastArray.
**kwargs – Additional keyword arguments to be passed to the function.

Notes

To improve performance, FastArray objects take over some of NumPy’s universal functions (ufuncs), use array recycling and multiple threads, and pass certain method calls to Bottleneck.

Note that whenever Riptable has implemented its own version of an existing NumPy method, a call to the NumPy method results in a call to the optimized Riptable version instead. We encourage users to directly call the Riptable method in order to avoid any confusion as to what method is actually being called.

See the list of NumPy Methods Optimized by Riptable for FastArrays.

Examples

Construct a FastArray

Pass a list to the constructor:

>>> rt.FastArray([1, 2, 3, 4, 5])
FastArray([1, 2, 3, 4, 5])

>>> #NOTE: rt.FA also works.
>>> rt.FA([1.0, 2.0, 3.0, 4.0, 5.0])
FastArray([1., 2., 3., 4., 5.])

Or use a utility function:

>>> rt.full(10, 0.7)
FastArray([0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7])

>>> rt.arange(10)
FastArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can optionally specify a data type:

>>> x = rt.FastArray([3, 6, 10],  dtype = rt.float64)
>>> x, x.dtype
(FastArray([ 3.,  6., 10.]), dtype('float64'))

>>> # Using a string shortcut:
>>> x = rt.FastArray([3,6,10],  dtype = 'float64')
>>> x, x.dtype
(FastArray([ 3.,  6., 10.]), dtype('float64'))

By default, characters are stored as byte strings. When unicode=True, the FastArray allows Unicode characters.

>>> rt.FA(list('abc'), unicode=True)
FastArray(['a', 'b', 'c'], dtype='<U1')

To convert an existing NumPy array, use the FastArray constructor.

>>> np_arr = np.array([1, 2, 3])
>>> rt.FA(np_arr)
FastArray([1, 2, 3])

To view the NumPy array as a FastArray (which is slightly less expensive than using the constructor), use the view method.

>>> fa = np_arr.view(FA)
>>> fa
FastArray([1, 2, 3])

To view it as a NumPy array again:

>>> fa.view(np.ndarray)
array([1, 2, 3])

>>> # Alternatively:
>>> fa._np
array([1, 2, 3])

Get a Subset of a FastArray

You can use standard Python slicing notation or fancy indexing to access a subset of a FastArray.

>>> # Create a FastArray:
>>> array = rt.arange(8)**2
>>> array
FastArray([0, 1, 4, 9, 16, 25, 36, 49])
>>> # Use Python slicing to get elements 2, 3, and 4:
>>> array[2:5]
FastArray([4, 9, 16])

>>> # Use fancy indexing to get elements 2, 4, and 1 (in that order):
>>> array[[2, 4, 1]]
FastArray([4, 16, 1])

For more details, see the examples for 1-dimensional arrays in NumPy’s docs: Indexing on ndarrays.

Note that slicing creates a view of the array and does not copy the underlying data; modifying the slice modifies the original array. Fancy indexing creates a copy of the extracted data; modifying this array does not modify the original array.

You can also pass a Boolean mask array.

>>> # Create a Boolean mask:
>>> evenMask = (array % 2 == 0)
>>> evenMask
FastArray([True, False, True, False, True, False, True, False])
>>> # Index using the Boolean mask:
>>> array[evenMask]
FastArray([0, 4, 16, 36])

How to Subclass FastArray

Include the required class definition:

>>> class TestSubclass(FastArray):
...     def __new__(cls, arr, **args):
...         # Before this call, arr needs to be a np.ndarray instance.
...         return arr.view(cls)
...     def __init__(self, arr, **args):
...         pass

If the subclass is computable, you might define your own math operations. In these operations, you might define what the subclass can be computed with. For examples of new definitions, see the DateTimeNano class.

Common operations to hook are comparisons (__eq__(), __ne__(), __gt__(), __lt__(), __le__(), __ge__()) and basic math functions (__add__(), __sub__(), __mul__(), etc.).

Bracket indexing operations are very common. If the subclass needs to set or return a value other than that in the underlying array, you need to take over __getitem__() or __setitem__().

Indexing is also used in display. For regular console/notebook display, you need to take over:

__repr__()
__str__()
_repr_html_() (for JupyterLab and Jupyter notebooks)

If the array is being displayed in a Dataset and you require certain formatting, you need to define two more methods:

display_query_properties(): Returns an ItemFormat object (see rt.Utils.rt_display_properties)
display_convert_func(): The conversion function returned by display_query_properties() must return a string. Each item being displayed, the result of __getitem__() at a single index, will go through this function individually, accompanied by an ItemFormat object.

Many Riptable operations need to return arrays of the same class they received. To ensure that your subclass will retain its special properties, you need to take over newclassfrominstance(). Failure to take this over will often result in an object with uninitialized variables.

copy() is another method that is called generically in Riptable routines, and needs to be taken over to retain subclass properties.

For a view of the underlying FastArray, you can use the _fa property.

allowed_funcs = ['decompress', 'view']

__getattribute__(attr): Block all FastArray operations. See allowed_funcs class global.

__repr__(): Return repr(self).

__str__(): Return str(self).

_build_string()

decompress()

riptable.rt_compressedarray

Classes

`riptable.rt_compressedarray`