riptable.rt_utils
Functions
|
Core routine for merge_asof. |
|
|
|
Perform a CRC check on every array in list, returns True if they were all a match. |
|
Similar to pandas describe; columns remain stable, with extra column (Stats) added for names. |
|
Find the length of a byte string without trailing zeros. Useful for optimizing string matching functions. |
|
|
|
|
|
|
|
Load from h5 file and flip hdf5.io objects to riptable structures. |
|
Provides fancy-indexing functionality similar to |
|
merge_prebinned |
|
Helper function to make two different lists of keys the same itemsize. Handles categoricals. |
|
|
|
- riptable.rt_utils.alignmk(key1, key2, time1, time2, direction='backward', allow_exact_matches=True, verbose=False)
Core routine for merge_asof.
Takes a key1 on the left and a key2 on the right (multikey is allowed). When going forward, it will check if time1 <= time2 if so it will hash on key1 and return the last row number for key2 or INVALID it will increment the index into time1 else it will return the last row number from key2 it will increment the index into time2 When going backward, it will start on the last time, it will check if time1 >= time2 if so it will hash on key1 and return the last row number for key2 or INVALID it will decrement the index into time1 else it will return the last row number from key2 it will decrement the index into time2
- Parameters:
key1 (a numpy array or a list/tuple of numpy arrays) –
key2 (a numpy array or a list/tuple of numpy arrays) –
time1 (a monotonic integer array often indicating time, must be same length as key1) –
time2 (a monotonic integer array often indicating time, must be same length as key2) –
direction ({'backward', 'forward', 'nearest'}) – The alignment direction.
allow_exact_matches (bool) –
verbose (bool) – When True, enables more-verbose logging output. Defaults to False.
- Returns:
Fancy index the same length as key1/time1 (may have invalids)
use the return index to pull from right hand side, for example key2[return]
to populate a dataset with length key1
Examples
>>> time1=rt.FA([0, 1, 4, 6, 8, 9, 11, 16, 19, 20, 22, 27]) >>> time2=rt.FA([4, 5, 7, 8, 10, 12, 15, 16, 24]) >>> alignmk(rt.ones(time1.shape), rt.ones(time2.shape), time1, time2, direction='backward') FastArray([-2147483648, -2147483648, 0, 1, 3, 3, 4, 7, 7, 7, 7, 8]) >>> alignmk(rt.ones(time1.shape), rt.ones(time2.shape), time1, time2, direction='forward') FastArray([0, 0, 0, 2, 3, 4, 5, 7, 8, 8, 8, -2147483648])
- riptable.rt_utils.bytes_to_str(b)
- riptable.rt_utils.crc_match(arrlist)
Perform a CRC check on every array in list, returns True if they were all a match.
- Parameters:
arrlist (list of numpy arrays) –
- Returns:
True if all arrays in
arrlist
are structurally equal; otherwise, False.- Return type:
See also
- riptable.rt_utils.describe(arr, q=None, fill_value=None)
Similar to pandas describe; columns remain stable, with extra column (Stats) added for names.
- Parameters:
- Return type:
Examples
>>> describe(arange(100) %3) *Stats Col0 ------ ------ Count 100.00 Valid 100.00 Nans 0.00 Mean 0.99 Std 0.82 Min 0.00 P10 0.00 P25 0.00 P50 1.00 P75 2.00 P90 2.00 Max 2.00 MeanM 0.99 [13 rows x 2 columns] total bytes: 169.0 B
- riptable.rt_utils.findTrueWidth(string)
Find the length of a byte string without trailing zeros. Useful for optimizing string matching functions.
- Parameters:
string (a byte string as an array of int8) – A byte string as an array of int8
- Returns:
Number of bytes in string.
- Return type:
Examples
>>> a = np.chararray(1, itemsize=5) >>> a[0] = b'abc' >>> findTrueWidth(np.frombuffer(a,dtype=np.int8)) 3
- riptable.rt_utils.get_default_value(arr)
- riptable.rt_utils.ischararray(a)
- riptable.rt_utils.islogical(a)
- riptable.rt_utils.load_h5(filepath, name='/', columns='', format=None, fixblocks=False, drop_short=False, verbose=0, **kwargs)
Load from h5 file and flip hdf5.io objects to riptable structures.
In some h5 files, the arrays are saved as rows in “blocks”. If
fixblocks
isTrue
, this routine will transpose the rows in the blocks.- Parameters:
filepath (str or os.PathLike) – The path to the HDF5 file to load.
name (str) – Set to table name, defaults to ‘/’.
columns (sequence of str or re.Pattern or callable, defaults to '') – Return the given subset of columns, or those matching regex. If a function is passed, it will be called with column names, dtypes and shapes, and should return a subset of column names. Passing an empty string (the default) loads all columns.
format (hdf5.Format) – TODO, defaults to hdf5.Format.NDARRAY
fixblocks (bool) – True will transpose the rows when the H5 file are as ???, defaults to False.
drop_short (bool) – Set to True to drop short rows and never return a Struct, defaults to False.
verbose – TODO
- Returns:
A
Dataset
orStruct
with all workspace contents.- Return type:
Notes
block<#>_items is a list of column names (bytes) block<#>_values is a numpy array of numpy array (rows) columns (for riptable) can be generated by zipping names from the list with transposed columns
axis0 appears to be all column names - not sure what to do with this also what is axis1? should it get added like the other columns?
- riptable.rt_utils.mbget(aValues, aIndex, d=None)
Provides fancy-indexing functionality similar to
np.take
, but where out-of-bounds indices ‘retrieve’ a default value instead of e.g. raising an exception.It returns an array the same size as the
aIndex
array, withaValues
in place of the indices and delimiter values (used
to customize) for invalid indices.- Parameters:
aValues (np.ndarray) – A single dimension of array values (strings only accepted as chararray).
aIndex (np.ndarray) – A single dimension array of int64 indices.
d – An optional argument for a custom default for string operations to use when the index is out of range. (currently always uses the default) d is character byte
b''
whenaValues
is a chararraynp.nan
when aValues are floats,INVALID_POINTER_32
orINVALID_POINTER_64
when aValues are ints.
- Returns:
vout – An array of values in
aValues
that have been looked up according to the indices inaIndex
. The array will have the same shape asaIndex
, and the same dtype and class asaValues
.- Return type:
np.ndarray
- Raises:
KeyError – When the dtype for
aValues
is not int32,int64,float32,float64 andaValues
is not a chararray.
Notes
- Tests Performed:
Large aValues size (28 million) Large aValues typesize (50 for chararray) Large aIndex size (28 million) All indices valid for aIndex in aValues. No indices valid for aIndex in aValues. Empty input arrays. Invalid types for aValues array. Invalid types for aIndex array (not int64 or int32)
The return array vout is the same size as the p array. Suppose we have a position i. If the index stored at position i of p is a valid index for array v, vout at position i will contain the value of v at that index. If the index stored at position i of p is an invalid index, vout at position i will contain the default or custom delimiter value (d).
Match: 4 is at position 2 of the p array. 4 is a valid index in array v (within range). 50 is at position 4 of the v array. Therefore, position 2 of the result vout will contain 50.
Miss: -7 is at position 1 of the p array. -7 is an invalid index in array v (out of range). Therefore, position 1 of the result vout will contain the delimiter.
- Edge Case Tests:
(TODO)
Examples
Start with two arrays:
>>> v = np.array([10, 20, 30, 40, 50, 60, 70]) #MATLab: v = [10 20 30 40 50 60 70]; >>> p = np.array([0, -7, 4, 3, 7, 1, 2]) #MATLab: p = [1 -6 5 4 8 2 3]; >>> vout = mbget(v,p) #MATLab: vout = mbget(v,p); >>> print(vout) #MATLab: vout [10 -2147483648 50 40 -2147483648 20 30] #MATLab: [10.00 NaN 50.00 40.00 NaN 20.00 30.00]
- riptable.rt_utils.merge_prebinned(key1, key2, val1, val2, totalUniqueSize)
merge_prebinned TODO: Improve docs when working properly
- Parameters:
key1 (a numpy array already binned (like a categorical)) –
key2 (a numpy array already binned) –
val1 (int32/64 or float32/64) –
val2 (int32/64 or float32/64) –
Notes
key1
andkey2
must be same dtypeval1
andval2
must be same dtype
- riptable.rt_utils.normalize_keys(key1, key2, verbose=False)
Helper function to make two different lists of keys the same itemsize. Handles categoricals.
- Parameters:
key1 (a numpy array or a list/tuple of numpy arrays) –
key2 (a numpy array or a list/tuple of numpy arrays) –
- Returns:
If the keys were passed in as single arrays they will be returned as a list of 1 array Integers, Float, String may be upcast if necessary. Categoricals may be aligned if necessary.
- Return type:
Two lists of arrays that are aligned (same itemsize)
Examples
>>> c1 = rt.Cat(['A','B','C']) >>> c2 = rt.Cat(rt.arange(3) + 1, ['A','B','C']) >>> [d1], [d2] = rt.normalize_keys(c1, c2)
Notes
TODO: integer, float and string upcasting can be done while rotating.
- riptable.rt_utils.str_to_bytes(s)
- riptable.rt_utils.to_str(s)