riptable.rt_struct
Classes
The Struct class is at the root of much of the riptable class design; both Dataset and Multiset |
- class riptable.rt_struct.Struct(dictionary={})
The Struct class is at the root of much of the riptable class design; both Dataset and Multiset inherit from Struct.
Struct represents a collection of (mixed-type) data members, with standard attribute get/set behavior, as well as dictionary-style retrieval.
The Struct constructor takes a dictionary (dict, OrderedDict, etc…) as its required argument. When
Struct.UseFastArray
is True (the default), any numpy arrays among the dictionary values will be cast into FastArray. Struct() := Struct({}).The constructor dictionary keys (or element/column names added later) must not conflict with any Struct member names. Additionally, if
Struct.AllowAnyName
is False (it is True by default), a column name must be a legal Python variable name, not starting with ‘_’.- Parameters:
dictionary (dict) – A dictionary of named objects.
Examples
>>> st = rt.Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'}, ... 'e': np.ones(7, dtype=np.int64)}) >>> print(st) # Name Type Size 0 1 2 - ---- ----- ---- ---- --- - 0 a int 0 1 1 b str 0 fish 2 c list 2 5.6 7.8 3 d dict 2 A B 4 e int64 7 1 1 1 >>> st.a 1 >>> st['a'] 1 >>> print(st[3:]) # Name Type Rows 0 1 2 - ---- ----- ---- - - - 0 d dict 2 A B 1 e int64 7 1 1 1 >>> st.newcol = 5 # okay, a new entry >>> st.newcol = [5, 7] # okay, replace the entry >>> st['another'] = 6 # also works >>> st['newcol'] = 6 # and this works as well
Indexing behavior
>>> st['b'] # get a 'column' (equiv. st.b) >>> st[['a', 'e']] # get some columns >>> st[[0, 4]] # get some columns (order is that of iterating st (== list(st)) >>> st[1:5:2] # standard slice notation, indexing corresponding to previous >>> st[bool_vector] # get 'True' columns
Equivalents
>>> assert len(st) == st.get_ncols() >>> for _k in st: print(_k, st[_k]) >>> for _k, _v in st.items(): print(_k, _v) >>> for _k, _v in zip(st.keys(), st.values()): print(_k, _v) >>> for _k, _v in zip(st, st.values()): print(_k, _v) >>> if key in st: ... assert getattr(st, key) is st[key]
Context manager
>>> with Struct({'a': 1, 'b': 'fish'}) as st: ... st.a) >>> assert not hasattr(st, 'a')
- property _A
Display all columns, all rows (up to 10,000), and long strings of a
Dataset
orStruct
.Without this property, columns are elided when the maximum display width is reached, rows are elided when there are more then 30 to display, and strings are truncated after 15 characters.
See also
Examples
>>> ds = rt.Dataset({'col_'+str(i):rt.arange(31) for i in range(12)}) >>> ds[0] = 'long_string_long_string'
By default, columns are elided when the maximum display width is reached, rows are elided when there are more then 30 to display, and strings are truncated after 15 characters:
>>> ds # col_0 col_1 col_2 col_3 col_4 col_5 ... col_7 col_8 col_9 col_10 col_11 --- --------------- ----- ----- ----- ----- ----- --- ----- ----- ----- ------ ------ 0 long_string_lon 0 0 0 0 0 ... 0 0 0 0 0 1 long_string_lon 1 1 1 1 1 ... 1 1 1 1 1 2 long_string_lon 2 2 2 2 2 ... 2 2 2 2 2 3 long_string_lon 3 3 3 3 3 ... 3 3 3 3 3 4 long_string_lon 4 4 4 4 4 ... 4 4 4 4 4 5 long_string_lon 5 5 5 5 5 ... 5 5 5 5 5 6 long_string_lon 6 6 6 6 6 ... 6 6 6 6 6 7 long_string_lon 7 7 7 7 7 ... 7 7 7 7 7 8 long_string_lon 8 8 8 8 8 ... 8 8 8 8 8 9 long_string_lon 9 9 9 9 9 ... 9 9 9 9 9 10 long_string_lon 10 10 10 10 10 ... 10 10 10 10 10 11 long_string_lon 11 11 11 11 11 ... 11 11 11 11 11 12 long_string_lon 12 12 12 12 12 ... 12 12 12 12 12 13 long_string_lon 13 13 13 13 13 ... 13 13 13 13 13 14 long_string_lon 14 14 14 14 14 ... 14 14 14 14 14 ... ... ... ... ... ... ... ... ... ... ... ... ... 16 long_string_lon 16 16 16 16 16 ... 16 16 16 16 16 17 long_string_lon 17 17 17 17 17 ... 17 17 17 17 17 18 long_string_lon 18 18 18 18 18 ... 18 18 18 18 18 19 long_string_lon 19 19 19 19 19 ... 19 19 19 19 19 20 long_string_lon 20 20 20 20 20 ... 20 20 20 20 20 21 long_string_lon 21 21 21 21 21 ... 21 21 21 21 21 22 long_string_lon 22 22 22 22 22 ... 22 22 22 22 22 23 long_string_lon 23 23 23 23 23 ... 23 23 23 23 23 24 long_string_lon 24 24 24 24 24 ... 24 24 24 24 24 25 long_string_lon 25 25 25 25 25 ... 25 25 25 25 25 26 long_string_lon 26 26 26 26 26 ... 26 26 26 26 26 27 long_string_lon 27 27 27 27 27 ... 27 27 27 27 27 28 long_string_lon 28 28 28 28 28 ... 28 28 28 28 28 29 long_string_lon 29 29 29 29 29 ... 29 29 29 29 29 30 long_string_lon 30 30 30 30 30 ... 30 30 30 30 30
Display all columns, rows, and long strings:
>>> ds._A # col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10 col_11 --- ----------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------ ------ 0 long_string_long_string 0 0 0 0 0 0 0 0 0 0 0 1 long_string_long_string 1 1 1 1 1 1 1 1 1 1 1 2 long_string_long_string 2 2 2 2 2 2 2 2 2 2 2 3 long_string_long_string 3 3 3 3 3 3 3 3 3 3 3 4 long_string_long_string 4 4 4 4 4 4 4 4 4 4 4 5 long_string_long_string 5 5 5 5 5 5 5 5 5 5 5 6 long_string_long_string 6 6 6 6 6 6 6 6 6 6 6 7 long_string_long_string 7 7 7 7 7 7 7 7 7 7 7 8 long_string_long_string 8 8 8 8 8 8 8 8 8 8 8 9 long_string_long_string 9 9 9 9 9 9 9 9 9 9 9 10 long_string_long_string 10 10 10 10 10 10 10 10 10 10 10 11 long_string_long_string 11 11 11 11 11 11 11 11 11 11 11 12 long_string_long_string 12 12 12 12 12 12 12 12 12 12 12 13 long_string_long_string 13 13 13 13 13 13 13 13 13 13 13 14 long_string_long_string 14 14 14 14 14 14 14 14 14 14 14 15 long_string_long_string 15 15 15 15 15 15 15 15 15 15 15 16 long_string_long_string 16 16 16 16 16 16 16 16 16 16 16 17 long_string_long_string 17 17 17 17 17 17 17 17 17 17 17 18 long_string_long_string 18 18 18 18 18 18 18 18 18 18 18 19 long_string_long_string 19 19 19 19 19 19 19 19 19 19 19 20 long_string_long_string 20 20 20 20 20 20 20 20 20 20 20 21 long_string_long_string 21 21 21 21 21 21 21 21 21 21 21 22 long_string_long_string 22 22 22 22 22 22 22 22 22 22 22 23 long_string_long_string 23 23 23 23 23 23 23 23 23 23 23 24 long_string_long_string 24 24 24 24 24 24 24 24 24 24 24 25 long_string_long_string 25 25 25 25 25 25 25 25 25 25 25 26 long_string_long_string 26 26 26 26 26 26 26 26 26 26 26 27 long_string_long_string 27 27 27 27 27 27 27 27 27 27 27 28 long_string_long_string 28 28 28 28 28 28 28 28 28 28 28 29 long_string_long_string 29 29 29 29 29 29 29 29 29 29 29 30 long_string_long_string 30 30 30 30 30 30 30 30 30 30 30
- property _G
Display all columns of a
Dataset
orStruct
, wrapping the table after the maximum display width is reached.Note: The table is displayed as text, not HTML.
- Return type:
None
See also
Examples
>>> ds = rt.Dataset( ... {key: rt.FA([i, 2 * i, 3 * i, 4 * i]) % 3 == 0 for i, key in enumerate('abcdefghijklmno')} ... )
Default behavior:
>>> ds # a b c d e f g ... j k l m n o - ---- ----- ----- ---- ----- ----- ---- --- ---- ----- ----- ---- ----- ----- 0 True False False True False False True ... True False False True False False 1 True False False True False False True ... True False False True False False 2 True True True True True True True ... True True True True True True 3 True False False True False False True ... True False False True False False
Show all rows, wrapping the table as needed:
>>> ds._G # a b c d e f g h i j k l m - ---- ----- ----- ---- ----- ----- ---- ----- ----- ---- ----- ----- ---- 0 True False False True False False True False False True False False True 1 True False False True False False True False False True False False True 2 True True True True True True True True True True True True True 3 True False False True False False True False False True False False True # n o - ----- ----- 0 False False 1 False False 2 True True 3 False False
- property _H
Display all columns and long strings of a
Dataset
orStruct
.Without this property, columns are elided when the maximum display width is reached, and strings are truncated after 15 characters.
See also
Examples
By default, columns are elided when the maximum display width is reached, and strings are truncated after 15 characters.
>>> ds = rt.Dataset({key : rt.FA([i, 2*i, 3*i, 4*i])%3 == 0 for i, key in enumerate('abcdefghijklm')}) >>> ds[0] = rt.FA('long_string_long_string') >>> ds # a b c d e f ... h i j k l m - --------------- ----- ----- ---- ----- ----- --- ----- ----- ---- ----- ----- ---- 0 long_string_lon False False True False False ... False False True False False True 1 long_string_lon False False True False False ... False False True False False True 2 long_string_lon True True True True True ... True True True True True True 3 long_string_lon False False True False False ... False False True False False True
Display all columns and long strings:
>>> ds._H # a b c d e f g h i j k l m - ----------------------- ----- ----- ---- ----- ----- ---- ----- ----- ---- ----- ----- ---- 0 long_string_long_string False False True False False True False False True False False True 1 long_string_long_string False False True False False True False False True False False True 2 long_string_long_string True True True True True True True True True True True True 3 long_string_long_string False False True False False True False False True False False True
- property _T
Display a transposed view of the
Dataset
orStruct
.All columns are shown as rows and vice-versa. Strings up to 32 characters are fully displayed.
See also
Examples
>>> ds = rt.Dataset({'a': [1, 2, 3], 'b' : ['longstring_longstring_longstring_longstring', ... 'fish', 'david']}) >>> ds # a b - - --------------- 0 1 longstring_long 1 2 fish 2 3 david >>> ds._T Fields: 0 1 2 a 1 2 3 b longstring_longstring_lonstring fish david
- property _V
Display all rows (up to 10,000) of a
Dataset
orStruct
.Without this property, rows are elided when there are more than 30 to display.
See also
Examples
By default, rows are elided when there are more than 30 to display.
>>> ds = rt.Dataset({'a' : rt.arange(31)}) >>> ds # a --- --- 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 ... ... 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30
Display all rows:
>>> ds._V # a --- --- 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30
- property _row_numbers
Subclasses can define their own callback function to customize the left side of the table. If not defined, normal row numbers will be displayed
- Parameters:
arr (array) – Fancy index array of row numbers
style (
ColumnStyle
) – Default style object for final row numbers column.
- Returns:
header (string)
label_array (ndarray)
style (
ColumnStyle
)
- property _sort_columns
Subclasses can define their own callback function to return columns they were sorted by, and styles. Callback function will receive trimmed fancy index (based on sort index) and return a dictionary of column headers -> (masked_array, ColumnStyle objects) These columns will be moved to the left side of the table (but to the right of row labels, groupbykeys, row numbers, etc.)
- property _styles
Subclasses can return a callback function which takes no arguments Returns dictionary of column names -> ColumnStyle objects
- property doc
rt_meta.Doc
The descriptive documentation object for the structure.
Returns the footer attributes.
For example, Accum2 and AccumTable objects can have footers.
- property shape
Return the number of rows and columns.
See also
riptable.reshape
Return an array containing the same data with a new shape.
FastArray.reshape
Return an array containing the same data with a new shape.
Examples
>>> ds = rt.Dataset({"one": rt.arange(3), "two": rt.arange(3) % 2}) >>> ds # one two - --- --- 0 0 0 1 1 1 2 2 0 >>> ds.shape (3, 2)
- property total_sizes: Tuple[int, int]
The total physical and logical size of all (columnar) data in bytes within this Struct.
- Returns:
total_physical_size (int) – The total size, in bytes, of all columnar data in this instance, not counting any duplicate/alias object instances.
total_logical_size (int) – The total size, in bytes, of all columnar data in this instance, including duplicate/alias object instances. This value is always at least as large as
total_physical_size
.
- AllowAnyName = True
True if any name for a column name is permitted, but will be renamed.
- Type:
- _lastrepr = 0
- _lastreprhtml = 0
- _restricted_names
- _summary_len = 3
- col_delete
- __bool__()
- __contains__(item)
- __delattr__(name)
Implement delattr(self, name).
- __delitem__(name)
- __dir__()
Default dir() implementation.
- __enter__()
- __eq__(lhs)
Return self==value.
- __exit__(exc_type, exc_val, exc_tb)
- __ge__(lhs)
Return self>=value.
- __getattr__(name)
- __getitem__(index)
- Parameters:
index (colspec) –
- Returns:
The indexed item(s), that is, ‘column(s)’. If index resolves to multiple ‘cols’ then another ‘Struct’ will be returned with those items as a shallow copy.
- Return type:
result
- Raises:
- __gt__(lhs)
Return self>value.
- __iter__()
- __le__(lhs)
Return self<=value.
- __len__()
- __lt__(lhs)
Return self<value.
- __ne__(lhs)
Return self!=value.
- __repr__()
Return repr(self).
- __reversed__()
- __setattr__(name, value)
Implement setattr(self, name, value).
- __setitem__(index, value)
- Parameters:
index – colspec
value – May be any type
- Returns:
None
- Raises:
- __str__()
Return str(self).
- _addnewitem(name, value)
- _addnewitem_allnames(name, value)
- _aggregate_column_matches(items=None, like=None, regex=None, on_missing='raise', func=None)
Aggregate the matches returned by the methods defined for
items
,like
, andregex
, and return them in order.At least one of
items
,like
, orregex
must be specified.- Parameters:
items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.
like (str, optional) – Substring to match in column names.
regex (str, optional) – Regular expression string to match in column names.
on_missing ({"raise", "warn", "ignore"}, default "raise") –
Governs how to handle a column in
items
that doesn’t exist:”raise” (default): Raises an IndexError. No columns are returned.
”warn”: Issues a warning. All columns in
items
that do exist are included in match list.”ignore”: No error or warning. All columns in
items
that do exist are included in match list.
func (callable, optional) – Method calling
_aggregate_column_matches
. Used to enrich exception / log messages.
- Returns:
A list of strings corresponding to column name matches.
- Return type:
- classmethod _align_array_info(allinfo, maxwidths)
- classmethod _array_info_list(arrinfo)
Build list of info for single array. Used for all arrays in a container or a single array stored in single SDS file.
returns [‘FA’, ‘shape’, ‘dtype name’, ‘i+itemsize’]
- classmethod _array_summary(data, name=None)
- Parameters:
data – Tuple of array info from CompressionType.Info tup1: (tuple) shape tup2: (int) dtype.num tup3: (int) bitmask for numpy flags tup4: (int) itemsize
name – Optional name for top-level Struct.
Intenal routine for tree from meta summary (info only, no arrays)
- Returns:
String of array info for a single struct.
- classmethod _array_summary_single(arrinfo)
- _as_dictionary(copy=False, rows=None, cols=None)
Return a dictionary of numpy arrays.
- _as_meta_data(name=None, nested=True)
- _autocomplete()
- _build_sds_meta_data(name=None, nesting=True, **kwargs)
Final SDS file will be laid out as follows:
header
meta data string (json, includes scalars)
arrays
special arrays
meta tuples [tuple(item name, SDSFlags) for all items]
Nested data structures will generate their own SDS files.
- _check_addtype(name, value)
override to check types
- _copy(deep=False, cls=None)
- Parameters:
deep (bool, default True) – if True, perform a deep copy calling each object depth first with
.copy(True)
if False, a shallow.copy(False)
is called, often just copying the containers dict.cls (type, optional) – Class of return type, for subclass super() calls
False. (First argument must be deep. Deep cannnot be set to None. It must be True or) –
- _copy_base(from_Struct)
This copies the underlying special variables but does not copy _all_items or _uniqueid or any of the ‘columns’.
- Parameters:
from_Struct – the Struct being copied
- Returns:
is_locked() (must unlock/relock around rest of copy)
- _copy_from_dict(source_dict, copy=False, rows=None, cols=None)
- _deleteitem(name)
- _ensure_atomic(colnames, func)
Only proceed with certain operations if all columns exist in table. Pass in the function for a more informative error.
- _escape_invalid_file_chars(name)
Certain characters will cause problems in item names if a Struct needs to name an SDS file. (’', ‘:’, ‘<’, ‘>’, ‘!’, ‘|’, ‘*’, ‘?’)
- _extract_indexing(index)
Internal method common to get/set item.
- Parameters:
index – (rowspec, colspec) or colspec (=> rowspec of :)
- Returns:
col_idx
row_idx
ncols
nrows
row_arg – NB Any column names will be converted to str (from bytes or
numpy.str_
).
- classmethod _flatten_undo(sep, startpos, startname, obj_array, meta=None, cutoffs=None)
internal routine
- classmethod _from_meta_data(itemdict, itemflags, meta)
- classmethod _from_sds_onefile(arrs, meta_tups, meta=None, folders=None)
Special routine called after loading an SDS onefile to re-expand
- _get_count_for_slice(idx, for_rows)
- _get_final_display_mode(plain=False)
- static _get_seq(map, protected, start)
- _index_from_row_labels(fld)
Use this if row index was a string or tuple. Will only be applied to the Dataset’s label columns (if it has any).
- classmethod _info_tree(path, data)
Converts nested structure to tree view of file info for Struct and Dataset. Top level will be named based on single file or directory.
- _init_from_dict(dictionary)
- _ipython_key_completions_()
- _last_row_stats()
- classmethod _load_from_sds_meta_data(meta, arrays, meta_tups=[], file_header={}, include=None)
Iterates over sections of the meta data object to rebuild a data structure.
Arrays will be in the following order: - Main arrays (or underlying FastArrays for subclasses) - Secondary arrays for FastArray subclasses that require additional contiguous data (e.g. Categorical) - Array of fancy indices to sort by
A dictionary will be constructed. All arrays will be inserted by name from ‘item_names’ in meta object. All (if any) meta data will be read from ‘item_meta’ in meta object, and FastArray subclasses will be constructed. The container object will be constructed from the dictionary. Any labels (gbkeys) will be set. If sorted column names exist, they will be set, and the sorted index will be added to the SortCache.
- classmethod _load_from_sds_meta_data_nested(name, meta, arrdict)
- classmethod _load_without_meta_data(meta, arrays, meta_tups, file_header=None)
Loads from meta tuples only (e.g. when no metadata is generated by Matlab)
- _lock()
- _mask_get_item(idx, by_col_arg=True)
_mask_get_item applies a mask to a row or a column
- Parameters:
idx – the argument from the get/set-item [] brackets
by_col_arg – is this a column mask (instead of row mask)
- Returns:
list of actual indexes or None
- _meta_dict(name=None)
- _post_init()
Call self._run_once() to cleanup or init anything else, override _run_once() in subclasses if needed. :return: None
- _pre_init()
- _prepare_display_data()
Returns a list of lists (all column data) and a list of header tuples for display.
- Returns:
list(list), list(tuple)
- _replaceitem(name, value)
- _replaceitem_allnames(name, value)
- _repr_html_()
- _run_once()
Other classes may override _run_once to initialize data, see _post_init() :return: None
- _safe_reordering_of_renames(orig_dict)
- classmethod _scalar_summary(scalar_tup)
Scalars are stored as arrays in SDS, but a flag is set in the meta tuple. They will be labeled as scalar and their dtype will be displayed.
- classmethod _serialize_item(item, itemname)
return a dict of {name: array} a matching list of ints which are the arrayflags a metastring if it exists
- static _sizeof_fmt(num, suffix='B')
- _sort_column_styles(style)
Callback to return sort-by columns.
style : default sort style from display
Returns dictionary of column name -> tuple( array, ColumnStyle ) These columns will be moved to the left of the table.
- _struct_compare_check(func_name, lhs)
Returns a Struct consisting of union of key names with value self.X == self.Y. If a key is missing from one or the other it will have value False. If any comparison fails (exception) the value will be False. If any comparisons value Y cannot be cast to bool, Y.all() and all(Y) will be attempted.
- Parameters:
func_name – comparison function name (e.g., ‘__eq__’)
lhs –
- Returns:
Struct of bools
- _superadditem(name, value)
- _temp_display(option, value)
Temporarily modify a display option when generating dataset display. User configured option (or default) will be restored after display string is generated.
- classmethod _tree_from_sds_meta_data(meta, arrays, meta_tups, file_header)
SDS loads in info mode (no data loaded, just metadata + file header information)
- Returns:
Tree display of nested structures in SDS directory.
- Return type:
- _unlock()
- _update_sort(name)
Discard sort index if sortby item was removed or replaced.
- _validate_names(names)
- all()
For use in boolean contexts: Is it true that for all elements (val) either:
val casts to True, or
returns True for val.all() or all(val)
- Return type:
- any()
For use in boolean contexts: Does there exist an element (val) which either:
val casts to True, or
returns True for val.any() or any(val)
- Return type:
Examples
>>> s=rt.Struct() >>> s.a=rt.Dataset() >>> s.any() False
- apply_schema(schema)
Apply a schema containing descriptive information recursively to the Struct.
- Parameters:
schema (dict) – A dictionary of schema information. See
rt_meta.apply_schema()
for more information on the format of the dictionary.- Returns:
res – Dictionary of deviations from the schema
- Return type:
See also
- as_ordered_dictionary(sublist=None)
Returns contents of Struct as a collections.OrderedDict instance.
- asdict(sublist=None, copy=False)
Return contents of Struct as a dictionary.
- Parameters:
- Return type:
Examples
This is useful if, for whatever reason, a riptable Dataset needs to go into a pandas DataFrame:
>>> ds = rt.Dataset({'col_'+str(i): rt.arange(5) for i in range(5)}) >>> df = pd.DataFrame(ds.asdict()) >>> df col_0 col_1 col_2 col_3 col_4 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Certain items can be requested with the
sublist
keyword:>>> ds.asdict(sublist=['col_1','col_3']) {'col_1': FastArray([0, 1, 2, 3, 4]), 'col_3': FastArray([0, 1, 2, 3, 4])}
- col_add_prefix(prefix)
Add the same prefix to all items in the Struct/Dataset.
Rather than renaming the columns in a col_rename loop - which would have to rebuild the underlying dictionary N times, this clears the original dictionary, and rebuilds a new one once. Label columns and sortby columns will also be fixed to match the new names.
- Parameters:
prefix (str) – String to add before every each item name
- Return type:
None
Examples
>>> #TODO Need to call np.random.seed(12345) first to ensure example runs deterministically >>> ds = rt.Dataset({'col_'+str(i):np.random.rand(5) for i in range(5)}) >>> ds.col_add_prefix('NEW_') # NEW_col_0 NEW_col_1 NEW_col_2 NEW_col_3 NEW_col_4 - --------- --------- --------- --------- --------- 0 0.70 0.52 0.07 0.81 0.26 1 0.13 0.43 0.01 0.46 0.45 2 0.34 0.24 0.87 0.81 0.80 3 0.63 0.22 0.85 0.60 0.91 4 0.46 0.70 0.02 0.49 0.34
- col_add_suffix(suffix)
Add the same suffix to all items in the Struct/Dataset.
Rather than renaming the columns in a col_rename loop - which would have to rebuild the underlying dictionary N times, this clears the original dictionary, and rebuilds a new one once. Label columns and sortby columns will also be fixed to match the new names.
- Parameters:
suffix (str) – String to add before every each item name
- Return type:
None
Examples
>>> #TODO Need to call np.random.seed(12345) first to ensure example runs deterministically >>> ds = rt.Dataset({'col_'+str(i):np.random.rand(5) for i in range(5)}) >>> ds.col_add_suffix('_NEW') # col_0_NEW col_1_NEW col_2_NEW col_3_NEW col_4_NEW - --------- --------- --------- --------- --------- 0 0.70 0.52 0.07 0.81 0.26 1 0.13 0.43 0.01 0.46 0.45 2 0.34 0.24 0.87 0.81 0.80 3 0.63 0.22 0.85 0.60 0.91 4 0.46 0.70 0.02 0.49 0.34
- col_exists(name)
Return True if the column name already exists
- col_filter(items=None, like=None, regex=None, on_missing='raise')
Return the columns specified by indices or matches on column names.
Note that this method doesn’t filter a
Dataset
orStruct
on its contents, only on the column index or name.At least one of
items
,like
, orregex
must be specified.- Parameters:
items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.
like (str, optional) – Substring to match in column names.
regex (str, optional) – Regular expression string to match in column names.
on_missing ({"raise", "warn", "ignore"}, default "raise") –
Governs how to handle a column in
items
that doesn’t exist:”raise” (default): Raises an IndexError. Nothing is returned.
”warn”: Issues a warning. Any columns in
items
that do exist are returned.”ignore”: No error or warning. Any columns in
items
that do exist are returned.
- Returns:
Same type as the input object.
- Return type:
See also
Examples
Select columns by name:
>>> ds = rt.Dataset({"one": rt.arange(3), "two": rt.arange(3) % 2, "three": rt.arange(3) % 3}) >>> ds # one two three - --- --- ----- 0 0 0 0 1 1 1 1 2 2 0 2
>>> ds.col_filter(items=["one", "three"]) # one three - --- ----- 0 0 0 1 1 1 2 2 2
Select columns by index:
>>> ds.col_filter(items=[0, 1]) # one two - --- --- 0 0 0 1 1 1 2 2 0
Select columns by substring:
>>> ds.col_filter(like="thr") # three - ----- 0 0 1 1 2 2
Select columns by regular expression:
>>> ds.col_filter(regex="e$") # one three - --- ----- 0 0 0 1 1 1 2 2 2
Select
Dataset
andFastArray
objects from aStruct
:>>> ds2 = rt.Dataset({"four": rt.arange(3), "five": rt.arange(3) % 2}) >>> fa = rt.FastArray([1, 2, 3]) >>> s = rt.Struct() >>> s.ds = ds >>> s.ds2 = ds2 >>> s.fa = fa >>> s.col_filter([0]) # Name Type Size 0 1 2 - ---- ------- --------------- - - - 0 ds Dataset 3 rows x 3 cols >>> s.col_filter("fa") # Name Type Size 0 1 2 - ---- ----- ---- - - - 0 fa int32 3 1 2 3
- col_get_attribute(name, attrib_name, default=None)
Gets the attribute of the specified column, the
attrib_name
must be used to indicate which attribute.- Parameters:
Examples
>>> ds.col_set_attribute('col1', 'TEST', 417) >>> ds.col_get_attribute('col1', 'TEST') 417 >>> ds.col_get_attribute('col1', 'TEST', nan) 417 >>> ds.col_get_attribute('col1', 'DOESNOTEXIST', nan) nan
- col_get_len()
Gets the number of columns (or items) in the Struct
- col_get_value(name)
Return a single item.
- Parameters:
name (string) – Item name.
- Returns:
Item from item container (no attribute).
- Return type:
obj
- Raises:
KeyError – Item not found with given
name
.
- col_map(rename_dict)
Rename columns and re-arrange names of columns based on the rules set forth in the supplied dictionary.
- Parameters:
rename_dict (dict) – Dictionary defining a remapping of (some/all) column names.
- Return type:
None
Examples
>>> #TODO Call np.random.seed(12345) here to make the example output deterministic >>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)}) >>> ds.col_map({'col_1':'AAA', 'col_2':'BBB'}) >>> ds # col_0 AAA BBB col_3 col_4 - ----- ---- ---- ----- ----- 0 0.55 0.21 0.27 0.85 0.03 1 0.77 0.75 0.65 0.97 0.24 2 0.09 0.07 0.40 0.81 0.62 3 0.50 0.93 0.98 0.99 0.99 4 0.40 0.45 0.53 0.49 0.76
- col_move(flist, blist)
Move single column or group of columns to back of list for iteration/indexing/display. Values of columns will remain unchanged.
- col_move_to_back(cols)
Move single column or group of columns to front of list for iteration/indexing/display.
Values of columns will remain unchanged.
Examples
>>> #TODO Call np.random.seed(12345) here to make the example output deterministic >>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)}) >>> ds # col_0 col_1 col_2 col_3 col_4 - ----- ----- ----- ----- ----- 0 0.28 0.84 0.24 0.72 0.81 1 0.72 0.44 0.41 0.53 0.17 2 0.37 0.66 0.61 0.52 0.50 3 0.08 0.31 0.15 0.65 0.98 4 0.63 0.89 0.25 0.13 0.16
>>> ds.col_move_to_back(['col_2','col_0']) # col_1 col_3 col_4 col_2 col_0 - ----- ----- ----- ----- ----- 0 0.84 0.72 0.81 0.24 0.28 1 0.44 0.53 0.17 0.41 0.72 2 0.66 0.52 0.50 0.61 0.37 3 0.31 0.65 0.98 0.15 0.08 4 0.89 0.13 0.16 0.25 0.63
See also
- col_move_to_front(cols)
Move single column or group of columns to front of list for iteration/indexing/display. Values of columns will remain unchanged.
Examples
>>> #TODO Call np.random.seed(12345) here to make the example output deterministic >>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)}) >>> ds # col_0 col_1 col_2 col_3 col_4 - ----- ----- ----- ----- ----- 0 0.60 0.50 0.77 0.72 0.73 1 0.48 0.65 0.96 0.17 0.99 2 0.06 0.54 0.81 0.20 0.30 3 0.18 0.85 0.24 0.44 0.38 4 0.04 0.84 0.64 0.66 0.97
>>> ds.col_move_to_front(['col_4', 'col_2']) >>> ds # col_4 col_2 col_0 col_1 col_3 - ----- ----- ----- ----- ----- 0 0.73 0.77 0.60 0.50 0.72 1 0.99 0.96 0.48 0.65 0.17 2 0.30 0.81 0.06 0.54 0.20 3 0.38 0.24 0.18 0.85 0.44 4 0.97 0.64 0.04 0.84 0.66
See also
- col_pop(colspec)
colspec is as for [] (getitem). List input will return a sub-Struct, removing it from current object. Single-column (“string”, single integer) input will return a single “column”.
- Parameters:
colspec (list, string, or integer) –
- Returns:
Single value or new (same-type) object containing the removed data.
- Return type:
obj
Examples
>>> ds = rt.Dataset({'col_'+str(i): rt.arange(5) for i in range(3)}) >>> ds # col_0 col_1 col_2 - ----- ----- ----- 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 >>> col = ds.col_pop('col_1') >>> ds # col_0 col_2 - ----- ----- 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 >>> col FastArray([0, 1, 2, 3, 4])
- col_remove(items=None, like=None, regex=None, on_missing='warn')
Remove the columns specified by indices or matches on column names.
This can be done only if the
Dataset
orStruct
is unlocked.At least one of
items
,like
, orregex
must be specified.- Parameters:
items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.
like (str, optional) – Substring to match in column names.
regex (str, optional) – Regular expression string to match in column names.
on_missing ({"warn", "raise", "ignore"}, default "warn") –
Governs how to handle a column in
items
that doesn’t exist:”warn” (default): Issues a warning. All columns in
items
that do exist are removed.Changed in version 1.13.0: Previously, the default value was
"raise"
.”raise” : Raises an IndexError. No columns are removed.
”ignore”: No error or warning. All columns in
items
that do exist are removed.
- Return type:
None
Examples
>>> ds = rt.Dataset({"col_" + str(i): rt.arange(5) for i in range(5)}) >>> ds # col_0 col_1 col_2 col_3 col_4 - ----- ----- ----- ----- ----- 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
>>> ds.col_remove(["col_2", "col_0"]) >>> ds # col_1 col_3 col_4 - ----- ----- ----- 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
Try to remove a column that doesn’t exist with the default
on_missing="warn"
. A warning is raised, and any columns that do exist are removed:>>> ds.col_remove(["col_1", "col_2"]) UserWarning: Column col_2 doesn't exist and couldn't be removed. >>> ds # col_3 col_4 - ----- ----- 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4
Remove a column by its index:
>>> ds.col_remove([0]) >>> ds # col_4 - ----- 0 0 1 1 2 2 3 3 4 4
>>> ds2 = rt.Dataset({"aabb": rt.arange(3), "abab": rt.arange(3), "ccdd": rt.arange(3), ... "cdcd": rt.arange(3)}) >>> ds2 # aabb abab ccdd cdcd - ---- ---- ---- ---- 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2
Remove columns by substring:
>>> ds2.col_remove(like="cd") >>> ds2 # aabb abab - ---- ---- 0 0 0 1 1 1 2 2 2
Remove columns by regular expression:
>>> ds2.col_remove(regex="^ab") >>> ds2 # aabb - ---- 0 0 1 1 2 2
- col_rename(old, new)
Rename a single column.
The new name must be a valid column name; that is, it must not be a Python keyword or a
Struct
orDataset
class method name.To check whether a name is valid, use
is_valid_colname
. To see a list of invalid column names, useget_restricted_names
.Note that column names that don’t meet Python’s rules for well-formed variable names can’t be accessed using attribute access. For example, a column named ‘my-column’ can’t be accessed with
ds.my-column
, but can be accessed withds['my-column']
.See also
is_valid_colname
Check whether a string is a valid column name.
get_restricted_names
Get a list of invalid column names.
Examples
>>> ds = rt.Dataset({'a': [1, 2, 3]), 'b': [4.0, 5.0, 6.0]}) >>> ds # a b - - ---- 0 1 4.00 1 2 5.00 2 3 6.00 >>> ds.col_rename('a', 'new_a') >>> ds # new_a b - ----- ---- 0 1 4.00 1 2 5.00 2 3 6.00
- col_set_attribute(name, attrib_name, attrib_value)
Sets the attribute of the specified column, the attrib_name must be used to indicate which attribute.
- Parameters:
Examples
>>> ds.col_set_attribute('col1', 'TEST', 417) >>> ds.col_get_attribute('col1', 'TEST') 417
- col_set_value(name, value)
Check if item name is allowed, possibly escape. Set the value portion of the item to value.
- col_str_match(expression, flags=0)
Create a boolean mask vector for columns whose names match the regex.
Uses
re.match()
, notre.search()
.- Parameters:
expression (str) – regular expression
flags – regex flags (from
re
module).
- Returns:
Array of bools (len ncols) which is true for columns which match the regex.
- Return type:
Examples
>>> st = rt.Struct({ ... 'price' : arange(5), ... 'trade_time' : rt.arange(5) * 1000, # expected to regex match `.*time.*` ... 'name' : rt.FA(['a','b','c','d','e']), ... 'other_trade_time' : rt.arange(5) * 1000, # expected to regex match `.*time.*` ... }) >>> st.col_str_match(r'.*time.*') FastArray([False, True, False, True])
- col_str_replace(old, new, max=-1)
If a column name contains the old string, the old string will be replaced with the new one. If replacing the string will conflict with an existing column name, an error will be raised. Labels / sortby columns will be fixed if their names are modified.
- Parameters:
Examples
Replace all occurrences in each names:
>>> ds = rt.Dataset({ ... 'aaa': rt.arange(5), ... 'a' : rt.arange(5), ... 'aab': rt.arange(5) ... }) >>> ds.col_str_replace('a', 'A') >>> ds # AAA A AAb - --- - --- 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
Limit number of replacements per name:
>>> ds = rt.Dataset({ ... 'aaa': rt.arange(5), ... 'a' : rt.arange(5), ... 'aab': rt.arange(5) ... }) >>> ds.col_str_replace('a','A',max=1) >>> ds # Aaa A Aab - --- - --- 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
Replacing will create a conflict:
>>> ds = rt.Dataset({'a': rt.arange(5), 'A': rt.arange(5)}) ValueError: Item A already existed, cannot make replacement in item.
- col_swap(from_cols, to_cols)
Swaps column values, names retain current order.
- Parameters:
Examples
>>> st = Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'}, ... 'e': np.ones(7, dtype=np.int64)}) >>> st # Name Type Rows 0 1 2 - ---- ----- ---- ---- --- - 0 a int 0 1 1 b str 0 fish 2 c list 2 5.6 7.8 3 d dict 2 A B 4 e int64 7 1 1 1
>>> st.col_swap(list('abc'), list('cba')) >>> st # Name Type Rows 0 1 2 - ---- ----- ---- ---- --- - 0 a list 2 5.6 7.8 1 b str 0 fish 2 c int 0 1 3 d dict 2 A B 4 e int64 7 1 1 1
- classmethod concat_structs(struct_list)
Merges data from multiple structs.
Structs must have the same keys, and contain only Structs, Datasets, arrays, and riptable arrays.
A struct utility for merging data from multiple structs (useful for multiday loading). Structs must have the same keys, and contain only Structs, Datasets, Categoricals, and Numpy Arrays.
See also
- copy(deep=True)
Returns a shallow or deep copy of the
Struct
. Defaults to a deep copy.- Parameters:
deep (bool, default True) – if True, perform a deep copy calling each object depth first with
.copy(True)
if False, a shallow.copy(False)
is called, often just copying the containers dict.
Examples
>>> ds=rt.Dataset({'somenans': [0., 1., 2., nan, 4., 5.], 'morestuff': ['A','B','C','D','E','F']}) >>> ds2=rt.Dataset({'somenans': [0., 1., nan, 3., 4., 5.], 'morestuff':['H','I','J','K','L','M']}) >>> st=Struct({'test':ds, 'test2': Struct({'ds2':ds2}), 'arr': arange(10)}) >>> st.copy() # Name Type Size 0 1 2 - ----- ------- --------------- --- - - 0 test Dataset 6 rows x 2 cols 1 test2 Struct 1 ds2 2 arr int32 10 0 1 2
- display_attributes()
Returns a dict of display attributes, currently consisting of NumberOfFooterRows and a list of MarginColumns.
- Returns:
d – A dictionary of display attributes
- Return type:
- dtranspose(plain=False)
For display only. Return a transposed version of the container’s string representation.
- Parameters:
plain (bool, False) – If true then should not be colored.
- Returns:
Formatted, transposed version of this instance; intended for display.
- Return type:
string
Examples
>>> st = rt.Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'}, ... 'e': np.ones(7, dtype=np.int64)}) >>> st # Name Type Size 0 1 2 - ---- ----- ---- ---- --- - 0 a int 0 1 1 b str 0 fish 2 c list 2 5.6 7.8 3 d dict 2 A B 4 e int64 7 1 1 1 [5 columns] >>> st.dtranspose() Fields: 0 1 2 3 4 ------- --- ---- ---- ---- ----- Name a b c d e Type int str list dict int64 Size 0 0 2 2 7 0 1 fish 5.6 A 1 1 7.8 B 1 2 1 [5 columns]
- equals(other)
Test whether two Structs contain the same elements in each column. NaNs in the same location are considered equal.
- Parameters:
other (another Struct or dict to compare to) –
- Return type:
See also
Dataset.crc, ==, >=, <=, >, <
Examples
>>> s1 = rt.Struct({'t': 54, 'test': np.int64(34), 'test2': rt.arange(200)}) >>> s2 = rt.Struct({'t': 54, 'test': np.int64(34), 'test2': rt.arange(200)}) >>> s1.equals(s2) True
- flatten(sep='/', level=0)
Flattens or collapses a Struct, recursively called
- Parameters:
use (sep='/' the separating string to) – Please note that some chars are not allowed and will be replaced with _.
- Return type:
New Struct with collapsed names (separated by specified char) which can then be saved
Note
_sep is stored in the __dict__ to help with undo or saving to file arrayflags, metastring are now exposed
See also
- flatten_undo(sep=None, startname='', obj_array=None)
Restores a Struct to original form before Struct.flatten()
- Parameters:
sep=None –
'/' (user may pass in the separating string to use such as) –
- Return type:
New Struct that is back to original form before Struct.flatten()
See also
- get_attribute(attrib_name, default=None)
Get an attribute that applies to all items/columns.
- Parameters:
attrib_name – name of the attribute
default – return value if attrib_name is not a valid attribute
- Returns:
val
- Return type:
attribute value or None
See also
- get_ncols()
Return the number of items in the Struct.
- Returns:
ncols – The number of items in the Struct
- Return type:
- get_nrows()
Retunrs 0, as a Struct has no rows.
- Returns:
0
- Return type:
Note
Subclasses need to define this explicitly.
- get_restricted_names()
Return a list of invalid column names.
Invalid column names are Python keywords and
Struct
orDataset
class method names.This method generates the result only once. Afterward, it is stored as a class variable.
- Returns:
A set of strings that are invalid column names.
- Return type:
See also
is_valid_colname
Check whether a string is a valid column name.
Examples
>>> # Limit and format the output. >>> print("Some of the restricted names include: ") >>> print(", ".join(list(ds.get_restricted_names())[::10])) Some of the restricted names include: mask_or_isinf, __reduce_ex__, imatrix_xy, __weakref__, dtypes, _get_columns, from_arrow, elif, __imul__, _deleteitem, __rsub__, _index_from_row_labels, as_matrix, putmask, _as_meta_data, shape, cat, __invert__, try, _init_columns_as_dict, label_as_dict, col_str_replace, _replaceitem, label_set_names, __contains__, __floordiv__, _row_numbers, filter, __init__, sorts_on, flatten_undo, col_str_match, __dict__, size, __rand__, info, col_remove, as, or
- classmethod hstack(struct_list)
Merges data from multiple structs. Structs must have the same keys, and contain only Structs, Datasets, arrays, and riptable arrays.
- Parameters:
struct_list (list of
Struct
) –loading). (A struct utility for merging data from multiple structs (useful for multiday) –
keys (Structs must have the same) –
Structs (and contain only) –
Datasets –
Categoricals –
Arrays. (and Numpy) –
- Returns:
obj
- Return type:
See also
riptable.hstack
- info(**kwargs)
Return an object containing a description of the structure’s contents.
- Parameters:
kwargs (dict) – Optional keyword arguments passed to
rt_meta.info()
- Returns:
info – A description of the structure’s contents.
- Return type:
- is_locked()
Returns True if object is locked (unable to add/remove/rename elements).
NB: Currently behaves as does tuple: the contained data will still be mutable when possible.
- Returns:
True if object is locked
- Return type:
- is_valid_colname(name)
Check whether a string is a valid column name.
Python keywords and
Struct
orDataset
class method names are not valid column names.To see a list of invalid column names, use
get_restricted_names
.- Parameters:
name (str) – The string to be checked.
- Returns:
True if
name
is valid, otherwise False.- Return type:
See also
get_restricted_names
Get a list of invalid column names.
Examples
>>> ds.is_valid_colname('yield') # Python keyword False >>> ds.is_valid_colname('sample') # Dataset method False >>> ds.is_valid_colname('Yield') # OK because keywords are case-sensitive True >>> ds.is_valid_colname('Sample') # Method names are also case-sensitive True
- items()
Dictionary-iterator access to Struct items.
- Returns:
dict_items – Name, Item pairs.
return: iterator to column keys and values
- key_search(regex, case_sensitive=False, recursive=True, path='')
- label_as_dict()
Gets the column names used as labels in display
- label_filter(items=None, like=None, regex=None, axis=None)
Subset rows of dataset according to value in its label column.
TODO: how should multikey be handled?
- Parameters:
items (list-like) – List of specific values to match in label column.
like (string) – Keep items where ‘like’ occurs in label column.
regex (string (regular expression)) – Keep axis with re.search(regex, col) == True.
Examples
>>> ds # col_7 col_8 col_9 keycol -- ----- ----- ----- -------------- 0 0.53 0.52 0.47 paul 1 0.10 0.78 0.09 ray 2 0.50 0.79 0.50 paul 3 0.81 0.68 0.72 ray 4 0.08 0.71 0.02 john 5 0.38 0.19 0.90 ray 6 0.53 0.33 0.46 mary katherine 7 0.75 0.48 0.94 john 8 1.00 0.70 0.79 mary ann 9 0.47 0.64 0.16 ray 10 0.80 0.43 0.08 mary ann 11 0.54 0.19 0.43 joe 12 0.89 0.08 0.81 mary katherine 13 0.96 0.91 0.33 paul 14 0.18 0.55 0.44 ray 15 0.42 0.49 0.66 mary ann 16 0.05 0.53 0.66 paul 17 0.60 0.56 0.03 joe 18 0.62 0.42 0.56 mary ann 19 0.63 0.33 0.95 paul
>>> gb = ds.gb('keycol').sum() >>> gb.label_filter(items='john') *keycol col_7 col_8 col_9 ------- ----- ----- ----- john 0.82 1.19 0.96
>>> gb.label_filter(like=['ar', 'p']) *keycol col_7 col_8 col_9 -------------- ----- ----- ----- mary ann 2.85 2.05 2.08 mary katherine 1.43 0.41 1.27 paul 2.66 3.08 2.92
>>> gb.label_filter(regex='n$') *keycol col_7 col_8 col_9 -------- ----- ----- ----- john 0.82 1.19 0.96 mary ann 2.85 2.05 2.08
- label_get_names()
Gets the column names used as labels in display
- label_remove()
Reomves any labels used in display
- label_set_names(listnames)
Set which column names can be used as labels in display
- classmethod load(path='', name=None, share=None, info=False, columns=None, include_all_sds=False, include=None, threads=None, folders=None)
Load a Struct from a directory or single SDS file.
- Parameters:
path (str or os.PathLike) – Full path to directory or single SDS file with Struct data.
name (
str
, optional, default None) – Name of a nested container to search for in the root directory. Multiple tiers can be separated by ‘//’info (bool, optional, default False) – If True, no array data will be loaded, instead a display tree of information about nested structures and their contents will be returned.
columns (
list
, optional, default None) – Not implementedinclude_all_sds (bool, optional, default False) – If False, when additional files were found in a directory, and they were not in the root structs meta data, the user will be prompted to load them. If True, all files will be automatically loaded.
include (list of str, optional, default None) – A list of specific items to load. This list will only be applied to the root Struct - not to nested containers.
threads (int, optional, default None) – Number of threads to use during the SDS load. Number of threads before the load will be restored after the load or if the load fails. See also
riptide_cpp.SetThreadWakeUp
.
- Returns:
Loaded data with possibly nested containers and riptable classes restored.
- Return type:
See also
riptable.load_sds
- make_categoricals(columnlist=None, dtype=None)
Converts specified string/bytes columns or all string/bytes columns to Categorical. Will also crawl through nested structs/datasets and convert their strings to categoricals.
- Parameters:
columnlist (
str
orlist
, optional) – Single name, or list of names of items to convert to categoricals.dtype (
numpy.dtype
, optional) – Integer dtype for the categoricals’ underlying arrays.
- Raises:
TypeError – If the dtype was set to a non-dtype object.
ValueError – If a requested item could not be found in the container.
Notes
Error checking will complete in the root structure before any conversion begins.
- make_matlab_categoricals(xtra, remove_trailing=True, dtype=None, prefix='p', keep_prefix=True)
Turn matlab categorical indices and corresponding unique arrays into riptable categoricals.
- Parameters:
xtra (
Struct
) – Container holding unique arrays.remove_trailing (bool, optional, default True) – If True, remove trailing spaces from Matlab strings.
dtype (
numpy.dtype
, optional, default None) – Integer dtype for underlying array of constructed categoricals.prefix (
str
, optional, default ‘p’) – Prefix for integer arrays in calling dataset - columns that will be looked for in the struct.keep_prefix (bool, default True) – If True, Drop the prefix after flipping the column to categorical in the dataset. If the a column exists with that name, the user will be warned.
- make_matlab_datetimes(dtcols=None, gmt=False, auto=True)
Convert datetime columns from Matlab to DateTimeNano and TimeSpan arrays.
- make_struct_from_categories(prefix=None, keep_prefix=False)
Build a struct of unique arrays from all categoricals in the container, or those with a specified prefix.
- Parameters:
Examples
TODO - sanitize - add example that makes a struct from categoricals and prints its representation See the version history for structure of older examples.
- Returns:
cats
- Return type:
Notes
This is a partial inverse operation of Struct.make_matlab_categoricals
- make_table(display_type)
Pretty-print code used by infrastructure.
- Parameters:
display_type (rt.rt_enum.DS_DISPLAY_TYPES) –
- Returns:
Display object or string.
- Return type:
obj or str
- save(path='', name=None, share=None, overwrite=True, compress=True, onefile=False, bandsize=None)
Save a struct to a directory. If the struct contains only arrays, will be saved as a single .SDS file.
- Parameters:
path (str or os.PathLike) – Full path to save. Directory will be created automatically if it doesn’t exist. .SDS extension will be appended if a single file is being saved and is necessary.
name (str, optional) – Name for the root structure if it’s being appended to an existing struct’s directory. The existing _root.sds does not get overwritten, and structs can be combined without a full load.
share (str, optional) –
overwrite (bool, optional, default True) – If True, user will not be prompted on whether or not to overwrite existing .SDS files. Otherwise, prompt will appear if directory exists.
compress (bool, optional, default True) – If True, ZStandard compression will be used when writing to SDS, otherwise, no compression will be used.
onefile (bool, optional, default False) – If True will collapse all nesting Structs
bandsize (int, optional, default None) –
- set_attribute(attrib_name, attrib_value)
Set an attribute that applies to all items/columns.
- Parameters:
attrib_name – name of the attribute
attrib_value – value of the attribute
See also
- set_display_callback(userfunc, scope=None)
Set the user display callback for styling text.
- Parameters:
userfunc (function) – This function must take two arguments,
userfunc(cols, **kwargs)
.scope (default None) – The callback for just this Dataset, or all Datasets. Can be None,
Dataset
, orStruct
.
Examples
>>> from riptable.rt_display import DisplayColumnColors >>> def make_red(cols, **kwargs): ... location = kwargs['location'] # could left, right, or main ... if location == 'main': ... for col in cols: ... for cell in col: ... if cell.string.startswith('-'): cell.string = '(' + cell.string[1:] + ')'; cell.color=DisplayColumnColors.Red >>> ds=rt.Dataset({'test':rt.arange(5)-3, 'another':rt.arange(5.0)-2}) >>> ds.set_display_callback(make_red) >>> ds
- static set_fast_array(val)
Set to true to force the casting of numpy arrays to FastArray when constructing a Struct or adding a new column.
- Parameters:
val (bool) – True or False
- summary_as_dict()
Gets the column names used as rights in display
- summary_get_names()
Gets the column names used as rights in display
- summary_remove()
Reomves any rights used in display
- summary_set_names(listnames)
Set which column names can be used as rights in display
- tolist()
Returns data values in a list. Output equivalent to list(st.values()).
- Returns:
list
- tree(name=None, showpaths=False, info=False)
Returns a hierarchical view of the Struct.
- Parameters:
name (str) – Optional name for the top of the tree
showpaths – TODO purpose unknown, may raise error if true
info – TODO purpose unknown
- Returns:
tree – A hierarchical view of the Struct
- Return type:
Examples
>>> st1 = rt.Struct({'A': rt.FA([1, 2, 3]), 'B': rt.FA([4, 5])}) >>> st2 = rt.Struct({'C': st1, 'D': rt.FA([6, 7, 8])}) >>> st2.tree() Struct ├──── C (Struct) │ ├──── A int32 (3,) 4 │ └──── B int32 (2,) 4 └──── D int32 (3,) 4 >>> st2.tree(name='foo') foo ├──── C (Struct) │ ├──── A int32 (3,) 4 │ └──── B int32 (2,) 4 └──── D int32 (3,) 4
- values()
Values are individual items from the struct (no attribute from item container).
- Returns:
Items.
- Return type:
dict_values