riptable.rt_struct

Classes

Struct

The Struct class is at the root of much of the riptable class design; both Dataset and Multiset

class riptable.rt_struct.Struct(dictionary={})[source]

The Struct class is at the root of much of the riptable class design; both Dataset and Multiset inherit from Struct.

Struct represents a collection of (mixed-type) data members, with standard attribute get/set behavior, as well as dictionary-style retrieval.

The Struct constructor takes a dictionary (dict, OrderedDict, etc…) as its required argument. When Struct.UseFastArray is True (the default), any numpy arrays among the dictionary values will be cast into FastArray. Struct() := Struct({}).

The constructor dictionary keys (or element/column names added later) must not conflict with any Struct member names. Additionally, if Struct.AllowAnyName is False (it is True by default), a column name must be a legal Python variable name, not starting with ‘_’.

Parameters:

dictionary (dict) – A dictionary of named objects.

Examples

>>> st = rt.Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'},
... 'e': np.ones(7, dtype=np.int64)})
>>> print(st)
#   Name   Type    Size   0      1     2
-   ----   -----   ----   ----   ---   -
0   a      int     0      1
1   b      str     0      fish
2   c      list    2      5.6    7.8
3   d      dict    2      A      B
4   e      int64   7      1      1     1
>>> st.a
1
>>> st['a']
1
>>> print(st[3:])
#   Name   Type    Rows   0   1   2
-   ----   -----   ----   -   -   -
0   d      dict    2      A   B
1   e      int64   7      1   1   1
>>> st.newcol = 5  # okay, a new entry
>>> st.newcol = [5, 7]  # okay, replace the entry
>>> st['another'] = 6  # also works
>>> st['newcol'] = 6  # and this works as well

Indexing behavior

>>> st['b'] # get a 'column' (equiv. st.b)
>>> st[['a', 'e']] # get some columns
>>> st[[0, 4]] # get some columns (order is that of iterating st (== list(st))
>>> st[1:5:2] # standard slice notation, indexing corresponding to previous
>>> st[bool_vector] # get 'True' columns

Equivalents

>>> assert len(st) == st.get_ncols()
>>> for _k in st: print(_k, st[_k])
>>> for _k, _v in st.items(): print(_k, _v)
>>> for _k, _v in zip(st.keys(), st.values()): print(_k, _v)
>>> for _k, _v in zip(st, st.values()): print(_k, _v)
>>> if key in st:
...     assert getattr(st, key) is st[key]

Context manager

>>> with Struct({'a': 1, 'b': 'fish'}) as st:
...     st.a)
>>> assert not hasattr(st, 'a')
property _A

Display all columns, all rows (up to 10,000), and long strings of a Dataset or Struct.

Without this property, columns are elided when the maximum display width is reached, rows are elided when there are more then 30 to display, and strings are truncated after 15 characters.

Returns:

A wrapper for display operations that don’t return a Dataset or Struct.

Return type:

DisplayString

See also

Struct._V

Display all rows of a Dataset or Struct.

Struct._H

Display all columns and long strings of a Dataset or Struct.

Struct._G

Display all columns of a Dataset or Struct, wrapping the table as needed.

Struct._T

Display a transposed view of a Dataset or Struct.

Examples

>>> ds = rt.Dataset({'col_'+str(i):rt.arange(31) for i in range(12)})
>>> ds[0] = 'long_string_long_string'

By default, columns are elided when the maximum display width is reached, rows are elided when there are more then 30 to display, and strings are truncated after 15 characters:

>>> ds
  #   col_0             col_1   col_2   col_3   col_4   col_5   ...   col_7   col_8   col_9   col_10   col_11
---   ---------------   -----   -----   -----   -----   -----   ---   -----   -----   -----   ------   ------
  0   long_string_lon       0       0       0       0       0   ...       0       0       0        0        0
  1   long_string_lon       1       1       1       1       1   ...       1       1       1        1        1
  2   long_string_lon       2       2       2       2       2   ...       2       2       2        2        2
  3   long_string_lon       3       3       3       3       3   ...       3       3       3        3        3
  4   long_string_lon       4       4       4       4       4   ...       4       4       4        4        4
  5   long_string_lon       5       5       5       5       5   ...       5       5       5        5        5
  6   long_string_lon       6       6       6       6       6   ...       6       6       6        6        6
  7   long_string_lon       7       7       7       7       7   ...       7       7       7        7        7
  8   long_string_lon       8       8       8       8       8   ...       8       8       8        8        8
  9   long_string_lon       9       9       9       9       9   ...       9       9       9        9        9
 10   long_string_lon      10      10      10      10      10   ...      10      10      10       10       10
 11   long_string_lon      11      11      11      11      11   ...      11      11      11       11       11
 12   long_string_lon      12      12      12      12      12   ...      12      12      12       12       12
 13   long_string_lon      13      13      13      13      13   ...      13      13      13       13       13
 14   long_string_lon      14      14      14      14      14   ...      14      14      14       14       14
...   ...                 ...     ...     ...     ...     ...   ...     ...     ...     ...      ...      ...
 16   long_string_lon      16      16      16      16      16   ...      16      16      16       16       16
 17   long_string_lon      17      17      17      17      17   ...      17      17      17       17       17
 18   long_string_lon      18      18      18      18      18   ...      18      18      18       18       18
 19   long_string_lon      19      19      19      19      19   ...      19      19      19       19       19
 20   long_string_lon      20      20      20      20      20   ...      20      20      20       20       20
 21   long_string_lon      21      21      21      21      21   ...      21      21      21       21       21
 22   long_string_lon      22      22      22      22      22   ...      22      22      22       22       22
 23   long_string_lon      23      23      23      23      23   ...      23      23      23       23       23
 24   long_string_lon      24      24      24      24      24   ...      24      24      24       24       24
 25   long_string_lon      25      25      25      25      25   ...      25      25      25       25       25
 26   long_string_lon      26      26      26      26      26   ...      26      26      26       26       26
 27   long_string_lon      27      27      27      27      27   ...      27      27      27       27       27
 28   long_string_lon      28      28      28      28      28   ...      28      28      28       28       28
 29   long_string_lon      29      29      29      29      29   ...      29      29      29       29       29
 30   long_string_lon      30      30      30      30      30   ...      30      30      30       30       30

Display all columns, rows, and long strings:

>>> ds._A
  #   col_0                     col_1   col_2   col_3   col_4   col_5   col_6   col_7   col_8   col_9   col_10   col_11
---   -----------------------   -----   -----   -----   -----   -----   -----   -----   -----   -----   ------   ------
  0   long_string_long_string       0       0       0       0       0       0       0       0       0        0        0
  1   long_string_long_string       1       1       1       1       1       1       1       1       1        1        1
  2   long_string_long_string       2       2       2       2       2       2       2       2       2        2        2
  3   long_string_long_string       3       3       3       3       3       3       3       3       3        3        3
  4   long_string_long_string       4       4       4       4       4       4       4       4       4        4        4
  5   long_string_long_string       5       5       5       5       5       5       5       5       5        5        5
  6   long_string_long_string       6       6       6       6       6       6       6       6       6        6        6
  7   long_string_long_string       7       7       7       7       7       7       7       7       7        7        7
  8   long_string_long_string       8       8       8       8       8       8       8       8       8        8        8
  9   long_string_long_string       9       9       9       9       9       9       9       9       9        9        9
 10   long_string_long_string      10      10      10      10      10      10      10      10      10       10       10
 11   long_string_long_string      11      11      11      11      11      11      11      11      11       11       11
 12   long_string_long_string      12      12      12      12      12      12      12      12      12       12       12
 13   long_string_long_string      13      13      13      13      13      13      13      13      13       13       13
 14   long_string_long_string      14      14      14      14      14      14      14      14      14       14       14
 15   long_string_long_string      15      15      15      15      15      15      15      15      15       15       15
 16   long_string_long_string      16      16      16      16      16      16      16      16      16       16       16
 17   long_string_long_string      17      17      17      17      17      17      17      17      17       17       17
 18   long_string_long_string      18      18      18      18      18      18      18      18      18       18       18
 19   long_string_long_string      19      19      19      19      19      19      19      19      19       19       19
 20   long_string_long_string      20      20      20      20      20      20      20      20      20       20       20
 21   long_string_long_string      21      21      21      21      21      21      21      21      21       21       21
 22   long_string_long_string      22      22      22      22      22      22      22      22      22       22       22
 23   long_string_long_string      23      23      23      23      23      23      23      23      23       23       23
 24   long_string_long_string      24      24      24      24      24      24      24      24      24       24       24
 25   long_string_long_string      25      25      25      25      25      25      25      25      25       25       25
 26   long_string_long_string      26      26      26      26      26      26      26      26      26       26       26
 27   long_string_long_string      27      27      27      27      27      27      27      27      27       27       27
 28   long_string_long_string      28      28      28      28      28      28      28      28      28       28       28
 29   long_string_long_string      29      29      29      29      29      29      29      29      29       29       29
 30   long_string_long_string      30      30      30      30      30      30      30      30      30       30       30
property _G

Display all columns of a Dataset or Struct, wrapping the table after the maximum display width is reached.

Note: The table is displayed as text, not HTML.

Return type:

None

See also

Struct._V

Display all rows of a Dataset or Struct.

Struct._H

Display all columns and long strings of a Dataset or Struct.

Struct._A

Display all columns, rows, and long strings of a Dataset or Struct.

Struct._T

Display a transposed view of a Dataset or Struct.

Examples

>>> ds = rt.Dataset(
...     {key: rt.FA([i, 2 * i, 3 * i, 4 * i]) % 3 == 0 for i, key in enumerate('abcdefghijklmno')}
... )

Default behavior:

>>> ds
#      a       b       c      d       e       f      g   ...      j       k       l      m       n       o
-   ----   -----   -----   ----   -----   -----   ----   ---   ----   -----   -----   ----   -----   -----
0   True   False   False   True   False   False   True   ...   True   False   False   True   False   False
1   True   False   False   True   False   False   True   ...   True   False   False   True   False   False
2   True    True    True   True    True    True   True   ...   True    True    True   True    True    True
3   True   False   False   True   False   False   True   ...   True   False   False   True   False   False

Show all rows, wrapping the table as needed:

>>> ds._G
#      a       b       c      d       e       f      g       h       i      j       k       l      m
-   ----   -----   -----   ----   -----   -----   ----   -----   -----   ----   -----   -----   ----
0   True   False   False   True   False   False   True   False   False   True   False   False   True
1   True   False   False   True   False   False   True   False   False   True   False   False   True
2   True    True    True   True    True    True   True    True    True   True    True    True   True
3   True   False   False   True   False   False   True   False   False   True   False   False   True

#       n       o
-   -----   -----
0   False   False
1   False   False
2    True    True
3   False   False
property _H

Display all columns and long strings of a Dataset or Struct.

Without this property, columns are elided when the maximum display width is reached, and strings are truncated after 15 characters.

Returns:

A wrapper for display operations that don’t return a Dataset or Struct.

Return type:

DisplayString

See also

Struct._V

Display all rows of a Dataset or Struct.

Struct._A

Display all columns, rows, and long strings of a Dataset or Struct.

Struct._G

Display all columns of a Dataset or Struct, wrapping the table as needed.

Struct._T

Display a transposed view of a Dataset or Struct.

Examples

By default, columns are elided when the maximum display width is reached, and strings are truncated after 15 characters.

>>> ds = rt.Dataset({key : rt.FA([i, 2*i, 3*i, 4*i])%3 == 0 for i, key in enumerate('abcdefghijklm')})
>>> ds[0] = rt.FA('long_string_long_string')
>>> ds
#   a                     b       c      d       e       f   ...       h       i      j       k       l      m
-   ---------------   -----   -----   ----   -----   -----   ---   -----   -----   ----   -----   -----   ----
0   long_string_lon   False   False   True   False   False   ...   False   False   True   False   False   True
1   long_string_lon   False   False   True   False   False   ...   False   False   True   False   False   True
2   long_string_lon    True    True   True    True    True   ...    True    True   True    True    True   True
3   long_string_lon   False   False   True   False   False   ...   False   False   True   False   False   True

Display all columns and long strings:

>>> ds._H
#   a                             b       c      d       e       f      g       h       i      j       k       l      m
-   -----------------------   -----   -----   ----   -----   -----   ----   -----   -----   ----   -----   -----   ----
0   long_string_long_string   False   False   True   False   False   True   False   False   True   False   False   True
1   long_string_long_string   False   False   True   False   False   True   False   False   True   False   False   True
2   long_string_long_string    True    True   True    True    True   True    True    True   True    True    True   True
3   long_string_long_string   False   False   True   False   False   True   False   False   True   False   False   True
property _T

Display a transposed view of the Dataset or Struct.

All columns are shown as rows and vice-versa. Strings up to 32 characters are fully displayed.

Returns:

A wrapper for display operations that don’t return a Dataset or Struct.

Return type:

DisplayString

See also

Struct.dtranspose

Called by this method.

Struct._V

Show all rows of a Dataset or Struct.

Struct._H

Show all columns and long strings of a Dataset or Struct.

Struct._A

Show all columns, rows, and long strings of a Dataset or Struct.

Struct._G

Show all columns of a Dataset or Struct, wrapping the table as needed.

Examples

>>> ds = rt.Dataset({'a': [1, 2, 3], 'b' : ['longstring_longstring_longstring_longstring',
...                  'fish', 'david']})
>>> ds
#   a   b
-   -   ---------------
0   1   longstring_long
1   2   fish
2   3   david
>>> ds._T
Fields:                                0     1      2
      a                                1     2      3
      b  longstring_longstring_lonstring  fish  david
property _V

Display all rows (up to 10,000) of a Dataset or Struct.

Without this property, rows are elided when there are more than 30 to display.

Returns:

A wrapper for display operations that don’t return a Dataset or Struct.

Return type:

DisplayString

See also

Struct._H

Display all columns and long strings of a Dataset or Struct.

Struct._A

Display all columns, rows, and long strings of a Dataset or Struct.

Struct._G

Display all columns of a Dataset or Struct, wrapping the table as needed.

Struct._T

Display a transposed view of a Dataset or Struct.

Examples

By default, rows are elided when there are more than 30 to display.

>>> ds = rt.Dataset({'a' : rt.arange(31)})
>>> ds
  #     a
---   ---
  0     0
  1     1
  2     2
  3     3
  4     4
  5     5
  6     6
  7     7
  8     8
  9     9
 10    10
 11    11
 12    12
 13    13
 14    14
...   ...
 16    16
 17    17
 18    18
 19    19
 20    20
 21    21
 22    22
 23    23
 24    24
 25    25
 26    26
 27    27
 28    28
 29    29
 30    30

Display all rows:

>>> ds._V
  #     a
---   ---
  0     0
  1     1
  2     2
  3     3
  4     4
  5     5
  6     6
  7     7
  8     8
  9     9
 10    10
 11    11
 12    12
 13    13
 14    14
 15    15
 16    16
 17    17
 18    18
 19    19
 20    20
 21    21
 22    22
 23    23
 24    24
 25    25
 26    26
 27    27
 28    28
 29    29
 30    30
property _row_numbers

Subclasses can define their own callback function to customize the left side of the table. If not defined, normal row numbers will be displayed

Parameters:
  • arr (array) – Fancy index array of row numbers

  • style (ColumnStyle) – Default style object for final row numbers column.

Returns:

  • header (string)

  • label_array (ndarray)

  • style (ColumnStyle)

property _sort_columns

Subclasses can define their own callback function to return columns they were sorted by, and styles. Callback function will receive trimmed fancy index (based on sort index) and return a dictionary of column headers -> (masked_array, ColumnStyle objects) These columns will be moved to the left side of the table (but to the right of row labels, groupbykeys, row numbers, etc.)

property _styles

Subclasses can return a callback function which takes no arguments Returns dictionary of column names -> ColumnStyle objects

property doc

rt_meta.Doc The descriptive documentation object for the structure.

property footers

Returns the footer attributes.

For example, Accum2 and AccumTable objects can have footers.

property has_nested_containers: bool

True if the Struct contains other Struct-like objects.

Type:

bool

property shape

Return the number of rows and columns.

Returns:

Number of rows, columns.

Return type:

tuple of int

See also

riptable.reshape

Return an array containing the same data with a new shape.

FastArray.reshape

Return an array containing the same data with a new shape.

Examples

>>> ds = rt.Dataset({"one": rt.arange(3), "two": rt.arange(3) % 2})
>>> ds
#   one   two
-   ---   ---
0     0     0
1     1     1
2     2     0
>>> ds.shape
(3, 2)
property total_sizes: Tuple[int, int]

The total physical and logical size of all (columnar) data in bytes within this Struct.

Returns:

  • total_physical_size (int) – The total size, in bytes, of all columnar data in this instance, not counting any duplicate/alias object instances.

  • total_logical_size (int) – The total size, in bytes, of all columnar data in this instance, including duplicate/alias object instances. This value is always at least as large as total_physical_size.

AllNames = False

True if any name for a column name is permitted without renaming.

Type:

bool

AllowAnyName = True

True if any name for a column name is permitted, but will be renamed.

Type:

bool

UseFastArray = True

True if np.ndarray is flipped to FastArray on init.

Type:

bool

WarnOnInvalidNames = False

True if a warning is generated on invalid names.

Type:

bool

_lastrepr = 0
_lastreprhtml = 0
_restricted_names
_summary_len = 3
col_delete
__bool__()[source]
__contains__(item)[source]
__delattr__(name)[source]

Implement delattr(self, name).

__delitem__(name)[source]
__dir__()[source]

Default dir() implementation.

__enter__()[source]
__eq__(lhs)[source]

Return self==value.

__exit__(exc_type, exc_val, exc_tb)[source]
__ge__(lhs)[source]

Return self>=value.

__getattr__(name)[source]
__getitem__(index)[source]
Parameters:

index (colspec) –

Returns:

The indexed item(s), that is, ‘column(s)’. If index resolves to multiple ‘cols’ then another ‘Struct’ will be returned with those items as a shallow copy.

Return type:

result

Raises:
__gt__(lhs)[source]

Return self>value.

__iter__()[source]
__le__(lhs)[source]

Return self<=value.

__len__()[source]
__lt__(lhs)[source]

Return self<value.

__ne__(lhs)[source]

Return self!=value.

__repr__()[source]

Return repr(self).

__reversed__()[source]
__setattr__(name, value)[source]

Implement setattr(self, name, value).

__setitem__(index, value)[source]
Parameters:
  • index – colspec

  • value – May be any type

Returns:

None

Raises:
__str__()[source]

Return str(self).

_addnewitem(name, value)[source]
_addnewitem_allnames(name, value)[source]
_aggregate_column_matches(items=None, like=None, regex=None, on_missing='raise', func=None)[source]

Aggregate the matches returned by the methods defined for items, like, and regex, and return them in order.

At least one of items, like, or regex must be specified.

Parameters:
  • items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.

  • like (str, optional) – Substring to match in column names.

  • regex (str, optional) – Regular expression string to match in column names.

  • on_missing ({"raise", "warn", "ignore"}, default "raise") –

    Governs how to handle a column in items that doesn’t exist:

    • ”raise” (default): Raises an IndexError. No columns are returned.

    • ”warn”: Issues a warning. All columns in items that do exist are included in match list.

    • ”ignore”: No error or warning. All columns in items that do exist are included in match list.

  • func (callable, optional) – Method calling _aggregate_column_matches. Used to enrich exception / log messages.

Returns:

A list of strings corresponding to column name matches.

Return type:

list of str

classmethod _align_array_info(allinfo, maxwidths)[source]
classmethod _array_info_list(arrinfo)[source]

Build list of info for single array. Used for all arrays in a container or a single array stored in single SDS file.

returns [‘FA’, ‘shape’, ‘dtype name’, ‘i+itemsize’]

classmethod _array_summary(data, name=None)[source]
Parameters:
  • data – Tuple of array info from CompressionType.Info tup1: (tuple) shape tup2: (int) dtype.num tup3: (int) bitmask for numpy flags tup4: (int) itemsize

  • name – Optional name for top-level Struct.

Intenal routine for tree from meta summary (info only, no arrays)

Returns:

String of array info for a single struct.

classmethod _array_summary_single(arrinfo)[source]
_as_dictionary(copy=False, rows=None, cols=None)[source]

Return a dictionary of numpy arrays.

_as_meta_data(name=None, nested=True)[source]
_autocomplete()[source]
_build_sds_meta_data(name=None, nesting=True, **kwargs)[source]

Final SDS file will be laid out as follows:

  • header

  • meta data string (json, includes scalars)

  • arrays

  • special arrays

  • meta tuples [tuple(item name, SDSFlags) for all items]

Nested data structures will generate their own SDS files.

_check_addtype(name, value)[source]

override to check types

_copy(deep=False, cls=None)[source]
Parameters:
  • deep (bool, default True) – if True, perform a deep copy calling each object depth first with .copy(True) if False, a shallow .copy(False) is called, often just copying the containers dict.

  • cls (type, optional) – Class of return type, for subclass super() calls

  • False. (First argument must be deep. Deep cannnot be set to None. It must be True or) –

_copy_base(from_Struct)[source]

This copies the underlying special variables but does not copy _all_items or _uniqueid or any of the ‘columns’.

Parameters:

from_Struct – the Struct being copied

Returns:

is_locked() (must unlock/relock around rest of copy)

_copy_from_dict(source_dict, copy=False, rows=None, cols=None)[source]
_deleteitem(name)[source]
_ensure_atomic(colnames, func)[source]

Only proceed with certain operations if all columns exist in table. Pass in the function for a more informative error.

_escape_invalid_file_chars(name)[source]

Certain characters will cause problems in item names if a Struct needs to name an SDS file. (’', ‘:’, ‘<’, ‘>’, ‘!’, ‘|’, ‘*’, ‘?’)

_extract_indexing(index)[source]

Internal method common to get/set item.

Parameters:

index – (rowspec, colspec) or colspec (=> rowspec of :)

Returns:

  • col_idx

  • row_idx

  • ncols

  • nrows

  • row_arg – NB Any column names will be converted to str (from bytes or numpy.str_).

classmethod _flatten_undo(sep, startpos, startname, obj_array, meta=None, cutoffs=None)[source]

internal routine

classmethod _from_meta_data(itemdict, itemflags, meta)[source]
classmethod _from_sds_onefile(arrs, meta_tups, meta=None, folders=None)[source]

Special routine called after loading an SDS onefile to re-expand

_get_count_for_slice(idx, for_rows)[source]
_get_final_display_mode(plain=False)[source]
static _get_seq(map, protected, start)[source]
_index_from_row_labels(fld)[source]

Use this if row index was a string or tuple. Will only be applied to the Dataset’s label columns (if it has any).

classmethod _info_tree(path, data)[source]

Converts nested structure to tree view of file info for Struct and Dataset. Top level will be named based on single file or directory.

_init_from_dict(dictionary)[source]
_ipython_key_completions_()[source]
_last_row_stats()[source]
classmethod _load_from_sds_meta_data(meta, arrays, meta_tups=[], file_header={}, include=None)[source]

Iterates over sections of the meta data object to rebuild a data structure.

Arrays will be in the following order: - Main arrays (or underlying FastArrays for subclasses) - Secondary arrays for FastArray subclasses that require additional contiguous data (e.g. Categorical) - Array of fancy indices to sort by

A dictionary will be constructed. All arrays will be inserted by name from ‘item_names’ in meta object. All (if any) meta data will be read from ‘item_meta’ in meta object, and FastArray subclasses will be constructed. The container object will be constructed from the dictionary. Any labels (gbkeys) will be set. If sorted column names exist, they will be set, and the sorted index will be added to the SortCache.

Parameters:
  • meta (riptable MetaData object (see Utils/rt_metadata.py)) –

  • arrays (list of numpy arrays from an expanded SDS file) –

Returns:

For now, Struct, Dataset, and Multiset all use this parent method.

Return type:

Struct, Dataset, or Multiset

classmethod _load_from_sds_meta_data_nested(name, meta, arrdict)[source]
classmethod _load_without_meta_data(meta, arrays, meta_tups, file_header=None)[source]

Loads from meta tuples only (e.g. when no metadata is generated by Matlab)

_lock()[source]
_mask_get_item(idx, by_col_arg=True)[source]

_mask_get_item applies a mask to a row or a column

Parameters:
  • idx – the argument from the get/set-item [] brackets

  • by_col_arg – is this a column mask (instead of row mask)

Returns:

list of actual indexes or None

_meta_dict(name=None)[source]
_post_init()[source]

Call self._run_once() to cleanup or init anything else, override _run_once() in subclasses if needed. :return: None

_pre_init()[source]
_prepare_display_data()[source]

Returns a list of lists (all column data) and a list of header tuples for display.

Returns:

list(list), list(tuple)

_replaceitem(name, value)[source]
_replaceitem_allnames(name, value)[source]
_repr_html_()[source]
_run_once()[source]

Other classes may override _run_once to initialize data, see _post_init() :return: None

_safe_reordering_of_renames(orig_dict)[source]
classmethod _scalar_summary(scalar_tup)[source]

Scalars are stored as arrays in SDS, but a flag is set in the meta tuple. They will be labeled as scalar and their dtype will be displayed.

classmethod _serialize_item(item, itemname)[source]

return a dict of {name: array} a matching list of ints which are the arrayflags a metastring if it exists

static _sizeof_fmt(num, suffix='B')[source]
_sort_column_styles(style)[source]

Callback to return sort-by columns.

style : default sort style from display

Returns dictionary of column name -> tuple( array, ColumnStyle ) These columns will be moved to the left of the table.

_struct_compare_check(func_name, lhs)[source]

Returns a Struct consisting of union of key names with value self.X == self.Y. If a key is missing from one or the other it will have value False. If any comparison fails (exception) the value will be False. If any comparisons value Y cannot be cast to bool, Y.all() and all(Y) will be attempted.

Parameters:
  • func_name – comparison function name (e.g., ‘__eq__’)

  • lhs

Returns:

Struct of bools

_superadditem(name, value)[source]
_temp_display(option, value)[source]

Temporarily modify a display option when generating dataset display. User configured option (or default) will be restored after display string is generated.

classmethod _tree_from_sds_meta_data(meta, arrays, meta_tups, file_header)[source]

SDS loads in info mode (no data loaded, just metadata + file header information)

Returns:

Tree display of nested structures in SDS directory.

Return type:

str

_unlock()[source]
_update_sort(name)[source]

Discard sort index if sortby item was removed or replaced.

_validate_names(names)[source]
all()[source]

For use in boolean contexts: Is it true that for all elements (val) either:

  1. val casts to True, or

  2. returns True for val.all() or all(val)

Return type:

bool

any()[source]

For use in boolean contexts: Does there exist an element (val) which either:

  1. val casts to True, or

  2. returns True for val.any() or any(val)

Return type:

bool

Examples

>>> s=rt.Struct()
>>> s.a=rt.Dataset()
>>> s.any()
False
apply_schema(schema)[source]

Apply a schema containing descriptive information recursively to the Struct.

Parameters:

schema (dict) – A dictionary of schema information. See rt_meta.apply_schema() for more information on the format of the dictionary.

Returns:

res – Dictionary of deviations from the schema

Return type:

dict

as_ordered_dictionary(sublist=None)[source]

Returns contents of Struct as a collections.OrderedDict instance.

Parameters:

sublist (list of str) – Optional list restricting columns to return.

Return type:

OrderedDict

asdict(sublist=None, copy=False)[source]

Return contents of Struct as a dictionary.

Parameters:
  • sublist (list of str) – Optional list restricting columns to return.

  • copy (bool) – If set to True then copy() will be called on columns where appropriate.

Return type:

dict

Examples

This is useful if, for whatever reason, a riptable Dataset needs to go into a pandas DataFrame:

>>> ds = rt.Dataset({'col_'+str(i): rt.arange(5) for i in range(5)})
>>> df = pd.DataFrame(ds.asdict())
>>> df
   col_0  col_1  col_2  col_3  col_4
0      0      0      0      0      0
1      1      1      1      1      1
2      2      2      2      2      2
3      3      3      3      3      3
4      4      4      4      4      4

Certain items can be requested with the sublist keyword:

>>> ds.asdict(sublist=['col_1','col_3'])
{'col_1': FastArray([0, 1, 2, 3, 4]), 'col_3': FastArray([0, 1, 2, 3, 4])}
col_add_prefix(prefix)[source]

Add the same prefix to all items in the Struct/Dataset.

Rather than renaming the columns in a col_rename loop - which would have to rebuild the underlying dictionary N times, this clears the original dictionary, and rebuilds a new one once. Label columns and sortby columns will also be fixed to match the new names.

Parameters:

prefix (str) – String to add before every each item name

Return type:

None

Examples

>>> #TODO Need to call np.random.seed(12345) first to ensure example runs deterministically
>>> ds = rt.Dataset({'col_'+str(i):np.random.rand(5) for i in range(5)})
>>> ds.col_add_prefix('NEW_')
#   NEW_col_0   NEW_col_1   NEW_col_2   NEW_col_3   NEW_col_4
-   ---------   ---------   ---------   ---------   ---------
0        0.70        0.52        0.07        0.81        0.26
1        0.13        0.43        0.01        0.46        0.45
2        0.34        0.24        0.87        0.81        0.80
3        0.63        0.22        0.85        0.60        0.91
4        0.46        0.70        0.02        0.49        0.34
col_add_suffix(suffix)[source]

Add the same suffix to all items in the Struct/Dataset.

Rather than renaming the columns in a col_rename loop - which would have to rebuild the underlying dictionary N times, this clears the original dictionary, and rebuilds a new one once. Label columns and sortby columns will also be fixed to match the new names.

Parameters:

suffix (str) – String to add before every each item name

Return type:

None

Examples

>>> #TODO Need to call np.random.seed(12345) first to ensure example runs deterministically
>>> ds = rt.Dataset({'col_'+str(i):np.random.rand(5) for i in range(5)})
>>> ds.col_add_suffix('_NEW')
#   col_0_NEW   col_1_NEW   col_2_NEW   col_3_NEW   col_4_NEW
-   ---------   ---------   ---------   ---------   ---------
0        0.70        0.52        0.07        0.81        0.26
1        0.13        0.43        0.01        0.46        0.45
2        0.34        0.24        0.87        0.81        0.80
3        0.63        0.22        0.85        0.60        0.91
4        0.46        0.70        0.02        0.49        0.34
col_exists(name)[source]

Return True if the column name already exists

col_filter(items=None, like=None, regex=None, on_missing='raise')[source]

Return the columns specified by indices or matches on column names.

Note that this method doesn’t filter a Dataset or Struct on its contents, only on the column index or name.

At least one of items, like, or regex must be specified.

Parameters:
  • items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.

  • like (str, optional) – Substring to match in column names.

  • regex (str, optional) – Regular expression string to match in column names.

  • on_missing ({"raise", "warn", "ignore"}, default "raise") –

    Governs how to handle a column in items that doesn’t exist:

    • ”raise” (default): Raises an IndexError. Nothing is returned.

    • ”warn”: Issues a warning. Any columns in items that do exist are returned.

    • ”ignore”: No error or warning. Any columns in items that do exist are returned.

Returns:

Same type as the input object.

Return type:

Dataset or Struct

Examples

Select columns by name:

>>> ds = rt.Dataset({"one": rt.arange(3), "two": rt.arange(3) % 2, "three": rt.arange(3) % 3})
>>> ds
#   one   two   three
-   ---   ---   -----
0     0     0       0
1     1     1       1
2     2     0       2
>>> ds.col_filter(items=["one", "three"])
#   one   three
-   ---   -----
0     0       0
1     1       1
2     2       2

Select columns by index:

>>> ds.col_filter(items=[0, 1])
#   one   two
-   ---   ---
0     0     0
1     1     1
2     2     0

Select columns by substring:

>>> ds.col_filter(like="thr")
#   three
-   -----
0       0
1       1
2       2

Select columns by regular expression:

>>> ds.col_filter(regex="e$")
#   one   three
-   ---   -----
0     0       0
1     1       1
2     2       2

Select Dataset and FastArray objects from a Struct:

>>> ds2 = rt.Dataset({"four": rt.arange(3), "five": rt.arange(3) % 2})
>>> fa = rt.FastArray([1, 2, 3])
>>> s = rt.Struct()
>>> s.ds = ds
>>> s.ds2 = ds2
>>> s.fa = fa
>>> s.col_filter([0])
#   Name   Type      Size              0   1   2
-   ----   -------   ---------------   -   -   -
0   ds     Dataset   3 rows x 3 cols
>>> s.col_filter("fa")
#   Name   Type    Size   0   1   2
-   ----   -----   ----   -   -   -
0   fa     int32   3      1   2   3
col_get_attribute(name, attrib_name, default=None)[source]

Gets the attribute of the specified column, the attrib_name must be used to indicate which attribute.

Parameters:
  • name (str) – The name of the column

  • attrib_name (str) – The name of the attribute

  • default – Default value returned when attribute not found.

Examples

>>> ds.col_set_attribute('col1', 'TEST', 417)
>>> ds.col_get_attribute('col1', 'TEST')
417
>>> ds.col_get_attribute('col1', 'TEST', nan)
417
>>> ds.col_get_attribute('col1', 'DOESNOTEXIST', nan)
nan
col_get_len()[source]

Gets the number of columns (or items) in the Struct

col_get_value(name)[source]

Return a single item.

Parameters:

name (string) – Item name.

Returns:

Item from item container (no attribute).

Return type:

obj

Raises:

KeyError – Item not found with given name.

col_map(rename_dict)[source]

Rename columns and re-arrange names of columns based on the rules set forth in the supplied dictionary.

Parameters:

rename_dict (dict) – Dictionary defining a remapping of (some/all) column names.

Return type:

None

Examples

>>> #TODO Call np.random.seed(12345) here to make the example output deterministic
>>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)})
>>> ds.col_map({'col_1':'AAA', 'col_2':'BBB'})
>>> ds
#   col_0    AAA    BBB   col_3   col_4
-   -----   ----   ----   -----   -----
0    0.55   0.21   0.27    0.85    0.03
1    0.77   0.75   0.65    0.97    0.24
2    0.09   0.07   0.40    0.81    0.62
3    0.50   0.93   0.98    0.99    0.99
4    0.40   0.45   0.53    0.49    0.76
col_move(flist, blist)[source]

Move single column or group of columns to back of list for iteration/indexing/display. Values of columns will remain unchanged.

Parameters:
  • flist (list of str) – Item names to move to front.

  • blist (list of str) – Item names to move to back.

col_move_to_back(cols)[source]

Move single column or group of columns to front of list for iteration/indexing/display.

Values of columns will remain unchanged.

Parameters:

flist (list of str) – Item names to move to back.

Examples

>>> #TODO Call np.random.seed(12345) here to make the example output deterministic
>>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)})
>>> ds
#   col_0   col_1   col_2   col_3   col_4
-   -----   -----   -----   -----   -----
0    0.28    0.84    0.24    0.72    0.81
1    0.72    0.44    0.41    0.53    0.17
2    0.37    0.66    0.61    0.52    0.50
3    0.08    0.31    0.15    0.65    0.98
4    0.63    0.89    0.25    0.13    0.16
>>> ds.col_move_to_back(['col_2','col_0'])
#   col_1   col_3   col_4   col_2   col_0
-   -----   -----   -----   -----   -----
0    0.84    0.72    0.81    0.24    0.28
1    0.44    0.53    0.17    0.41    0.72
2    0.66    0.52    0.50    0.61    0.37
3    0.31    0.65    0.98    0.15    0.08
4    0.89    0.13    0.16    0.25    0.63
col_move_to_front(cols)[source]

Move single column or group of columns to front of list for iteration/indexing/display. Values of columns will remain unchanged.

Parameters:

flist (list of str) – Item names to move to front.

Examples

>>> #TODO Call np.random.seed(12345) here to make the example output deterministic
>>> ds = rt.Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)})
>>> ds
#   col_0   col_1   col_2   col_3   col_4
-   -----   -----   -----   -----   -----
0    0.60    0.50    0.77    0.72    0.73
1    0.48    0.65    0.96    0.17    0.99
2    0.06    0.54    0.81    0.20    0.30
3    0.18    0.85    0.24    0.44    0.38
4    0.04    0.84    0.64    0.66    0.97
>>> ds.col_move_to_front(['col_4', 'col_2'])
>>> ds
#   col_4   col_2   col_0   col_1   col_3
-   -----   -----   -----   -----   -----
0    0.73    0.77    0.60    0.50    0.72
1    0.99    0.96    0.48    0.65    0.17
2    0.30    0.81    0.06    0.54    0.20
3    0.38    0.24    0.18    0.85    0.44
4    0.97    0.64    0.04    0.84    0.66
col_pop(colspec)[source]

colspec is as for [] (getitem). List input will return a sub-Struct, removing it from current object. Single-column (“string”, single integer) input will return a single “column”.

Parameters:

colspec (list, string, or integer) –

Returns:

Single value or new (same-type) object containing the removed data.

Return type:

obj

Examples

>>> ds = rt.Dataset({'col_'+str(i): rt.arange(5) for i in range(3)})
>>> ds
#   col_0   col_1   col_2
-   -----   -----   -----
0       0       0       0
1       1       1       1
2       2       2       2
3       3       3       3
4       4       4       4
>>> col = ds.col_pop('col_1')
>>> ds
#   col_0   col_2
-   -----   -----
0       0       0
1       1       1
2       2       2
3       3       3
4       4       4
>>> col
FastArray([0, 1, 2, 3, 4])
col_remove(items=None, like=None, regex=None, on_missing='warn')[source]

Remove the columns specified by indices or matches on column names.

This can be done only if the Dataset or Struct is unlocked.

At least one of items, like, or regex must be specified.

Parameters:
  • items (str, int, or iterable of str or int, optional) – Names or indices of columns to be removed. An iterable can contain both string and int values.

  • like (str, optional) – Substring to match in column names.

  • regex (str, optional) – Regular expression string to match in column names.

  • on_missing ({"warn", "raise", "ignore"}, default "warn") –

    Governs how to handle a column in items that doesn’t exist:

    • ”warn” (default): Issues a warning. All columns in items that do exist are removed.

      Changed in version 1.13.0: Previously, the default value was "raise".

    • ”raise” : Raises an IndexError. No columns are removed.

    • ”ignore”: No error or warning. All columns in items that do exist are removed.

Return type:

None

Examples

>>> ds = rt.Dataset({"col_" + str(i): rt.arange(5) for i in range(5)})
>>> ds
#   col_0   col_1   col_2   col_3   col_4
-   -----   -----   -----   -----   -----
0       0       0       0       0       0
1       1       1       1       1       1
2       2       2       2       2       2
3       3       3       3       3       3
4       4       4       4       4       4
>>> ds.col_remove(["col_2", "col_0"])
>>> ds
#   col_1   col_3   col_4
-   -----   -----   -----
0       0       0       0
1       1       1       1
2       2       2       2
3       3       3       3
4       4       4       4

Try to remove a column that doesn’t exist with the default on_missing="warn". A warning is raised, and any columns that do exist are removed:

>>> ds.col_remove(["col_1", "col_2"])
UserWarning: Column col_2 doesn't exist and couldn't be removed.
>>> ds
#   col_3   col_4
-   -----   -----
0       0       0
1       1       1
2       2       2
3       3       3
4       4       4

Remove a column by its index:

>>> ds.col_remove([0])
>>> ds
#   col_4
-   -----
0       0
1       1
2       2
3       3
4       4
>>> ds2 = rt.Dataset({"aabb": rt.arange(3), "abab": rt.arange(3), "ccdd": rt.arange(3),
...                   "cdcd": rt.arange(3)})
>>> ds2
#   aabb   abab   ccdd   cdcd
-   ----   ----   ----   ----
0      0      0      0      0
1      1      1      1      1
2      2      2      2      2

Remove columns by substring:

>>> ds2.col_remove(like="cd")
>>> ds2
#   aabb   abab
-   ----   ----
0      0      0
1      1      1
2      2      2

Remove columns by regular expression:

>>> ds2.col_remove(regex="^ab")
>>> ds2
#   aabb
-   ----
0      0
1      1
2      2
col_rename(old, new)[source]

Rename a single column.

The new name must be a valid column name; that is, it must not be a Python keyword or a Struct or Dataset class method name.

To check whether a name is valid, use is_valid_colname. To see a list of invalid column names, use get_restricted_names.

Note that column names that don’t meet Python’s rules for well-formed variable names can’t be accessed using attribute access. For example, a column named ‘my-column’ can’t be accessed with ds.my-column, but can be accessed with ds['my-column'].

Parameters:
  • old (str) – Current column name.

  • new (str) – New column name.

Return type:

None

See also

is_valid_colname

Check whether a string is a valid column name.

get_restricted_names

Get a list of invalid column names.

Examples

>>> ds = rt.Dataset({'a': [1, 2, 3]), 'b': [4.0, 5.0, 6.0]})
>>> ds
#   a      b
-   -   ----
0   1   4.00
1   2   5.00
2   3   6.00
>>> ds.col_rename('a', 'new_a')
>>> ds
#   new_a      b
-   -----   ----
0       1   4.00
1       2   5.00
2       3   6.00
col_set_attribute(name, attrib_name, attrib_value)[source]

Sets the attribute of the specified column, the attrib_name must be used to indicate which attribute.

Parameters:
  • name (str) – The name of the column

  • attrib_name (str) – The name of the attribute

  • attrib_value – The value of the attribute

Examples

>>> ds.col_set_attribute('col1', 'TEST', 417)
>>> ds.col_get_attribute('col1', 'TEST')
417
col_set_value(name, value)[source]

Check if item name is allowed, possibly escape. Set the value portion of the item to value.

Parameters:
  • name (str) – Item name.

  • value (object) – For structs, nearly anything. For datasets, array.

col_str_match(expression, flags=0)[source]

Create a boolean mask vector for columns whose names match the regex.

Uses re.match(), not re.search().

Parameters:
  • expression (str) – regular expression

  • flags – regex flags (from re module).

Returns:

Array of bools (len ncols) which is true for columns which match the regex.

Return type:

FastArray

Examples

>>> st = rt.Struct({
... 'price' : arange(5),
... 'trade_time' : rt.arange(5) * 1000,        # expected to regex match `.*time.*`
... 'name' : rt.FA(['a','b','c','d','e']),
... 'other_trade_time' : rt.arange(5) * 1000,  # expected to regex match `.*time.*`
... })
>>> st.col_str_match(r'.*time.*')
FastArray([False,  True, False, True])
col_str_replace(old, new, max=-1)[source]

If a column name contains the old string, the old string will be replaced with the new one. If replacing the string will conflict with an existing column name, an error will be raised. Labels / sortby columns will be fixed if their names are modified.

Parameters:
  • old (str) – String to look for within individual names of columns.

  • new (str) – String to replace old string in column names.

  • max (int) – Optionally limit the number of occurrences per column name to replace; defaults to -1 which will replace all.

Examples

Replace all occurrences in each names:

>>> ds = rt.Dataset({
... 'aaa': rt.arange(5),
... 'a' : rt.arange(5),
... 'aab': rt.arange(5)
... })
>>> ds.col_str_replace('a', 'A')
>>> ds
#   AAA   A   AAb
-   ---   -   ---
0     0   0     0
1     1   1     1
2     2   2     2
3     3   3     3
4     4   4     4

Limit number of replacements per name:

>>> ds = rt.Dataset({
... 'aaa': rt.arange(5),
... 'a' : rt.arange(5),
... 'aab': rt.arange(5)
... })
>>> ds.col_str_replace('a','A',max=1)
>>> ds
#   Aaa   A   Aab
-   ---   -   ---
0     0   0     0
1     1   1     1
2     2   2     2
3     3   3     3
4     4   4     4

Replacing will create a conflict:

>>> ds = rt.Dataset({'a': rt.arange(5), 'A': rt.arange(5)})
ValueError: Item A already existed, cannot make replacement in item.
col_swap(from_cols, to_cols)[source]

Swaps column values, names retain current order.

Parameters:
  • from_cols (list) – a list of unique extant column names

  • to_cols (list) – a list of unique extant column names

Examples

>>> st = Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'},
... 'e': np.ones(7, dtype=np.int64)})
>>> st
#   Name   Type    Rows   0      1     2
-   ----   -----   ----   ----   ---   -
0   a      int     0      1
1   b      str     0      fish
2   c      list    2      5.6    7.8
3   d      dict    2      A      B
4   e      int64   7      1      1     1
>>> st.col_swap(list('abc'), list('cba'))
>>> st
#   Name   Type    Rows   0      1     2
-   ----   -----   ----   ----   ---   -
0   a      list    2      5.6    7.8
1   b      str     0      fish
2   c      int     0      1
3   d      dict    2      A      B
4   e      int64   7      1      1     1
classmethod concat_structs(struct_list)[source]

Merges data from multiple structs.

Structs must have the same keys, and contain only Structs, Datasets, arrays, and riptable arrays.

A struct utility for merging data from multiple structs (useful for multiday loading). Structs must have the same keys, and contain only Structs, Datasets, Categoricals, and Numpy Arrays.

Parameters:

struct_list (list of Struct) –

Returns:

obj

Return type:

Struct

See also

hstack()

copy(deep=True)[source]

Returns a shallow or deep copy of the Struct. Defaults to a deep copy.

Parameters:

deep (bool, default True) – if True, perform a deep copy calling each object depth first with .copy(True) if False, a shallow .copy(False) is called, often just copying the containers dict.

Examples

>>> ds=rt.Dataset({'somenans': [0., 1., 2., nan, 4., 5.], 'morestuff': ['A','B','C','D','E','F']})
>>> ds2=rt.Dataset({'somenans': [0., 1., nan, 3., 4., 5.], 'morestuff':['H','I','J','K','L','M']})
>>> st=Struct({'test':ds, 'test2': Struct({'ds2':ds2}), 'arr': arange(10)})
>>> st.copy()
#   Name    Type      Size              0     1   2
-   -----   -------   ---------------   ---   -   -
0   test    Dataset   6 rows x 2 cols
1   test2   Struct    1                 ds2
2   arr     int32     10                0     1   2
display_attributes()[source]

Returns a dict of display attributes, currently consisting of NumberOfFooterRows and a list of MarginColumns.

Returns:

d – A dictionary of display attributes

Return type:

dict

dtranspose(plain=False)[source]

For display only. Return a transposed version of the container’s string representation.

Parameters:

plain (bool, False) – If true then should not be colored.

Returns:

Formatted, transposed version of this instance; intended for display.

Return type:

string

Examples

>>> st = rt.Struct({'a': 1, 'b': 'fish', 'c': [5.6, 7.8], 'd': {'A': 'david', 'B': 'matthew'},
... 'e': np.ones(7, dtype=np.int64)})
>>> st
#   Name    Type   Size      0     1   2
-   ----   -----   ----   ----   ---   -
0      a     int      0      1
1      b     str      0   fish
2      c    list      2    5.6   7.8
3      d    dict      2      A     B
4      e   int64      7      1     1   1
[5 columns]
>>> st.dtranspose()
Fields:     0      1      2      3       4
-------   ---   ----   ----   ----   -----
   Name     a      b      c      d       e
   Type   int    str   list   dict   int64
   Size     0      0      2      2       7
      0     1   fish    5.6      A       1
      1                 7.8      B       1
      2                                  1
[5 columns]
equals(other)[source]

Test whether two Structs contain the same elements in each column. NaNs in the same location are considered equal.

Parameters:

other (another Struct or dict to compare to) –

Return type:

bool

See also

Dataset.crc, ==, >=, <=, >, <

Examples

>>> s1 = rt.Struct({'t': 54, 'test': np.int64(34), 'test2': rt.arange(200)})
>>> s2 = rt.Struct({'t': 54, 'test': np.int64(34), 'test2': rt.arange(200)})
>>> s1.equals(s2)
True
flatten(sep='/', level=0)[source]

Flattens or collapses a Struct, recursively called

Parameters:

use (sep='/' the separating string to) – Please note that some chars are not allowed and will be replaced with _.

Return type:

New Struct with collapsed names (separated by specified char) which can then be saved

Note

_sep is stored in the __dict__ to help with undo or saving to file arrayflags, metastring are now exposed

See also

flatten_undo

flatten_undo(sep=None, startname='', obj_array=None)[source]

Restores a Struct to original form before Struct.flatten()

Parameters:
  • sep=None

  • '/' (user may pass in the separating string to use such as) –

Return type:

New Struct that is back to original form before Struct.flatten()

See also

flatten

get_attribute(attrib_name, default=None)[source]

Get an attribute that applies to all items/columns.

Parameters:
  • attrib_name – name of the attribute

  • default – return value if attrib_name is not a valid attribute

Returns:

val

Return type:

attribute value or None

get_ncols()[source]

Return the number of items in the Struct.

Returns:

ncols – The number of items in the Struct

Return type:

int

get_nrows()[source]

Retunrs 0, as a Struct has no rows.

Returns:

0

Return type:

int

Note

Subclasses need to define this explicitly.

get_restricted_names()[source]

Return a list of invalid column names.

Invalid column names are Python keywords and Struct or Dataset class method names.

This method generates the result only once. Afterward, it is stored as a class variable.

Returns:

A set of strings that are invalid column names.

Return type:

set

See also

is_valid_colname

Check whether a string is a valid column name.

Examples

>>> # Limit and format the output.
>>> print("Some of the restricted names include: ")
>>> print(", ".join(list(ds.get_restricted_names())[::10]))
Some of the restricted names include: mask_or_isinf, __reduce_ex__,
imatrix_xy, __weakref__, dtypes, _get_columns, from_arrow, elif,
__imul__, _deleteitem, __rsub__, _index_from_row_labels, as_matrix,
putmask, _as_meta_data, shape, cat, __invert__, try, _init_columns_as_dict,
label_as_dict, col_str_replace, _replaceitem, label_set_names, __contains__,
__floordiv__, _row_numbers, filter, __init__, sorts_on, flatten_undo,
col_str_match, __dict__, size, __rand__, info, col_remove, as, or
classmethod hstack(struct_list)[source]

Merges data from multiple structs. Structs must have the same keys, and contain only Structs, Datasets, arrays, and riptable arrays.

Parameters:
  • struct_list (list of Struct) –

  • loading). (A struct utility for merging data from multiple structs (useful for multiday) –

  • keys (Structs must have the same) –

  • Structs (and contain only) –

  • Datasets

  • Categoricals

  • Arrays. (and Numpy) –

Returns:

obj

Return type:

Struct

See also

riptable.hstack

info(**kwargs)[source]

Return an object containing a description of the structure’s contents.

Parameters:

kwargs (dict) – Optional keyword arguments passed to rt_meta.info()

Returns:

info – A description of the structure’s contents.

Return type:

rt_meta.Info

is_locked()[source]

Returns True if object is locked (unable to add/remove/rename elements).

NB: Currently behaves as does tuple: the contained data will still be mutable when possible.

Returns:

True if object is locked

Return type:

bool

is_valid_colname(name)[source]

Check whether a string is a valid column name.

Python keywords and Struct or Dataset class method names are not valid column names.

To see a list of invalid column names, use get_restricted_names.

Parameters:

name (str) – The string to be checked.

Returns:

True if name is valid, otherwise False.

Return type:

bool

See also

get_restricted_names

Get a list of invalid column names.

Examples

>>> ds.is_valid_colname('yield')  # Python keyword
False
>>> ds.is_valid_colname('sample')  # Dataset method
False
>>> ds.is_valid_colname('Yield')  # OK because keywords are case-sensitive
True
>>> ds.is_valid_colname('Sample')  # Method names are also case-sensitive
True
items()[source]

Dictionary-iterator access to Struct items.

Returns:

  • dict_items – Name, Item pairs.

  • return: iterator to column keys and values

keys()[source]
Returns:

Item names.

Return type:

list

label_as_dict()[source]

Gets the column names used as labels in display

label_filter(items=None, like=None, regex=None, axis=None)[source]

Subset rows of dataset according to value in its label column.

TODO: how should multikey be handled?

Parameters:
  • items (list-like) – List of specific values to match in label column.

  • like (string) – Keep items where ‘like’ occurs in label column.

  • regex (string (regular expression)) – Keep axis with re.search(regex, col) == True.

Examples

>>> ds
 #   col_7   col_8   col_9   keycol
--   -----   -----   -----   --------------
 0    0.53    0.52    0.47   paul
 1    0.10    0.78    0.09   ray
 2    0.50    0.79    0.50   paul
 3    0.81    0.68    0.72   ray
 4    0.08    0.71    0.02   john
 5    0.38    0.19    0.90   ray
 6    0.53    0.33    0.46   mary katherine
 7    0.75    0.48    0.94   john
 8    1.00    0.70    0.79   mary ann
 9    0.47    0.64    0.16   ray
10    0.80    0.43    0.08   mary ann
11    0.54    0.19    0.43   joe
12    0.89    0.08    0.81   mary katherine
13    0.96    0.91    0.33   paul
14    0.18    0.55    0.44   ray
15    0.42    0.49    0.66   mary ann
16    0.05    0.53    0.66   paul
17    0.60    0.56    0.03   joe
18    0.62    0.42    0.56   mary ann
19    0.63    0.33    0.95   paul
>>> gb = ds.gb('keycol').sum()
>>> gb.label_filter(items='john')
*keycol   col_7   col_8   col_9
-------   -----   -----   -----
john       0.82    1.19    0.96
>>> gb.label_filter(like=['ar', 'p'])
*keycol          col_7   col_8   col_9
--------------   -----   -----   -----
mary ann          2.85    2.05    2.08
mary katherine    1.43    0.41    1.27
paul              2.66    3.08    2.92
>>> gb.label_filter(regex='n$')
*keycol    col_7   col_8   col_9
--------   -----   -----   -----
john        0.82    1.19    0.96
mary ann    2.85    2.05    2.08
label_get_names()[source]

Gets the column names used as labels in display

label_remove()[source]

Reomves any labels used in display

label_set_names(listnames)[source]

Set which column names can be used as labels in display

classmethod load(path='', name=None, share=None, info=False, columns=None, include_all_sds=False, include=None, threads=None, folders=None)[source]

Load a Struct from a directory or single SDS file.

Parameters:
  • path (str or os.PathLike) – Full path to directory or single SDS file with Struct data.

  • name (str, optional, default None) – Name of a nested container to search for in the root directory. Multiple tiers can be separated by ‘//’

  • info (bool, optional, default False) – If True, no array data will be loaded, instead a display tree of information about nested structures and their contents will be returned.

  • columns (list, optional, default None) – Not implemented

  • include_all_sds (bool, optional, default False) – If False, when additional files were found in a directory, and they were not in the root structs meta data, the user will be prompted to load them. If True, all files will be automatically loaded.

  • include (list of str, optional, default None) – A list of specific items to load. This list will only be applied to the root Struct - not to nested containers.

  • threads (int, optional, default None) – Number of threads to use during the SDS load. Number of threads before the load will be restored after the load or if the load fails. See also riptide_cpp.SetThreadWakeUp.

Returns:

Loaded data with possibly nested containers and riptable classes restored.

Return type:

Struct

See also

riptable.load_sds

make_categoricals(columnlist=None, dtype=None)[source]

Converts specified string/bytes columns or all string/bytes columns to Categorical. Will also crawl through nested structs/datasets and convert their strings to categoricals.

Parameters:
  • columnlist (str or list, optional) – Single name, or list of names of items to convert to categoricals.

  • dtype (numpy.dtype, optional) – Integer dtype for the categoricals’ underlying arrays.

Raises:
  • TypeError – If the dtype was set to a non-dtype object.

  • ValueError – If a requested item could not be found in the container.

Notes

Error checking will complete in the root structure before any conversion begins.

make_matlab_categoricals(xtra, remove_trailing=True, dtype=None, prefix='p', keep_prefix=True)[source]

Turn matlab categorical indices and corresponding unique arrays into riptable categoricals.

Parameters:
  • xtra (Struct) – Container holding unique arrays.

  • remove_trailing (bool, optional, default True) – If True, remove trailing spaces from Matlab strings.

  • dtype (numpy.dtype, optional, default None) – Integer dtype for underlying array of constructed categoricals.

  • prefix (str, optional, default ‘p’) – Prefix for integer arrays in calling dataset - columns that will be looked for in the struct.

  • keep_prefix (bool, default True) – If True, Drop the prefix after flipping the column to categorical in the dataset. If the a column exists with that name, the user will be warned.

make_matlab_datetimes(dtcols=None, gmt=False, auto=True)[source]

Convert datetime columns from Matlab to DateTimeNano and TimeSpan arrays.

Parameters:
  • dtcols (str or list) – Name or list of names of columns to convert to DateTimeNano arrays.

  • gmt (bool, optional, default False) – Not implemented.

  • auto (bool, optional, default True) – If True, look for ‘MS’ in the names of all columns, and flip them to TimeSpan objects.

make_struct_from_categories(prefix=None, keep_prefix=False)[source]

Build a struct of unique arrays from all categoricals in the container, or those with a specified prefix.

Parameters:
  • prefix (str, optional) – Only include columns with names that begin with this string.

  • keep_prefix (bool, default False) – If True, keep the prefix when naming the item in the new structure.

Examples

TODO - sanitize - add example that makes a struct from categoricals and prints its representation See the version history for structure of older examples.

Returns:

cats

Return type:

Struct

Notes

This is a partial inverse operation of Struct.make_matlab_categoricals

make_table(display_type)[source]

Pretty-print code used by infrastructure.

Parameters:

display_type (rt.rt_enum.DS_DISPLAY_TYPES) –

Returns:

Display object or string.

Return type:

obj or str

save(path='', name=None, share=None, overwrite=True, compress=True, onefile=False, bandsize=None)[source]

Save a struct to a directory. If the struct contains only arrays, will be saved as a single .SDS file.

Parameters:
  • path (str or os.PathLike) – Full path to save. Directory will be created automatically if it doesn’t exist. .SDS extension will be appended if a single file is being saved and is necessary.

  • name (str, optional) – Name for the root structure if it’s being appended to an existing struct’s directory. The existing _root.sds does not get overwritten, and structs can be combined without a full load.

  • share (str, optional) –

  • overwrite (bool, optional, default True) – If True, user will not be prompted on whether or not to overwrite existing .SDS files. Otherwise, prompt will appear if directory exists.

  • compress (bool, optional, default True) – If True, ZStandard compression will be used when writing to SDS, otherwise, no compression will be used.

  • onefile (bool, optional, default False) – If True will collapse all nesting Structs

  • bandsize (int, optional, default None) –

set_attribute(attrib_name, attrib_value)[source]

Set an attribute that applies to all items/columns.

Parameters:
  • attrib_name – name of the attribute

  • attrib_value – value of the attribute

set_display_callback(userfunc, scope=None)[source]

Set the user display callback for styling text.

Parameters:
  • userfunc (function) – This function must take two arguments, userfunc(cols, **kwargs).

  • scope (default None) – The callback for just this Dataset, or all Datasets. Can be None, Dataset, or Struct.

Examples

>>> from riptable.rt_display import DisplayColumnColors
>>> def make_red(cols, **kwargs):
...     location = kwargs['location']  # could left, right, or main
...     if location == 'main':
...         for col in cols:
...             for cell in col:
...                 if cell.string.startswith('-'): cell.string = '(' + cell.string[1:] + ')'; cell.color=DisplayColumnColors.Red
>>> ds=rt.Dataset({'test':rt.arange(5)-3, 'another':rt.arange(5.0)-2})
>>> ds.set_display_callback(make_red)
>>> ds
static set_fast_array(val)[source]

Set to true to force the casting of numpy arrays to FastArray when constructing a Struct or adding a new column.

Parameters:

val (bool) – True or False

summary_as_dict()[source]

Gets the column names used as rights in display

summary_get_names()[source]

Gets the column names used as rights in display

summary_remove()[source]

Reomves any rights used in display

summary_set_names(listnames)[source]

Set which column names can be used as rights in display

tolist()[source]

Returns data values in a list. Output equivalent to list(st.values()).

Returns:

list

tree(name=None, showpaths=False, info=False)[source]

Returns a hierarchical view of the Struct.

Parameters:
  • name (str) – Optional name for the top of the tree

  • showpaths – TODO purpose unknown, may raise error if true

  • info – TODO purpose unknown

Returns:

tree – A hierarchical view of the Struct

Return type:

str

Examples

>>> st1 = rt.Struct({'A': rt.FA([1, 2, 3]), 'B': rt.FA([4, 5])})
>>> st2 = rt.Struct({'C': st1, 'D': rt.FA([6, 7, 8])})
>>> st2.tree()
Struct
 ├──── C (Struct)
 │     ├──── A int32 (3,) 4
 │     └──── B int32 (2,) 4
 └──── D int32 (3,) 4
>>> st2.tree(name='foo')
foo
 ├──── C (Struct)
 │     ├──── A int32 (3,) 4
 │     └──── B int32 (2,) 4
 └──── D int32 (3,) 4
values()[source]

Values are individual items from the struct (no attribute from item container).

Returns:

Items.

Return type:

dict_values