`riptable.rt_multiset`

Classes

Multiset

Multisets contain datasets and/or multisets where all contained dataset have the

class riptable.rt_multiset.Multiset(input_value=None)

Bases: riptable.rt_struct.Struct

Multisets contain datasets and/or multisets where all contained dataset have the same number of rows. Multisets can provide a convenient namespace for closely related datasets, such as those loaded from a single HDF5 file or generated by an aggregation applied to a GroupBy object.

The columns within contained datasets may be displayed in an interleaved way. Example: Assume Jan and Feb are two datasets with 3 columns each:

Jan: Run1, Run2, Run3
Feb: Run1, Run2, Run3

A Multiset containing these datasets would display with a multi-line header:

Run1 Run2 Run3

Jan Feb Jan Feb Jan Feb

One can access the Run1 column in the Jan dataset with the syntax: ms.Jan.Run1

Examples

>>> ds=rt.Dataset({'somenans': [0., 1., 2., nan, 4., 5.], 'morestuff': ['A','B','C','D','E','F']})
>>> ds2=rt.Dataset({'somenans': [0., 1., nan, 3., 4., 5.], 'morestuff':['H','I','J','K','L','M']})
>>> ms=rt.Multiset({'test':ds, 'test2':ds2})
>>> ms
      somenans      morestuff
#   test   test2   test   test2
-   ----   -----   ----   -----
0   0.00    0.00   A      H
1   1.00    1.00   B      I
2   2.00     nan   C      J
3    nan    3.00   D      K
4   4.00    4.00   E      L
5   5.00    5.00   F      M

>>> ms['morestuff']
     morestuff
#   test   test2
-   ----   -----
0   A      H
1   B      I
2   C      J
3   D      K
4   E      L
5   F      M

>>> ms['test']
#   somenans   morestuff
-   --------   ---------
0       0.00   A
1       1.00   B
2       2.00   C
3        nan   D
4       4.00   E
5       5.00   F

>>> ms[[2,3],'somenans']
      somenans
#   test   test2
-   ----   -----
0   2.00     nan
1    nan    3.00

>>> ms[[2,3],'morestuff']
     morestuff
#   test   test2
-   ----   -----
0   C      J
1   D      K

>>> ms[[2,3],['morestuff','somenans']]
     morestuff       somenans
#   test   test2   test   test2
-   ----   -----   ----   -----
0   C      J       2.00     nan
1   D      K        nan    3.00

property dtypes

Returns dictionary of dtype for each column.

Returns:: Dictionary containing the dtype for each column in the Multiset.
Return type:: dict

__getitem__(index)

Parameters:: index ((rowspec, colspec) or colspec) –
Return type:: the indexed row(s), cols(s), sub-dataset or single value

Examples

>>> ds=rt.Dataset({'somenans': [0., 1., 2., nan, 4., 5.]})
>>> ds2=rt.Dataset({'somenans': [0., 1., nan, 3., 4., 5.]})
>>> ms=rt.Multiset({'test':ds, 'test2':ds2})
>>> ms[2,:]
      somenans
#   test   test2
-   ----   -----
0   2.00     nan

Raises:

IndexError – When an invalid column name is supplied.
TypeError –

__len__()

__repr__(): Return repr(self).

__setitem__(index, value)

Parameters:

index (colspec) –
value (A Dataset or Multiset) –

Return type:

None

Raises:

IndexError, TypeError, ValueError –

__str__(): Return str(self).

_autocomplete()

static _build_col_headers(rootobject, rootdict): return a list of lists of ColHeaders

_build_footers(): Still testing. TODO: speed up this python loop

_check_addtype(name, value): called from subclassed Struct when a new item is added

_copy(deep=False, rows=None, cols=None, base_index=0, cls=None)

Bracket indexing that returns a multiset will funnel into this routine.

Parameters:

deep (if True, perform a deep copy on column array) –
rows (row mask) –
cols (column mask) –
base_index (used for head/tail slicing) –
cls (class of return type, for subclass super() calls) –
False. (First argument must be deep. Deep cannnot be set to None. It must be True or) –

static _depth_first(curobject, curdict, level, returnlist): returns the max depth, list of dictionaries

_init_from_dict(dictionary)

_last_row_stats()

_repr_html_()

abs(*args, **kwargs)

all(*args, **kwargs)

For use in boolean contexts: Is it true that for all elements (val) either:

val casts to True, or
returns True for val.all() or all(val)

Return type:: bool

any(*args, **kwargs)

For use in boolean contexts: Does there exist an element (val) which either:

val casts to True, or
returns True for val.any() or any(val)

Return type:: bool

Examples

>>> s=rt.Struct()
>>> s.a=rt.Dataset()
>>> s.any()
False

apply(*args, **kwargs)

apply_cols(*args, **kwargs)

apply_rows(*args, **kwargs)

astype(*args, **kwargs)

cascade(funcname, *args, **kwargs)

Depth first calling of functions, often into a Dataset. For each Dataset in the Multiset, the function will be called with the args and kwargs. The return result is expected to be a Dataset which will then be added back into a new Multiset and returned to the called.

Parameters:: funcname (string or callable function) –
Return type:: Multiset

copy(deep=True)

Returns a shallow or deep copy of the multiset Defaults to a deepy copy.

Parameters:: deep (bool, default True) – Set to False for a shallow copy.

describe(*args, **kwargs)

fillna(*args, **kwargs)

flatten(horizontal=True, delimiter='_', dset_col_name='Column')

Return a single dataset constructed by concatenating all of the datasets and flattened multisets contained within the multiset. Horizontal flattening will concatenate the datasets horizontally, prepending the dataset name to each dataset’s column names. Vertical flattening requires the names and order of columns in each dataset to be identical, adding a single column to the returned dataset containing the name of the dataset from which each row derives.

Parameters:

horizontal (bool) – If True, concatenate the Datasets horizontally, otherwise vertically.
delimiter (char) – The character used when joining dataset and column names to create unique names.
dset_col_name (string) – For vertical flattening, the name for the column containing dataset names.

Return type:

Dataset

Raises:

ValueError –

keep(*args, **kwargs)

label_fixup(): Auto scan for which column names can be used as labels in display

label_set_names(listnames): Set which column names can be used as labels in display

make_table(display_type)

Pretty-print code used by infrastructure.

Parameters:: display_type – See rt.rt_enum.DS_DISPLAY_TYPES.
Returns:: Display object or string.

max(*args, **kwargs)

mean(*args, **kwargs)

min(*args, **kwargs)

multiget(index, deep=False)

Returns a new Multiset representing a one-level sub-sampling of the original.

Parameters:

index (An index specification.) –
deep (bool, False) – If set to True will make deep copies

Return type:

A new Multiset.

nanmax(*args, **kwargs)

nanmean(*args, **kwargs)

nanmin(*args, **kwargs)

nanstd(*args, **kwargs)

nansum(*args, **kwargs)

nanvar(*args, **kwargs)

pivot(*args, **kwargs)

quantile(*args, **kwargs)

sort_copy(*args, **kwargs)

sort_inplace(*args, **kwargs)

std(*args, **kwargs)

sum(*args, **kwargs)

trim(*args, **kwargs)

var(*args, **kwargs)

riptable.rt_multiset

Classes

`riptable.rt_multiset`