riptable.rt_groupbykeys

Classes

GroupByKeys

Handles masking, appending invalid, and sorting of key columns for a groupby operation.

class riptable.rt_groupbykeys.GroupByKeys(grouping_dict, ifirstkey=None, isortrows=None, sort_display=False, pre_sorted=False, prebinned=False)

Handles masking, appending invalid, and sorting of key columns for a groupby operation.

Parameters:
  • grouping_dict (dict) – Non-unique or unique key columns.

  • ifirstkey (array, optional) – If set, first occurence to generate unique values from non-unique keys. Sometimes these are lazily evaluated and do not correspond to the grouping dict held. See the prebinned keyword.

  • isortrows (array, optional) – A sorted index for the unique array. May be calculated later on, after grouping_dict is reduced to unique values.

  • sort_display (bool, default False) – If True, unique keys in result of operation will be sorted. Otherwise, will appear unsorted.

  • pre_sorted (bool, default False) – Unique grouping_dict is already in a sorted order, do not apply / calculate isortrows, even if sort on is True.

  • prebinned (bool, default False) – If True, grouping_dict contains unique values. ifirstkey will not be stored. If False, grouping_dict contains non-unique values, the default if constructed by a GroupBy object.

Notes

Constructor

GroupByKeys has two main ways of initialization:

  1. From a non-unique grouping_dict and ifirstkey (fancy index to unique values). The object will hold on to both, and lazily generate the groupby keys as necessary.

  2. From already binned gbkeys (unique values). Most categoricals will initialize GroupByKeys this way. Because Categoricals are sometimes naturally sorted, they may set the pre_sorted keyword to True.

If sort_display is True and the keys are not already sorted, the gbkeys will be sorted in-place the first time a groupby calculation is made. After being sorted, the internal _sort_applied flag will be set. Despite the keys being sorted, the sort might still need to be applied to the data columns of the groupby calculation’s result.

Lazy Evaluation

  • If isortrows is not provided and the gbkeys are not pre-sorted, a lexsort will be performed, and the keys will be sorted inplace.

  • If gbkeys are requested with a filter bin, a new bin will be permanently prepended to each key array. After the filtered bin is added, the gbkeys will still default to return arrays without the filter (a view of the held arrays offset by 1).

  • Multikey labels are a list of strings of the tuples that will appear when a multikey is displayed. They will also add a filtered tuple as necessary, and default to a reduced view after the addition - just like the gbkeys. Multikey labels will not be generated until they are requested for display (because they are constructed in a python loop, generating these is expensive).

property gbkeys

Generates groupby keys if necessary. Returns groupby keys.

property gbkeys_filtered

Adds a filter to the gbkeys, or returns the already filtered gbkeys.

property isortrows

Generates isortrows (index to sort groupby keys). Possibly performs a lexsort.

property multikey

Returns True if GroupByKeys object is holding multiple columns in _gbkeys

property multikey_labels
property multikey_labels_filtered
property singlekey

Returns True if GroupByKeys object is holding a single column in _gbkeys

property sort_gb_data

If a sort has been applied to the gbkeys, they do not need to be sorted, however the data resulting from a groupby calculation is naturally unsorted and will still need a sort applied.

property unique_count

Returns number of unique groupby keys - lazily evaluated and stored.

__getitem__(index)
__repr__()

Return repr(self).

__str__()

Return str(self).

_build_string()
_get_filter_bin_name(arr)
_get_index_from_tuple(tup)

If the GroupByKeys object is holding a multikey dictionary, it can be indexed by a tuple. This internal routine (called by get_index_from_bin/__getitem__) will return the bin index of matching multikey entries or -1 if not found. Any string/bytes values will be fixed to match the string/bytes column.

_insert_filter_bin()
_insert_filter_label()
_make_isortrows()
_pull_from_ifirstkey()
_trim_keys(keys)

Return a trimmed view of the keys so the filtered bin is not included. Also trims list of multikey labels

copy(deep=False)

Creates a deep or shallow copy of the grouping

get_bin(index)
Parameters:

index – int or list of integers

Return result_bins:

matching bins for provided indices or an empty list

get_bin_from_index(index)
Parameters:

index – int or list of integers

Return result_bins:

matching bins for provided indices or an empty list

get_index_from_bin(bin)
Parameters:

bin – a tuple of multiple keys or a single key (will be converted to tuple)

Return index:

the bin index, or -1 if not found.

keys(sort=None, showfilter=False)

Return unique keys, possibly apply a sort and add a filter bin.

labels(showfilter=False)

Generates list of tuples from multikey columns.

unique_unsorted()

Pull the unique keys unsorted, using iFirstKey or the prebinned uniques.

unsort()

Sets the internal _sort_display flag to False. Will warn the user if the groupby keys are already sorted or were pre-sorted when GroupByKeys were constructed.