riptable.rt_accum2

Classes

Accum2

The Accum2 object is very similar to a GroupBy object that has been initialized with a multikey Categorical.

class riptable.rt_accum2.Accum2(cat_rows, cat_cols, filter=None, showfilter=False, ordered=None, sort_gb=False, totals=True, ylabel=None)[source]

Bases: riptable.rt_groupbyops.GroupByOps, riptable.rt_fastarray.FastArray

The Accum2 object is very similar to a GroupBy object that has been initialized with a multikey Categorical.

The Accum2 object is very similar to a GroupBy object that has been initialized with a multikey Categorical. Because it also inherits from GroupByOps, all calculations will be sent to _calculate_all in a Grouping object. Accum2 generates a single array of data, and splits it into multiple columns - one for each x-axis bin. There is always an invalid bin, but it is omitted by default when the single array is split into columns. Datasets resulting from an Accum2 groupby calculation will be displayed with a footer row of column totals, and an additional vertical column of row totals.

In addition to inheriting from GroupByOps, Accum2 also inherits from FastArray. This way, it can exist as a column in a Dataset. Its cell data will appear as a tuple of values from its X and Y axis.

Parameters:
  • cat_rows (Categorical) – Categorical for the rows axis, or an array which will be converted to a Categorical.

  • cat_cols (Categorical) – Categorical for the column axis, or an array which will be converted to a Categorical.

  • Keywords

  • --------

  • invalid (defaults to False. Set to True to show filtered columns) –

  • ordered (defaults to None. See Categorical) –

  • sort_gb (defaults to False. See Categorical) –

  • ylabel (defaults to None. Set to a string to override the name of the left column) –

  • totals (defaults to True.) –

  • option (There is no sort_display) –

Returns:

  • Accum2 object which can be used to perform calculations

  • Accum2 subclasses from FastArray and can be added to a dataset

  • Accum2.operation is then supported. Accum2(catx, caty).min(array1)

  • See (groupbyops)

Examples

>>> int_fa = FastArray([1,2,3,4]*4)
>>> str_fa = FastArray(['a','b','c','d','b','c','d','a','c','d','b','a','d','a','b','c'])
>>> data_col = np.random.rand(16)*10
>>> data_col
array([6.7337479 , 1.69561884, 8.20657899, 6.12821287, 3.95380641,
        1.06706672, 9.51679965, 3.57184704, 7.86268264, 9.0136061 ,
        2.12355667, 3.64954958, 8.40952542, 0.06431684, 9.52872172,
        3.94938333])   #random
>>> c_x = Categorical(str_fa)
>>> c_y = Categorical(int_fa)
>>> ac = Accum2(c_x, c_y)
>>> ac
Accum2 Keys
 X:[b'a' b'b' b'c' b'd']
 Y:{'key_0': FastArray([1, 2, 3, 4])}
 Bins:25   Rows:16

*YLabel   a   b   c   d   Total
-------   -   -   -   -   -----
      1   1   1   1   1       4
      2   1   1   1   1       4
      3   0   2   1   1       4
      4   2   0   1   1       4
-------   -   -   -   -   -----
  Total   4   4   4   4      16
>>> ac.sum(data_col)
*YLabel       a       b       c       d   Total
-------   -----   -----   -----   -----   -----
      1    6.73    3.95    7.86    8.41   26.96
      2    0.06    1.70    1.07    9.01   11.84
      3    0.00   11.65    8.21    9.52   29.38
      4    7.22    0.00    3.95    6.13   17.30
-------   -----   -----   -----   -----   -----
  Total   14.02   17.30   21.09   33.07   85.48
property gb_keychain

Request a GroupByKeys from the y-axis categorical.

This provides unique keys, a possible sorted index, and the ability to add a filtered bin to the final table from groupby calculations.

property gbkeys
property ikey
property isortrows
property ncountgroup

Grouping.ncountgroup

Type:

See

property ncountkey

Grouping.ncountkey

Type:

See

property size
ACCUM_X_MAX: int = 10000
DebugMode: bool = False
__del__()[source]

Called when a Categorical is deleted.

__getitem__(fld)[source]

Bracket indexing for Accum2.

__len__()[source]
__repr__()[source]

Return repr(self).

__str__()[source]

Return str(self).

classmethod _accum1_pass(cat, origarr, funcNum, showfilter=False, filter=None, func_param=0, **kwargs)[source]

internal call to calculate the Y or X summary axis the filter muse be passed correctly returns array with result of operation, size of array is number of uniques

classmethod _add_totals(cat_rows, newds, name, totalsX, totalsY, totalOfTotals)[source]

Adds a summary column on the right (totalsY) Adds a footer on the bottom (totalsX)

classmethod _apply_2d_operation(func, imatrix, showfilter=True, filter_rows=None, filter_cols=None)[source]

Called from routines like sum or min where we can make one pass

If there are badrows, then filter_rows is set to the row indexes that are bad If there are badcols, then filter_cols is set to the col indexes that are bad filter_rows is a fancy index or none

_build_sds_meta_data(name, **kwargs)[source]
_build_string()[source]
classmethod _calc_badslots(cat, badslots, filter, wantfancy)[source]

internal routine will combine (row or col filter) badslots with common filter

if there are not badslots, the common filter is returned otherwise a new filter is returned the filter is negative (badslots locations are false)

if wantfancy is true, returns fancy index to cols or rows otherwise full boolean mask combined with existing filter (if exists)

classmethod _calc_multipass(cat_cols, cat_rows, newds, origarr, funcNum, func, imatrix, name=None, showfilter=False, filter=None, badrows=None, badcols=None, badcalc=True, **kwargs)[source]

For functions that require multiple passes to get the proper result. such as mean or median.

If the grid is 7 x 11: there will be 77 + 11 + 7 + 1 => 96 passes

Parameters:
  • func (userfunction to call calculate) –

  • name (optional column name (otherwise function name used)) –

  • badrows (optional list of bad row keys, will be combined with filter) –

  • badcols (optional list of bad col keys, will be combined with filter) –

  • filter) (badrows/cols is just the keys that are bad (not a boolean) –

  • badrows=['AAPL' (for example) –

  • 'GOOG']

  • take (Need new algo to) – bad bins + ikey + existing boolean filter ==> create a new boolean filter walk ikey, see if bin is bad in lookup table, if so set filter to False else copy from existing filter value

classmethod _calc_onepass(cat_cols, cat_rows, newds, origarr, funcNum, func, imatrix, name=None, showfilter=False, filter=None, badrows=None, badcols=None, badcalc=True, **kwargs)[source]

For functions such as sum or min that require one pass to get the proper result.

The first pass calculates all the cells. Once the cells are calculated, an imatrix is made. Since functions like sum or min can calculate proper values for horizontal or vertical operations without making another pass, we use the imatrix to calculate the rest.

The user may also pass in badrows or badcols, or both. When badrows is passed, the CELLS for that row are still calculated normally. However, the totalOfTotals will not include the badrows or cols.

_calculate_all(funcNum, *args, func_param=0, **kwargs)[source]

Can be called from apply_reduce

_finish_calculate_all(origdict, accum_dict, funcNum, func_param=0, tups=0, transform=False, **kwargs)[source]
Parameters:
  • origdict (original dataset input) –

  • accum_list (input data we can calculate on) –

  • funcNum (internal riptable groupby function number OR) – a callable reduce function

  • func_param (optional, parameters for the function) –

_get_gbkeyname()[source]
_get_gbkeys(showfilter=False)[source]
_internal_getitem(matrix_index)[source]
classmethod _load_from_sds_meta_data(name, arr, cols, meta)[source]
_make_imatrix(input_arr, col_keys, row_keys, showfilter=False)[source]

Return a Fortran-ordered 2d matrix.

if showfilter is False, the first column is removed
    shape is (row_keys.unique_count+1, col_keys.unique_count)
else if showfilter is True
    shape is (row_keys.unique_count +1, col_keys.unique_count+1)
_stack_dataset(arr, origarr, funcNum, showfilter=False, tups=0, **kwargs)[source]

Accum2 uses a single array but returns a dataset that is stacked. The long column is unrolled into columns.

Parameters:
  • arr

  • origarr

  • funcNum

  • showfilter (bool) –

  • kwargs (dict-like) – Keyword args to pass to the function specified by funcNum.

apply_reduce(userfunc, *args, dataset=None, label_keys=None, func_param=None, dtype=None, transform=False, **kwargs)[source]

Accum2:apply_reduce calls Grouping:apply_helper

Parameters:
  • userfunc (callable) – A callable that takes a contiguous array as its first argument, and returns a scalar In addition the callable may take positional and keyword arguments.

  • args – Used to pass in columnar data from other datasets

  • dataset (None) – User may pass in an entire dataset to compute.

  • label_keys (None) – Not supported, will use the existing groupby keys as labels.

  • func_param (tuple, optional) – Set to a tuple to pass as arguments to the routine.

  • dtype (str or np.dtype, optional) – Change to a numpy dtype to return an array with that dtype. Defaults to None.

  • transform (bool) – Set to True to re-expand the results of the calculation. Defaults to False.

  • filter

  • kwargs – Optional positional and keyword arguments to pass to userfunc

Notes

See Grouping.apply_reduce

count(**kwargs)[source]

Compute count of group

display_convert_func(index, itemformat)[source]
display_query_properties()[source]

Take over display query properties from parent class FastArray.

When displayed in a Dataset, Accum2 data will be displayed as a tuple composite of its categorical (x,y) bin values.

make_dataset(arr, showfilter=False)[source]
Parameters:

arr (input array of data) –

Returns:

  • ds

  • col_keys

  • row_keys