riptable.rt_groupbyops

Classes

GroupByOps

Holds all the functions for groupby

class riptable.rt_groupbyops.GroupByOps

Bases: abc.ABC

Holds all the functions for groupby

Only used when inherited

Child class must set self.grouping and self._dataset Child class must also override methods; count, _calculate_all, and the property; gb_keychain

property first_bool

Return a boolean mask of the first occurrence.

Examples

>>> c = rt.Cat(['this','this','that','that','this'])
>>> c.first_bool
FastArray([ True, False,  True, False, False])
property first_fancy

Return a fancy index mask of the first occurrence

Notes

NOTE: not optimized for groupby which has grouping.ikey always set NOTE: categorical needs to lazy evaluate ikey

Examples

>>> c = rt.Cat(['b','b','a','a','b'])
>>> c.first_fancy
FastArray([0, 2])
>>> c=Cat(['b','b','a','a','b'], ordered=False)
>>> c.first_fancy
FastArray([2, 0])
abstract property gb_keychain: riptable.rt_groupbykeys.GroupByKeys
property groups

Returns a dictionary of unique key values -> their fancy indices of occurrence in the original data.

property last_bool

Return a boolean mask of the last occurrence.

Examples

>>> c = rt.Cat(['this','this','that','that','this'])
>>> c.last_bool
FastArray([ False, False,  False, True, True])
property last_fancy

Return a fancy index mask of the last occurrence

Notes

NOTE: not optimized for groupby which has grouping.ikey always set NOTE: categorical needs to lazy evaluate ikey

Examples

>>> c = rt.Cat(['b','b','a','a','b'])
>>> c.last_fancy
FastArray([3, 4])
>>> c=Cat(['b','b','a','a','b'], ordered=False)
>>> c.last_fancy
FastArray([4, 3])
AggNames
DebugMode = False
NumpyAggNames
QUANTILE_MULTIPLIER = 1000000000.0
_USE_FAST_COUNT_UNIQUES = True
_dataset: riptable.rt_dataset.Dataset | None
grouping: riptable.rt_grouping.Grouping
abstract _calculate_all(funcNum, *args, func_param=0, gbkeys=None, isortrows=None, **kwargs)
_dict_val_at_index(index)

Returns the value of the group label for a given index. A single-key grouping will return a single value. A multi-key grouping will return a tuple of values.

_ema_op(function, *args, time=None, decay_rate=1.0, filter=None, reset_filter=None, **kwargs)

Ema base function for time based ema functions

Formula:

grp loops over each item in a groupby group
i loops over eachitem in the original dataset

Output[i] = <some formula>

Parameters:
  • time (float or int array used to calculate time difference) –

  • decay_rate (see formula, used a half life) –

  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

_gb_keyword_wrapper(filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)
static _gb_quantile_name(q, is_nan_function)

Returns a correct name of a quantile function given q and nan-flag

_get_agg_func(item)

Translates user input into name and method for groupby aggregations.

Parameters:

item (str or function) – String or supported numpy math function. See GroupByOps.AggNames.

Returns:

  • name (str) – Lowercase name for aggregation function.

  • func (function) – GroupByOps method.

_iter_internal(dataset=None)

Generates pairs of labels and the stored dataset sliced by their fancy indices. Right now, this is only called by categorical. Groupby has a faster way of return dataset slices.

_iter_internal_contiguous()

Sorts the data by group to create contiguous memory. Returns key + dataset view of key’s rows for each group.

_keys_as_list()
_nth(*args, n=1, **kwargs)
_pop_gb_data(calledfrom, userfunc, *args, **kwargs)

Pop the groupby data from the args and keyword args, possibly combining. Avoid repeating this step when the data doesn’t change.

Parameters:
  • calledfrom ({'apply_reduce', 'apply_nonreduce', 'apply', 'agg'}) –

  • userfunc (callable or int (function number)) –

Returns:

  • 4 return values

  • any user arguments

  • the kwargs (with ‘dataset’ removed)

  • the dictionary of numpy arrays to operarte on

  • tups (0 or 1 or 2 depending on whether the first argument was a tuple of arrays)

See also

GroupByOps.agg

_possibly_transform(gb_ds, label_keys=None, **kwargs)

Called after a reduce operation to possibly re-expand back. Check transform flag.

_prepare_gb_data(calledfrom, userfunc, *args, dataset=None, **kwargs)
Parameters:
  • calledfrom ('Accum2', 'Categorical','GroupBy','apply_reduce','apply_nonreduce','apply','agg') –

  • userfunc (a callable function or a function number) –

  • allowed) (args or dataset must be present (both also) –

    if just args: make a dictionary from that if just dataset: make dictionary if both: make a new dataset, then make a dictionary from that if neither: error

    from Grouping, normally just a dataset from Categorical, normally just args (but user can use kwarg ‘dataset’ to supply one)

  • Grouping (This routine normalizes input from) –

  • Accum2

  • Categorical

  • dataset. (GroupBy defaults to use the _dataset variable that it sets after being constructed from a) –

  • methods. (no input data is required for the calculation) –

  • for (Accum2 and Categorical can also set _dataset just like Groupby. See Dataset.accum2 and Dataset.cat) –

  • examples.

  • set (If a _dataset has been) –

  • methods.

  • arrays (internal function to parse argument and search for numpy) –

Returns:

  • a dictionary of arrays to be used as input to many groupby algorithms

  • user_args if any (the first argument might be removed)

  • tups (0 or or 2. Will be set to T> 0 if the first argument is a tuple)

Raises:

ValueError

_quantile(*args, q=None, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, is_nan_function=None, is_percentile=None, **kwargs)

Internal function for all (nan)quantile/percentile/median operations.

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • q (float, list of floats) – Quantile(s) or percentile(s) to compute

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • is_nan_function (bool) – Indicates if this was called a nan-version of a function.

  • is_percentile (bool) – Indicates if this was called a (nan)percentile.

static _quantile_funcParam_from_q(q, is_nan_function)

Returns a funcParam to be passed to a cpp level. Multiplier is needed because functions only take interger funcParams See GroupByBase::AccumQuantile1e9Mult function in riptide_cpp/src/GroupBy.cpp

static _quantile_q_from_funcParam(funcParam)

Decodes a quantile q and a nan-flag from funcParam used for cpp level.

agg(func=None, *args, dataset=None, **kwargs)
Parameters:

func (callable, string, dictionary, or list of string/callables) –

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.

Accepted Combinations are:

  • string function name

  • function

  • list of functions

  • dict of column names -> functions (or list of functions)

Returns:

aggregated

Return type:

Multiset

Notes

Numpy functions mean/median/prod/sum/std/var are special cased so the default behavior is applying the function along axis=0

Examples

Aggregate these functions across all columns

>>> gb.agg(['sum', 'min'])
            A         B         C
sum -0.182253 -0.614014 -2.909534
min -1.916563 -1.460076 -1.568297

Different aggregations per column

>>> gb.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
            A         B
max       NaN  1.514318
min -1.916563 -1.460076
sum -0.182253       NaN
>>> gb.agg({'C': np.sum, 'D': lambda x: np.std(x,ddof=1)})
aggregate(func)
apply(userfunc, *args, dataset=None, label_keys=None, **kwargs)

GroupByOps:apply calls Grouping:apply

Parameters:
  • userfunc (callable) – userfunction to call

  • dataset (None) –

  • label_keys (None) –

apply_nonreduce(userfunc, *args, dataset=None, label_keys=None, func_param=None, dtype=None, **kwargs)

GroupByOps:apply_nonreduce calls Grouping:apply_reduce

Parameters:
  • userfunc (callable) – A callable that takes a contiguous array as its first argument, and returns a scalar. In addition the callable may take positional and keyword arguments.

  • args – used to pass in columnar data from other datasets

  • dataset (None) – User may pass in an entire dataset to compute.

  • label_keys (None.) – Not supported, will use the existing groupby keys as labels.

  • dtype (str or np.dtype, optional) – Change to a numpy dtype to return an array with that dtype. Defaults to None.

  • kwargs – Optional positional and keyword arguments to pass to userfunc

Notes

Grouping apply_reduce (for Categorical, groupby, accum2)

For every column of data to be computed, the userfunc will be called back per group as a single array. The order of the groups is either:

  • Order of first apperance (when coming from a hash)

  • Lexigraphical order (when lex=True or a Categorical with ordered=True)

The function passed to apply must take an array as its first argument and return back a single scalar value.

Examples

From a Dataset groupby:

>>> ds.gb(['Symbol'])['TradeSize'].apply_reduce(np.sum)

From an existing categorical:

>>> ds.Symbol.apply_reduce(np.sum, ds.TradeSize)

Create your own with forced dtype:

>>> def mycumprodsum(arr):
>>>     return arr.cumprod().sum()
>>> ds.Symbol.apply_reduce(mycumprodsum, ds.TradeSize, dtype=np.float32)
apply_reduce(userfunc, *args, dataset=None, label_keys=None, nokeys=False, func_param=None, dtype=None, transform=False, **kwargs)

GroupByOps:apply_reduce calls Grouping:apply_reduce

Parameters:
  • userfunc (callable) – A callable that takes a contiguous array as its first argument, and returns a scalar In addition the callable may take positional and keyword arguments.

  • args – Used to pass in columnar data from other datasets

  • dataset (None) – User may pass in an entire dataset to compute.

  • label_keys (None) – Not supported, will use the existing groupby keys as labels.

  • func_param (tuple, optional) – Set to a tuple to pass as arguments to the routine.

  • dtype (str or np.dtype, optional) – Change to a numpy dtype to return an array with that dtype. Defaults to None.

  • transform (bool) – Set to True to re-expand the results of the calculation. Defaults to False.

  • filter

  • kwargs – Optional positional and keyword arguments to pass to userfunc

Notes

Grouping apply_reduce (for Categorical, groupby, accum2)

For every column of data to be computed, the userfunc will be called back per group as a single array. The order of the groups is either:

  • Order of first appearance (when coming from a hash)

  • Lexigraphical order (when lex=True or a Categorical with ordered=True)

The function passed to apply must take an array as its first argument and return back a single scalar value.

Examples

From a Dataset groupby:

>>> ds.gb(['Symbol'])['TradeSize'].apply_reduce(np.sum)

From an existing categorical:

>>> ds.Symbol.apply_reduce(np.sum, ds.TradeSize)

Create your own with forced dtype:

>>> def mycumprodsum(arr):
...     return arr.cumprod().sum()
>>> ds.Symbol.apply_reduce(mycumprodsum, ds.TradeSize, dtype=np.float32)
as_filter(index)

return an index filter for a given unique key

static contains_np_arrays(container)

Check to see if all items in a list-like container are numpy arrays.

abstract count()

Compute count of group

count_uniques(*args, **kwargs)

Compute unique count of group

Return type:

Dataset with grouped key plus the unique count for each column by group.

Examples

>>> N = 17; np.random.seed(1)
>>> ds =Dataset(
        dict(
            Symbol = Cat(np.random.choice(['SPY','IBM'], N)),
            Exchange = Cat(np.random.choice(['AMEX','NYSE'], N)),
            TradeSize = np.random.choice([1,5,10], N),
            TradePrice = np.random.choice([1.1,2.2,3.3], N),
            ))
>>> ds.cat(['Symbol','Exchange']).count_uniques()
*Symbol   *Exchange   TradeSize   TradePrice
-------   ---------   ---------   ----------
IBM       NYSE                2            2
.         AMEX                2            3
SPY       AMEX                3            2
.         NYSE                1            2
cumcount(*args, ascending=True, **kwargs)

rolling count for each group Number each item in each group from 0 to the length of that group - 1.

Parameters:

ascending (bool, default True) –

Returns:

  • A single array, same size as the original grouping dict/categorical.

  • If a filter was applied, integer sentinels will appear in those slots.

cummax(*args, filter=None, reset_filter=None, skipna=True, **kwargs)

Cumulative nanmax for each group

Parameters:
  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

  • skipna (boolean, default True) – Exclude nan/invalid values.

Return type:

Dataset same rows as original dataset

cummin(*args, filter=None, reset_filter=None, skipna=True, **kwargs)

Cumulative nanmin for each group

Parameters:
  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

  • skipna (boolean, default True) – Exclude nan/invalid values.

Return type:

Dataset same rows as original dataset

cumprod(*args, filter=None, reset_filter=None, **kwargs)

Cumulative product for each group

Parameters:
  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

cumsum(*args, filter=None, reset_filter=None, **kwargs)

Cumulative sum for each group

Parameters:
  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

abstract describe(**kwargs)
diff(period=1, **kwargs)

rolling diff for each group

Parameters:

period (optional, period size, defaults to 1) –

Return type:

Dataset same rows as original dataset

ema_decay(*args, time=None, decay_rate=None, filter=None, reset_filter=None, **kwargs)

Ema decay for each group

Formula:

grp loops over each item in a groupby group
i loops over eachitem in the original dataset

Output[i] = Column[i] + LastEma[grp] * exp(-decay_rate * (Time[i] - LastTime[grp])); LastEma[grp] = Output[i] LastTime[grp] = Time[i]

Parameters:
  • time (float or int array used to calculate time difference) –

  • decay_rate (see formula, used a half life) –

  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

Example

>>> aapl
#    delta     sym       org    time
-   ------     ----   ------   -----
0    -3.11     AAPL    -3.11   25.65
1   210.54     AAPL   210.54   38.37
2    49.97     AAPL    42.11   41.66
>>> np.log(2)/(1e3*100)
6.9314718055994526e-06
>>> aapl.groupby('sym')['delta'].ema_decay(time=aapl.time, decay_rate=np.log(2)/(1e3*100))[0]
FastArray([ -3.11271882, 207.42784495, 257.39155897])
ema_normal(*args, time=None, decay_rate=None, filter=None, reset_filter=None, **kwargs)

Ema decay for each group

Formula:

grp loops over each item in a groupby group
i loops over eachitem in the original dataset

decayedWeight = exp(-decayRate * (Time[i] - LastTime[grp])); LastEma[grp] = Column[i] * (1 - decayedWeight) + LastEma[grp] * decayedWeight Output[i] = LastEma[grp] LastTime[grp] = Time[i]

Parameters:
  • time (float or int array used to calculate time difference) –

  • decay_rate (see formula, used a half life (defaults to 1.0)) –

  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

Example

>>> ds = rt.Dataset({'test': rt.arange(10), 'group2': rt.arange(10) % 3})
>>> ds.normal = ds.gb('group2')['test'].ema_normal(decay_rate=1.0, time = rt.arange(10))['test']
>>> ds.weighted = ds.gb('group2')['test'].ema_weighted(decay_rate=0.5)['test']
>>> ds
#   test   group2   normal   weighted
-   ----   ------   ------   --------
0      0        0     0.00       0.00
1      1        1     1.00       1.00
2      2        2     2.00       2.00
3      3        0     2.85       1.50
4      4        1     3.85       2.50
5      5        2     4.85       3.50
6      6        0     5.84       3.75
7      7        1     6.84       4.75
8      8        2     7.84       5.75
9      9        0     8.84       6.38
ema_weighted(*args, decay_rate=None, filter=None, reset_filter=None, **kwargs)

Ema decay for each group with constant decay value (no time parameter)

Formula:

grp loops over each item in a groupby group
i loops over eachitem in the original dataset

LastEma[grp] = Column[i] * (1 - decay_rate) + LastEma[grp] * decay_rate Output[i] = LastEma[grp]

Parameters:
  • time (<not used>) –

  • decay_rate (see formula, used a half life) –

  • filter (optional, boolean mask array of included) –

  • reset_filter (optional, boolean mask array) –

Return type:

Dataset same rows as original dataset

Example

>>> ds = rt.Dataset({'test': rt.arange(10), 'group2': rt.arange(10) % 3})
>>> ds.normal = ds.gb('group2')['test'].ema_normal(decay_rate=1.0, time=rt.arange(10))['test']
>>> ds.weighted = ds.gb('group2')['test'].ema_weighted(decay_rate=0.5)['test']
>>> ds
#   test   group2   normal   weighted
-   ----   ------   ------   --------
0      0        0     0.00       0.00
1      1        1     1.00       1.00
2      2        2     2.00       2.00
3      3        0     2.85       1.50
4      4        1     3.85       2.50
5      5        2     4.85       3.50
6      6        0     5.84       3.75
7      7        1     6.84       4.75
8      8        2     7.84       5.75
9      9        0     8.84       6.38

See also

ema_normal, ema_decay

findnth(*args, filter=None, **kwargs)

FindNth

Parameters:
  • filter (optional, boolean mask array of included) –

  • bin (TAKES NO ARGUMENTS -- operates on) –

Return type:

Dataset same rows as original dataset

first(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

First value in the group

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

get_groupings(filter=None)
Parameters:

filter (ndarray of bools, optional) – pass in a boolean filter

Returns:

iGroup - the fancy indices for all groups, sorted by group. see iFirstGroup and nCountGroup for how to walk this. iFirstGroup - first index for each group in the igroup array. the first index is invalid nCountGroup - count for each unique group. the first count in this array is the invalid count.

Return type:

dict containing ndarrays calculated in pack_by_group().

classmethod get_header_names(columns, default='col_')
abstract head(n=5, **kwargs)

Returns first n rows of each group.

Essentially equivalent to .apply(lambda x: x.head(n)), except ignores as_index flag.

Examples

>>> df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
>>> df.groupby('A', as_index=False).head(1)
   A  B
0  1  2
2  5  6
>>> df.groupby('A').head(1)
   A  B
0  1  2
2  5  6
iter_groups()

Very similar to the ‘groups’ property, but uses a generator instead of building the entire dictionary. Returned pairs will be group label value (or tuple of multikey group label values) –> fancy index for that group (base-0).

key_from_bin(bin)

Returns the value of the group label for a given index. (uses zero-based indexing) A single-key grouping will return a single value. A multi-key grouping will return a tuple of values.

last(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Last value in the group

max(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute max of group

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

mean(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute mean of groups

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

median(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute median of groups For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

min(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute min of group

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

mode(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute mode of groups (auto handles nan)

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nanmax(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute max of group, excluding missing values

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nanmean(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute mean of group, excluding missing values

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nanmedian(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute median of group, excluding missing values For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

nanmin(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute min of group, excluding missing values

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nanpercentile(*args, q, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute percentile of groups, excluding missing values For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • q (float/int or list of floats/ints) – Percentile(s) to compute. Must be value(s) between 0 and 100

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

nanquantile(*args, q=None, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute quantile of groups, excluding missing values For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • q (float, list of floats) – Quantile(s) to compute. Must be value(s) between 0. and 1.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • is_nan_function (bool) – Not recommended for use. Indicates if this is a nan-version of a function.

nanstd(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute standard deviation of groups, excluding missing values

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nansum(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute sum of group, excluding missing values

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

nanvar(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute variance of groups, excluding missing values

For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

abstract ngroup(ascending=True, **kwargs)

Number each group from 0 to the number of groups - 1. This is the enumerative complement of cumcount. Note that the numbers given to the groups match the order in which the groups would be seen when iterating over the groupby object, not the order they are first observed.

Parameters:

ascending (bool, default True) – If False, number in reverse, from number of group - 1 to 0.

Examples

>>> df = pd.DataFrame({"A": list("aaabba")})
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').ngroup()
0    0
1    0
2    0
3    1
4    1
5    0
dtype: int64
>>> df.groupby('A').ngroup(ascending=False)
0    1
1    1
2    1
3    0
4    0
5    1
dtype: int64
>>> df.groupby(["A", [1,1,2,3,2,1]]).ngroup()
0    0
1    0
2    1
3    3
4    2
5    0
dtype: int64

See also

cumcount

Number the rows in each group.

static np_quantile_mult(a, funcParam)

Applies a correct numpy function for aggregation, used in accum2 Takes funcParam as an argument

abstract nth(*args, **kwargs)
null(showfilter=False)

Performs a reduced no-op. No operation is performed.

Parameters:

showfilter (bool, False) –

Return type:

Dataset with grouping keys. No operation is performed.

Examples

>>> rt.Cat(np.random.choice(['SPY','IBM'], 100)).null(showfilter=True)
abstract ohlc(**kwargs)

Compute sum of values, excluding missing values For multiple groupings, the result index will be a MultiIndex

percentile(*args, q, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute percentile of groups. Returns nan for data that contains nans. For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • q (float/int or list of floats/ints) – Percentile(s) to compute. Must be value(s) between 0 and 100

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

quantile(*args, q, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, **kwargs)

Compute quantile of groups. Returns nan for data that contains nans. For multiple groupings, the result will be a MultiSet

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • q (float or list of floats) – Quantile(s) to compute. Must be value(s) between 0. and 1.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

static quantile_name_from_param(funcParam)

Returns a correct name of a quantile function given funParam, used in accum2

abstract rank(method='average', ascending=True, na_option='keep', pct=False, axis=0, **kwargs)

Provides the rank of values within each group

Parameters:
  • method ({'keep', 'top', 'bottom'}, default 'keep') –

    • average: average rank of group

    • min: lowest rank in group

    • max: highest rank in group

    • first: ranks assigned in order they appear in the array

    • dense: like ‘min’, but rank always increases by 1 between groups

  • method

    • keep: leave NA values where they are

    • top: smallest rank if ascending

    • bottom: smallest rank if descending

  • ascending (boolean, default True) – False for ranks by high (1) to low (N)

  • pct (boolean, default False) – Compute percentage rank of data within each group

Return type:

DataFrame with ranking of values within each group

classmethod register_functions(functable)

Registration should follow the NUMBA_REVERSE_TABLE layout at the bottom of rt_groupbynumba.py If we register again, the last to register will be executed. NUMBA_REVERSE_TABLE[i + GB_FUNC_NUMBA]={‘name’: k, ‘packing’: v[0], ‘func_front’: v[1], ‘func_back’: v[2], ‘func_gb’:v[3], ‘func_dtype’: v[4], ‘return_full’: v[5]}

abstract resample(rule, *args, **kwargs)

Provide resampling when using a TimeGrouper Return a new grouper with our resampler appended

rolling_count(*args, window=3, **kwargs)

rolling count for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_diff(*args, window=1, **kwargs)

rolling diff for each group

Parameters:

window (optional, window size, defaults to 1) –

Return type:

Dataset same rows as original dataset

rolling_mean(*args, window=3, **kwargs)

rolling mean for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_median(*args, window=3, **kwargs)

rolling nan median for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_nanmean(*args, window=3, **kwargs)

rolling nan mean for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_nansum(*args, window=3, **kwargs)

rolling nan sum for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_quantile(*args, q, window=3, **kwargs)

rolling nan quantile for each group

Parameters:
  • q (float, quantile to compute) –

  • window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

rolling_shift(*args, window=1, **kwargs)

rolling shift for each group

Parameters:
  • window (optional, window size, defaults to 1) –

  • negative (windows can be) –

Return type:

Dataset same rows as original dataset

rolling_sum(*args, window=3, **kwargs)

rolling sum for each group

Parameters:

window (optional, window size, defaults to 3) –

Return type:

Dataset same rows as original dataset

abstract sem(**kwargs)

Compute standard error of the mean of groups For multiple groupings, the result index will be a MultiIndex

Parameters:

ddof (integer, default 1) – degrees of freedom

shift(window=1, **kwargs)

Shift each group by periods observations :param window: :type window: integer, default 1 number of periods to shift :param periods: :type periods: optional support, same as window

std(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute standard deviation of groups

For multiple groupings, the result will be a MultiSet

Parameters:
  • ddof (integer, default 1) – degrees of freedom

  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

sum(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute sum of group

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work Accum2 not supported.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

abstract tail(n=5, **kwargs)

Returns last n rows of each group Essentially equivalent to .apply(lambda x: x.tail(n)), except ignores as_index flag.

Examples

>>> df = pd.DataFrame([['a', 1], ['a', 2], ['b', 1], ['b', 2]], columns=['A', 'B'])
>>> df.groupby('A').tail(1)
   A  B
1  a  2
3  b  2
>>> df.groupby('A').head(1)
   A  B
0  a  1
2  b  1
trimbr(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute trimmed mean br of groups (auto handles nan)

Parameters:
  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.

var(*args, filter=None, transform=False, showfilter=False, col_idx=None, dataset=None, return_all=False, computable=True, accum2=False, func_param=0, **kwargs)

Compute variance of groups

For multiple groupings, the result will be a MultiSet

Parameters:
  • ddof (integer, default 1) – degrees of freedom

  • *args – Elements to apply the GroupBy Operation to. Typically a FastArray or Dataset.

  • filter (array of bool, optional) – Elements to include in the GroupBy Operation.

  • transform (bool) – If transform = True, the output will have the same shape as args. If transform = False, the output will typically have the same shape as the categorical.

  • showfilter (bool) – If showfilter is True, there will be an extra row in the output representing the GroupBy Operation applied to all those elements that were filtered out.

  • col_idx (str, list of str, optional) – If the input is a Dataset, col_idx specifies which columns to keep.

  • dataset (Dataset, optional) – If a dataset is specified, the GroupBy Operation will also be applied to the dataset. If there is an args argument and dataset is specified then the result will be appended to the dataset.

  • return_all (bool) – If return_all is True, will return all columns, even those where the GroupBy Operation does not make sense. If return_all is False, it will not return columns it cannot apply the GroupBy to. Does not work with Accum2.

  • computable (bool) – If computable is True, will not try to apply the GroupBy Operation to non-computable datatypes.

  • accum2 (bool) – Not recommended for use. If accum2 is True, the result is returned as a dictionary.

  • func_param – Not recommended for use.