riptable.rt_groupby
Classes
|
- class riptable.rt_groupby.GroupBy(dataset, keys=None, filter=None, ordered=None, sort_display=None, return_all=False, hint_size=0, lex=None, rec=False, totals=False, copy=False, cutoffs=None, verbose=False, **kwargs)
Bases:
riptable.rt_groupbyops.GroupByOps
- Parameters:
dataset (Dataset) – The dataset object
keys (list. List of column names to groupby) –
filter (None. Boolean mask array applied as filter before grouping) –
return_all (bool. Default to False. When set to True will return all) – the dataset columns for every operation.
hint_size (int. Hint size for the hash (optional)) –
sort_display (bool. Default to True. Indicates) –
lex (bool) – Defaults to False. When True uses a lexsort to find the groups (otherwise uses a hash).
totals (bool) –
- property gb_keychain
- property gbkeys
dictionary of numpy arrays binned from
- property ifirstkey
- property ilastkey
- property isortrows
sorted index or None
- property transform
The property transform sets a flag so that the next reduce function called after transform, will repopulate the original array with the reduced value.
Example
>>> ds.groupby(['side', 'venue']).transform.sum()
- DebugMode = False
- TestCatGb = True
- __getattr__(name)
__getattr__ is hit when ‘.’ is used to trim a single column.
Examples
>>> ds = Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)}) >>> ds.keycol = FA(['a','a','b','c','a']) >>> ds.gb('keycol').col_4.mean() *keycol col_4 ------- ----- a 0.73 b 0.03 c 0.76
- __getitem__(fld)
- __iter__()
Generates tuples of key, value pairs. Keys are key values for single key, or tuples of key values for multikey. Values are datasets containing all rows from data in group for that key.
- __repr__()
Return repr(self).
- __str__()
Return str(self).
- _build_string()
- _calculate_all(funcNum, *args, func_param=0, **kwargs)
Generate a GroupByKeys object if necessary and ask for the result of a calculation from the grouping object. Returns: a grouped by dataset with the result from the calculation
- _getitem(fld)
Called by __getitem__ and __getattr__. Uses the field to index into the stored dataset. Often used to limit the data the groupby operation is being performed on. Returns a shallow copy of the groupby object.
This routine gets hit during the following common code pattern:
>>> ds = Dataset({'col_'+str(i): np.random.rand(5) for i in range(5)}) >>> ds.keycol = FA(['a','a','b','c','a']) >>> ds.gb('keycol')[['col_1', 'col_2']].sum() *keycol col_1 col_2 ------- ----- ----- a 1.92 0.89 b 0.70 0.46 c 0.07 0.42
>>> ds.gb('keycol').col_4.mean() *keycol col_4 ------- ----- a 0.73 b 0.03 c 0.76
- _grouping_data_as_dict(ds)
- _pop_gb_data(calledfrom, userfunc, *args, **kwargs)
GroupBy holds on to its dataset. There may be no additional data provided.
- add_totals(gb_ds)
- as_categorical()
Returns a categorical using the same binning information as the GroupBy object (no addtl. hash required). New categorical will not share a grouping object with this groupby object, but will share a reference to the iKey. Categorical operation results will be sorted or unsorted depending on if ‘gb’ or ‘gbu’ called this.
- backfill(limit=0, fill_val=None, inplace=False)
Backward fill the values
- Parameters:
limit (integer, optional) – limit of how many values to fill
See also
fill_forward
,fill_backward
,fill_invalid
- copy(deep=True)
Called from getitem when user follows gb with []
- count(**kwargs)
Compute count of group
- abstract expanding(**kwargs)
- fill_backward(limit=0, fill_val=None, inplace=False)
Replace NaN and invalid array values by propagating the next encountered valid group value backward.
- Parameters:
limit (int, default 0) – The maximium number of consecutive NaN or invalid values to fill. If there is a gap with more than this number of consecutive NaN or invalid values, the gap will be only partially filled. If no
limit
is specified, all consecutive NaN and invalid values are replaced.fill_val (scalar, default None) – The value to use where there is no valid group value to propagate backward. If
fill_val
is not specified, NaN and invalid values aren’t replaced where there is no valid group value to propagate backward.**kwargs – Additional keyword arguments.
- Returns:
The returned
Dataset
contains the inputDataset
object’s numerical columns.- Return type:
Dataset
See also
GroupBy.fill_forward
Replace NaN and invalid array values with the last valid group value.
Categorical.fill_backward
Replace NaN and invalid array values with the next valid group value.
riptable.fill_backward
Replace NaN and invalid values with the next valid value.
Dataset.fillna
Replace NaN and invalid values with a specified value or nearby data.
FastArray.fillna
Replace NaN and invalid values with a specified value or nearby data.
FastArray.replacena
Replace NaN and invalid values with a specified value.
Examples
>>> ds = rt.Dataset({'Key_col' : ['A', 'B', 'A', 'B', 'A', 'B'], ... 'Vals' : [rt.nan, rt.nan, 2, 3, 4, 5]}) >>> ds.gb('Key_col').fill_backward() # Vals - ---- 0 2.00 1 3.00 2 2.00 3 3.00 4 4.00 5 5.00
Use a
fill_val
to replace values where there’s no valid group value to propagate backward:>>> ds.Vals = rt.FastArray([0, 1, 2, 3, rt.nan, rt.nan]) >>> ds.gb('Key_col').fill_backward(fill_val = 0) # Vals - ---- 0 0.00 1 1.00 2 2.00 3 3.00 4 0.00 5 0.00
Replace only the first NaN or invalid value in any consecutive series of NaN or invalid values in a group:
>>> ds.Vals = rt.FastArray([rt.nan, rt.nan, rt.nan, rt.nan, 4, 5]) >>> ds.gb('Key_col').fill_backward(limit = 1) # Vals - ---- 0 nan 1 nan 2 4.00 3 5.00 4 4.00 5 5.00
- fill_forward(limit=0, fill_val=None, inplace=False)
Replace NaN and invalid array values by propagating the last encountered valid group value forward.
- Parameters:
limit (int, default 0) – The maximium number of consecutive NaN or invalid values to fill. If there is a gap with more than this number of consecutive NaN or invalid values, the gap will be only partially filled. If no
limit
is specified, all consecutive NaN and invalid values are replaced.fill_val (scalar, default None) – The value to use where there is no valid group value to propagate forward. If
fill_val
is not specified, NaN and invalid values aren’t replaced where there is no valid group value to propagate forward.inplace (bool, default False) – If False, return a copy of the array. If True, modify original data. This will modify any other views on this object. This fails if the array is locked.
- Returns:
The returned
Dataset
contains the inputDataset
object’s numerical columns.- Return type:
Dataset
See also
GroupBy.fill_backward
Replace NaN and invalid array values with the next valid group value.
Categorical.fill_forward
Replace NaN and invalid array values with the last valid group value.
riptable.fill_forward
Replace NaN and invalid values with the last valid value.
Dataset.fillna
Replace NaN and invalid values with a specified value or nearby data.
FastArray.fillna
Replace NaN and invalid values with a specified value or nearby data.
FastArray.replacena
Replace NaN and invalid values with a specified value.
Examples
>>> ds = rt.Dataset({'Key_col' : ['A', 'B', 'A', 'B', 'A', 'B'], ... 'Vals' : [0, 1, 2, 3, rt.nan, rt.nan]}) >>> ds.gb('Key_col').fill_forward() # Vals - ---- 0 0.00 1 1.00 2 2.00 3 3.00 4 2.00 5 3.00
Use a
fill_val
to replace values where there’s no valid group value to propagate forward:>>> ds.Vals = rt.FastArray([rt.nan, rt.nan, 2, 3, 4, 5]) >>> ds.gb('Key_col').fill_forward(fill_val = 0) # Vals - ---- 0 0.00 1 0.00 2 2.00 3 3.00 4 4.00 5 5.00
Replace only the first NaN or invalid value in any consecutive series of NaN or invalid values in a group:
>>> ds.Vals = rt.FastArray([0, 1, rt.nan, rt.nan, rt.nan, rt.nan]) >>> ds.gb('Key_col').fill_forward(limit = 1) # Vals - ---- 0 0.00 1 1.00 2 0.00 3 1.00 4 nan 5 nan
- get_group(category, **kwargs)
The name of the group to get as a Dataset.
- Parameters:
category (string or tuple) – A value from the column used to construct the GroupBy, or if multiple columns were used, a tuple of the multiple columns.
- Return type:
Example
>>> ds.groupby('symbol').get_group('AAPL')
- nth(n=1)
Select the nth row from each group.
- Parameters:
n (int) – A single nth value for the row
Examples
>>> ds = rt.Dataset({'A': [1, 1, 2, 1, 2], ... 'B': [np.nan, 2, 3, 4, 5]}) >>> g = ds.groupby('A') >>> g.nth(0) *A B -- ---- 1 nan 2 3.00 [2 rows x 2 columns] total bytes: 32.0 B
>>> g.nth(1) *A B -- ---- 1 2.00 2 5.00 [2 rows x 2 columns] total bytes: 32.0 B
>>> g.nth(-1) *A B -- ---- 1 4.00 2 5.00 [2 rows x 2 columns] total bytes: 32.0 B
- pad(limit=0, fill_val=None, inplace=False)
Forward fill the values
- Parameters:
limit (integer, optional) – limit of how many values to fill
See also
fill_forward
,fill_backward
,fill_invalid
- abstract stack(**kwargs)
- abstract unstack(**kwargs)