Riptable Categoricals – Accessing Parts of the Categorical
Use Categorical methods and properties to access the stored data.
Get the array of Categorical values with expand_array
.
Note that because the expansion constructs the complete list of values from the list of
unique categories, it is an expensive operation:
>>> c = rt.Categorical(["b", "a", "b", "c", "a", "c", "b"])
>>> c
Categorical([b, a, b, c, a, c, b]) Length: 7
FastArray([2, 1, 2, 3, 1, 3, 2], dtype=int8) Base Index: 1
FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3
>>> c.expand_array
FastArray([b'b', b'a', b'b', b'c', b'a', b'c', b'b'], dtype='|S8')
>>> c2 = rt.Categorical([10, 0, 0, 5, 5, 10, 0, 0, 5, 0])
>>> c2
Categorical([10, 0, 0, 5, 5, 10, 0, 0, 5, 0]) Length: 10
FastArray([3, 1, 1, 2, 2, 3, 1, 1, 2, 1], dtype=int8) Base Index: 1
FastArray([ 0, 5, 10]) Unique count: 3
>>> c2.expand_array
FastArray([10, 0, 0, 5, 5, 10, 0, 0, 5, 0])
Note that in this base-1 Categorical with an integer mapping array and unique categories provided, 0 is mapped to Filtered, 1 is mapped to “b”, and 2 is mapped to “a”; there is no 3 to be mapped to “c”, so it doesn’t appear in the expanded array.
>>> c3 = rt.Categorical([0, 1, 1, 2, 2, 0, 1, 1, 2, 1], categories=["b", "a", "c"])
>>> c3
Categorical([Filtered, b, b, a, a, Filtered, b, b, a, b]) Length: 10
FastArray([0, 1, 1, 2, 2, 0, 1, 1, 2, 1]) Base Index: 1
FastArray([b'b', b'a', b'c'], dtype='|S1') Unique count: 3
>>> c3.expand_array
FastArray([b'Filtered', b'b', b'b', b'a', b'a', b'Filtered', b'b', b'b', b'a', b'b'], dtype='|S8')
Get the integer mapping array with _fa
:
>>> c._fa
FastArray([2, 1, 2, 3, 1, 3, 2], dtype=int8)
>>> c2._fa
FastArray([3, 1, 1, 2, 2, 3, 1, 1, 2, 1], dtype=int8)
>>> c3._fa
FastArray([0, 1, 1, 2, 2, 0, 1, 1, 2, 1])
Get the array of unique categories with category_array
:
>>> c.category_array
FastArray([b'a', b'b', b'c'], dtype='|S1')
>>> c2.category_array
FastArray([ 0, 5, 10])
>>> c3.category_array
FastArray([b'b', b'a', b'c'], dtype='|S1')
Note that if you want to use _fa
to index into category_array
, you’ll need
to subtract 1:
>>> c.category_array[c._fa[0]-1]
b'b'
For multi-key Categoricals, use category_dict
to get a dictionary of the two category
arrays:
>>> strs = rt.FastArray(["a", "b", "b", "a", "b", "a"])
>>> ints = rt.FastArray([2, 1, 1, 2, 1, 3])
>>> c = rt.Categorical([strs, ints])
>>> c
Categorical([(a, 2), (b, 1), (b, 1), (a, 2), (b, 1), (a, 3)]) Length: 6
FastArray([1, 2, 2, 1, 2, 3], dtype=int8) Base Index: 1
{'key_0': FastArray([b'a', b'b', b'a'], dtype='|S1'), 'key_1': FastArray([2, 1, 3])} Unique count: 3
>>> c.category_dict
{'key_0': FastArray([b'a', b'b', b'a'], dtype='|S1'),
'key_1': FastArray([2, 1, 3])}
Use category_mapping
to get the mapping dictionary from a Categorical created with
an IntEnum
or mapping dictionary:
>>> d = {"StronglyAgree": 44, "Agree": 133, "Disagree": 75, "StronglyDisagree": 1, "NeitherAgreeNorDisagree": 144 }
>>> codes = [1, 44, 44, 133, 75]
>>> c = rt.Categorical(codes, categories=d)
Categorical([StronglyDisagree, StronglyAgree, StronglyAgree, Agree, Disagree]) Length: 5
FastArray([ 1, 44, 44, 133, 75]) Base Index: None
{44:'StronglyAgree', 133:'Agree', 75:'Disagree', 1:'StronglyDisagree', 144:'NeitherAgreeNorDisagree'} Unique count: 4
>>> c.category_mapping
{44: 'StronglyAgree',
133: 'Agree',
75: 'Disagree',
1: 'StronglyDisagree',
144: 'NeitherAgreeNorDisagree'}