Riptable Categoricals – Accessing Parts of the Categorical

Use Categorical methods and properties to access the stored data.

Get the array of Categorical values with expand_array. Note that because the expansion constructs the complete list of values from the list of unique categories, it is an expensive operation:

>>> c = rt.Categorical(["b", "a", "b", "c", "a", "c", "b"])
>>> c
Categorical([b, a, b, c, a, c, b]) Length: 7
  FastArray([2, 1, 2, 3, 1, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c.expand_array
FastArray([b'b', b'a', b'b', b'c', b'a', b'c', b'b'], dtype='|S8')


>>> c2 = rt.Categorical([10, 0, 0, 5, 5, 10, 0, 0, 5, 0])
>>> c2
Categorical([10, 0, 0, 5, 5, 10, 0, 0, 5, 0]) Length: 10
  FastArray([3, 1, 1, 2, 2, 3, 1, 1, 2, 1], dtype=int8) Base Index: 1
  FastArray([ 0,  5, 10]) Unique count: 3

>>> c2.expand_array
FastArray([10,  0,  0,  5,  5, 10,  0,  0,  5,  0])

Note that in this base-1 Categorical with an integer mapping array and unique categories provided, 0 is mapped to Filtered, 1 is mapped to “b”, and 2 is mapped to “a”; there is no 3 to be mapped to “c”, so it doesn’t appear in the expanded array.

>>> c3 = rt.Categorical([0, 1, 1, 2, 2, 0, 1, 1, 2, 1], categories=["b", "a", "c"])
>>> c3
Categorical([Filtered, b, b, a, a, Filtered, b, b, a, b]) Length: 10
  FastArray([0, 1, 1, 2, 2, 0, 1, 1, 2, 1]) Base Index: 1
  FastArray([b'b', b'a', b'c'], dtype='|S1') Unique count: 3

>>> c3.expand_array
FastArray([b'Filtered', b'b', b'b', b'a', b'a', b'Filtered', b'b', b'b', b'a', b'b'], dtype='|S8')

Get the integer mapping array with _fa:

>>> c._fa
FastArray([2, 1, 2, 3, 1, 3, 2], dtype=int8)

>>> c2._fa
FastArray([3, 1, 1, 2, 2, 3, 1, 1, 2, 1], dtype=int8)

>>> c3._fa
FastArray([0, 1, 1, 2, 2, 0, 1, 1, 2, 1])

Get the array of unique categories with category_array:

>>> c.category_array
FastArray([b'a', b'b', b'c'], dtype='|S1')

>>> c2.category_array
FastArray([ 0,  5, 10])

>>> c3.category_array
FastArray([b'b', b'a', b'c'], dtype='|S1')

Note that if you want to use _fa to index into category_array, you’ll need to subtract 1:

>>> c.category_array[c._fa[0]-1]
b'b'

For multi-key Categoricals, use category_dict to get a dictionary of the two category arrays:

>>> strs = rt.FastArray(["a", "b", "b", "a", "b", "a"])
>>> ints = rt.FastArray([2, 1, 1, 2, 1, 3])
>>> c = rt.Categorical([strs, ints])
>>> c
Categorical([(a, 2), (b, 1), (b, 1), (a, 2), (b, 1), (a, 3)]) Length: 6
  FastArray([1, 2, 2, 1, 2, 3], dtype=int8) Base Index: 1
  {'key_0': FastArray([b'a', b'b', b'a'], dtype='|S1'), 'key_1': FastArray([2, 1, 3])} Unique count: 3

>>> c.category_dict
{'key_0': FastArray([b'a', b'b', b'a'], dtype='|S1'),
'key_1': FastArray([2, 1, 3])}

Use category_mapping to get the mapping dictionary from a Categorical created with an IntEnum or mapping dictionary:

>>> d = {"StronglyAgree": 44, "Agree": 133, "Disagree": 75, "StronglyDisagree": 1, "NeitherAgreeNorDisagree": 144 }
>>> codes = [1, 44, 44, 133, 75]
>>> c = rt.Categorical(codes, categories=d)
Categorical([StronglyDisagree, StronglyAgree, StronglyAgree, Agree, Disagree]) Length: 5
  FastArray([  1,  44,  44, 133,  75]) Base Index: None
  {44:'StronglyAgree', 133:'Agree', 75:'Disagree', 1:'StronglyDisagree', 144:'NeitherAgreeNorDisagree'} Unique count: 4
>>> c.category_mapping
{44: 'StronglyAgree',
 133: 'Agree',
 75: 'Disagree',
 1: 'StronglyDisagree',
 144: 'NeitherAgreeNorDisagree'}