Riptable Categoricals – Indexing

Bracket indexing traverses the FastArray of indices/codes and returns the corresponding category.

When a Categorical is indexed with a single integer, the corresponding category is returned as a unicode string.

When multiple integers or a boolean array are used, a copy of the Categorical is returned that has the same categories as the original Categorical but with an index/code array limited to the selected elements. If you modify the returned subset, it won’t affect the original Categorical.

When a slice is used, the returned Categorical is a view, not a copy. If you modify the view, the original Categorical is also modified.

To set a value to a new value, the new value must be already represented in the existing categories array.

The following examples use this Categorical:

>>> c = rt.Categorical(["a", "a", "b", "a", "c", "c", "b"])
>>> c
Categorical([a, a, b, a, c, c, b]) Length: 7
  FastArray([1, 1, 2, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Single integer

Use bracket indexing to get a single value:

>>> c[0]
'a'

>>> c[1]
'a'

>>> c[2]
'b'

You can also index from the end of the array with negative indices:

>>> c[-1]
'b'

>>> c[-2]
'c'

Set a value:

>>> c[0] = "c"
>>> c
Categorical([c, a, b, a, c, c, b]) Length: 7
  FastArray([3, 1, 2, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

The value must be already represented in the existing categories array (adding categories using auto_add_categories isn’t working correctly at the time of this writing):

>>> try:
...     c[0] = "d"
... except ValueError as e:
...     print("ValueError:", e)
ValueError: Cannot automatically add categories [b'd'] while auto_add_categories is
set to False. Set flag to True in Categorical init.

Multiple integers

>>> c
Categorical([c, a, b, a, c, c, b]) Length: 7
  FastArray([3, 1, 2, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Pass a list of indices (a fancy index, which also specifies ordering). The returned Categorical is a copy of the original Categorical:

>>> c[[0, 2]]
Categorical([c, b]) Length: 2
  FastArray([3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c[[2, 0]]
Categorical([b, c]) Length: 2
  FastArray([2, 3], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c[[-1, 1]]
Categorical([b, a]) Length: 2
  FastArray([2, 1], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Or pass an array:

>>> c[rt.arange(1, 3)]  # Indices 1 and 2.
Categorical([a, b]) Length: 2
  FastArray([1, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Set values:

>>> c[[0, 2]] = "a"
>>> c
Categorical([a, a, a, a, c, c, b]) Length: 7
  FastArray([1, 1, 1, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c[rt.arange(1, 3)] = "b"
>>> c
Categorical([a, b, b, a, c, c, b]) Length: 7
  FastArray([1, 2, 2, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Boolean mask array

>>> c
Categorical([a, b, b, a, c, c, b]) Length: 7
  FastArray([1, 2, 2, 1, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

The returned Categorical is a copy of the original Categorical:

>>> mask = rt.FA([False, True, True, True, True, True, False])
>>> c[mask]
Categorical([a, b, a, c, c]) Length: 5
  FastArray([1, 2, 1, 3, 3], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Set values:

>>> c[mask] = "c"
>>> c
Categorical([a, c, c, c, c, c, b]) Length: 7
  FastArray([1, 3, 3, 3, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Slice

>>> c
Categorical([a, c, c, c, c, c, b]) Length: 7
  FastArray([1, 3, 3, 3, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

The returned Categorical is a view of the original Categorical. Any changes to the view also modify the original (see below):

>>> c[:3]  # Indices 0-2.
Categorical([a, c, c]) Length: 3
  FastArray([1, 3, 3], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c[1:6]  # Indices 1-5.
Categorical([c, c, c, c, c]) Length: 5
  FastArray([3, 3, 3, 3, 3], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Set values:

>>> c[1:6] = "a"
Categorical([a, a, a, a, a, a, b]) Length: 7
  FastArray([1, 1, 1, 1, 1, 1, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

Slicing returns a view, not a copy. So if you set values in the returned subset, the original Categorical is modified:

>>> c2 = c[1:6]
>>> c2
Categorical([a, a, a, a, a]) Length: 5
  FastArray([1, 1, 1, 1, 1], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c2[1:5] = "c"  # Modify the returned view.
>>> c2
Categorical([a, c, c, c, c]) Length: 5
  FastArray([1, 3, 3, 3, 3], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3

>>> c  # The original is also modified.
Categorical([a, a, c, c, c, c, b]) Length: 7
  FastArray([1, 1, 3, 3, 3, 3, 2], dtype=int8) Base Index: 1
  FastArray([b'a', b'b', b'c'], dtype='|S1') Unique count: 3