A Useful Way to Instantiate a Categorical
It can sometimes be useful to instantiate a Categorical with only one category, then fill it in as needed.
For example, let’s say we have a Dataset with a column that has a lot of categories, and we want to create a new Categorical column that keeps two of those categories, properly aligned with the rest of the data in the Dataset, and lumps the other categories into a category called ‘Other.’
Our Dataset, with a column of many categories:
>>> rng = np.random.default_rng(seed=42)
>>> N = 50
>>> ds_buildcat = rt.Dataset({'big_cat': rng.choice(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], N)})
>>> ds_buildcat
# big_cat
--- -------
0 D
1 I
2 A
3 I
4 F
5 B
6 D
7 F
8 D
9 B
10 G
11 G
12 B
13 C
14 C
... ...
35 I
36 J
37 D
38 C
39 J
40 G
41 C
42 G
43 F
44 J
45 C
46 J
47 J
48 B
49 B
We create our ‘small’ Categorical instantiated with 3s, which fills the column with the ‘Other’ category:
>>> ds_buildcat.small_cat = rt.Cat(rt.full(ds_buildcat.shape[0], 3), categories=['B', 'D', 'Other'])
>>> ds_buildcat.small_cat
>>> ds_buildcat
# big_cat small_cat
--- ------- ---------
0 D Other
1 I Other
2 A Other
3 I Other
4 F Other
5 B Other
6 D Other
7 F Other
8 D Other
9 B Other
10 G Other
11 G Other
12 B Other
13 C Other
14 C Other
... ... ...
35 I Other
36 J Other
37 D Other
38 C Other
39 J Other
40 G Other
41 C Other
42 G Other
43 F Other
44 J Other
45 C Other
46 J Other
47 J Other
48 B Other
49 B Other
Now we can fill in the aligned ‘B’ and ‘D’ categories:
>>> ds_buildcat.small_cat[ds_buildcat.big_cat == 'B'] = 'B'
>>> ds_buildcat.small_cat[ds_buildcat.big_cat == 'D'] = 'D'
>>> ds_buildcat
# big_cat small_cat
--- ------- ---------
0 D D
1 I Other
2 A Other
3 I Other
4 F Other
5 B B
6 D D
7 F Other
8 D D
9 B B
10 G Other
11 G Other
12 B B
13 C Other
14 C Other
... ... ...
35 I Other
36 J Other
37 D D
38 C Other
39 J Other
40 G Other
41 C Other
42 G Other
43 F Other
44 J Other
45 C Other
46 J Other
47 J Other
48 B B
49 B B