Solutions to Riptable Exercises

This notebook contains the solutions to the Riptable Exercises.

Your solutions may be implemented slightly differently, but they should get the same essential results.

If you have any questions or comments, email RiptableDocumentation@sig.com.

[1]:
import riptable as rt
import numpy as np

Introduction to the Riptable Dataset

Datasets are the core class of riptable.

They are tables of data, consisting of a series of columns of the same length (sometimes referred to as fields).

Structurally, they behave like python dictionaries, and can be created directly from one.

We’ll familiarize ourselves with Datasets by manually constructing one by generating fake sample data using np.random.default_rng().choice(...) or similar.

In real life they will essentially always be generated from world data.

First, create a python dictionary with two fields of the same length (>1000); one column of stock prices and one of symbols.

Make sure the symbols have duplicates, for later aggregation exercises.

[2]:
rng = np.random.default_rng()
dset_length = 5_000
[3]:
my_dict = {'Price': rng.uniform(0, 1000, dset_length), 'Symbol': rng.choice(['GME', 'AMZN', 'TSLA', 'SPY'], dset_length)}

Create a riptable dataset from this, using rt.Dataset(my_dict).

[4]:
my_dset = rt.Dataset(my_dict)

You can easily append more columns to a dataset.

Add a new column of integer trade size, using my_dset.Size =.

[5]:
my_dset.Size = rng.integers(1, 1000, dset_length)

Columns can be referred with brackets around a string name as well. This is typically used when the column name comes from a variable.

Add a new column of booleans indicating whether you traded this trade, using my_dset['MyTrade'] =.

[6]:
my_dset['MyTrade'] = rng.choice([True, False], dset_length)

Add a new column of string “Buy” or “Sell” indicating the customer direction.

[7]:
my_dset.CustDirection = rng.choice(['Buy', 'Sell'], dset_length)

Riptable will convert these lists to the riptable FastArray container and cast the data to an appropriate numpy datatype.

View the datatypes with my_dset.dtypes.

[8]:
my_dset.dtypes
[8]:
{'Price': dtype('float64'),
 'Symbol': dtype('S4'),
 'Size': dtype('int64'),
 'MyTrade': dtype('bool'),
 'CustDirection': dtype('S4')}

View some sample rows of the dataset using .sample().

You should use this instead of .head() because the initial rows of a dataset are often unrepresentative.

[9]:
my_dset.sample()
[9]:
# PriceSymbolSizeMyTradeCustDirection
0294.84SPY18FalseBuy
180.15TSLA939TrueBuy
2189.83AMZN919TrueBuy
3795.83SPY324TrueSell
4111.57AMZN53TrueBuy
5695.09AMZN173TrueSell
6109.21AMZN810TrueBuy
783.23SPY674TrueSell
8388.80AMZN872FalseSell
9744.19SPY164FalseBuy
[10 rows x 5 columns] total bytes: 250.0 B

View distributional stats of the numerical fields of your dataset with .describe().

You can call this on a single column as well.

[10]:
my_dset.describe()
[10]:
*Stats Price SizeMyTrade
Count5000.005000.005000.00
Valid5000.005000.005000.00
Nans0.000.000.00
Mean495.85496.970.50
Std292.47288.050.50
Min0.241.000.00
P1092.0797.000.00
P25240.33247.000.00
P50493.73499.000.00
P75750.31747.001.00
P90902.58895.001.00
Max999.94999.001.00
MeanM494.98497.040.50
[13 rows x 4 columns] total bytes: 377.0 B

Manipulating data

You can perform simple operation on riptable columns with normal python syntax. Riptable will do them to the whole column at once, efficiently.

Create a new column by performing scalar arithmetic on one of your numeric columns.

[11]:
my_dset.SharesOfStock = 100 * my_dset.Size
[12]:
my_dset.sample()
[12]:
# PriceSymbolSizeMyTradeCustDirectionSharesOfStock
0594.55GME428FalseSell42800
1575.25SPY218FalseBuy21800
288.07TSLA655FalseSell65500
3616.62SPY941TrueBuy94100
4589.26TSLA235FalseBuy23500
5560.63SPY99TrueSell9900
6461.22SPY982TrueBuy98200
7636.12AMZN96FalseSell9600
8632.51GME253TrueSell25300
9293.35AMZN104FalseBuy10400
[10 rows x 6 columns] total bytes: 330.0 B

As long as the columns are the same size (as is guaranteed if they’re in the same dataset) you can perform combining operations the same way.

Create a new column of total price paid for the trade by multiplying two existing columns together.

Riptable will automatically upcast types as necessary to preserve information.

[13]:
my_dset.TotalCash = my_dset.Price * my_dset.Size
[14]:
my_dset.sample()
[14]:
# PriceSymbolSizeMyTradeCustDirectionSharesOfStockTotalCash
0263.41AMZN533FalseBuy53300140395.27
16.94SPY111TrueBuy11100770.02
2238.68SPY409FalseSell4090097620.54
3420.92SPY463FalseBuy46300194887.29
4435.00TSLA676FalseBuy67600294057.25
5947.84TSLA334FalseBuy33400316579.05
6439.43AMZN364FalseBuy36400159952.95
744.15AMZN222FalseBuy222009800.34
8972.57GME243FalseSell24300236334.91
9873.75TSLA599FalseSell59900523377.81
[10 rows x 7 columns] total bytes: 410.0 B

There are many built-in functions as well, which you call with either my_dset.field.function() or rt.function(my_dset.field) syntax.

Find the unique Symbols in your dataset.

[15]:
my_dset.Symbol.unique()
[15]:
FastArray([b'AMZN', b'GME', b'SPY', b'TSLA'], dtype='|S4')

Date/Time

Riptable has three main date/time types: Date, DateTimeNano, and TimeSpan.

Give each row of your dataset an rt.Date.

Make sure they’re not all different, but still include days from multiple months.

Note that due to Riptable idiosyncracies you need to generate a list of yyyymmdd strings and pass into the rt.Date(...) constructor, not construct Dates individually.

[16]:
my_dset.Date = rt.Date(rng.choice(rt.Date.range('20220201', '20220430'), dset_length))
[17]:
my_dset.sample()
[17]:
# PriceSymbolSizeMyTradeCustDirectionSharesOfStockTotalCash Date
028.77SPY452FalseSell4520013004.172022-02-18
1705.58SPY892FalseSell89200629378.882022-02-17
2225.44SPY221TrueBuy2210049821.422022-04-18
3647.78TSLA488TrueSell48800316118.912022-03-10
4926.05GME426FalseSell42600394497.532022-04-14
5226.91SPY705TrueSell70500159972.932022-04-27
6816.29SPY400TrueSell40000326516.952022-03-12
7177.41GME257TrueBuy2570045594.882022-03-10
8414.71AMZN232FalseSell2320096213.222022-04-19
9471.17AMZN576FalseBuy57600271393.672022-04-25
[10 rows x 8 columns] total bytes: 450.0 B

Give each row a unique(ish) TimeSpan as a trade time.

You can instantiate them using rt.TimeSpan(hours_var, unit='h').

[18]:
my_dset.TradeTime = rt.TimeSpan(rng.uniform(9.5, 16, dset_length), unit='h')
[19]:
my_dset.sample()
[19]:
# PriceSymbolSizeMyTrade...SharesOfStockTotalCash Date TradeTime
0189.25SPY779False...77900147423.202022-02-1909:47:31.067694548
1405.30TSLA575True...57500233044.972022-02-2813:47:22.412012377
2306.29SPY12True...12003675.432022-03-2612:56:47.572327046
3740.90SPY872False...87200646063.082022-02-2011:16:31.322328844
4148.72AMZN469True...4690069751.192022-02-2213:45:17.960767118
5687.63AMZN900False...90000618863.962022-02-1613:03:38.246123166
6395.53AMZN171False...1710067635.542022-04-2315:47:05.167722885
7620.26TSLA777False...77700481941.712022-04-1411:40:32.776661126
8266.59GME765True...76500203941.392022-04-1214:37:57.503811705
9286.83AMZN613False...61300175824.152022-03-0314:45:09.378948295
[10 rows x 9 columns] total bytes: 530.0 B

Create a DateTimeNano of the combined TradeTime + Date by simple addition. Riptable knows how to sum the types.

Be careful here, by default you’ll get a GMT timezone, you can force NYC with rt.DateTimeNano(..., from_tz='NYC').

[20]:
my_dset.TradeDateTime = rt.DateTimeNano(my_dset.Date + my_dset.TradeTime, from_tz='NYC')
[21]:
my_dset.sample()
[21]:
# PriceSymbolSizeMyTrade...TotalCash Date TradeTime TradeDateTime
0455.87TSLA397True...180979.302022-03-1913:27:13.90356832620220319 13:27:13.903568326
173.24TSLA887False...64965.262022-03-0611:30:56.00243288920220306 11:30:56.002432889
2263.36SPY807True...212535.232022-02-1913:34:55.26403970220220219 13:34:55.264039702
388.49GME382True...33802.092022-03-0114:31:17.04888324220220301 14:31:17.048883242
4889.71AMZN554False...492899.072022-03-0811:12:06.55822957020220308 11:12:06.558229570
5466.92TSLA688True...321237.962022-04-2609:59:53.55818762620220426 09:59:53.558187626
6165.87AMZN272False...45116.562022-03-1013:41:53.83076708720220310 13:41:53.830767087
7740.90SPY872False...646063.082022-02-2011:16:31.32232884420220220 11:16:31.322328844
8146.13TSLA896True...130931.032022-02-2412:18:33.66087030820220224 12:18:33.660870308
9240.26AMZN587True...141032.792022-03-0210:43:44.85645911020220302 10:43:44.856459110
[10 rows x 10 columns] total bytes: 610.0 B

To reverse this operation and get out separate dates and times from a DateTimeNano, you can call rt.Date(my_DateTimeNano) and my_DateTimeNano.time_since_midnight().

Create a new month name column by using the .strftime function.

[22]:
my_dset.month_name = my_dset.Date.strftime('%b%y')
[23]:
my_dset.sample()
[23]:
# PriceSymbolSizeMyTrade... TradeTime TradeDateTimemonth_name
0731.57TSLA687True...13:18:00.78679203820220426 13:18:00.786792038Apr22
1491.23TSLA472True...12:13:02.00106123620220313 12:13:02.001061236Mar22
2321.02SPY739True...10:05:47.46819983520220428 10:05:47.468199835Apr22
3728.75TSLA191True...13:33:27.07260677120220204 13:33:27.072606771Feb22
4414.23AMZN955False...12:21:10.04261728320220209 12:21:10.042617283Feb22
5399.06TSLA608True...10:22:12.20100595020220430 10:22:12.201005950Apr22
6918.26AMZN93False...12:32:03.01594882420220227 12:32:03.015948824Feb22
7787.75GME408True...15:00:34.27416374520220420 15:00:34.274163745Apr22
8478.57TSLA595False...13:17:26.62425558520220211 13:17:26.624255585Feb22
9488.22SPY807False...13:45:28.88891150720220228 13:45:28.888911507Feb22
[10 rows x 11 columns] total bytes: 660.0 B

Create another new month column by using the .start_of_month attribute.

This is nice for grouping because it will automatically sort correctly.

[24]:
my_dset.month = my_dset.Date.start_of_month
[25]:
my_dset.sample()
[25]:
# PriceSymbolSizeMyTrade... TradeDateTimemonth_name month
0272.34SPY593False...20220421 15:56:06.447740285Apr222022-04-01
1758.55SPY210True...20220425 10:01:29.981701403Apr222022-04-01
2654.72SPY613True...20220330 09:38:12.459630380Mar222022-03-01
3417.83TSLA737True...20220426 15:29:01.508398022Apr222022-04-01
4280.31AMZN958True...20220327 11:43:36.813136697Mar222022-03-01
5139.19AMZN52False...20220418 13:50:10.245021957Apr222022-04-01
6915.93TSLA18True...20220420 13:55:37.175809652Apr222022-04-01
7877.00TSLA928False...20220223 11:51:05.923929754Feb222022-02-01
861.40AMZN472True...20220311 09:40:49.323285896Mar222022-03-01
9807.30SPY863True...20220325 11:48:40.251197776Mar222022-03-01
[10 rows x 12 columns] total bytes: 700.0 B

Sorting

Riptable has two sorts, sort_copy (which preserves the original dataset) and sort_inplace, which is faster and more memory-efficient if you don’t need the original data order.

Sort your dataset by TradeDateTime.

This is the natural ordering of a list of trades, so do it in-place.

[26]:
my_dset = my_dset.sort_inplace('TradeDateTime')
[27]:
my_dset.sample()
[27]:
# PriceSymbolSizeMyTrade... TradeDateTimemonth_name month
0379.21AMZN385False...20220203 10:58:19.673425054Feb222022-02-01
1617.50SPY413True...20220214 14:02:42.426523660Feb222022-02-01
2555.63SPY617False...20220308 13:25:17.282540250Mar222022-03-01
3718.59AMZN34True...20220324 11:57:46.925665938Mar222022-03-01
4838.81GME958False...20220330 10:09:37.013952547Mar222022-03-01
5498.17AMZN135False...20220403 12:36:29.924505469Apr222022-04-01
6123.59AMZN927True...20220414 12:51:26.312197854Apr222022-04-01
7339.86AMZN805True...20220417 13:05:59.168005100Apr222022-04-01
8158.91SPY211False...20220428 09:30:22.411907771Apr222022-04-01
9936.81TSLA81False...20220430 15:06:53.166066606Apr222022-04-01
[10 rows x 12 columns] total bytes: 700.0 B

Filtering

Filtering is the principal way to work with a subset of your data in riptable. It is commonly used for looking at a restricted set of trades matching some criterion you care about.

Except in rare instances, though, you should maintain your dataset in its full size, and only apply a filter when performing a final computation.

This will avoid unnecessary data duplication and improve speed & memory usage.

Construct a filter of only your sales. (A filter is a column of Booleans which is true only for the rows you’re interested in.)

You can combine filters using & or |. Be careful to always wrap expressions in parentheses to avoid an extremely slow call into native python followed by a crash.

Always (my_dset.field1 > 10) & (my_dset.field2 < 5), never my_dset.field1 > 10 & my_dset.field2 > 5.

[28]:
f_my_sales = my_dset.MyTrade & (my_dset.CustDirection == 'Buy')

Compute the total Trade Size, filtered for only your sales.

For this and many other instances, you can & should pass your filter into the filter kwarg of the .nansum(...) call.

This allows riptable to perform the filtering during the nansum computation, rather than instantiating a new column and then summing it.

[29]:
my_dset.Size.nansum(filter=f_my_sales)
[29]:
621241

Count how many times you sold each symbol.

Here the .count() function doesn’t accept a filter kwarg, so you must fall back to explicitly filtering the Symbol field before counting.

Be careful that you only filter down the Symbol field, not the entire dataset, otherwise you are wasting a lot of compute.

[30]:
my_dset.Symbol[f_my_sales].count()
[30]:
*UniqueCount
AMZN301
GME306
SPY282
TSLA340
[4 rows x 2 columns] total bytes: 32.0 B

Categoricals

So far, we’ve been operating on your symbol column as a column of strings.

However, it’s far more efficient when you have a large column with many repeats to use a categorical, which assigns each unique value a number, and stores the labels & numbers separately.

This is memory-efficient, and also computationally efficient, as riptable can peform operations on the unique values, then expand out to the full vector appropriately.

Make a new column of your string column converted to a categorical, using rt.Cat(column).

[31]:
my_dset.Symbol_cat = rt.Cat(my_dset.Symbol)
my_dset.Symbol_cat
[31]:
Categorical([AMZN, SPY, SPY, SPY, SPY, ..., TSLA, GME, SPY, AMZN, SPY]) Length: 5000
  FastArray([1, 3, 3, 3, 3, ..., 4, 2, 3, 1, 3], dtype=int8) Base Index: 1
  FastArray([b'AMZN', b'GME', b'SPY', b'TSLA'], dtype='|S4') Unique count: 4

Perform the same filtered count from above, on the categorical.

The categorical .count() admits a filter kwarg, which makes it simpler.

[32]:
my_dset.Symbol_cat.count(filter=f_my_sales)
[32]:
*Symbol_catCount
AMZN301
GME306
SPY282
TSLA340
[4 rows x 2 columns] total bytes: 32.0 B

Categoricals can be used as groupings. When you call a numeric function on a categorical and pass numeric columns in, riptable knows to do the calculation per-group.

Compute the total amount of contracts sold by customers in each symbol.

[33]:
my_dset.Symbol_cat.sum(my_dset.Size, filter=my_dset.CustDirection == 'Sell')
[33]:
*Symbol_cat Size
AMZN303513
GME290964
SPY337699
TSLA304961
[4 rows x 2 columns] total bytes: 48.0 B

The transform=True kwarg in a categorical operation performs the aggregation, then transforms it back up to the original shape of the categorical, giving each row the appropriate value from its group.

Make a new column which is the average trade price, per symbol.

[34]:
my_dset.average_trade_price = my_dset.Symbol_cat.mean(my_dset.Price, transform=True)

Inspect with .sample() to confirm that this value is consistent for rows with matching symbol.

[35]:
my_dset.sample()
[35]:
# PriceSymbolSizeMyTrade...month_name monthSymbol_cataverage_trade_price
0612.27AMZN356False...Feb222022-02-01AMZN497.66
15.42AMZN610False...Feb222022-02-01AMZN497.66
2877.96AMZN58False...Feb222022-02-01AMZN497.66
3340.75AMZN802True...Mar222022-03-01AMZN497.66
4564.53GME486True...Apr222022-04-01GME495.29
546.86TSLA414True...Apr222022-04-01TSLA499.91
6850.28SPY723True...Apr222022-04-01SPY490.44
7895.93SPY967False...Apr222022-04-01SPY490.44
8267.21AMZN98True...Apr222022-04-01AMZN497.66
9279.97GME887True...Apr222022-04-01GME495.29
[10 rows x 14 columns] total bytes: 806.0 B

If you need to perform a custom operation on each categorical, you can pass in a function with .apply_reduce (which aggregates) or .apply_nonreduce (which is like transform=True).

Note that the custom function you pass needs to expect a FastArray, and output a scalar (apply_reduce) or same-length FastArray (apply_nonreduce).

Find, for each symbol, the trade size of the second trade occuring in the dataset.

[36]:
my_dset.Symbol_cat.apply_reduce(lambda x: x[1], my_dset.Size)
[36]:
*Symbol_catSize
AMZN700
GME42
SPY492
TSLA536
[4 rows x 2 columns] total bytes: 48.0 B

Sometimes you want to aggregate based on multiple values. In these cases we use multi-key categoricals.

Use a multi-key categorical to compute the average size per symbol-month pair.

[37]:
my_dset.Symbol_month_cat = rt.Cat([my_dset.Symbol, my_dset.month])
[38]:
my_dset.Symbol_month_cat.nanmean(my_dset.Size).sort_inplace('Symbol')
[38]:
*Symbol *month Size
AMZN2022-02-01473.99
.2022-03-01495.32
.2022-04-01493.56
GME2022-02-01508.95
.2022-03-01506.99
.2022-04-01479.18
SPY2022-02-01509.76
.2022-03-01529.28
.2022-04-01479.58
TSLA2022-02-01501.33
.2022-03-01469.43
.2022-04-01517.81
[12 rows x 3 columns] total bytes: 192.0 B

Accumulating

Aggregating over two values for human viewing is often most conveniently done with an accum.

Use Accum2 to compute the average size per symbol-month pair.

[39]:
rt.Accum2(my_dset.Symbol, my_dset.month).nanmean(my_dset.Size)
[39]:
*Symbol2022-02-012022-03-012022-04-01Nanmean
AMZN473.99495.32493.56487.57
GME508.95506.99479.18498.34
SPY509.76529.28479.58506.63
TSLA501.33469.43517.81495.67
Nanmean 497.85 499.98 492.81 496.97
[4 rows x 5 columns] total bytes: 144.0 B

Average numbers can be meaningless. It is often better to consider relative percentage instead.

Use accum_ratiop to compute the fraction of total volume done by each symbol-month pair.

[40]:
rt.accum_ratiop(my_dset.Symbol, my_dset.month, my_dset.Size, norm_by='R')
[40]:
*Symbol 2022-02-012022-03-012022-04-01TotalRatio Total
AMZN32.9336.7030.37100.00628959
GME31.9636.0232.02100.00594021
SPY32.9336.1030.98100.00636328
TSLA31.9033.1734.93100.00625533
TotalRatio 32.44 35.49 32.07 100.00
Total 806012 881963 796866 2484841
[4 rows x 6 columns] total bytes: 176.0 B

Merging

There are two main types of merges.

First is merge_lookup. This is used for enriching one (typically large) dataset with information from another (typically small) dataset.

Create a new dataset with one row per symbol from your dataset, and a second column of who trades each symbol.

[41]:
symbol_trader = rt.Dataset({'UnderlyingSymbol': ['GME', 'TSLA', 'SPY', 'AMZN'],
                           'Trader': ['Nate', 'Elon', 'Josh', 'Dan']})
[42]:
symbol_trader
[42]:
#UnderlyingSymbolTrader
0GMENate
1TSLAElon
2SPYJosh
3AMZNDan
[4 rows x 2 columns] total bytes: 32.0 B

Enrich the main dataset by putting the correct trader into each row.

[43]:
my_dset.Trader = my_dset.merge_lookup(symbol_trader, on=('Symbol', 'UnderlyingSymbol'), columns_left=[])['Trader']
[44]:
my_dset.sample()
[44]:
# PriceSymbolSizeMyTrade...Symbol_cataverage_trade_priceSymbol_month_cat Trader
0702.44SPY739True...SPY490.44(SPY, 2022-02-01)Josh
1926.83TSLA591True...TSLA499.91(TSLA, 2022-02-01)Elon
2450.66SPY459False...SPY490.44(SPY, 2022-02-01)Josh
3664.47SPY846True...SPY490.44(SPY, 2022-03-01)Josh
4464.46AMZN379True...AMZN497.66(AMZN, 2022-03-01)Dan
5508.80GME907False...GME495.29(GME, 2022-03-01)Nate
6145.61TSLA289True...TSLA499.91(TSLA, 2022-04-01)Elon
7729.66GME148False...GME495.29(GME, 2022-04-01)Nate
820.50TSLA769True...TSLA499.91(TSLA, 2022-04-01)Elon
9768.59GME957True...GME495.29(GME, 2022-04-01)Nate
[10 rows x 16 columns] total bytes: 952.0 B

The second type of merge is merge_asof, which is used for fuzzy alignment between two datasets, typically by time (though often by other variables).

Create a new index price dataset with one price per minute, which covers all the Dates in your dataset.

The index price doesn’t need to be reasonable.

Each row should have a DateTimeNano as the datetime.

[45]:
num_minutes = int((my_dset.TradeDateTime.max() - my_dset.TradeDateTime.min()).minutes[0])
start_datetime = rt.Date(my_dset.TradeDateTime.min())
[46]:
index_price = rt.Dataset({'DateTime': start_datetime + rt.TimeSpan(range(num_minutes), unit='m'),
                          'IndexPrice': rng.uniform(3500, 4500, num_minutes)})
[47]:
index_price.sample()
[47]:
# DateTimeIndexPrice
020220217 07:25:00.0000000003742.56
120220218 12:24:00.0000000004439.16
220220225 16:41:00.0000000003833.25
320220303 13:44:00.0000000004341.40
420220326 08:00:00.0000000004356.62
520220402 02:58:00.0000000003796.68
620220403 15:55:00.0000000003645.95
720220416 10:01:00.0000000004469.10
820220423 03:30:00.0000000004284.35
920220427 08:09:00.0000000004347.81
[10 rows x 2 columns] total bytes: 160.0 B

Use merge_asof to get the most recent Index Price associated with each trade in your main dataset.

Note both datasets need to be sorted for merge_asof.

The on kwarg is the numeric/time field that looks for close matches.

The by kwarg is not necessary here, but could constrain the match to a subset if, for example, you had multiple indices and a column of which one each row is associated with.

Use direction='backward' to ensure you’re not biasing your data by looking into the future!

[48]:
my_dset.IndexPrice = my_dset.merge_asof(index_price, on=('TradeDateTime', 'DateTime'), direction='backward', columns_left=[])['IndexPrice']

Saving/Loading

The native riptable filetype is .sds. It’s the fastest way to save & load your data.

Save out your dataset to file using rt.save_sds.

[49]:
rt.save_sds('my_dset.sds', my_dset)

Delete your dataset to free up memory using the native python del my_dset.

Note that if there are references to the dataset in other objects you may not actually free up memory.

[50]:
del my_dset

Reload your saved dataset from disk with rt.load_sds.

[51]:
my_dset = rt.load_sds('my_dset.sds')
[52]:
my_dset.sample()
[52]:
# PriceSymbolSizeMyTrade...average_trade_priceSymbol_month_cat TraderIndexPrice
0569.10TSLA719True...499.91(TSLA, 2022-02-01)Elon3945.79
1915.31GME4False...495.29(GME, 2022-02-01)Nate3756.71
2173.42AMZN166False...497.66(AMZN, 2022-03-01)Dan3972.01
3410.27GME722True...495.29(GME, 2022-03-01)Nate4458.17
4606.42SPY995True...490.44(SPY, 2022-03-01)Josh3954.01
5910.27AMZN219True...497.66(AMZN, 2022-03-01)Dan4108.60
6609.56TSLA459False...499.91(TSLA, 2022-04-01)Elon4384.97
7466.54TSLA400False...499.91(TSLA, 2022-04-01)Elon4221.02
8225.37AMZN150False...497.66(AMZN, 2022-04-01)Dan3688.96
9912.44AMZN615True...497.66(AMZN, 2022-04-01)Dan3527.44
[10 rows x 17 columns] total bytes: 1.0 KB

To load from h5 files (a common file type at SIG), use rt.load_h5(file).

To load from csv files, use the slow but robust pandas loader, with rt.Dataset.from_pandas(pd.read_csv(file)).