Visualize Data

Riptable can work with any of the standard plotting tools, including Matplotlib, to create visualizations of your data. You can also take advantage of the plotting and HTML styling tools offered by Pandas.

In this section we’ll look at a couple of simple examples using Matplotlib, Pandas, and Playa.

import pandas as pd
import matplotlib.pyplot as plt

We’ll use this Dataset in examples:

>>> rng = np.random.default_rng(seed=42)
>>> N = 1_000
>>> symbols = ['IBM', 'AAPL', 'F', 'CSCO', 'SPY']
>>> start_time = rt.DateTimeNano('20191112 09:30:00', from_tz='NYC')
>>> ds_list = []
>>> for symbol in symbols:
...     temp_ds = rt.Dataset({'Symbol': rt.full(N, symbol),
...                      'Price': 100.0 + 10 * rng.standard_normal(1) + rng.standard_normal(N).cumsum(),
...                      'Size': rng.integers(1, 50, 1) + rng.integers(1, 50, N),
...                      'Time': start_time + rt.TimeSpan(rng.integers(1, 100, N).cumsum(), 's'),
...                     })
...     ds_list.append(temp_ds)
>>> ds = rt.hstack(ds_list)
>>> ds = ds.sort_inplace('Time')
>>> ds.sample()
#   Symbol    Price   Size                          Time
-   ------   ------   ----   ---------------------------
0   SPY      114.70     36   20191112 10:53:02.000000000
1   CSCO      88.83     74   20191112 16:24:38.000000000
2   SPY      140.37     66   20191112 16:42:23.000000000
3   SPY      147.62     53   20191112 17:40:02.000000000
4   AAPL     125.24     68   20191112 18:02:39.000000000
5   SPY      141.60     67   20191112 18:19:48.000000000
6   SPY      139.88     61   20191112 18:41:54.000000000
7   SPY      139.81     69   20191112 19:32:47.000000000
8   SPY      143.03     50   20191112 20:40:38.000000000
9   AAPL     130.09     87   20191112 22:22:09.000000000

Matplotlib Plotting

Example of a basic plot of IBM’s share price:

>>> f = ds.Symbol=='IBM'
>>> plt.figure(figsize=(10,8))
>>> plt.plot(ds.Time[f], ds.Price[f])
>>> plt.show()
../_images/output_11_0.png

And a histogram:

>>> plt.figure(figsize=(10,8))
>>> plt.hist(ds.Price[f])
>>> plt.show()
../_images/output_13_0.png

And a scatter plot:

>>> plt.figure(figsize=(10,8))
>>> for symbol in symbols:
...     f = ds.Symbol==symbol
...     plt.scatter(ds.Time[f], ds.Price[f], label=symbol)
>>> plt.grid()
>>> plt.legend()
>>> plt.title('Stock Price by Time')
>>> plt.show()
../_images/output_15_0.png

Pandas HTML Styling

If you want to use the Pandas Styler methods, call to_pandas() on your Dataset for the rendering:

>>> def color_smaller_red(val):
...     color = 'red' if type(val)==float and val < 100 else 'gray'
...     return 'color: %s' % color
>>> ds.to_pandas().head(10).style.applymap(color_smaller_red)
  Symbol Price Size Time
0 AAPL 103.281775 63 2019-11-12 09:30:30-05:00
1 SPY 110.168266 35 2019-11-12 09:30:43-05:00
2 SPY 109.627368 37 2019-11-12 09:30:46-05:00
3 F 84.582351 58 2019-11-12 09:30:58-05:00
4 IBM 102.007187 37 2019-11-12 09:31:18-05:00
5 CSCO 77.963601 73 2019-11-12 09:31:35-05:00
6 SPY 109.972200 46 2019-11-12 09:31:36-05:00
7 CSCO 76.155438 73 2019-11-12 09:31:40-05:00
8 F 84.816947 64 2019-11-12 09:31:55-05:00
9 F 86.517740 59 2019-11-12 09:31:56-05:00

Groupscatter Plots with Playa

Playa’s GroupScatter() method groups data into buckets based on x-values and returns a Matplotlib plot summarizing the data.

from playa.plot import GroupScatter

Make a noisier price signal

>>> ds.NoisyPrice = ds.Price + rng.normal(0, 10, ds.shape[0])

A regular Matplotlib scatter plot, for comparison

>>> num_rows = int(rt.ceil(len(symbols)/2))
>>> fig, axes = plt.subplots(num_rows, 2, figsize=(20, 5 * num_rows))
>>> for (ax, symbol) in zip(axes.flatten(), symbols):
...     f = ds.Symbol==symbol
...     ax.scatter(ds.Time[f], ds.NoisyPrice[f])
...     ax.grid()
...     ax.set_xlabel('Time')
...     ax.set_ylabel('Price')
...     ax.set_title(f'{symbol} Noisy Stock Price by Time')
>>> plt.show()
../_images/output_25_0.png

Now a GroupScatter for each one, you can see how it clarifies the point cloud and reveals the shape.

>>> fig, axes = plt.subplots(num_rows, 2, figsize=(20, 5 * num_rows))
>>> for (ax, symbol) in zip(axes.flatten(), symbols):
...     f = ds.Symbol==symbol
...     gs = GroupScatter(ds.Time[f].hour, ds.NoisyPrice[f])
...     gs.plot(title=f'{symbol} Noisy Stock Price Over Time', x_label='Hour of the Day', y_label='Price', ax=ax)
>>> plt.show()
../_images/output_27_0.png

This was just a brief introduction – check out the Matpotlib, Pandas, and Playa documentation for more details and possibilities.

Next we cover useful tools for working with NaNs and other missing values: Working with Missing Data.


Questions or comments about this guide? Email RiptableDocumentation@sig.com.