Introduction

What Is Riptable?

Riptable is an open source library built for high-performance data analysis. It’s similar to Pandas by design, but it’s been optimized to meet the needs of Riptable’s core users: quantitative analysts interacting live with large volumes of trading data.

Riptable is based on NumPy, so it shares many core NumPy methods for array-based operations. Riptable has also implemented its own Pandas-like functions for grouping and aggregation. For users who work with large datasets, Riptable improves on NumPy and Pandas by using multi-threading and efficient memory management, much of it implemented at the C++ level.

Riptable’s APIs are designed to be more feature-rich and easier to work with than those provided by Pandas and other existing libraries.

NumPy and Pandas users will find it easy to convert their data to Riptable (and back again if need be). It’s also possible to convert data from CSV or SQL files. Similarly, h5 files can be converted to Riptable’s format. Matlab users, who will generally find similar syntax and functionality in Riptable, can use special keyword arguments to convert Matlab data to Riptable’s format. See Work with Riptable Files and Other File Formats for details.

For data visualization, any of the standard plotting tools (for example, matplotlib.pyplot) will work out of the box. To see a few basic examples, check out the Visualize Data section.

Who This Tutorial Is For

If you’re new to Riptable, this tutorial is for you. It’s intended to help get you familiar with Riptable’s basic functionality and syntax.

Some experience with Python will be helpful, especially familiarity with dictionary syntax, sequences (lists, tuples, etc.), and basic functions and arguments.

A Note to Pandas Users

If you’ve used Pandas, you’ll notice many similarities in Riptable – though be aware that Riptable has some not-always-immediately-obvious differences. This tutorial doesn’t call out those differences specifically; see the API Reference for details of differences in specific methods, functions, attributes, etc.

Install and Import Riptable

To install Riptable on Windows or Linux, create a Conda environment and type:

conda install riptable

To access Riptable and its functions in your Python code, add these lines to your code:

import riptable as rt
import numpy as np

Display Options

You can modify Riptable’s default display options using the attributes offered in rt.Display.options. Here are a few you might find useful.

General Display Options

Some general options you can set for a session:

# Display all Dataset columns -- the default max is 9.
rt.Display.options.COL_ALL = True

# Render up to 100MM before showing in scientific notation.
rt.Display.options.E_MAX = 100_000_000

# Truncate small decimals, rather than showing infinitesimal scientific notation.
rt.Display.options.P_THRESHOLD = 0

# Put commas in numbers.
rt.Display.options.NUMBER_SEPARATOR = True

# Turn on Riptable autocomplete (start typing, then press Tab to see options).
rt.autocomplete()

Contextual Help

The rt.autocomplete() option listed above can be used as an alternative to Python’s built-in dir() function, which shows various attributes and methods associated with an object.

For example, to see the attributes and methods of Riptable’s Date object, you can use dir():

>>> # Limit and format the output.
>>> dir_date = dir(rt.Date)
>>> print("Some of the attributes and methods include...\n")
>>> print(", ".join(list(dir_date)[::10]))
Some of the attributes and methods include...

CompressPickle, T, _LDUMP, _TON, __array_function__, __class__, __doc__, __hash__, __init_subclass__, __le__, __new__, __rfloordiv__, __rsub__, __truediv__, _check_mathops, _fa_keyword_wrapper, _max, _nanstd, _reduce_op_identity_value, _yearday_splits_leap, argpartition, clip_upper, cummin, differs, ema_decay, format_date_num, is_leapyear, isnormal, map_old, move_mean, nanmean, nonzero, push, reshape, round, sign, strides, tolist, year

Note: The resulting list may not be complete. For details, see Python’s documentation for dir() in the section on built-in functions.

Alternatively, you can use Riptable’s autocomplete interface. With rt.autocomplete() turned on, type rt.Date.<TAB> where <TAB> is the Tab key. You’ll see a pop-up list of attributes and methods. Keep typing to narrow down the list.

Note that private/internal attributes and methods (those whose names are preceded by an underscore) are omitted by default, but you can access them by typing the underscore. For example: rt.Date._fa<TAB>.

You can access the doc string on any (documented) function or object with the following syntax:

IPython prompt: my_func?
Python prompt: help(my_obj)

For example:

>>> rt.sum?
Signature: rt.sum(*args, filter=None, dtype=None, **kwargs)
Docstring:
Compute the sum of the values in the first argument.

When possible, ``rt.sum(x, *args)`` calls ``x.sum(*args)``; look there for
documentation. In particular, note whether the called function accepts the
keyword arguments listed below. For example, `Dataset.sum()` does not accept
the `filter` or `dtype` keyword arguments.

For ``FastArray.sum``, see `numpy.sum` for documentation but note the following:

* The `dtype` keyword argument may not work as expected:

    * Riptable data types (for example, `rt.float64`) are ignored.
    * NumPy integer data types (for example, `numpy.int32`) are also ignored.
    * NumPy floating point data types are applied as `numpy.float64`.

* If you include another NumPy parameter (for example, ``axis=0``), the NumPy
implementation of ``sum`` will be used and the ``dtype`` will be used to
compute the sum.

Parameters
----------
filter : array of bool, default None
    Specifies which elements to include in the sum calculation.
dtype : rt.dtype or numpy.dtype, optional
    The data type of the result. By default, for integer input the result `dtype` is
    ``int64`` and for floating point input the result `dtype` is ``float64``. See
    the notes above about using this keyword argument with `FastArray` objects
    as input.

See Also
--------
numpy.sum
nansum : Sums the values, ignoring NaNs.
FastArray.sum : Sums the values of a `FastArray`.
Dataset.sum : Sums the values of numerical `Dataset` columns.
GroupByOps.sum : Sums the values of each group. Used by `Categorical` objects.

Examples
--------
>>> a = rt.FastArray([1, 3, 5, 7])
>>> rt.sum(a)
16

>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0])
>>> rt.sum(a)
16.0
File:      c:\\riptable\\rt_numpy.py
Type:      function

You can access the source code with ??:

>>> rt.sum??
Signature: rt.sum(*args, filter=None, dtype=None, **kwargs)
Docstring:
Compute the sum of the values in the first argument.

When possible, ``rt.sum(x, *args)`` calls ``x.sum(*args)``; look there for
documentation. In particular, note whether the called function accepts the
keyword arguments listed below. For example, `Dataset.sum()` does not accept
the `filter` or `dtype` keyword arguments.

For ``FastArray.sum``, see `numpy.sum` for documentation but note the following:

* The `dtype` keyword argument may not work as expected:

    * Riptable data types (for example, `rt.float64`) are ignored.
    * NumPy integer data types (for example, `numpy.int32`) are also ignored.
    * NumPy floating point data types are applied as `numpy.float64`.

* If you include another NumPy parameter (for example, ``axis=0``), the NumPy
implementation of ``sum`` will be used and the ``dtype`` will be used to
compute the sum.

Parameters
----------
filter : array of bool, default None
    Specifies which elements to include in the sum calculation.
dtype : rt.dtype or numpy.dtype, optional
    The data type of the result. By default, for integer input the result `dtype` is
    ``int64`` and for floating point input the result `dtype` is ``float64``. See
    the notes above about using this keyword argument with `FastArray` objects
    as input.

See Also
--------
numpy.sum
nansum : Sums the values, ignoring NaNs.
FastArray.sum : Sums the values of a `FastArray`.
Dataset.sum : Sums the values of numerical `Dataset` columns.
GroupByOps.sum : Sums the values of each group. Used by `Categorical` objects.

Examples
--------
>>> a = rt.FastArray([1, 3, 5, 7])
>>> rt.sum(a)
16

>>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0])
>>> rt.sum(a)
16.0
Source:
def sum(*args,filter = None, dtype = None,**kwargs):
    '''
    Compute the sum of the values in the first argument.

    When possible, ``rt.sum(x, *args)`` calls ``x.sum(*args)``; look there for
    documentation. In particular, note whether the called function accepts the
    keyword arguments listed below. For example, `Dataset.sum()` does not accept
    the `filter` or `dtype` keyword arguments.

    For ``FastArray.sum``, see `numpy.sum` for documentation but note the following:

    * The `dtype` keyword argument may not work as expected:

        * Riptable data types (for example, `rt.float64`) are ignored.
        * NumPy integer data types (for example, `numpy.int32`) are also ignored.
        * NumPy floating point data types are applied as `numpy.float64`.

    * If you include another NumPy parameter (for example, ``axis=0``), the NumPy
    implementation of ``sum`` will be used and the ``dtype`` will be used to
    compute the sum.

    Parameters
    ----------
    filter : array of bool, default None
        Specifies which elements to include in the sum calculation.
    dtype : rt.dtype or numpy.dtype, optional
        The data type of the result. By default, for integer input the result `dtype` is
        ``int64`` and for floating point input the result `dtype` is ``float64``. See
        the notes above about using this keyword argument with `FastArray` objects
        as input.

    See Also
    --------
    numpy.sum
    nansum : Sums the values, ignoring NaNs.
    FastArray.sum : Sums the values of a `FastArray`.
    Dataset.sum : Sums the values of numerical `Dataset` columns.
    GroupByOps.sum : Sums the values of each group. Used by `Categorical` objects.

    Examples
    --------
    >>> a = rt.FastArray([1, 3, 5, 7])
    >>> rt.sum(a)
    16

    >>> a = rt.FastArray([1.0, 3.0, 5.0, 7.0])
    >>> rt.sum(a)
    16.0
    '''
    kwargs = _np_keyword_wrapper(filter=filter, dtype=dtype, **kwargs)
    args = _convert_cat_args(args)
    if hasattr(args[0], 'sum'):
        return args[0].sum(*args[1:], **kwargs)
    return builtins.sum(*args,**kwargs)
File:      c:\\riptable\\rt_numpy.py
Type:      function

Dataset Display Options

When you view a Dataset, some data might be elided or truncated. By default:

Up to 9 columns are shown. If the Dataset has more than 9 columns, the middle columns are elided (with a “…” column displayed).
Up to 30 rows are shown. If the Dataset has more than 30 rows, the middle rows are elided (with a “…” row displayed).
Strings are displayed up to 15 characters, with additional characters truncated.

The following internal/private methods override the defaults on a per-display basis:

Show all columns and rows (up to 10,000 rows), as well as long strings: ds._A
Show all columns and long strings: ds._H
Show all columns with wrapping, and long strings: ds._G
Show all rows (up to 10,000): ds._V
Transpose columns and rows: ds._T

Now that we’re all set up, we’re ready to look at Riptable’s foundational data structures: Riptable Datasets, FastArrays, and Structs.

Questions or comments about this guide? Email RiptableDocumentation@sig.com.