riptable.rt_merge_asof

Time/ordering-based merge implementations.

Functions

merge_asof2(left, right[, on, left_on, right_on, by, ...])

Perform an as-of merge. This is similar to a left-join except that we match on nearest key rather than equal keys.

riptable.rt_merge_asof.merge_asof2(left, right, on=None, left_on=None, right_on=None, by=None, left_by=None, right_by=None, suffixes=None, copy=True, columns_left=None, columns_right=None, *, tolerance=None, allow_exact_matches=True, direction='backward', matched_on=False, **kwargs)

Perform an as-of merge. This is similar to a left-join except that we match on nearest key rather than equal keys.

Both Datasets must be sorted (ascending) by the ‘on’ column. When ‘by’ columns are specified, the ‘on’ column for each Dataset only needs to be sorted (ascending) within each unique ‘key’ of the ‘by’ columns. Sorting the entire Dataset by the ‘on’ column also meets this requirement, but some Datasets may have an ‘on’ column which is already pre-sorted within each ‘by’ key, in which case no additional sorting is required.

For each row in the left Dataset:
  • A “backward” search selects the last row in the right Dataset whose ‘on’ key is less than or equal to the left’s key.

  • A “forward” search selects the first row in the right Dataset whose ‘on’ key is greater than or equal to the left’s key.

  • A “nearest” search selects the row in the right Dataset whose ‘on’ key is closest in absolute distance to the left’s key.

Optionally match on equivalent keys with ‘by’ before searching with ‘on’.

Parameters:
  • left (Dataset) – Left Dataset

  • right (Dataset) – Right Dataset

  • on (str) – Column name to join on. Must be found in both the left and right Datasets. This column in both left and right Datasets MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. Either on or left_on/right_on must be specified.

  • left_on (str or list of str, optional) – Column name to join on in left Dataset.

  • right_on (str or list of str, optional) – Column name to join on in right Dataset.

  • by (str or (str, str) or list of str or list of (str, str), optional) – Column name or list of column names. Match on these columns before performing merge operation.

  • left_by (str or list of str, optional) – Column names to match on in the left Dataset.

  • right_by (str or list of str, optional) – Column names to match on in the right Dataset.

  • suffixes ((str, str), optional, default None) – Suffix to apply to overlapping column names in the left and right side, respectively.

  • copy (bool, default True) – If False, avoid copying data when possible; this can reduce memory usage but users must be aware that data can be shared between left and/or right and the Dataset returned by this function.

  • columns_left (str or list of str, optional) – Column names to include in the merge from left, defaults to None which causes all columns to be included.

  • columns_right (str or list of str, optional) – Column names to include in the merge from right, defaults to None which causes all columns to be included.

  • tolerance (integer or float or Timedelta, optional, default None) – Tolerance allowed when performing the ‘asof’ part of the merge; whenever a row from left doesn’t have a key in right within this distance or less, that row will have a null/missing/NA value for any columns from the right Dataset which appear in the merged result.

  • allow_exact_matches (boolean, default True) –

    • If True, allow matching with the same ‘on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to)

    • If False, don’t match the same ‘on’ value (i.e., strictly less-than / strictly greater-than)

  • direction ({'backward', 'forward', or 'nearest'}, default 'backward') – Whether to search for prior, subsequent, or closest matches.

  • matched_on (bool or str, default False) – If set to True or a string, an additional column is added to the result; for each row, it contains the value from the on column in right that was matched. When True, the column will use the default name ‘matched_on’; specify a string to explicitly name the column.

Returns:

merged

Return type:

Dataset

Raises:

ValueError – The on, left_on, or right_on columns are not sorted in ascending order within one or more keys of the by columns.

See also

riptable.merge_asof

Notes

TODO: Consider allowing the tolerance parameter to also be a 1D array/sequence whose length is the same

as the number of groups (or keys?) in the left dataset. That allows a per-group tolerance to be specified if needed.

Examples

>>> left = rt.Dataset({'a': [1, 5, 10], 'left_val': ['a', 'b', 'c']})
>>> left
#    a   left_val
-   --   --------
0    1   a
1    5   b
2   10   c
>>> right = rt.Dataset({'a': [1, 2, 3, 6, 7],
...                       'right_val': [1, 2, 3, 6, 7]})
>>> right
#   a   right_val
-   -   ---------
0   1           1
1   2           2
2   3           3
3   6           6
4   7           7
>>> rt.merge_asof(left, right, on='a')
#   a_x   left_val   a_y   right_val
-   ---   --------   ---   ---------
0     1   a            1           1
1     5   b            3           3
2    10   c            7           7
>>> rt.merge_asof(left, right, on='a', allow_exact_matches=False)
 #   a_x   left_val   a_y   right_val
-   ---   --------   ---   ---------
0     1   a          Inv         Inv
1     5   b            3           3
2    10   c            7           7
>>> rt.merge_asof(left, right, on='a', direction='forward')
#   a_x   left_val   a_y   right_val
-   ---   --------   ---   ---------
0     1   a            1           1
1     5   b            6           6
2    10   c          Inv         Inv

Here is a real-world time-series example

>>> quotes
#                   time  ticker   Bid   Ask
-   --------------------  ------  ----  ----
0   20191015 09:45:57.09    AAPL  3.40  3.50
1   20191015 11:35:09.76    AAPL  3.45  3.55
2   20191015 12:02:27.11    AAPL  3.50  3.60
3   20191015 12:43:13.73    MSFT  2.85  2.95
4   20191015 14:32:11.18    MSFT  2.90  3.00
>>> trades
#                   time  ticker  TradePrice  TradeSize
-   --------------------  ------  ----------  ---------
0   20191015 10:03:24.73    AAPL        3.45       1.00
1   20191015 10:41:22.79    MSFT        2.85       1.00
2   20191015 10:41:35.69    MSFT        2.86       1.00
3   20191015 11:04:32.55    AAPL        3.47       1.00
4   20191015 11:44:35.63    MSFT        2.91       1.00
5   20191015 12:26:17.68    AAPL        3.55       1.00
6   20191015 14:24:10.93    MSFT        2.98       1.00
7   20191015 15:45:13.41    AAPL        3.60       7.00
8   20191015 15:50:42.53    AAPL        3.58       1.00
9   20191015 15:53:59.60    AAPL        3.56       5.00
>>> rt.merge_asof(trades, quotes, on='time', by='ticker')
#                 time_x   ticker_x  TradePrice   TradeSize                 time_y   ticker_y    Bid    Ask
-   --------------------  ---------  ----------   ---------   --------------------   --------   ----   ----
0   20191015 10:03:24.73       AAPL        3.45        1.00   20191015 09:45:57.09       AAPL   3.40   3.50
1   20191015 10:41:22.79       MSFT        2.85        1.00                    Inv   Filtered    nan    nan
2   20191015 10:41:35.69       MSFT        2.86        1.00                    Inv   Filtered    nan    nan
3   20191015 11:04:32.55       AAPL        3.47        1.00   20191015 09:45:57.09       AAPL   3.40   3.50
4   20191015 11:44:35.63       MSFT        2.91        1.00                    Inv   Filtered    nan    nan
5   20191015 12:26:17.68       AAPL        3.55        1.00   20191015 12:02:27.11       AAPL   3.50   3.60
6   20191015 14:24:10.93       MSFT        2.98        1.00   20191015 12:43:13.73       MSFT   2.85   2.95
7   20191015 15:45:13.41       AAPL        3.60        7.00   20191015 12:02:27.11       AAPL   3.50   3.60
8   20191015 15:50:42.53       AAPL        3.58        1.00   20191015 12:02:27.11       AAPL   3.50   3.60
9   20191015 15:53:59.60       AAPL        3.56        5.00   20191015 12:02:27.11       AAPL   3.50   3.60

only merge with the forward quotes

>>> rt.merge_asof(trades, quotes, on='time', by='ticker', direction='forward')
#                 time_x  ticker_x  TradePrice  TradeSize                time_y  ticker_y   Bid   Ask
-   --------------------  --------  ----------  ---------  --------------------  --------  ----  ----
0   20191015 10:03:24.73      AAPL        3.45       1.00  20191015 11:35:09.76      AAPL  3.45  3.55
1   20191015 10:41:22.79      MSFT        2.85       1.00  20191015 12:43:13.73      MSFT  2.85  2.95
2   20191015 10:41:35.69      MSFT        2.86       1.00  20191015 12:43:13.73      MSFT  2.85  2.95
3   20191015 11:04:32.55      AAPL        3.47       1.00  20191015 11:35:09.76      AAPL  3.45  3.55
4   20191015 11:44:35.63      MSFT        2.91       1.00  20191015 12:43:13.73      MSFT  2.85  2.95
6   20191015 14:24:10.93      MSFT        2.98       1.00  20191015 14:32:11.18      MSFT  2.90  3.00
7   20191015 15:45:13.41      AAPL        3.60       7.00                   Inv  Filtered   nan   nan
8   20191015 15:50:42.53      AAPL        3.58       1.00                   Inv  Filtered   nan   nan
9   20191015 15:53:59.60      AAPL        3.56       5.00                   Inv  Filtered   nan   nan
5   20191015 12:26:17.68      AAPL        3.55       1.00                   Inv  Filtered   nan   nan