Getting data

Interactive Brokers historic data

Interactive Brokers provides free historic data (only for clients) up to 6 month in the past, with a maximum resolution of 1 second bars. This is great of course, but to prevent abuse, IB imposed all kinds of download limitations. You are not allowed to make more than 60 data requests per 10 minutes for example and the number of bars per request is limited.

The script tools/getData.py is designed to download large datasets while respecting the rules.

This script downloads historic data for single or multiple securities through the Interactive brokers API.

  • the tool is run from the command line (see parameters below)
  • configuration is done in settings.yml. Here a list of security definitions is kept
    together with data destination folder
  • the data is saved as .csv files in a directory structure.

Running script

Download historic data

usage: getData [-h] [--symbols SYMBOLS] [--end END]

Named Arguments

--symbols

symbols separated by comma: SPY,VXX

Default: “all”

--end timestamp from where to start download. Defaults to last trading date

Interactive Brokers tick data

tickLogger.py is a script to log tick events to a file.

  • symbols to log and data location are stored in a yml config file
  • default configuration is read from settings.yml , you can provide different file through command line parameter.
  • ticks are logged to a rotating csv file, new file will start on midnight

Note

If you need to run this program for longer time periods, it is advisable to use IB Gateway instead of IB TWS. The latter will automatically log off at the end of each day.

Running script

Log ticks for a set of stocks

usage: tickLogger [-h] [--settings SETTINGS]

Named Arguments

--settings

ini file containing settings

Default: “settings.yml”

Yahoo Finance data

Yahoo Finance

This module enables easy access to data provided by Yahoo Finance.

Note

This service may stop without notice, Yahoo does not seem to like people accessing their data automatically. Breaking the service already happened in early 2017. This module includes a workaround that works … for now.

Getting historic data

The module is usually imported as follows:

In [1]: from tradingWithPython import yahooFinance as yf

Singe symbol

Then, to get raw yahoo finance data for a symbol use getSymbolData()

In [2]: df = yf.getSymbolData("SPY")
Got 4510 days of data

In [3]: df.head()
Out[3]: 
              open    high     low   close  adj_close    volume
Date                                                           
1999-12-31  146.84  147.50  146.25  146.88     105.37   3172700
2000-01-03  148.25  148.25  143.88  145.44     104.34   8164300
2000-01-04  143.53  144.06  139.64  139.75     100.26   8089800
2000-01-05  139.94  141.53  137.25  140.00     100.44  12177900
2000-01-06  139.62  141.50  137.75  137.75      98.82   6227200

We can also normalize OHLC with the adj_close data column. After normalization, the close column will be equal to adj_close , so the latter is omitted from the result.

In [4]: df = yf.getSymbolData("SPY",adjust=True)
Got 4510 days of data

In [5]: df.head()
Out[5]: 
             close    volume    open    high     low
Date                                                
1999-12-31  105.37   3172700  105.35  105.82  104.92
2000-01-03  104.34   8164300  106.36  106.36  103.22
2000-01-04  100.26   8089800  102.97  103.35  100.18
2000-01-05  100.44  12177900  100.39  101.54   98.46
2000-01-06   98.82   6227200  100.17  101.51   98.82

Multiple symbols

getHistoricData() will accept one ore more symbols and download them while displaying a progress bar.

In [6]: symbols = ['XLE','USO','SPY']

In [7]: data = yf.getHistoricData(symbols)
Downloading data:

 [                       0%                       ]
 [**********************67%*******                ]  2 of 3 complete
 [*********************100%***********************]  3 of 3 complete

The data will be a multi-index DataFrame:

In [8]: data.columns
Out[8]: 
MultiIndex(levels=[['SPY', 'USO', 'XLE'], ['open', 'high', 'low', 'close', 'adj_close', 'volume']],
           labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
           names=['symbol', 'ohlcv'])

To select a symbol, simply use

In [9]: data['SPY']
Out[9]: 
ohlcv         open    high     low   close  adj_close     volume
Date                                                            
1999-12-31  146.84  147.50  146.25  146.88     105.37    3172700
2000-01-03  148.25  148.25  143.88  145.44     104.34    8164300
2000-01-04  143.53  144.06  139.64  139.75     100.26    8089800
...            ...     ...     ...     ...        ...        ...
2017-11-29  263.02  263.63  262.20  262.71     262.71   77512100
2017-11-30  263.76  266.05  263.67  265.01     265.01  122980900
2017-12-01  264.76  265.31  260.79  264.46     264.46  147858445

[4510 rows x 6 columns]

Or with cross-section (see Advanced indexing)

In [10]: data.xs('close',level=1,axis=1)
Out[10]: 
symbol         SPY    USO    XLE
Date                            
1999-12-31  146.88    NaN  27.09
2000-01-03  145.44    NaN  26.56
2000-01-04  139.75    NaN  26.06
...            ...    ...    ...
2017-11-29  262.71  11.47  68.08
2017-11-30  265.01  11.47  69.10
2017-12-01  264.46  11.67  69.68

[4510 rows x 3 columns]

Functions

tradingWithPython.lib.yahooFinance.getSymbolData(symbol, sDate=(2000, 1, 1), adjust=False, verbose=True, dumpDest=None)[source]

get data from Yahoo finance and return pandas dataframe

Parameters:

symbol : str

Yahoo finanance symbol

sDate : tuple , default (2000,1,1)

start date (y,m,d)

adjust : bool , default False

use adjusted close values to correct OHLC. adj_close will be ommited

verbose : bool , default True

print output

dumpDest : str, default None

dump raw data for debugging

Returns:

DataFrame

tradingWithPython.lib.yahooFinance.getHistoricData(symbols, **options)[source]

get data from Yahoo finance and return pandas dataframe Will get OHLCV data frame if sinle symbol is provided. If many symbols are provided, it will return a wide panel

Parameters:

symbols : str or list

Yahoo finanance symbol or a list of symbols

sDate : tuple (optional)

start date (y,m,d)

adjust : bool

T/[F] adjust data based on adj_close

Returns:

DataFrame, multi-index