Read historical market data

There are 2 ways to get historical time series:

  1. readMkT function. It is a coinvent wrapper around MkTreader class,

  2. MkTreader class. Provides additional facilities for tracking missing data and errors.

In both cases the user can:

  • save the collected historical time series to a local file repository for later use,

  • read directly from the providers,

  • read and update an existing local repository.

The following market providers can be accesses:

  • yahoo - free as is service,

  • eodhistoricaldata - needs a premium account from eodhistoricaldata.com (some tests may be accomplished with a free key),

  • alphavantage - needs a premium account from alphavantage.co,

  • marketstack - needs a premium account from marketstack.com (some tests may be accomplished with a free key),

  • eodhistoricaldata_yahoo - hybrid service where the historical raw prices are collected from eodhistoricaldata provider while the splits and dividends are collected from yahoo. It may work with a eodhistoricaldata free key for limited testing purposes.

  • alphavantage_yahoo - hybrid service where the historical raw prices are collected from alphavantage provider while the splits and dividends are collected from yahoo. It may work with a alphavantage free key for testing purposes.

Regardless of where the historical data is collected, the open, high, low, close, volume, divd (dividends) are splits adjusted while the close adjusted, for short adjusted prices, are splits and dividends adjusted relative to the most recent date in the returned time series.

The following file formats are supported to save data:

  • csv - comma separated values format

  • json - JavaScript object notation format

  • feeder - portable binary format for storing Arrow tables and data frames in Python and R

In addition of the marker information, the source and acquisition date and time are also saved.

TOP

readMkT function

azapy.MkT.readMkT.readMkT(symbol=[], sdate='2012-01-01', edate='today', calendar=None, output_format='frame', source=None, force=False, save=True, file_dir='outDir', file_format='csv', api_key=None, param=None, verbose=True)

Retrieves market data for a set of stock symbols.

It is a wrapper for MkTreader class returning directly the requested historical time series. The function call variables are the same as for ‘MkTreader’ member function ‘get’.

Parameters:
symbolstr or list of str, optional

Stock symbols to be uploaded. The default is [].

sdatedate like, optional

The start date of historical time series. The default is “2012-01-01”.

edatedate like, optional

The end date of historical time series (must: sdate >= edate) The default is ‘today’.

calendarnumpy.busdaycalendar, optional

Exchange business day calendar. If set to None it will default to the NY stock exchange business calendar (provided by the azapy function NYSEgen). The default is None.

output_formatstr, optional
The function output format. It can be:
  • ‘frame’ - pandas.DataFrame

  • ‘dict’ - dict of pandaws.DataFrame. The symbols are the keys.

The default is ‘frame’

sourcestr or dict, optional

If it is a str, then it represents the market data provider for all historical prices request. Possible values are: ‘yahoo’, ‘alphavantage’, ‘alphavantage_yahoo’, ‘eodhistoricaldata’, ‘eodhistoricaldata_yahoo’ and ‘marketstack’. If set to None it will default to ‘yahoo’.

It can be set to a dict containing specific instructions for each stock symbol. The dict keys are the symbols and the values are ‘dict’ instructions specific to each symbol. Valid keys for the instructions dict are the names of this function call variables except ‘sdate’, ‘edate’, ‘calendar’ and ‘output_format’. The actual set of stock symbols is given by the union of variable ‘symbol’ and the keys of the dict ‘source’. Missing values in the symbol instruction dict’s will be filled with the values of the function call variables. The values of the function call variables act as generic values to be used in absence of specific instructions in the ‘source’ dict. The default is None.

Example of dict ‘source’:

source = {‘AAPL’: {‘source’: ‘eodhistoricaldata, ‘verbose’: True}, ‘SPY’: {‘source’: ‘yahoo’, ‘force’: True}}

In this case there are 2 symbols that will be added (union) to the set of symbols defined by ‘symbol’ variable. For symbol ‘AAPL’ the provider source is eodhistoricaldata and the ‘verbose’ instruction is set to True. The rest of the instructions: ‘force’, ‘save’, ‘file_dir’, ‘file_format’, ‘api_key’ and ‘param’ are set to the values of the corresponding function call variables. Similar for symbol ‘SPY’. The instructions for the rest of the symbols that may be specified in the ‘symbol’ variable will be set according to the values of the function call variables.

forceBoolean, optional
  • True: will try to collect historical prices exclusive from the market data providers.

  • False: first it will try to load the historical prices from a local saved file. If such a file does not exist the market data provider will be accessed.

If the file exists but the saved historical data is too short then it will try to collect the missing values only from the market data provider. The default is False.

saveBoolean, optional
  • True: It will try to save the historical price collected from the providers to a local file.

  • False: No attempt to save the data is made.

The default is True.

file_dirstr, optional

Directory with (to save) historical market data. If it does not exists then it will be created. The default is “outDir”.

file_formatstr, optional

The saved file format for the historical prices. The following files formats are supported: csv, json and feather The default is ‘csv’.

api_keystr, optional

Provider API key (where is required). If set to None then the API key is set to the value of global environment variables

  • APLPHAVANTAGE_API_KEY for alphavantage,

  • EODHISTORICALDATA_API_KEY for eodhistoricaldata,

  • MARKETSTACK_API_KEY for marketstack.

The default is None.

paramdict, optional

Set of additional information to access the market data provider. At this point in time only accessing alphavantage provider requires an additional parameter specifying the maximum number of API (symbols) requested per minute. It varies with the level of access corresponding to the API key. The minimum value is 5 for a free key and starts at 75 for premium keys. This value is stored in max_req_per_min variable.

Example: param = {‘max_req_per_min’: 5}

This is also the default vale for alphavantage, if param is set to None. The default is None.

verboseBoolean, optional

If set to True, then additional information will be printed during the loading of historical prices. The default is True.

Returns:
`pandas.DataFrame` or ‘dict’ `pandas.DataFrame`Historical market data.

The output format is designated by the value of the input parameter output_format.

TOP

Example readMkT function call

import azapy as az

sdate = "2012-01-01"
edate = 'today'
symb = ['GLD', 'TLT', 'XLV', 'VGT', 'VHT']

mktdir = "../../MkTdata"

# simple calls (most often used) 
# returns a pd.DataFrame
mktdata = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir)

# returns a dict of pd.DataFrame
mktdata_dict = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir,
                          output_format='dict')

# complex call (extremely customized)
# Note: 'alhavantage' and 'eofhistoricaldata' may requier valid API keys
source = {'GLD': {'force': True,
                  'save': True,
                  'file_dir': '../../MkTdata_yahoo',
                  'foramt_foramt' : 'feather'
                 },
          'TLT': {'source' : 'alphavantage',
                  'force': False,
                  'save': True,
                  'file_dir': '../../MkTdata_av',
                  'foramt_foramt': 'json',
                  'param': {'max_req_per_min': 75}
                 },
          'XLV': {'source': 'eofhistoricaldata'}
         }

symb = ['VGT', 'VHT']

mktdata = az.readMkT(symb, sdate=sdate, edate=edate, source=source,
                     file_dir=mktdir, file_format='csv')

TOP

MkTreader class

class azapy.MkT.MkTreader.MkTreader(verbose=True)

Bases: object

Collects historical market prices from market data providers such as ‘yahoo’, ‘eodhistoricaldata’, ‘alphavantage’ and ‘marketstack’.

Attributs
  • dsource : dict of request instructions per symbol

  • delta_time : execution time of the request in seconds

  • rout : pandas.DataFrame containing historical prices for all symbols. It is created during the call of get function.

  • rout_status : request status information. It is created during the call of get_request_status function or during the call of function get with option verbose=True.

  • error_log : contains lists of missing historical observation dates. It is created together with rout_status.

Methods

get([symbol, sdate, edate, calendar, ...])

Retrieves market data for a set of stock symbols.

get_error_log()

Returns lists of missing historical observation dates per symbol

get_request_status([verbose])

Reports abbreviated information about request status.

__init__(verbose=True)

Constructor

Parameters:
verboseBoolean, optional

If set to True, additional information will be printed during the loading of historical prices. The default value is True.

Returns:
The MkTreder object
get(symbol=[], sdate='2012-01-01', edate='today', calendar=None, output_format='frame', source=None, force=False, save=True, file_dir='outDir', file_format='csv', api_key=None, param=None, verbose=None)

Retrieves market data for a set of stock symbols.

Parameters:
symbolstr or list of str, optional

Stock symbols to be uploaded. The default is [].

sdatedate like, optional

The start date of historical time series. The default is “2012-01-01”.

edatedate like, optional

The end date of historical time series (must: sdate >= edate) The default is ‘today’.

calendarnumpy.busdaycalendar, optional

Exchange business day calendar. If set to None it will default to the NY stock exchange business calendar (provided by the azapy function NYSEgen). The default is None.

output_formatstr, optional
The function output format. It can be:
  • ‘frame’ - pandas.DataFrame

  • ‘dict’ - dict of pandas.DataFrame. The symbols are the keys.

The default is ‘frame’

sourcestr or dict, optional

If it is a str, then it represents the market data provider for all historical prices request. Possible values are: ‘yahoo’, ‘alphavantage’, ‘alphavantage_yahoo’, ‘eodhistoricaldata’, ‘eodhistoricaldata_yahoo’ and ‘marketstack’. If set to None it will default to ‘yahoo’.

It can be set to a dict containing specific instructions for each stock symbol. The dict keys are the symbols and the values are ‘dict’ instructions specific to each symbol. Valid keys for the instructions dict are the names of this function call variables except ‘sdate’, ‘edate’, ‘calendar’ and ‘output_format’. The actual set of stock symbols is given by the union of variable ‘symbol’ and the keys of the dict ‘source’. Missing values in the symbol instruction dict’s will be filled with the values of the function call variables. The values of the function call variables act as generic values to be used in absence of specific instructions in the ‘source’ dict. The default is None.

Example of dict ‘source’:

source = {‘AAPL’: {‘source’: ‘eodhistoricaldata, ‘verbose’: True}, ‘SPY’: {‘source’: ‘yahoo’, ‘force’: True}}

In this case there are 2 symbols that will be added (union) to the set of symbols defined by ‘symbol’ variable. For symbol ‘AAPL’ the provider source is eodhistoricaldata and the ‘verbose’ instruction is set to True. The rest of the instructions: ‘force’, ‘save’, ‘file_dir’, ‘file_format’, ‘api_key’ and ‘param’ are set to the values of the corresponding function call variables. Similar for symbol ‘SPY’. The instructions for the rest of the symbols that may be specified in the ‘symbol’ variable will be set according to the values of the function call variables.

forceBoolean, optional
  • True: will try to collect historical prices exclusive from the market data providers.

  • False: first it will try to load the historical prices from a local saved file. If such a file does not exist the market data provider will be accessed.

If the file exists but the saved historical data is too short then it will try to collect the missing values only from the market data provider. The default is False.

saveBoolean, optional
  • True: It will try to save the historical price collected from the providers to a local file.

  • False: No attempt to save the data is made.

The default is True.

file_dirstr, optional

Directory with (to save) historical market data. If it does not exists then it will be created. The default is “outDir”.

file_formatstr, optional

The saved file format for the historical prices. The following files formats are supported: csv, json and feather The default is ‘csv’.

api_keystr, optional

Provider API key (where is required). If set to None then the API key is set to the value of global environment variables

  • APLPHAVANTAGE_API_KEY for alphavantage,

  • EODHISTORICALDATA_API_KEY for eodhistoricaldata,

  • MARKETSTACK_API_KEY for marketstack.

The default is None.

paramdict, optional

Set of additional information to access the market data provider. At this point in time only accessing alphavantage provider requires an additional parameter specifying the maximum number of API (symbols) requested per minute. It varies with the level of access corresponding to the API key. The minimum value is 5 for a free key and starts at 75 for premium keys. This value is stored in max_req_per_min variable.

Example: param = {‘max_req_per_min’: 5}

This is also the default vale for alphavantage, if param is set to None. The default is None.

verboseBoolean, optional

If set True, the additional information will be printed during the loading of historical prices. If None it is ignored, otherwise it overwrites the value set by the constructor. The default value is None.

Returns:
`pandas.DataFrame` or ‘dict’ `pandas.DataFrame`Historical market data.

The output format is designated by the value of the input parameter output_format.

get_error_log()

Returns lists of missing historical observation dates per symbol

Returns:
`dict`The error-log.
If it is an empty dict then there are no missing dates in the
collected historical time series.
Otherwise, the keys of the dict are the symbols that have missing
dates. The values for these keys are also dict with the following
fields:
  • ‘back’: a list of missing date at the tail of the time series

  • ‘front’ : a list of missing data at the head of the time series

  • ‘mid’ : a list of missing data in the middle of the time series

Fields with empty list of dates are omitted.
get_request_status(verbose=None)

Reports abbreviated information about request status.

verboseBoolean, optional

If set to True, additional information will be printed during the function execution. In set to ‘None’, it will be ignored, otherwise it will overwrite the value set by the constructor. The default value is None.

Returns:
`pandas.DataFrame`The status report.
The column names are the symbols for which the data was requested.
The rows contain the actual input parameters per symbol as well
as:
  • ‘nrow’ : the length of historical time series.

  • ‘sdate’ : first date in the time series.

  • ‘edate’ : end date of the time series.

  • ‘error’ : if there are missing data. If its value is ‘Yes’ then the actual list of missing date per symbol can be obtained by calling get_error_log.

TOP

Example MkTreader class usage

import azapy as az

sdate = "2012-01-01"
edate = 'today'
symb = ['GLD', 'TLT', 'XLV', 'VGT', 'IHI']

mktdir = "../../MkTdata"

# build MkTreader object
mkt = az.MkTreader()

# read historical mkt data
hdata = mkt.get(symb, sdate=sdate, edate=edate, file_dir=mktdir, verbose=False, force=False, output_format='dict')
print(f"MkT data\n{hdata}")

# request status
req_status = mkt.get_request_status()
print(f"Status per symbol\n{req_status}")


# missing observation dates
error_date = mkt.get_error_log()
print(f"Error log per symbol\n{error_date}")

TOP