Read historical market data¶
There are 2 ways to get historical time series:
readMkT function. It is a coinvent wrapper around
MkTreader
class,MkTreader class. Provides additional facilities for tracking missing data and errors.
In both cases the user can:
save the collected historical time series to a local file repository for later use,
read directly from the providers,
read and update an existing local repository.
The following market providers can be accesses:
yahoo - free as is service,
eodhistoricaldata - needs a premium account from eodhistoricaldata.com (some tests may be accomplished with a free key),
alphavantage - needs a premium account from alphavantage.co,
marketstack - needs a premium account from marketstack.com (some tests may be accomplished with a free key),
eodhistoricaldata_yahoo - hybrid service where the historical raw prices are collected from eodhistoricaldata provider while the splits and dividends are collected from yahoo. It may work with a eodhistoricaldata free key for limited testing purposes.
alphavantage_yahoo - hybrid service where the historical raw prices are collected from alphavantage provider while the splits and dividends are collected from yahoo. It may work with a alphavantage free key for testing purposes.
Regardless of where the historical data is collected, the
open
, high
, low
, close
, volume
, divd
(dividends) are splits adjusted
while the close adjusted, for short adjusted
prices, are splits and
dividends adjusted relative to the most recent date in the returned time series.
The following file formats are supported to save data:
csv - comma separated values format
json - JavaScript object notation format
feeder - portable binary format for storing Arrow tables and data frames in Python and R
In addition of the marker information, the source and acquisition date and time are also saved.
readMkT function¶
- azapy.MkT.readMkT.readMkT(symbol=[], sdate='2012-01-01', edate='today', calendar=None, output_format='frame', source=None, force=False, save=True, file_dir='outDir', file_format='csv', api_key=None, param=None, verbose=True)¶
Retrieves market data for a set of stock symbols.
It is a wrapper for MkTreader class returning directly the requested historical time series. The function call variables are the same as for ‘MkTreader’ member function ‘get’.
- Parameters:
- symbolstr or list of str, optional
Stock symbols to be uploaded. The default is [].
- sdatedate like, optional
The start date of historical time series. The default is “2012-01-01”.
- edatedate like, optional
The end date of historical time series (must: sdate >= edate) The default is ‘today’.
- calendarnumpy.busdaycalendar, optional
Exchange business day calendar. If set to None it will default to the NY stock exchange business calendar (provided by the azapy function NYSEgen). The default is None.
- output_formatstr, optional
- The function output format. It can be:
‘frame’ - pandas.DataFrame
‘dict’ - dict of pandaws.DataFrame. The symbols are the keys.
The default is ‘frame’
- sourcestr or dict, optional
If it is a str, then it represents the market data provider for all historical prices request. Possible values are: ‘yahoo’, ‘alphavantage’, ‘alphavantage_yahoo’, ‘eodhistoricaldata’, ‘eodhistoricaldata_yahoo’ and ‘marketstack’. If set to None it will default to ‘yahoo’.
It can be set to a dict containing specific instructions for each stock symbol. The dict keys are the symbols and the values are ‘dict’ instructions specific to each symbol. Valid keys for the instructions dict are the names of this function call variables except ‘sdate’, ‘edate’, ‘calendar’ and ‘output_format’. The actual set of stock symbols is given by the union of variable ‘symbol’ and the keys of the dict ‘source’. Missing values in the symbol instruction dict’s will be filled with the values of the function call variables. The values of the function call variables act as generic values to be used in absence of specific instructions in the ‘source’ dict. The default is None.
Example of dict ‘source’:
source = {‘AAPL’: {‘source’: ‘eodhistoricaldata, ‘verbose’: True}, ‘SPY’: {‘source’: ‘yahoo’, ‘force’: True}}
In this case there are 2 symbols that will be added (union) to the set of symbols defined by ‘symbol’ variable. For symbol ‘AAPL’ the provider source is eodhistoricaldata and the ‘verbose’ instruction is set to True. The rest of the instructions: ‘force’, ‘save’, ‘file_dir’, ‘file_format’, ‘api_key’ and ‘param’ are set to the values of the corresponding function call variables. Similar for symbol ‘SPY’. The instructions for the rest of the symbols that may be specified in the ‘symbol’ variable will be set according to the values of the function call variables.
- forceBoolean, optional
True: will try to collect historical prices exclusive from the market data providers.
False: first it will try to load the historical prices from a local saved file. If such a file does not exist the market data provider will be accessed.
If the file exists but the saved historical data is too short then it will try to collect the missing values only from the market data provider. The default is False.
- saveBoolean, optional
True: It will try to save the historical price collected from the providers to a local file.
False: No attempt to save the data is made.
The default is True.
- file_dirstr, optional
Directory with (to save) historical market data. If it does not exists then it will be created. The default is “outDir”.
- file_formatstr, optional
The saved file format for the historical prices. The following files formats are supported: csv, json and feather The default is ‘csv’.
- api_keystr, optional
Provider API key (where is required). If set to None then the API key is set to the value of global environment variables
APLPHAVANTAGE_API_KEY for alphavantage,
EODHISTORICALDATA_API_KEY for eodhistoricaldata,
MARKETSTACK_API_KEY for marketstack.
The default is None.
- paramdict, optional
Set of additional information to access the market data provider. At this point in time only accessing alphavantage provider requires an additional parameter specifying the maximum number of API (symbols) requested per minute. It varies with the level of access corresponding to the API key. The minimum value is 5 for a free key and starts at 75 for premium keys. This value is stored in max_req_per_min variable.
Example: param = {‘max_req_per_min’: 5}
This is also the default vale for alphavantage, if param is set to None. The default is None.
- verboseBoolean, optional
If set to True, then additional information will be printed during the loading of historical prices. The default is True.
- Returns:
- `pandas.DataFrame` or ‘dict’ `pandas.DataFrame`Historical market data.
The output format is designated by the value of the input parameter output_format.
Example readMkT function call¶
import azapy as az
sdate = "2012-01-01"
edate = 'today'
symb = ['GLD', 'TLT', 'XLV', 'VGT', 'PSJ']
mktdir = "../../MkTdata"
# simple calls (most often used)
# returns a pd.DataFrame
mktdata = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir)
# returns a dict of pd.DataFrame
mktdata_dict = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir,
output_format='dict')
# complex call (extremely customized)
# Note: 'alhavantage' and 'eofhistoricaldata' may requier valid API keys
source = {'GLD': {'force': True,
'save': True,
'file_dir': '../../MkTdata_yahoo',
'foramt_foramt' : 'feather'
},
'TLT': {'source' : 'alhavantage',
'force': False,
'save': True,
'file_dir': '../../MkTdata_av',
'foramt_foramt': 'json',
'param': {'max_req_per_min': 75}
},
'XLV': {'source': 'eofhistoricaldata'}
}
symb = ['VGT', 'PSJ']
mktdata = az.readMkT(symb, sdate=sdate, edate=edate, source=source,
file_dir=mktdir, file_format='csv')
MkTreader class¶
- class azapy.MkT.MkTreader.MkTreader¶
Bases:
object
Collects historical market prices from market data providers such as ‘yahoo’, ‘eodhistoricaldata’, ‘alphavantage’ and ‘marketstack’.
- Attributs
dsource : dict of request instructions per symbol
delta_time : execution time of the request in seconds
rout : pandas.DataFrame containing historical prices for all symbols. It is created during the call of get function.
rout_status : request status information. It is created during the call of get_request_status function or during the call of function get with option verbose=True.
error_log : contains lists of missing historical observation dates. It is created together with rout_status.
Methods
get
([symbol, sdate, edate, calendar, ...])Retrieves market data for a set of stock symbols.
Returns lists of missing historical observation dates per symbol
Reports abbreviated information about request status.
- __init__()¶
Constructor
- get(symbol=[], sdate='2012-01-01', edate='today', calendar=None, output_format='frame', source=None, force=False, save=True, file_dir='outDir', file_format='csv', api_key=None, param=None, verbose=True)¶
Retrieves market data for a set of stock symbols.
It is a wrapper for MkTreader class returning directly the requested historical time series. The function call variables are the same as for ‘MkTreader’ member function ‘get’.
- Parameters:
- symbolstr or list of str, optional
Stock symbols to be uploaded. The default is [].
- sdatedate like, optional
The start date of historical time series. The default is “2012-01-01”.
- edatedate like, optional
The end date of historical time series (must: sdate >= edate) The default is ‘today’.
- calendarnumpy.busdaycalendar, optional
Exchange business day calendar. If set to None it will default to the NY stock exchange business calendar (provided by the azapy function NYSEgen). The default is None.
- output_formatstr, optional
- The function output format. It can be:
‘frame’ - pandas.DataFrame
‘dict’ - dict of pandas.DataFrame. The symbols are the keys.
The default is ‘frame’
- sourcestr or dict, optional
If it is a str, then it represents the market data provider for all historical prices request. Possible values are: ‘yahoo’, ‘alphavantage’, ‘alphavantage_yahoo’, ‘eodhistoricaldata’, ‘eodhistoricaldata_yahoo’ and ‘marketstack’. If set to None it will default to ‘yahoo’.
It can be set to a dict containing specific instructions for each stock symbol. The dict keys are the symbols and the values are ‘dict’ instructions specific to each symbol. Valid keys for the instructions dict are the names of this function call variables except ‘sdate’, ‘edate’, ‘calendar’ and ‘output_format’. The actual set of stock symbols is given by the union of variable ‘symbol’ and the keys of the dict ‘source’. Missing values in the symbol instruction dict’s will be filled with the values of the function call variables. The values of the function call variables act as generic values to be used in absence of specific instructions in the ‘source’ dict. The default is None.
Example of dict ‘source’:
source = {‘AAPL’: {‘source’: ‘eodhistoricaldata, ‘verbose’: True}, ‘SPY’: {‘source’: ‘yahoo’, ‘force’: True}}
In this case there are 2 symbols that will be added (union) to the set of symbols defined by ‘symbol’ variable. For symbol ‘AAPL’ the provider source is eodhistoricaldata and the ‘verbose’ instruction is set to True. The rest of the instructions: ‘force’, ‘save’, ‘file_dir’, ‘file_format’, ‘api_key’ and ‘param’ are set to the values of the corresponding function call variables. Similar for symbol ‘SPY’. The instructions for the rest of the symbols that may be specified in the ‘symbol’ variable will be set according to the values of the function call variables.
- forceBoolean, optional
True: will try to collect historical prices exclusive from the market data providers.
False: first it will try to load the historical prices from a local saved file. If such a file does not exist the market data provider will be accessed.
If the file exists but the saved historical data is too short then it will try to collect the missing values only from the market data provider. The default is False.
- saveBoolean, optional
True: It will try to save the historical price collected from the providers to a local file.
False: No attempt to save the data is made.
The default is True.
- file_dirstr, optional
Directory with (to save) historical market data. If it does not exists then it will be created. The default is “outDir”.
- file_formatstr, optional
The saved file format for the historical prices. The following files formats are supported: csv, json and feather The default is ‘csv’.
- api_keystr, optional
Provider API key (where is required). If set to None then the API key is set to the value of global environment variables
APLPHAVANTAGE_API_KEY for alphavantage,
EODHISTORICALDATA_API_KEY for eodhistoricaldata,
MARKETSTACK_API_KEY for marketstack.
The default is None.
- paramdict, optional
Set of additional information to access the market data provider. At this point in time only accessing alphavantage provider requires an additional parameter specifying the maximum number of API (symbols) requested per minute. It varies with the level of access corresponding to the API key. The minimum value is 5 for a free key and starts at 75 for premium keys. This value is stored in max_req_per_min variable.
Example: param = {‘max_req_per_min’: 5}
This is also the default vale for alphavantage, if param is set to None. The default is None.
- verboseBoolean, optional
If set to True, then additional information will be printed during the loading of historical prices. The default is True.
- Returns:
- `pandas.DataFrame` or ‘dict’ `pandas.DataFrame`Historical market data.
The output format is designated by the value of the input parameter output_format.
- get_error_log()¶
Returns lists of missing historical observation dates per symbol
- Returns:
- `dict`The error-log.
- If it is an empty dict then there are no missing dates in the
- collected historical time series.
- Otherwise, the keys of the dict are the symbols that have missing
- dates. The values for these keys are also dict with the following
- fields:
‘back’: a list of missing date at the tail of the time series
‘front’ : a list of missing data at the head of the time series
‘mid’ : a list of missing data in the middle of the time series
- Fields with empty list of dates are omitted.
- get_request_status()¶
Reports abbreviated information about request status.
- Returns:
- `pandas.DataFrame`The status report.
- The column names are the symbols for which the data was requested.
- The rows contain the actual input parameters per symbol as well
- as:
‘nrow’ : the length of historical time series.
‘sdate’ : first date in the time series.
‘edate’ : end date of the time series.
‘error’ : if there are missing data. If its value is ‘Yes’ then the actual list of missing date per symbol can be obtained by calling get_error_log.
Example MkTreader class usage¶
import azapy as az
sdate = "2012-01-01"
edate = 'today'
symb = ['GLD', 'TLT', 'XLV', 'VGT', 'IHI']
mktdir = "../../MkTdata"
# build MkTreader object
mkt = az.MkTreader()
# read historical mkt data
hdata = mkt.get(symb, sdate=sdate, edate=edate, file_dir=mktdir)
print(f"MkT data\n{hdata}")
# request status
req_status = mkt.get_request_status()
print(f"Status per symbol\n{req_status}")
# missing observation dates
error_date = mkt.get_error_log()
print(f"Error log per symbol\n{error_date}")