Correlation Clustering Selector¶
The correlation clustering selector aims to produce a set of low correlated assets. The general idea is to partition the original universe of assets in clusters of highly correlated elements. Then for each cluster of 2 or more assets a representative is selected according to a performance measure. At the end, the size of the selection is the number of clusters.
The CorrClusterSelctor
uses a hierarchical clustering algorithm with Ward
linkage and correlation distance, \(d_\rho(A, B) = 1 - \rho(A, B)\), between assets.
The hierarchical tree is cut at \(1 - \rho_{\rm th}\), where \(\rho_{\rm th}\)
is a user defined correlation threshold. A typical value is
\(\rho_{\rm th}=0.95\). It uses the f13612w
filter to define the best
representative of each cluster. The f13612w
filter
is a momentum measure defined as the weighted average of the most recent
annualized 1-, 3-, 6-, and 12-months rates of return. The typical setup is equal
weighted average. However, azapy implementation allows for arbitrary
positive weights (not all zero), e.g., [1, 2, 1, 1]
.
CorrClusterSelector class¶
- class azapy.Selectors.CorrClusterSelector.CorrClusterSelector(pname='CorrCluster', corr_threshold=0.95, freq='Q', ftype='f13612w', fw=None, col_price='adjusted', hlength=1)¶
Bases:
NullSelector
Selects symbols with lower inter-correlation.
- Attributes
pname : str - portfolio name
mkt : pandas.DataFrame - selection’s market data
symb : list - selected symbols
symb_omitted : list - unselected symbols
capital : float - always set to 1
Methods
getSelection
(mktdata, **params)Computes the selection.
- __init__(pname='CorrCluster', corr_threshold=0.95, freq='Q', ftype='f13612w', fw=None, col_price='adjusted', hlength=1)¶
Constructor
- Parameters:
- pnamestr, optional
Selector name. The default is ‘DualMomentum’.
- corr_thresoldfloat, optional
Cluster correlation threshold (i.e., a cluster contains only symbols with inter-correlation higher than corr_threshold. The default is 0.95.
- freqstr, optional
The horizon of rates subject to correlation estimations. It can be either ‘M’ for monthly or ‘Q’ for quarterly rates. The default is ‘Q’.
- ftypestr, optional
Inner-cluster filter (i.e., criteria to designate the representative of a cluster with more than one symbol). At this point only ‘f13612w’ is implemented. The default is ‘f13612w’.
- fwlist, optional
List of filter wights. For ‘f13612w’ it must be a list of 4 positive (not all zero) numbers. A value of None indicates equal weights. Note: the weights are normalized internally. The default is None.
- col_pricestr, optional
The name of the pricing column to be considered in computations. The default is ‘adjusted’.
- hlength‘float’, optional
History length in number of years used for calibration. A fractional number will be rounded to an integer number of months. The default is 3.25 years.
- Returns:
- The object.
- getSelection(mktdata, **params)¶
Computes the selection.
- Parameters:
- mktdatapandas.DataFrme
MkT data in the format produced by the azapy function readMkT.
- **paramsdict, optional
- Other optional parameters:
- verboseBoolean, optional
When it is set to True, the selection symbols are printed. The default is ‘False’.
- viewBoolean, optional
If set to True, then the dendrogram of hierarchical classification is printed out. The default is False. Note: the tree cutoff is at 1 - corr_threshold level.
- Returns:
- (capital, mkt)tuple
- capitalfloat
Fraction of capital allocated to the selection. For this selector it is always 1.
- mktpandas.DataFrame
Selection MkT data in the format produced by the azapy function readMkT.
Example CorrClusterSelctor¶
# Examples
import numpy as np
import azapy as az
print(f"azapy version {az.version()}", flush=True)
#==============================================================================
# collect market data
mktdir = '../../MkTdata'
sdate = '2012-01-01'
edate = '2021-07-27'
symb = ['GLD', 'TLT', 'IHI', 'VGT', 'OIH',
'XAR', 'XBI', 'XHE', 'XHS', 'XLB',
'XLE', 'XLF', 'XLI', 'XLK', 'XLU',
'XLV', 'XLY', 'XRT', 'SPY', 'ONEQ',
'QQQ', 'DIA', 'ILF', 'XSW', 'PGF',
'IDV', 'JNK', 'HYG', 'SDIV', 'VIG',
'SLV', 'AAPL', 'MSFT', 'AMZN', 'GOOG',
'IYT', 'VGI', 'IWM', 'BRK-B', 'ITA' ]
mktdata = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir,
verbose=False)
#==============================================================================
# CorrClusterSelector
selector = az.CorrClusterSelector()
capital, mkt = selector.getSelection(mktdata)
print(f"As of {edate}\n"
f"capital at risk: {capital}\n"
f"selected symbols: {mkt.symbol.unique()}\n"
f"selected {len(mkt.symbol.unique())} out of {len(symb)} symbols\n"
f"symbols omitted: {list(np.setdiff1d(symb, mkt.symbol.unique()))}")