Correlation Clustering Selector

The correlation clustering selector aims to produce a set of low correlated assets. The general idea is to partition the original universe of assets in clusters of highly correlated elements. Then for each cluster of 2 or more assets a representative is selected according to a performance measure. At the end, the size of the selection is the number of clusters.

The CorrClusterSelctor uses a hierarchical clustering algorithm with Ward linkage and correlation distance, \(d_\rho(A, B) = 1 - \rho(A, B)\), between assets. The hierarchical tree is cut at \(1 - \rho_{\rm th}\), where \(\rho_{\rm th}\) is a user defined correlation threshold. A typical value is \(\rho_{\rm th}=0.95\). It uses the f13612w filter to define the best representative of each cluster. The f13612w filter is a momentum measure defined as the weighted average of the most recent annualized 1-, 3-, 6-, and 12-months rates of return. The typical setup is equal weighted average. However, azapy implementation allows for arbitrary positive weights (not all zero), e.g., [1, 2, 1, 1].


CorrClusterSelector class

class azapy.Selectors.CorrClusterSelector.CorrClusterSelector(pname='CorrCluster', corr_threshold=0.95, freq='Q', ftype='f13612w', fw=None, col_price='adjusted', hlength=1)

Bases: NullSelector

Selects symbols with lower inter-correlation.

  • pname : str - portfolio name

  • mkt : pandas.DataFrame - selection’s market data

  • symb : list - selected symbols

  • symb_omitted : list - unselected symbols

  • capital : float - always set to 1


getSelection(mktdata, **params)

Computes the selection.

__init__(pname='CorrCluster', corr_threshold=0.95, freq='Q', ftype='f13612w', fw=None, col_price='adjusted', hlength=1)


pnamestr, optional

Selector name. The default is ‘DualMomentum’.

corr_thresoldfloat, optional

Cluster correlation threshold (i.e., a cluster contains only symbols with inter-correlation higher than corr_threshold. The default is 0.95.

freqstr, optional

The horizon of rates subject to correlation estimations. It can be either ‘M’ for monthly or ‘Q’ for quarterly rates. The default is ‘Q’.

ftypestr, optional

Inner-cluster filter (i.e., criteria to designate the representative of a cluster with more than one symbol). At this point only ‘f13612w’ is implemented. The default is ‘f13612w’.

fwlist, optional

List of filter wights. For ‘f13612w’ it must be a list of 4 positive (not all zero) numbers. A value of None indicates equal weights. Note: the weights are normalized internally. The default is None.

col_pricestr, optional

The name of the pricing column to be considered in computations. The default is ‘adjusted’.

hlength‘float’, optional

History length in number of years used for calibration. A fractional number will be rounded to an integer number of months. The default is 3.25 years.

The object.
getSelection(mktdata, **params)

Computes the selection.


MkT data in the format produced by the azapy function readMkT.

**paramsdict, optional
Other optional parameters:
verboseBoolean, optional

When it is set to True, the selection symbols are printed. The default is ‘False’.

viewBoolean, optional

If set to True, then the dendrogram of hierarchical classification is printed out. The default is False. Note: the tree cutoff is at 1 - corr_threshold level.

(capital, mkt)tuple

Fraction of capital allocated to the selection. For this selector it is always 1.


Selection MkT data in the format produced by the azapy function readMkT.


Example CorrClusterSelctor

# Examples
import numpy as np

import azapy as az
print(f"azapy version {az.version()}", flush=True)

# collect market data
mktdir = '../../MkTdata'
sdate = '2012-01-01'
edate = '2021-07-27'

symb = ['GLD', 'TLT', 'IHI', 'VGT', 'OIH',
        'XAR', 'XBI', 'XHE', 'XHS', 'XLB',
        'XLE', 'XLF', 'XLI', 'XLK', 'XLU', 
        'XLV', 'XLY', 'XRT', 'SPY', 'ONEQ', 
        'QQQ', 'DIA', 'ILF', 'XSW', 'PGF', 
        'IDV', 'JNK', 'HYG', 'SDIV', 'VIG', 
        'SLV', 'AAPL', 'MSFT', 'AMZN', 'GOOG', 
        'IYT', 'VGI', 'IWM', 'BRK-B', 'ITA' ]

mktdata = az.readMkT(symb, sdate=sdate, edate=edate, file_dir=mktdir, 

# CorrClusterSelector

selector = az.CorrClusterSelector()

capital, mkt = selector.getSelection(mktdata)

print(f"As of {edate}\n"
      f"capital at risk: {capital}\n"
      f"selected symbols: {mkt.symbol.unique()}\n"
      f"selected {len(mkt.symbol.unique())} out of {len(symb)} symbols\n"
      f"symbols omitted: {list(np.setdiff1d(symb, mkt.symbol.unique()))}")