Dimension reduction methods of machine learning are suited for detecting latent factors of a broad set of asset prices. These factors can then be used to improve estimates of the covariance structure of price changes and – by extension – to improve the construction of a well-diversified minimum variance portfolio. Methods for dimension reduction include sparse principal components analysis, sparse partial least squares, and autoencoders. Both static and dynamic factor models can be built. Hyperparameters tuning can proceed in rolling training and validation samples. Empirical analysis suggests that that machine learning adds value to factor-based asset allocation in the equity market. Investors with moderate or conservative risk preferences would realize significant utility gains.

The below are quotes from the paper and some other sources which are linked next to the quote. Emphasis, headings, and text in brackets have been added for clarity.

This post ties in with this site’s summary on “Quantitative methods for macro information efficiency“, particularly the section on dimension reduction.

### Using factors for reduction of portfolio variance

“The presence of __factor structure in asset returns has been widely accepted in the economic literature__…We examine the characteristics and benefits of latent factors generated from machine learning dimensionality reduction techniques for asset allocation. The analysis is conducted under the framework of factor-based covariance matrices used to construct minimum-variance portfolios.”

“__We focus…on minimum-variance portfolios…It requires only estimates of the covariance matrix__, which are often considered to be more accurate than the estimates of the means [that are used in] the mean-variance criterion of Markowitz [and] that have been found to be the principal source of estimation risk.”

“Although the minimum-variance framework avoids the problem of estimation error associated with expected returns, its performance remains crucially dependent on the quality of the estimated covariance matrix. __To lessen the impact of covariance misspecification on the optimal weights, we impose a factor structure on the covariance matrix__, which reduces the number of parameters to be estimated…It has been shown [in previous academic work] that introducing factor structure to the covariance matrix can improve portfolio performance.”

“In addition to having observed or latent factors, factor models can be static, such as in the arbitrage pricing theory…or dynamic.”

### Using machine learning to construct factors

“We __examine the economic value of latent factors generated using a variety of supervised and unsupervised dimensionality reduction methods__…In addition to classical approaches, such as principal component analysis (PCA) and partial least squares (PLS), their respective regularized versions that induce sparsity through a penalty in the objective function are also considered. We also investigate the performance of factors generated by autoencoders; a type of unsupervised neural network used for dimensionality reduction.”

“We describe classical dimensionality reduction techniques used to generate the latent factors, along with their extensions from the machine learning literature, which rely on regularization and neural networks. __The alternative methods we consider are similar in that the dimensionality of the data is reduced by mapping the set of ____ predictors to a smaller set of ____ combinations of the original variables__…

**Principal component analysis**(PCA)__derives the latent factors in an unsupervised way__, based only on information from the predictors. PCA produces the weight matrix [based on] the covariance structure between predictors…The first principal component of the predictor set…has the largest sample variance amongst all linear combinations of the columns of the predictors.**Sparse principal component analysis**(SPCA)…is based on the regression/reconstruction property of PCA and produces__modified principal components with sparse weights, such that each principal component is a linear combination of only a few of the original predictors__…PCA can be viewed in terms of a ridge regression problem and by adding the L1 penalty (penalty the increases linearly with coefficient size) they convert it to an elastic net regression, which allows for the estimation of sparse principal components.- In
**partial least squares**(PLS) the__factors are constructed in a supervised way, by using information from both the predictors and the response__…constructing linear combinations based on both sets…PLS computes weights that account for the covariation between the predictors and the response. **Sparse partial least squares**(SPLS) is an extension of PLS that__imposes the L1 penalty to promote sparsity onto a surrogate weight vector__instead of the original weight vector while keeping [the two vectors] close to each other.**Autoencoders**…are a type of__unsupervised neural network that can be used for dimensionality reduction__. Autoencoders have a similar structure to feed-forward neural networks, which have been shown to be universal approximators for any continuous function. However, an autoencoder differs in that the number of inputs is the same as the number of outputs and that it is used in an unsupervised context.__Autoencoders have also been shown to be nonlinear generalizations of principal component analysis__. The goal of…autoencoders is to learn a parsimonious representation of the original input data through a bottleneck structure…Autoencoders use non-linear activation functions to discover non-linear representations of the data.

The encoder__creates a compressed representation of the set of predictor data when the input variables pass through the units in the hidden layers__, which are then decompressed to the output layer through the decoder. By placing constraints on the network, such as limiting the number of hidden units, it is forced to learn a compressed representation of the input, potentially uncovering an interesting structure of the data. Most often the encoding and decoding parts of an autoencoder are symmetrical, in that they both feature the same number of hidden layers with the same number of hidden units per layer. The output of the decoder is most commonly used to validate information loss, while the smallest hidden layer of the encoder (or code, at the bottleneck of the network) corresponds to the dimension-reduced data representation.

“After the factor model is estimated…the __covariance matrix of returns is obtained by its decomposition into two components: the first is based on the factor loadings and the factor covariance matrix, while the second is the covariance matrix of the errors__…We focus on exact factor models where the covariance matrix of the residuals is diagonal by assuming cross-sectional independence.”

“[We] __introduce dynamic factor models as an extension__…A dynamic factor model is one in which at least one of the following three generalizations holds true: (i) the intercept and factor loadings are time-varying, (ii) the covariance matrix of the factors is time-varying or (iii) the covariance matrix of the errors is time-varying…There are various definitions of dynamic factor models, the one we follow in this study is a model that allows the factor loadings to be time-varying.”

“The __machine learning models used to derive the latent factors rely on hyperparameter tuning__. The choice of hyperparameters controls the amount of model complexity and is critical for the performance of the model. Specifically, we adopt the validation sample approach, in which the optimal set of values for the tuning parameters is selected in the validation sample…we maintain the temporal ordering of the data…Specifically, __in each iteration of the rolling window, the in-sample is split into two disjointed periods, the training subsample, consisting of 80% of the observations [and]the validation subsample__. In the training subsample the model is estimated for several sets of values of the tuning parameters. The [validation] subsample is used to select the optimal set of tuning parameters, by using the latent factor weight and loading estimates for each set of hyperparameters from the training sample. Forecasts are constructed for the observations in the validation sample.”

### Key empirical findings

“We explore the impact that the proposed latent factors have on the structure of factor-based covariance matrices and to the composition and performance of minimum-variance portfolios.”

“We evaluate the different factor and covariance specifications by constructing minimum-variance portfolios based on individual stock return data for a sample period spanning 60 years. Overall, our findings suggest that __machine learning adds value to factor-based asset allocation__. In the baseline case, machine learning leads to portfolios that significantly outperform the equal-weighted benchmark… __Investors with moderate or conservative risk preferences would realize statistically significant utility gains__…The best-performing methods to generate the covariance matrix are autoencoders and sparse principal component analysis.”

“In addition, machine learning can improve factor-based portfolio optimization when performance is measured using alternative risk metrics. Covariance matrices based on autoencoders and sparse PCA outperform the equal-weighted portfolio by up to 2.9%, 1.26% and 1.57% per annum, in terms of mean absolute deviation, Value-at-Risk and Conditional Value-at-Risk, respectively.”

“The improved performance can be attributed to two aspects.

- First,
__factor-based covariance matrices tend to significantly reduce the risk of a portfolio consisting of individual stocks__. This finding remains robust in an out-of-sample setting, using different risk measures, across covariance and factor specifications, for a varying number of assets, alternative portfolio objective formulations and when transaction costs are taken into account. - Second, we demonstrate that using machine learning can
__lead to significant economic gains__. For example, using a factor-implied covariance based on machine learning, can lead to a decrease in out-of-sample portfolio standard deviation of up to 29% and an increase in the Sharpe ratio of over 25%.

“The results show that __machine learning yields factors that cause the covariances and portfolio weights to diverge from those based on commonly used estimators__. Latent factors produced by PCA and PLS-type methods exhibit a stronger connection with well-known factors (such as those from the Fama and French five-factor model) throughout the out-of-sample period, compared to factors based on autoencoders. Furthermore, the covariance matrices whose structure deviates most from the sample estimator are based on unsupervised methods or allow the residual covariance matrix to be time-varying. Portfolios based on machine learning also have weights that are smaller, vary less over time and are more diversified, than models based on observed factors. Covariance matrices based on unsupervised methods also lead to portfolios with lower turnover and thus reduced sensitivity to transaction costs.”

“__Shallow learning outperforms deeper learning__, which can be attributed to the small size of the data set and the low signal-to-noise ratio…Additionally, __unsupervised methods tend to perform better than supervised methods__.”