Need help for large scale portfolio optimisation

Hi, I am recently working on a project that is related to portfolio optimisation where the investible universe contains roughly 6000 to 10000 assets(depending on the dates). The data set is large and there are roughly 1000 data points for each asset. The objective function in this problem is x'Vx, where x is the weight vector and V is the covariance matrix. V is singular, not invertible and not postive definite, since there are more assets than data points. I tried using fmincon and quadprog solvers from MATLAB, but the results are unsatisfactory since the solvers stop basically due to the fact that the maximum number of iterations has been reached. Enhancing this number (to 30000) gives more accurate, but still not correct, results in both solvers and also it is easy to cause out of memory problem.
I am wondering if there is any other alternative solvers that I can used to get this around. Any suggestions will be greatly appreciated.

回答(1 个)

For a quadratic objective function you would want to use quadprog so you are right in this sense, fmincon will just take longer, but is necessary if you have non-linear constraints. In what sense is the answer "not correct"?

4 个评论

Hi Brendan, thanks for your reply.
By "not correct", I mean the fmincon solver is stopped because of maximum number of iterations reached but not because of a (local) minimum found. I also set up the options to display the iterations and found that the values of the objective function at each iteration are not converging, resulting in large fluctuations in simple returns.
The constraints of this problem are simply weights summing up to 1 and long-only.
I have also used another much smaller dataset (10 stocks and 2000 data points for each stock) to test this optimisation framework, and the solvers are able to find a minimum.
This may be an issue due to a singular covariance matrix, or an ill-conditioned covariance matrix.
You can check these via:
rank(V)
cond(V)
This would not be a surprise given the large number of variables to samples. There are some things you can do to mitigate this problem.
1. Add a small regularization parameter to the main diagonal (I use a covariance matrix computed from 252 days of stock prices in 2006 for 450 assets on the S&P).
rank(c)
ans =
249
rank(c+diag(repmat(10*eps,length(c),1)))
ans =
450
10*eps
ans =
2.2204e-15
Notice how small this regularization parameter is. It will have negligible effect on the solution.
The condition number has also changed drastically:
cond(c)
ans =
9.9006e+19
cond(c+diag(repmat(10*eps,length(c),1)))
ans =
1.3600e+13
2. If you have missing data points you must be careful how you are calculating the covariance matrix. If you are using pairwise data you will have different dimensionality and no guarantee of a positive symmetric definite matrix. You are better off omitting any observations which contain NaNs
3. You could always use some sort of shrinkage estimator with regularized condition numbers. For instance see: CONDITION NUMBER REGULARIZED COVARIANCE ESTIMATION
I hope this helps!
Thanks for your suggestions Brendan and here are some results based on the three advices.
Firstly, I added a small number to the diagonals of the covariance matrix and it indeed gave me a full rank after that. However, the fmincon still stops because the maximum number of iteration is reached.
Secondly, I was indeed using the nancov MATLAB function with the 'Pairwise' option. I estimated the covariance matrix again without the 'pairwise' option, which means the rows that contain NaNs will be deleted in the calculation. I checked the values in the covariance matrix and compared them with the case with 'pairwise' option, and I found that the one without the 'pairwise' option has larger variances and covariances. I am using weekly data and the length of this dataset is not long and therefore deleting data will be last thing I want to do.
Thirdly, I haven't tried the method in the paper you mentioned yet, but I tried using the shrinkage estimator proposed by Ledoit and Wolf(2004) to estimate the covariance matrix of the dataset and ran the optimisation with quadratic programming. It is able to find minimums but the portfolio is underperforming for the whole time period, which I think there might be something going wrong.
It might be possible that your constraints are not set appropriately, so I would double check these first.
With the number of assets in question it is likely a good idea to increase the number of iterations MaxIter and perhaps even increase the tolerances associated with termination TolX and TolFun. Additionally, it will likely be helpful if you can get diagnostic information from the solver to ensure that convergence is occurring. With that in mind consider passing in options such as:
opt = optimoptions('quadprog','TolFun',...
1e-6,'TolX',1e-6,'Display','iter-detailed')
Depending on what your constraints are, you may be able to switch the algorithm to the trust-region-reflective method and provide Hessian information. There is a good example of this here:

请先登录,再进行评论。

类别

帮助中心File Exchange 中查找有关 Quadratic Programming and Cone Programming 的更多信息

提问:

2015-8-12

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by