Make classification with huge dataset
显示 更早的评论
I'm trying to make classification with huge dataset containing 6 persons for training and here I'm getting this error from only 1 person dataset: "Requested 248376x39305 (9.1GB) array exceeds maximum array size preference." First of all I'm trying Bagged Tree and Neural Network classificators and I want to ask how can I do it? It's possible to learn these classificators in portions of datasets (learn saved classification model again)?
9 个评论
Greg Heath
2016-11-7
Please explain how 248376 x 39305 constitutes a 1 person data set
[ I N ] = size(input)
[ O N ] = size(target)
Thanks,
Greg
Mindaugas Vaiciunas
2016-11-7
编辑:Walter Roberson
2016-11-7
Walter Roberson
2016-11-7
Please show your Tree Bagging code. https://www.mathworks.com/help/stats/treebagger.html does not return matrices.
Mindaugas Vaiciunas
2016-11-7
Walter Roberson
2016-11-7
Have you considered reducing the number of trees?
Mindaugas Vaiciunas
2016-11-8
Greg Heath
2016-11-9
编辑:Greg Heath
2016-11-9
I still don't get it
39305/765
ans =
51.3791
Regardless, I think you should use dimensionality reduction via feature extraction.
Hope this helps,
Greg
Mindaugas Vaiciunas
2016-11-9
Greg Heath
2016-11-10
Of course it will affect it. However, the way to choose is to set a limit on the loss of accuracy.
回答(1 个)
Walter Roberson
2016-11-7
0 个投票
Add more memory (RAM) to you computer. Then check or adjust Preferences -> MATLAB -> Workspace -> MATLAB array size limit.
Or, you could set the division ratios so that a much smaller fraction is used for training and validation, with most of it left for test. This effectively uses only a small subset of the data, but a different small subset each time it trains.
6 个评论
Mindaugas Vaiciunas
2016-11-7
Walter Roberson
2016-11-7
Amazon Web Services, among other providers, make available machines with more than 36 Gb of RAM. If you had that much RAM your program would run; therefore adding RAM is a solution for the problem.
Mindaugas Vaiciunas
2016-11-8
Walter Roberson
2016-11-8
https://www.mathworks.com/products/parallel-computing/matlab-parallel-cloud/ 16 workers, 60 Gigabytes, $US 4.32 per hour educational pricing, including compute services.
Or if you provide your own EC2 instance, https://www.mathworks.com/products/parallel-computing/parallel-computing-on-the-cloud/distriben-ec2.html $0.07 per worker per hour for the software licensing from MATLAB. For example you could do https://aws.amazon.com/ec2/pricing/on-demand/ m4.4xlarge, 16 cores, 64 gigabytes, $US 0.958 per hour for the EC2 service. Between that and the $0.07 per worker from Mathworks it would come in less than $US2.50 per hour. About the price of a Starbucks "Grande" coffee.
Remember, your time is not really "free". At the very least you need to take into account "opportunity costs" -- like an hour spent fighting a memory issue is an hour you could have been working on a minimum wage job.
Mindaugas Vaiciunas
2016-11-9
Walter Roberson
2016-11-9
Let me put it this way:
- You do not with to reduce the number of trees or the data because doing so might decrease the recognition rate
- We do not have a magic low-memory implementation of the TreeBagger available.
- You do not have enough memory on your system to run the classification using the existing software
Your choices would seem to be:
- write the classifier yourself, somehow not using as much memory; or
- obtain more memory for your own system; or
- obtain use of a system with more memory
类别
在 帮助中心 和 File Exchange 中查找有关 Licensing on Cloud Platforms 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!