Use Parallel Processing for Regression TreeBagger
Workflow
This example shows you how to:
Use an ensemble of bagged regression trees to estimate feature importance.
Improve computation speed by using parallel computing.
The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3.
Load the sample data and separate it into predictor and response arrays.
load imports-85;
Y = X(:,1);
X = X(:,2:end);
Set up the parallel environment to use the default number of workers. The computer that created this example has six cores.
mypool = parpool
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). mypool = ProcessPool with properties: Connected: true NumWorkers: 6 Cluster: local AttachedFiles: {} AutoAddClientPath: true IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true
Set the options to use parallel processing.
paroptions = statset('UseParallel',true);
Estimate feature importance using leaf size 1
and 5000
trees in parallel. Time the function for comparison purposes.
tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1,'Options',paroptions); toc
Elapsed time is 9.873065 seconds.
Perform the same computation in serial for timing comparison.
tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1); toc
Elapsed time is 28.092654 seconds.
The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.
See Also
parpool
(Parallel Computing Toolbox) | statset
| TreeBagger