Use Parallel Processing for Regression TreeBagger Workflow
This example shows you how to:
- Use an ensemble of bagged regression trees to estimate feature importance. 
- Improve computation speed by using parallel computing. 
The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3.
Load the sample data and separate it into predictor and response arrays.
load imports-85;
Y = X(:,1);
X = X(:,2:end);Set up the parallel environment to use the default number of workers. The computer that created this example has six cores.
mypool = parpool
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).
mypool = 
 ProcessPool with properties: 
            Connected: true
           NumWorkers: 6
              Cluster: local
        AttachedFiles: {}
    AutoAddClientPath: true
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true
Set the options to use parallel processing.
paroptions = statset('UseParallel',true);Estimate feature importance using leaf size 1 and 5000 trees in parallel. Time the function for comparison purposes.
tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1,'Options',paroptions); toc
Elapsed time is 9.873065 seconds.
Perform the same computation in serial for timing comparison.
tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1); toc
Elapsed time is 28.092654 seconds.
The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.
See Also
parpool (Parallel Computing Toolbox) | statset | TreeBagger