Main Content

predict

Predict responses for new observations from kernel incremental learning model

Since R2022a

    Description

    example

    label = predict(Mdl,X) returns the predicted responses (or labels) label of the observations in the predictor data X from the incremental learning model Mdl.

    example

    [label,score] = predict(Mdl,X) also returns classification scores for all classes when Mdl is an incremental learning model for classification.

    Examples

    collapse all

    Create an incremental learning model by converting a traditionally trained kernel model, and predict responses using both models.

    Load the 2015 NYC housing data set. For more details on the data, see NYC Open Data.

    load NYCHousing2015

    Extract the response variable SALEPRICE from the table. For numerical stability, scale SALEPRICE by 1e6.

    Y = NYCHousing2015.SALEPRICE/1e6;
    NYCHousing2015.SALEPRICE = [];

    To reduce computational cost for this example, remove the NEIGHBORHOOD column, which contains a categorical variable with 254 categories.

    NYCHousing2015.NEIGHBORHOOD = [];

    Create dummy variable matrices from the other categorical predictors.

    catvars = ["BOROUGH","BUILDINGCLASSCATEGORY"];
    dumvarstbl = varfun(@(x)dummyvar(categorical(x)),NYCHousing2015, ...
        InputVariables=catvars);
    dumvarmat = table2array(dumvarstbl);
    NYCHousing2015(:,catvars) = [];

    Treat all other numeric variables in the table as predictors of sales price. Concatenate the matrix of dummy variables to the rest of the predictor data.

    idxnum = varfun(@isnumeric,NYCHousing2015,OutputFormat="uniform");
    X = [dumvarmat NYCHousing2015{:,idxnum}];

    Fit a kernel regression model to the entire data set.

    Mdl = fitrkernel(X,Y)
    Mdl = 
      RegressionKernel
                  ResponseName: 'Y'
                       Learner: 'svm'
        NumExpansionDimensions: 2048
                   KernelScale: 1
                        Lambda: 1.0935e-05
                 BoxConstraint: 1
                       Epsilon: 0.0549
    
    
    

    Mdl is a RegressionKernel model object representing a traditionally trained kernel regression model.

    Convert the traditionally trained kernel regression model to a model for incremental learning.

    IncrementalMdl = incrementalLearner(Mdl)
    IncrementalMdl = 
      incrementalRegressionKernel
    
                        IsWarm: 1
                       Metrics: [1x2 table]
             ResponseTransform: 'none'
        NumExpansionDimensions: 2048
                   KernelScale: 1
    
    
    

    IncrementalMdl is an incrementalRegressionKernel model object prepared for incremental learning.

    The incrementalLearner function initializes the incremental learner by passing model parameters to it, along with other information Mdl extracted from the training data. IncrementalMdl is warm (IsWarm is 1), which means that incremental learning functions can start tracking performance metrics.

    An incremental learner created from converting a traditionally trained model can generate predictions without further processing.

    Predict sales prices for all observations using both models.

    ttyfit = predict(Mdl,X);
    ilyfit = predict(IncrementalMdl,X);
    compareyfit = norm(ttyfit - ilyfit)
    compareyfit = 0
    

    The difference between the fitted values generated by the models is 0.

    To compute posterior class probabilities, specify a logistic regression incremental learner.

    Load the human activity data set. Randomly shuffle the data.

    load humanactivity
    n = numel(actid);
    rng(10) % For reproducibility
    idx = randsample(n,n);
    X = feat(idx,:);
    Y = actid(idx);

    For details on the data set, enter Description at the command line.

    Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2).

    Y = Y > 2;

    Create an incremental logistic regression model for binary classification. Prepare it for predict by fitting the model to the first 10 observations.

    Mdl = incrementalClassificationKernel(Learner="logistic");
    initobs = 10;
    Mdl = fit(Mdl,X(1:initobs,:),Y(1:initobs));

    Mdl is an incrementalClassificationKernel model. All its properties are read-only.

    Simulate a data stream, and perform the following actions on each incoming chunk of 50 observations:

    1. Call predict to predict classification scores for the observations in the incoming chunk of data. The classification scores are posterior class probabilities for logistic regression learners.

    2. Call rocmetrics to compute the area under the ROC curve (AUC) using the classification scores, and store the result.

    3. Call fit to fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observations.

    numObsPerChunk = 50;
    nchunk = floor((n - initobs)/numObsPerChunk);
    auc = zeros(nchunk,1);
    
    % Incremental learning
    for j = 1:nchunk
        ibegin = min(n,numObsPerChunk*(j-1) + 1 + initobs);
        iend   = min(n,numObsPerChunk*j + initobs);
        idx = ibegin:iend; 
        [~,posteriorProb] = predict(Mdl,X(idx,:));  
        mdlROC = rocmetrics(Y(idx),posteriorProb,Mdl.ClassNames);
        auc(j) = mdlROC.AUC(2);  
        Mdl = fit(Mdl,X(idx,:),Y(idx));
    end

    Mdl is an incrementalClassificationKernel model object trained on all the data in the stream.

    Plot the AUC for the incoming chunks of data.

    plot(auc)
    xlim([0 nchunk])
    ylabel("AUC")
    xlabel("Iteration")

    The plot suggests that the classifier predicts moving subjects well during incremental learning.

    Input Arguments

    collapse all

    Incremental learning model, specified as an incrementalClassificationKernel or incrementalRegressionKernel model object. You can create Mdl directly or by converting a supported, traditionally trained machine learning model using the incrementalLearner function. For more details, see the corresponding reference page.

    You must configure Mdl to predict labels for a batch of observations.

    • If Mdl is a converted, traditionally trained model, you can predict labels without any modifications.

    • Otherwise, you must fit Mdl to data using fit or updateMetricsAndFit.

    Batch of predictor data, specified as a floating-point matrix of n observations and Mdl.NumPredictors predictor variables.

    Note

    predict supports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.

    Data Types: single | double

    Output Arguments

    collapse all

    Predicted responses (labels), returned as a categorical or character array; floating-point, logical, or string vector; or cell array of character vectors with n rows. n is the number of observations in X, and label(j) is the predicted response for observation j.

    • For regression problems, label is a floating-point vector.

    • For classification problems, label has the same data type as the class names stored in Mdl.ClassNames. (The software treats string arrays as cell arrays of character vectors.)

      The predict function classifies an observation into the class yielding the highest score. For an observation with NaN scores, the function classifies the observation into the majority class, which makes up the largest proportion of the training labels.

    Classification scores, returned as an n-by-2 floating-point matrix when Mdl is an incrementalClassificationKernel model. n is the number of observations in X. score(j,k) is the score for classifying observation j into class k. Mdl.ClassNames specifies the order of the classes.

    If Mdl.Learner is 'svm', predict returns raw classification scores. If Mdl.Learner is 'logistic', classification scores are posterior probabilities.

    More About

    collapse all

    Classification Score

    For kernel incremental learning models for binary classification, the raw classification score for classifying the observation x, a row vector, into the positive class (second class in Mdl.ClassNames) is

    f(x)=β0+T(x)β,

    where

    • T(·) is a transformation of an observation for feature expansion.

    • β0 is the scalar bias.

    • β is the column vector of coefficients.

    The raw classification score for classifying x into the negative class (first class in Mdl.ClassNames) is –f(x). The software classifies observations into the class that yields the positive score.

    If the kernel classification model consists of logistic regression learners, then the software applies the "logit" score transformation to the raw classification scores.

    Version History

    Introduced in R2022a