Main Content

Reduce Dimensionality

Reduce dimensionality using Principal Component Analysis (PCA) in Live Editor

Since R2022b

Description

The Reduce Dimensionality Live Editor task enables you to interactively perform Principal Component Analysis (PCA). The task generates MATLAB® code for your live script and returns the resulting transformed data to the MATLAB workspace.

Using the Reduce Dimensionality Live Editor task, you can:

  • Determine the number of components required to explain the variance of a fixed percentage of the data, such as 95% or 99%.

  • Create a scree plot of explained variances of the principal components.

  • Create a scatter plot of two principal components.

  • Create a biplot of two principal components.

  • Obtain the transformed data.

For general information about Live Editor tasks, see Add Interactive Tasks to a Live Script.

Reduce Dimensionality task in Live Editor

Open the Task

To add the Reduce Dimensionality task to a live script, perform one of these actions:

  • On the Live Editor tab, select Task > Reduce Dimensionality; or on the Insert tab, select Task > Reduce Dimensionality.

  • In a code block in the live script, type a relevant keyword, such as pca or reduce. Select Reduce Dimensionality from the suggested command completions.

Examples

expand all

Load the cities data set.

load cities

In the File section of the Home tab, click New Live Script.

New Live Script button

In the Code section of the Live Editor tab, click Task to open the task gallery. Under Statistics and Machine Learning, click Reduce Dimensionality.

Select Input data > ratings.

Select ratings as the input data

Run the task by the diagonal striped bar on the left of the Live Editor window, or by pressing Ctrl+Enter. By default, the task creates three plots.

Scree plot with four components to explain 95% of the variance

Scatter plot of two principal components

Biplot of two principal components

The software returns the transformed data to the workspace as a variable named transformedData (by default). You can edit this name.

Default name is transformedData

Load the moore data set.

load moore

Convert the data into a table.

tbl = array2table(moore);

In the File section of the Home tab, click New Live Script.

New Live Script button

In the Code section of the Live Editor tab, click Task to open the task gallery. Under Statistics and Machine Learning, click Reduce Dimensionality.

Select Input data > tbl.

Select tbl as the input data

Run the task by clicking the diagonal striped bar on the left of the Live Editor window, or by pressing Ctrl+Enter. By default, the task creates three plots.

Scree plot requires only 2 components to explain almost all of the variance

Scatter plot of two principal components: component 1 from -4000 to 5000, and component 2 from -1500 to 2500

Biplot of two principal components showing three visible lines, two large lines at the southeast and northeast, and one small line near the x-axis.

Related Examples

Parameters

expand all

Specify the data to reduce by selecting a variable from the available workspace variables. The variable can be a numeric matrix or a table.

Specify the criterion for reducing the dimensionality of the data.

  • Explained variance (%) — Specify the percentage of variance to explain, a nonnegative scalar from 0 through 100. If you specify 100, then the result retains all principal components.

  • Number of components— Specify from 1 through the number of columns of data. If you specify the number of columns of data, then the result retains all principal components.

Regardless of the criterion you specify, you can plot all the principal components. The reduction criterion changes only the number of columns in the returned, transformed data; the plots can use all the transformed data before reduction.

To display plots of the principal components, select from the available options:

  • Select Scree plot to display the percentage of the variance explained by each principal component as a bar chart. The cumulative percentages appear as a line plot above the bars. The task uses the bar function to create the bar chart and the plot function to plot the cumulative percentages.

  • Select 2D scatter plot to display the principal components of the data in a 2D scatter plot. The task uses either the scatter function or the gscatter function to create the scatter plot, depending on whether you specify a grouping variable.

  • Select 2D biplot to plot the data as a 2D biplot. The task uses the biplot function to create the biplot.

Tips

  • By default, the Reduce Dimensionality task does not run automatically when you modify the task parameters. To have the task run automatically after any change, select the Autorun button at the top right of the task. If your data set is large, enabling this option can cause the task to run slowly.

Version History

Introduced in R2022b

See Also

| | | |