What Is Statistics and Machine Learning Toolbox?
Statistics and Machine Learning Toolbox™ provides tools for accessing, preprocessing, and visualizing data; extracting features; training and optimizing models; and preparing models for deployment.
The typical workflow begins with accessing, cleaning, and preprocessing your data in preparation for extracting features. The toolbox supports all widely used classification, regression, and clustering algorithms, and it makes the challenging parts of model building easier with:
• Point-and-click apps for training and comparing models
• Automatic hyperparameter tuning and feature selection for optimizing model performance
• Scaling processing to big data and clusters using the same code
• Fast execution compared to popular open source tools
With MATLAB Coder™ you can automatically generate C/C++ code from machine learning models for use in embedded and high-performance applications.
Published: 1 May 2021
The statistics and machine learning tool box provides tools for discovering patterns and selecting features, training classification or regression models with apps, and deploying to enterprise and embedded systems. In this example, a regression model predicts future loads in electric grids using multiple sources of data including timestamped historical electric load data and weather data. You can start exploring with descriptive statistics and visualizations including box plots to compare means and variances, dendrograms to reveal clustering and structure.
After preprocessing your data in MATLAB, you can identify which variables to select as features based on high correlations between predictors and response. Have principal component analysis identify transformed features that account for the majority of the data variability or use automated feature selection methods.
With the classification and regression Learner app you can interactively build predictive classification or regression models including nearest neighbor, decision trees, and shallow neural networks. Optimize hyperparameters, compare results from multiple models and cross-validation to a separate test data, and visualize performance with confusion matrices or ROC curves. Many of the toolbox algorithms work with out-of-memory data, without requiring any code changes. Once you've settled on a machine learning model you can deploy that model to IT systems using MATLAB compiler or generate standalone c-code that can be used on embedded devices with MATLAB Coder.
You can incrementally update linear models with new data and also update embedded models without regenerating the prediction code. The statistics and machine learning tool box offers a variety of statistical functions including hypothesis tests, ANOVA, and industrial statistics. To get started refer to an example, the information on the product page, or download a free trial below.