Matlab's sequentialfs.m provides a fast, but arguably sub-optimal, feature selection algorithm for linear or quadratic discriminant models. This submission provides a generally slower, but better optimized, forward selection algorithm. It sequentially selects predictors/features which improve cross-validated classification accuracy, using a cross-validation method of the user's choosing. The function provides the same cross-validation options as sequentialfs.m (Holdout, KFold, Leaveout), but also provides an additional customizable option, ‘sets’ (see help section within function). If two or more candidate features improve the model’s classification accuracy to the same degree (i.e., a “tie”), the algorithm proceeds to the next “depth” of candidate features, separately for each of the tied features. Proceeding to the next depth continues until one feature at the tied level is determined to unambiguously yield the best accuracy (in combination with the subsequent features at greater depths). The user can specify a maximum depth for which to search for "tie-breakers", or, by default, the algorithm can proceed to an unlimited depth (in practice, usually not more than 3-4). If the specified maximum depth is reached while comparing tied candidates, the algorithm will greedily select the tied feature in order of feature entry. If, at any point, additional features add no improvement to the model's classification accuracy, optimization ceases. If a tie persists after optimization ends, the tied feature in order of feature entry is selected.
Bootstrapping is now available to check each selected feature for significance and to generate confidence intervals for feature coefficients. Currently, this option can only be used for 2 category classification problems. A specified number of boostrapped samples (resamples with replacement) are generated, and a discriminant model is fitted to each using the selected features. 95% confidence intervals for each feature are calculated as the 2.5% and 97.5% of the sorted coefficient bootstrap distribution. P-values are calculated for the 2-tailed test that each feature's bootstrap distribution is significantly different from 0. Note: features are z-scored within each bootstrap sample so as to provide coefficients that are more comparable across features.
Elliot Layden (2023). selectFeatures (https://www.mathworks.com/matlabcentral/fileexchange/65716-selectfeatures), MATLAB Central File Exchange. 检索来源 .
平台兼容性Windows macOS Linux
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!
Fixed bootstrapping waitbar issue
Added a bootstrapping option to calculate 95% confidence intervals and p-values of selected features. The median or mean coefficient from each feature's bootstrap distribution could be taken as a more robust estimate of effect size.
Updated help info
Fixed history output
Corrected verbose output