subset
Create new ensemble datastore from subset of existing ensemble datastore
Since R2021a
Syntax
Description
The subset
function allows you to extract a representative
ensemble data set from a large ensemble datastore.
Use subset
especially when your source data is too large to easily
process and extract features from, as well as to import and experiment with your data in Diagnostic Feature
Designer.
subset
provides the following options that you can combine for creating
the reduced data set:
By index — Specify an index vector to extract the specific ensemble members you want.
By number of members in class or ensemble — Specify the number of members to select from each condition class or from the entire ensemble. You can also specify the number of members based on the size of the smallest or largest class. This option allows you to not only reduce the size of the ensemble, but to balance the classes in the ensemble for more effective model development.
By order — Specify the order in which members are selected, such as from the start of the original data or randomly.
By holdout — Partition selected data into training and test ensembles.
Specify Subset by Index
creates a new ensemble datastore sens
= subset(ens
,idx
)sens
from a subset of the existing
ensemble datastore ens
by extracting the ensemble members that
correspond to the indices in idx
.
Use this syntax when you want to perform ensemble operations on a specific ensemble member or group of ensemble member. For example, you can use this syntax to:
Extract only ensemble members with a specific fault condition.
Extract a single ensemble member with specific characteristics to isolate and explore member behavior.
Specify which members you want to extract using the index vector
idx
. You can then operate on your extracted ensemble using the same
techniques that you use for any data ensemble.
Specify Subset by Class
uses a subset that contains sens
= subset(ens
,ConditionVariable=cvName
,NumMembers=numMembers
)numMembers
members in each class.
reduces the size of only the largest class. sens
= subset(ens
,ConditionVariable=cvName
,ImbalancedClass="largest",SampleSize=sampleSize
)sampleSize
specifies the
reduced size of the largest class by decimal percentage or number of members. This syntax
is particularly useful when you have much more data representing healthy equipment than
faulty equipment.
specifies which members sens
= subset(___,SelectionOrder=selectionOrder
)subset
retains when reducing ensemble size.
You can use this syntax with any of the input-argument combinations in the Specify Subset
by Class syntax group.
Specify Subset for Unlabeled Data
extracts a subset that contains sens
= subset(ens
,NumMembers=numMembers
)numMembers
members. Use this syntax
when the data contains no labels, or classes, that can be used as condition values, or
when you want to operate on the ensemble as a whole without considering class
distribution.
specifies which members sens
= subset(ens
,NumMembers=numMembers
,SelectionOrder=selectionOrder
)subset
retains when reducing the size of the
ensemble.
Return Subset and Remainder Ensembles and Indices
Partition Ensemble into Training and Test Sets
[
specifies a random partition for holdout validation using trainsub
,trainidx
,testsub
,testidx
] = subset(ens,Holdout=holdout
)holdout
,
which can be an integer or a percentage expressed as a fraction, with the function
cvpartition
. When
holdout
is an integer, cvpartition
randomly
selects holdout
observations for the test set. When
holdout
is a value in the range (0,1),
cvpartition
randomly selects
holdout
*n observations, where
n is the number of members in ens
.
This syntax returns the training set and indices in trainsub
and
trainidx
, respectively, and the test set and indices in
testsub
and testidx
.