describe
Description
describe(
prints the description
of the features generated by Transformer
)Transformer
. Create the
FeatureTransformer
object Transformer
by using the
gencfeatures
or
genrfeatures
function.
describe(
prints the description of the features identified by Transformer
,Index
)Index
.
Examples
Generate features from a table of predictor data by using gencfeatures
. Inspect the generated features by using the describe
object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl)
Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' } {'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' } {'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' } {'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'} {'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime
and RestorationTime
, have data types that are not supported by classifier training functions like fitcensemble
.
Generate 25 features from the predictors in Tbl
that can be used to train a bagged ensemble. Specify the Region
table variable as the response.
Transformer = gencfeatures(Tbl,"Region",25,TargetLearner="bag")
Transformer = FeatureTransformer with properties: Type: 'classification' TargetLearner: 'bag' NumEngineeredFeatures: 22 NumOriginalFeatures: 3 TotalNumFeatures: 25
The Transformer
object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe
object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ _________________________________________________________________________________________________________________
Loss Numeric true Loss ""
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
sdn(OutageTime) Numeric false OutageTime "Serial date number from 01-Feb-2002 12:18:00"
woe3(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = SouthEast)"
doy(OutageTime) Numeric false OutageTime "Day of the year"
year(OutageTime) Numeric false OutageTime "Year"
kmd1 Numeric false Loss, Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd5 Numeric false Loss, Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
quarter(OutageTime) Numeric false OutageTime "Quarter of the year"
woe2(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = NorthEast)"
year(RestorationTime) Numeric false RestorationTime "Year"
month(OutageTime) Numeric false OutageTime "Month of the year"
Loss.*Customers Numeric false Loss, Customers "Loss .* Customers"
tods(OutageTime) Numeric false OutageTime "Time of the day in seconds"
⋮
The Info
table indicates the following:
The first three generated features are original to
Tbl
, although the software converts the originalCause
variable to a categorical variablec(Cause)
.The
OutageTime
andRestorationTime
variables are not included as generated features because they aredatetime
variables, which cannot be used to train a bagged ensemble model. However, the software derives many of the generated features from these variables, such as the fourth featureRestorationTime-OutageTime
.Some generated features are a combination of multiple transformations. For example, the software generates the sixth feature
woe3(c(Cause))
by converting theCause
variable to a categorical variable and then calculating the Weight of Evidence values for the resulting variable.
Generate features from a table of predictor data by using genrfeatures
. Inspect the generated features by using the describe
object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl)
Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' } {'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' } {'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' } {'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'} {'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime
and RestorationTime
, have data types that are not supported by regression model training functions like fitrensemble
.
Generate 25 features from the predictors in Tbl
that can be used to train a bagged ensemble. Specify the Loss
table variable as the response.
rng("default") % For reproducibility Transformer = genrfeatures(Tbl,"Loss",25,TargetLearner="bag")
Transformer = FeatureTransformer with properties: Type: 'regression' TargetLearner: 'bag' NumEngineeredFeatures: 22 NumOriginalFeatures: 3 TotalNumFeatures: 25
The Transformer
object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe
object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ ___________________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
kmd2 Numeric false Customers "Euclidean distance to centroid 2 (kmeans clustering with k = 10)"
kmd1 Numeric false Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd4 Numeric false Customers "Euclidean distance to centroid 4 (kmeans clustering with k = 10)"
kmd5 Numeric false Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
kmd9 Numeric false Customers "Euclidean distance to centroid 9 (kmeans clustering with k = 10)"
cos(Customers) Numeric false Customers "cos( )"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
kmd6 Numeric false Customers "Euclidean distance to centroid 6 (kmeans clustering with k = 10)"
kmi Categorical false Customers "Cluster index encoding (kmeans clustering with k = 10)"
kmd7 Numeric false Customers "Euclidean distance to centroid 7 (kmeans clustering with k = 10)"
kmd3 Numeric false Customers "Euclidean distance to centroid 3 (kmeans clustering with k = 10)"
kmd10 Numeric false Customers "Euclidean distance to centroid 10 (kmeans clustering with k = 10)"
hour(RestorationTime) Numeric false RestorationTime "Hour of the day"
⋮
The first three generated features are original to Tbl
, although the software converts the original Region
and Cause
variables to categorical
variables.
Info(1:3,:) % describe(Transformer,1:3)
ans=3×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ______________ ______________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
The OutageTime
and RestorationTime
variables are not included as generated features because they are datetime
variables, which cannot be used to train a bagged ensemble model. However, the software derives some generated features from these variables, such as the tenth feature RestorationTime-OutageTime
.
Info(10,:) % describe(Transformer,10)
ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ___________________________ ________________________________________________________________
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
Some generated features are a combination of multiple transformations. For example, the software generates the nineteenth feature fenc(c(Cause))
by converting the Cause
variable to a categorical variable with 10 categories and then calculating the frequency of the categories.
Info(19,:) % describe(Transformer,19)
ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ______________ ____________________________________________________________________________________________________________
fenc(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Frequency encoding (number of levels = 10)"
Input Arguments
Feature transformer, specified as a FeatureTransformer
object.
Features to describe, specified as a numeric or logical vector indicating the position of the features, or a string array or cell array of character vectors indicating the names of the features.
Example: 1:12
Data Types: single
| double
| logical
| string
| cell
Output Arguments
Feature descriptions, returned as a table. Each row corresponds to a generated feature, and each column provides the following information.
Column Name | Description |
---|---|
Type | Indicates the data type of the feature, either numeric
or categorical
|
IsOriginal | Indicates whether the feature is an original feature
(true ) or an engineered feature
(false ) |
InputVariables | Indicates the original features used to generate the feature |
Transformations | Describes the transformations used to generate the feature, in the order they are applied — For more information, see Feature Transformations. |
Algorithms
This table provides additional information on some of the more complex feature
transformation descriptions in Info.Transformations
.
Sample Feature Name | Sample Transformation Description in Info | Additional Information |
---|---|---|
eb4(Variable) | Equal-width binning (number of bins = 4) | The software splits the Variable values into
4 bins of equal width. The resulting feature is a categorical
variable. |
fenc(Variable) | Frequency encoding (number of levels = 10) | The software calculates the frequency of the 10 categories
(or levels) in Variable . In the resulting feature, the software
replaces each categorical value with the corresponding category frequency,
creating a numeric variable. |
kmc1 | Centroid encoding (component #1) (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature corresponds to an observation and is the 1 st component
of the cluster centroid associated with that observation. The resulting feature is
a numeric variable. |
kmd4 | Euclidean distance to centroid 4 (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the Euclidean distance from the corresponding observation to the
centroid of the 4 th cluster. The resulting feature is a numeric
variable. |
kmi | Cluster index encoding (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the cluster index for the corresponding observation. The resulting
feature is a categorical variable. |
q50(Variable) | Equiprobable binning (number of bins = 50) | The software splits the Variable values into
50 bins of equal probability. The resulting feature is a
categorical variable. |
woe5(Variable) | Weight of Evidence (positive class = Class5) | This transformation is available for classification problems only. The software performs the following steps to create the resulting feature:
|
Version History
Introduced in R2021a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)