Main Content

standardizeMissing

Insert standard missing values

Description

B = standardizeMissing(A,indicator) replaces values specified in indicator with standard missing values in A and returns a standardized array or table.

Missing values are defined according to the data type of A:

  • NaNdouble, single, duration, and calendarDuration

  • NaTdatetime

  • <missing>string

  • <undefined>categorical

  • {''}cell of character vectors

If A is a table, then the data type of each variable defines the missing value for that variable.

In addition to standardizing missing values, you can interactively find, fill, or remove missing data by adding the Clean Missing Data task to a live script.

example

B = standardizeMissing(___,Name,Value) specifies additional parameters for standardizing missing values using one or more name-value arguments. For example, standardizeMissing(A,indicator,'DataVariables',datavars) standardizes missing values in the variables specified by datavars when A is a table or timetable.

example

Examples

collapse all

Create a row vector and replace all instances of -99 with the standard missing value for double data types, NaN.

A = [0 1 5 -99 8 3 4 -99 16];
B = standardizeMissing(A,-99)
B = 1×9

     0     1     5   NaN     8     3     4   NaN    16

Create a table containing Inf and 'N/A' to represent missing values.

dblVar = [NaN;3;Inf;7;9];
cellstrVar = {'one';'three';'';'N/A';'nine'};
charVar = ['A';'C';'E';' ';'I'];
categoryVar = categorical({'red';'yellow';'blue';'violet';''});

A = table(dblVar,cellstrVar,charVar,categoryVar)
A=5×4 table
    dblVar    cellstrVar    charVar    categoryVar
    ______    __________    _______    ___________

     NaN      {'one'   }       A       red        
       3      {'three' }       C       yellow     
     Inf      {0x0 char}       E       blue       
       7      {'N/A'   }               violet     
       9      {'nine'  }       I       <undefined>

Replace all instances of Inf with NaN and replace all instances of 'N/A' with the empty character vector, ''.

B = standardizeMissing(A,{Inf,'N/A'})
B=5×4 table
    dblVar    cellstrVar    charVar    categoryVar
    ______    __________    _______    ___________

     NaN      {'one'   }       A       red        
       3      {'three' }       C       yellow     
     NaN      {0x0 char}       E       blue       
       7      {0x0 char}               violet     
       9      {'nine'  }       I       <undefined>

Replace instances of Inf and 'N/A' occurring in specified variables of a table with the standard missing value indicators.

Create a table containing Inf and 'N/A' to represent missing values.

a = {'alpha';'bravo';'charlie';'';'N/A'};
x = [1;NaN;3;Inf;5];
y = [57;732;93;1398;Inf];

A = table(a,x,y)
A=5×3 table
         a          x      y  
    ___________    ___    ____

    {'alpha'  }      1      57
    {'bravo'  }    NaN     732
    {'charlie'}      3      93
    {0x0 char }    Inf    1398
    {'N/A'    }      5     Inf

For the variables a and x, replace instances of Inf with NaN and 'N/A' with the empty character vector, ''.

B = standardizeMissing(A,{Inf,'N/A'},'DataVariables',{'a','x'})
B=5×3 table
         a          x      y  
    ___________    ___    ____

    {'alpha'  }      1      57
    {'bravo'  }    NaN     732
    {'charlie'}      3      93
    {0x0 char }    NaN    1398
    {0x0 char }      5     Inf

Inf in the variable y remains unchanged because y is not included in the DataVariables name-value argument.

Input Arguments

collapse all

Input data, specified as a vector, matrix, multidimensional array, table, or timetable. If A is a timetable, then standardizeMissing operates on the table data only and ignores NaT and NaN values in the vector of row times.

Data Types: double | single | char | string | cell | table | timetable | categorical | datetime | duration

Nonstandard missing value indicator, specified as a scalar, vector, or cell array. The elements of indicator define the values that standardizeMissing treats as missing. If A is an array, then indicator must be a vector. If A is a table or timetable, then indicator can also be a cell array with entries of multiple data types.

The data types specified in indicator match data types in the corresponding entries of A. The following are additional data type matches between the elements of indicator and elements of A:

  • double indicators match double, single, integer, and logical entries of A.

  • string and char indicators match categorical entries of A.

Example: B = standardizeMissing(A,'N/A') replaces the character vector 'N/A' with the empty character vector, ''.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char | string | cell | datetime | duration

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: standardizeMissing(T,indicator,'ReplaceValues',false)

Table variables to operate on, specified as one of the options in this table. The DataVariables value indicates which variables of the input table to fill.

Other variables in the table not specified by DataVariables pass through to the output without being standardized.

Indexing SchemeValues to SpecifyExamples

Variable names

  • A string scalar or character vector

  • A string array or cell array of character vectors

  • A pattern object

  • "A" or 'A' — A variable named A

  • ["A" "B"] or {'A','B'} — Two variables named A and B

  • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

Variable index

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 (false) values.

  • 3 — The third variable from the table

  • [2 3] — The second and third variables from the table

  • [false false true] — The third variable

Function handle

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric — All the variables containing numeric values

Variable type

  • A vartype subscript that selects variables of a specified type

  • vartype("numeric") — All the variables containing numeric values

Example: standardizeMissing(T,indicator,'DataVariables',["Var1" "Var2" "Var4"])

Replace values indicator, specified as one of these values when A is a table or timetable:

  • true or 1 — Replace input table variables containing missing entries with standardized table variables.

  • false or 0 — Append the input table with all table variables that were checked for missing entries. The missing entries in the appended variables are standardized.

For vector, matrix, or multidimensional array input data, ReplaceValues is not supported.

B is the same size as A unless the value of ReplaceValues is false. If the value of ReplaceValues is false, then the width of B is the sum of the input data width and the number of data variables specified.

Example: standardizeMissing(T,indicator,'ReplaceValues',false)

Algorithms

standardizeMissing treats leading and trailing white space differently for cell arrays of character vectors, character arrays, and categorical arrays.

  • For cell arrays of character vectors, standardizeMissing does not ignore white space. All character vectors must match exactly a character vector specified in indicator.

  • For character arrays, standardizeMissing ignores trailing white space.

  • For categorical arrays, standardizeMissing ignores leading and trailing white space.

Alternative Functionality

Live Editor Task

In addition to standardizing missing values, you can interactively find, fill, or remove missing data by adding the Clean Missing Data task to a live script.

Clean Missing Data task in the Live Editor

Extended Capabilities

Version History

Introduced in R2013b

expand all