Preprocessing Time Series Data with MATLAB
This reference shows common use cases, but is by no means comprehensive.
Timetable
MATLAB datatype designed to organize and work with time series data.
Components of a Timetable
tt = timetable(times, var1, var2, ... ,varN);
(All variables must have the same number of rows.)
tt = table2timetable(t);
(The first datetime or duration variable in “t” becomes the row times.)
Timetable Manipulation
Access Data
These return the same array
Add a New Variable
tt.newVar = zeros(height(tt),1);
Change Variable Names
tt.properties.VariableNames = newNames;
(Names must be valid MATLAB identifiers)
Tip: Use matlab.lang.makevalidname to create valid names from potentially invalid names.
Resample Data Using Retime
tt = retime(tt,newtimes,method);
method is used to fill gaps after retiming, and has the same options as synchronize (see “Merge Timetables”).
Data Cleaning
Smooth Data
B = smoothdata(A,method);
Smooth noisy data with methods:
'movmean','movmedian','gaussian', 'lowess','loess','rlowess', 'rloess','sgolay'
Detect Outliers
TF = isoutlier(A,method);
Identify outliers with methods:
'median','mean','quartiles', 'grubbs','gesd'
Detect Change Points
TF = ischange(A,method);
Find abrupt changes with methods: 'mean','variance','linear'
Merge Timetables
Synchronize multiple timetables to a common time vector.
tt = synchronize(tt1,tt2,...,ttN);
Synchronizing often results in missing data points (times at which a variable was not measured). synchronize supports several methods for adjusting data to fill in gaps:
Fill: 'fillwithmissing','fillwithconstant'
Interpolation: 'linear','spline','pchip'
Nearest Neighbor: 'previous', 'next','nearest'
Aggregation: 'mean','min','max',@func,...
Missing Data
Find Missing Values
TF = ismissing(tt);
Fill Missing Values
tt = fillmissing(tt,method);
Replace missing values with values calculated from nearby points with methods:
'previous','next','nearest',
'linear','spline','pchip'
Remove Rows Containing Missing Values
tt = rmmissing(tt);
Big Data
Tall arrays extend MATLAB functions to work on data too big to load into memory.
Create a “tall” timetable:
% Create a datastore that points to % the data
ds = datastore('*.csv');
% Create a tall table from the % datastore
t = tall(ds);
% Convert to a timetable
tt = table2timetable(t);