distanceProfile

Compute distance profile between query subsequence and all other subsequences of a single-variable or multivariable time series

Since R2024b

collapse all in page

Syntax

DP = distanceProfile(X,len,loc)

[DP,DPI] = distanceProfile(___)

[___] = distanceProfile(___,Name=Value)

distanceProfile(___)

Description

Return Distance Profile

DP = distanceProfile(X,len,loc) returns the distance profile (vector of z-normalized Euclidean distances) between a query subsequence of the time series X and every subsequence in X with the same length len.

If X is a vector, then the software treats it as a single channel
If X is a matrix, then you can control whether the software computes the distance profile for each time series channel individually or cumulatively using the Type name-value argument.

The query begins at the time series position loc. The query subsequence is therefore defined by whether X is a vector or a matrix:

Vector — X(loc:loc+len-1)
Matrix with K columns — X(loc:loc+len-1,k).

example

[DP,DPI] = distanceProfile(___) also returns the vector DPI of the starting indices of the subsequences that best match the query subsequence.

example

[___] = distanceProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to exclude matches near the query starting position, set ExcludeTrivialMatches to true.

Plot Distance Profile

distanceProfile(___) plots an interactive plot of the distance profile, with overlays for the query, the motif (best match to query), and the discord (worst match to query). You can move the vertical selection lines in the plot to find the top motif and discord of any other data segments in the time series.

You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Compute and Plot Distance Profile

Open Live Script

Load the data, which consists of T1. T1 is a timetable containing armature current measurements on a degrading DC motor.

load matrix_profile_data T1

T1 is known to have an anomalous segment with length 100, starting at location 9797. Use this segment as the query segment.

X = T1.MotorCurrent;
len = 100;
loc = 9797;

Calculate the distance profile.

[DP,DPI] = distanceProfile(X,len,loc);

Display the first two elements of index vector DPI and the corresponding distances in DP.

DPI(1:2)

DP(DPI(1)),DP(DPI(2))

ans = 
8.3894

ans = 
8.4532

For comparison, display the value of the largest distance.

max(DP)

ans = 
18.4828

Plot the distance profile.

distanceProfile(X,len,loc);

The top plot shows the time series. The query appears at location 9797. A motif, or match to the query, occurs at location 2617.
The middle plot shows the distance profile with an exclusion zone around the query location.
The bottom plot shows the query subsequence, the motif subsequence (best match) and the discord subsequence (worst match).

Move the vertical selection lines to find the top motif and discord of any other data segments in the time series.

The distanceProfile plot displays only the top match. If you are interested in viewing more matches, you can extract, plot, and compare subsegments using the values in X and DPI.

Input Arguments

collapse all

`X` — Time series to evaluate
numeric vector | numeric matrix

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

`len` — Length of query subsequence
integer

Length of the query subsequence, specified as an integer. len must be less than the length n of the time series.

`loc` — Starting position of query subsequence
integer

Starting position of the query subsequence, specified as an integer. loc must be less than the length n of the time series.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: DP = distanceProfile(X,10,20,ExcludeTrivialMatch=true) excludes subsequence matches near the query subsequence starting position of 20.

`ExcludeTrivialMatch` — Option to set exclusion zone around query sequence
`true` (default) | `false`

Option to set exclusion zone around the starting position loc of the query sequence, specified as true or false. Setting this option to true excludes matches of the query subsequence with itself.

`ExclusionZoneLength` — Length of exclusion zone
`ceil(len/2)` (default) | integer

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. Setting this parameter when ExcludeTrivialMatch is true results in the setting of values of DP to NaN within the exclusion zone.

`EndPoints` — Method for handling query windows near endpoints
`"discard"` (default) | `"fill"`

Method for handling query windows near the endpoints of x, specified as one of these options:

"discard" — Truncate the length of the output vectors DP and DPI to n – len + 1, where n is the length of X.
"fill" — Extend the length of DP and DPI to n by padding DP with len – 1 NaNs. The software sets the last len – 1 elements of the vector DPI to the sequence n-len+2:n.

`Type` — Computation options for matrix-based input
`"individual"` (default) | `"cumulative"`

Computation options when X is a matrix, specified as one of the following approaches:

"individual" — Compute the distance profile of each channel separately.
"cumulative" — Combine the distance profiles of each channel using the cumulative average of sorted distance profile values.

Output Arguments

collapse all

`DP` — Distance profile
numeric vector | numeric matrix

Distance profile containing the z-normalized distances between a query sequence of time series X and each subsequence of the time series of the same length len, returned as a numeric vector.

The length of DP is equal to n or n – len + 1, depending on the setting for EndPoints. Here, n is the length of X.

When ExcludeTrivialMatch is true, elements of DP near the query starting location loc are set to NaN, with the number of elements determined by the value of ExclusionZoneLength.

`DPI` — Starting indices for best matching subsequences
positive integer vector | positive integer matrix

Starting indices for subsequences X(DPI(k):DPI(k)+len-1) of X that best match the query subsequence of X(loc:loc+len-1), returned as an integer vector.

The elements of DPI sort the elements of DP(DPI) in ascending order of distances, that is, from the best match (smallest distance) to the worst match (largest distance). The best match, therefore, has the starting location of DP(DPI(1)), and the worst match has the starting location of DP(DPI(n-len+1).

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The distanceProfile function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray (Parallel Computing Toolbox). For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2024b

distanceProfile

Syntax

Description

Return Distance Profile

Plot Distance Profile

Examples

Compute and Plot Distance Profile

Input Arguments

X — Time series to evaluate numeric vector | numeric matrix

len — Length of query subsequence integer

loc — Starting position of query subsequence integer

Name-Value Arguments

ExcludeTrivialMatch — Option to set exclusion zone around query sequence true (default) | false

ExclusionZoneLength — Length of exclusion zone ceil(len/2) (default) | integer

EndPoints — Method for handling query windows near endpoints "discard" (default) | "fill"

Type — Computation options for matrix-based input "individual" (default) | "cumulative"

Output Arguments

DP — Distance profile numeric vector | numeric matrix

DPI — Starting indices for best matching subsequences positive integer vector | positive integer matrix

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`X` — Time series to evaluate
numeric vector | numeric matrix

`len` — Length of query subsequence
integer

`loc` — Starting position of query subsequence
integer

`ExcludeTrivialMatch` — Option to set exclusion zone around query sequence
`true` (default) | `false`

`ExclusionZoneLength` — Length of exclusion zone
`ceil(len/2)` (default) | integer

`EndPoints` — Method for handling query windows near endpoints
`"discard"` (default) | `"fill"`

`Type` — Computation options for matrix-based input
`"individual"` (default) | `"cumulative"`

`DP` — Distance profile
numeric vector | numeric matrix

`DPI` — Starting indices for best matching subsequences
positive integer vector | positive integer matrix

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.