Main Content

distanceProfile

Compute distance profile between query subsequence and all other subsequences of a time series

Since R2024b

Description

Return Distance Profile

DP = distanceProfile(X,len,loc) returns the distance profile (vector of z-normalized Euclidean distances) between a query subsequence of the time series X and every subsequence in X that has the same length len. The query begins at the time series position loc.The query subsequence is therefore defined by X(loc:loc+len-1).

example

[DP,I] = distanceProfile(___) also returns the vector I of the starting indices of the subsequences that best match the query subsequence.

example

[___] = distanceProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to exclude matches near the query starting position, set ExcludeTrivialMatches to true.

Plot Distance Profile

distanceProfile(___) plots an interactive plot of the distance profile, with overlays for the query, the motif (best match to query), and the discord (worst match to query). You can move the vertical selection lines in the plot to find the top motif and discord of any other data segments n the time series.

You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Load the data, which consists of T1. T1 is a timetable containing armature current measurements on a degrading DC motor.

load matrix_profile_data T1

T1 is known to have an anomalous segment with length 100, starting at location 9797. Use this segment as the query segment.

X = T1.MotorCurrent;
len = 100;
loc = 9797;

Calculate the distance profile.

[D,I] = distanceProfile(X,len,loc);

Display the first two elements of index vector I and the corresponding distances in D.

I(1:2)
ans = 2×1

        2617
        9368

D(I(1)),D(I(2))
ans = 
8.3894
ans = 
8.4532

For comparison, display the value of the largest distance.

max(D)
ans = 
18.4828

Plot the distance profile.

distanceProfile(X,len,loc);

Figure contains 3 axes objects. Axes object 1 with title Time Series, xlabel Time, ylabel Data contains 5 objects of type line, constantline. These objects represent Data, Query (i=9797), Motif (i=2617), Discord (i=567). Axes object 2 with title Distance Profile, xlabel Time, ylabel Distance contains 3 objects of type line, constantline, patch. These objects represent Distance, Exclusion Zone. Axes object 3 with title Subsequences, xlabel Time, ylabel Data contains 3 objects of type line. These objects represent Query (i=9797), Motif (i=2617), Discord (i=567).

  • The top plot shows the time series. The query appears at location 9797. A motif, or match to the query, occurs at location 2617.

  • The middle plot shows the distance profile with an exclusion zone around the query location.

  • The bottom plot shows the query subsequence, the motif subsequence (best match) and the discord subsequence (worst match).

Move the vertical selection lines to find the top motif and discord of any other data segments in the time series.

The distanceProfile plot displays only the top match. If you are interested in viewing more matches, you can extract, plot, and compare subsegments using the values in X and I.

Input Arguments

collapse all

Time series to evaluate , specified as a numeric vector of length N. X must not have any missing data.

Length of query subsequence, specified as an integer. len must be less than time series length N.

Starting position of subsequence, specified as an integer. loc must be less than time series length N.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: DP = distanceProfile(X,10,20,ExcludeTrivialMatch=true) excludes subsequence matches near the query subsequence starting position of 20.

Option to set exclusion zone around the starting position loc of the query sequence, specified as true or false. Setting this option to true excludes matches of the query subsequence with itself.

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. Setting this parameter when 'ExcludeTrivialMatch' is true results in the setting of values of DP to NaN within the exclusion zone.

Option for controlling output length when X ends with a partial subsequence, specified as one of the following options:

  • "discard" — Truncate the length of the output vectors DP and I to N-len+1, where N is the length of X.

  • "fill" — Extend the length of distance and index to N by padding DP with len-1 NaNs. The software sets the last len-1 elements of the vector I to the sequence N-len+2:N.

Output Arguments

collapse all

Distance profile containing the z-normalized distances between a query sequence of time series X and each subsequence of the time series of the same length len, returned as a numeric vector.

The length of DP is equal to the length of X when X ends with a complete subsequence with respect to len. When X ends with a partial subsequence, the value of EndPoints further modifies the length of DP by truncation or fill.

When ExcludeTrivialMatch is true, elements of DP near the query starting location loc are set to NaN, with the number of elements determined by the value of ExclusionZoneLength.

Starting indices for subsequences X(I(k):I(k)+len-1) of X that best match the query subsequence of x(loc:loc+len-1), returned as an integer vector.

I is ordered to sort DP(I) in ascending order of distances, that is, from the best match (smallest distance) to the worst match (largest distance). The best match therefore has the starting location of DP(I(1)) and the worst match has the starting location of DP(I(N-len+1).

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

Version History

Introduced in R2024b