Find locations of repeated values?

Question

Jacqueline 2013-7-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/82088-find-locations-of-repeated-values

So, I have this function that takes a set of data and finds if there are values that repeat for more than 300 seconds in that data set...\

function FindRepetition(TruckVariableName)

setpref('Internet','SMTP_Server','lamb.corning.com');

data1 = (TruckVariableName);
x = length(TruckVariableName);
data = reshape(data1, 1, x); 
datarep = ~diff(data) & data(2:x) ~= 0; %binary data -- 1 means repeats, 0 means different, excludes repetitive zeros
%if the difference in the data at each point is zero, and if the data at
%that point isn't itself zero, return true. 2:x means difference array is equal to the length of the data array, matrix dimensions must be the same or &
%cannot be used
datarepstr = num2str(datarep); %convert to string
s = regexprep(datarepstr,' ',''); %remove spaces
[startindex,runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start
l = cellfun('length',runs); %find the length of each run
y = l > 300;
if any(y) %if any run is longer than 5 minutes, display message
  %sendmail('johnsonlj2@corning.com', '2011 KENWORTH ISX15','A data fault has been detected - Prolonged data repetition');
  disp('--An error has occurred - Prolonged data repetition.');
  disp('Errors occurred at'); 
end
end

I want to find WHERE those repeated values start in that set of data. I tried disp(find(y));, but that finds the locations of the data set y, which is not the original data set. Anyone know how I can find the locations of data1 where the data repeats for more than 300 seconds?

2 个评论
显示无隐藏无

Cedric 2013-7-15

编辑：Cedric 2013-7-15

Could you provide a sample dataset or the content of this TruckVariableName that you pass to your function?

Jacqueline 2013-7-15

One of my variables is engine speed, and the data is collected for over 95,000 seconds. A chunk of the data may look like this...

1055.25000000000 777.250000000000 771.750000000000 1112.37500000000 1151.37500000000 1447 1447 1447 1447 1447 1447 1447 1447 668.625000000000 803.750000000000 850.250000000000 693.625000000000 1069.37500000000 868.500000000000 985.875000000000 1085.87500000000 1148 1065.62500000000 978.250000000000 885.750000000000 723.125000000000 638.125000000000 678.500000000000 807.500000000000 692.750000000000 814.875000000000

See how 1447 is repeated? Say that was repeating for more than 300 seconds. My script would use the ~diff function and replace the non-repeating numbers with 0s and the repeating numbers with 1s. Then it finds were the ones repeat for more than 300 seconds. When I use find(y) though, it finds locations but they don't correspond to the original data set

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Cedric 2013-7-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/82088-find-locations-of-repeated-values#answer_91801

编辑：Cedric 2013-7-15

在 MATLAB Online 中打开

I think that you can use two approaches. I'll illustrate with a simple example: say we have the following data

>> data = [7 8 8 8 8 6 6 7 8 7 7 7] ;

and we want to get blocks of repeating values with at least 3 elements.

1. Based on your REGEXP method, you would indeed look for the position of streams of 1's larger than a given value.

 >> rep = ~diff(data)                            % Add other components if needed.
 rep =
     0     1     1     1     0     1     0     0     0     1     1
 >> repStr = sprintf('%d', rep)
 repStr =
     01110100011
 >> start = regexp(repStr, '1{2,}', 'start')     % 3 similar values -> 2 
 start =                                         % repetitions.
     2    10

2. Without conversion to string and REGEXP:

 >> buffer = [true, diff(data)~=0]
 buffer =
     1     1     0     0     0     1     0     1     1     1     0     0
 >> groupStart = find(buffer)
 groupStart =
     1     2     6     8     9    10
 >> groupId = cumsum(buffer)
 groupId =
     1     2     2     2     2     3     3     4     5     6     6     6
 >> groupSize = accumarray(groupId.', ones(size(groupId))).'
 groupSize =
     1     4     2     1     1     3
 >> start = groupStart(groupSize > 2)
 start =
     2    10

EDIT: note that the 2nd method is more than 5 times faster than the 1st on large datasets.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Cedric 2013-7-15

编辑：Cedric 2013-7-15

在 MATLAB Online 中打开

In your command window, type

doc sprintf

then, in the SPRINTF documentation, look up formatSpec, which describes all the format conversion specifiers. %d is for integer, which means that elements of rep are interpreted as integers and converted to string as such.

Jacqueline 2013-7-15

Thank you!

请先登录，再进行评论。

Answer 2

Muthu Annamalai 2013-7-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/82088-find-locations-of-repeated-values#answer_91795

在 MATLAB Online 中打开

Guessing from reading the code, and the comments in the code itself, you are looking for the variable, startindex

[startindex,runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start

So just add this to your return value from the function, and you should be all set.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Jacqueline 2013-7-15

That finds the starting point of where there are more than one 1s in a data set of 1s and zeros. The length of that string is different than my original string, which is where I need to find the locations of the repeating values

请先登录，再进行评论。

Find locations of repeated values?

2 个评论
显示无隐藏无

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Find locations of repeated values?

2 个评论 显示 无隐藏 无

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

2 个评论
显示无隐藏无

3 个评论
显示 1更早的评论隐藏 1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论