Find locations of repeated values?
7 次查看(过去 30 天)
显示 更早的评论
So, I have this function that takes a set of data and finds if there are values that repeat for more than 300 seconds in that data set...\
function FindRepetition(TruckVariableName)
setpref('Internet','SMTP_Server','lamb.corning.com');
data1 = (TruckVariableName);
x = length(TruckVariableName);
data = reshape(data1, 1, x);
datarep = ~diff(data) & data(2:x) ~= 0; %binary data -- 1 means repeats, 0 means different, excludes repetitive zeros
%if the difference in the data at each point is zero, and if the data at
%that point isn't itself zero, return true. 2:x means difference array is equal to the length of the data array, matrix dimensions must be the same or &
%cannot be used
datarepstr = num2str(datarep); %convert to string
s = regexprep(datarepstr,' ',''); %remove spaces
[startindex,runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start
l = cellfun('length',runs); %find the length of each run
y = l > 300;
if any(y) %if any run is longer than 5 minutes, display message
%sendmail('johnsonlj2@corning.com', '2011 KENWORTH ISX15','A data fault has been detected - Prolonged data repetition');
disp('--An error has occurred - Prolonged data repetition.');
disp('Errors occurred at');
end
end
I want to find WHERE those repeated values start in that set of data. I tried disp(find(y));, but that finds the locations of the data set y, which is not the original data set. Anyone know how I can find the locations of data1 where the data repeats for more than 300 seconds?
2 个评论
采纳的回答
Cedric
2013-7-15
编辑:Cedric
2013-7-15
I think that you can use two approaches. I'll illustrate with a simple example: say we have the following data
>> data = [7 8 8 8 8 6 6 7 8 7 7 7] ;
and we want to get blocks of repeating values with at least 3 elements.
1. Based on your REGEXP method, you would indeed look for the position of streams of 1's larger than a given value.
>> rep = ~diff(data) % Add other components if needed.
rep =
0 1 1 1 0 1 0 0 0 1 1
>> repStr = sprintf('%d', rep)
repStr =
01110100011
>> start = regexp(repStr, '1{2,}', 'start') % 3 similar values -> 2
start = % repetitions.
2 10
2. Without conversion to string and REGEXP:
>> buffer = [true, diff(data)~=0]
buffer =
1 1 0 0 0 1 0 1 1 1 0 0
>> groupStart = find(buffer)
groupStart =
1 2 6 8 9 10
>> groupId = cumsum(buffer)
groupId =
1 2 2 2 2 3 3 4 5 6 6 6
>> groupSize = accumarray(groupId.', ones(size(groupId))).'
groupSize =
1 4 2 1 1 3
>> start = groupStart(groupSize > 2)
start =
2 10
EDIT: note that the 2nd method is more than 5 times faster than the 1st on large datasets.
3 个评论
更多回答(1 个)
Muthu Annamalai
2013-7-15
Guessing from reading the code, and the comments in the code itself, you are looking for the variable, startindex
[startindex,runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start
So just add this to your return value from the function, and you should be all set.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!