count the change between a row and the next, in a column

1 次查看(过去 30 天)
I have a series of data in a column, and I want to count how many times the data, between one row and the next, change its content (my column is a categorical column). Below, I put an example:
Here, I have 4 changes. The dimension of my timetable is 78397x5. How can I do? Thanks in advance for the help.

采纳的回答

Paolo
Paolo 2018-6-28
编辑:Paolo 2018-6-28
For categorical array c
chararr = cellstr(c);
num = cellfun(@(x) regexp(x,'(\d)*$','match','once'), chararr,'un',0);
nchange = nnz(diff(str2double(num)));
  7 个评论
Erica Corradi
Erica Corradi 2018-6-29
编辑:Erica Corradi 2018-6-29
I want to see how many times my patient gets out of the bed room at night (whose PIR sensor corresponding to the bedroom is the dbuid-20) and walks around the home. Once counted the times when it comes out of the bedroom, I want to see how many times it goes to the bathroom (which is the dbuid-17) and the average time of stay. For this count, I can ignore rows that do not contain dbuid-XX. the PIR sensor generates a row of my table every time it detects a movement, every 3 seconds. To calculate the average time spent in the bathroom, I have to consider the consecutive rows in which dbuid-17 is present. You consider that I have a timetable, so the time for each line is expressed as date and HH: min: sec. I have to do this calculation every time I have successive rows of dbuid-17
Adam Danz
Adam Danz 2018-6-29
编辑:Adam Danz 2018-6-29
Erica, what didn't work with my solution? Do you have access to grp2idx()?
I updated my solution to use your tab_night data as input rather than my fake data (the algorithm didn't change). I also added the solution to count the number of changes between _20 and _17. Note that this solution should work on any and all category types and is not dependent on the naming if the category.
For you data, there were 12468 total changes in category and there were 927 changes from _20 to _17 (which is a lot of trips to the bathroom).

请先登录,再进行评论。

更多回答(2 个)

Adam Danz
Adam Danz 2018-6-28
编辑:Adam Danz 2018-6-29
A solution is to convert your categorical data to numerical using gr2idx() if you have the statistics toolbox. Then you can differentiate the numerical array and count how many times the differentiation isn't zero (ie, count the number of times there is a change). Note that this solution will work with any category names. Using the data you attached, here's how to count the number of changes:
% Copy list of all categories
cats = tab_night{:,1};
% Convert to numerical values
n = grp2idx(cats);
% Alternatively, as @Guillaume pointed out: n = double(c);
% now differentiate and count the number of changes
count = sum(diff(n)~=0);
If you want to know the index value where the change occurs, these are the row numbers:
find(diff(n)~=0)+1
If you want to count the number of times the category changes from ..._20 to ..._17 here's how to do that:
% Count changes from dbuid-20 to dbuid-17
% First we need to find which values of 'n' correspond to the 2 categories requested
dbuid17 = n(find(cats == 'dbuid-17',1)); %=4
dbuid20 = n(find(cats == 'dbuid-20',1)); %=7
% now we need to count the number of times n changes from 7 to 4
% 4 minus 7 is -3 so if we differentate n we just need to count the number of '-3's
% along with the number of times it changed to category _17.
nDiff = diff(n);
changIdx = nDiff==(dbuid17-dbuid20) & n(2:end)==dbuid17;
nChanges = sum(changIdx);
Finally, to confirm that everything worked well, use the 'changIdx' to look at which categories were flagged as a change from _20 to _17:
cats(find(changIdx)) %Should all be dbuid-20
cats(find(changIdx)+1) %should all be dbuid-17
If you don't have access to the grp2idx function this won't work. Otherwise, it works for all category types.
  4 个评论
Erica Corradi
Erica Corradi 2018-6-29
The PIR sensors detect the movement of the person and send the signal every 3 seconds: this means that every row of my table represents the moment when the PIR sensor has detected movement; it will send a new signal after 3 seconds (even if the person keeps moving in the room within 3 seconds). For the average time calculation, column1type data does not help.
Adam Danz
Adam Danz 2018-6-29
编辑:Adam Danz 2018-6-29
Hi Erica, you've got a nice chunk of work to do on this and it's an interesting project so I hope you succeed. I'll break down the steps you need to do but there will inevitably be additional bumps along the way.
  1. You need to determine when someone is in room X. You should be able to use my code to get you started on that. For example, you can identify any time someone is in the room 'dbuid20' by the logical vector created by n==dbuid20. This will give you a string of [0 0 0 1 1 1 1 1 0 0 0 0 ....] where the 1s indicated presence in the room.
  2. Using that logical vector, you need to figure out when each string of 1s start and end. For example, [0 0 0 1 1 1 1 0 0 0 1 1 0 0] in that string the ones start at [4,11] and end at [7, 12]; Hint: if the logical vector is named 'inRoomX', you can use the output of diff(inRoomX) which will have a 1 or -1 for entering or leaving the room. However, you may need to clean up the data in cases where there is only 1 sample within the room (ie, the visit was <=3 seconds).
  3. What you ultimately want is two logical vectors: one that marks room entry and one that marks room exit. Once you have that, you can pull the corresponding time stamps from your data and calculate the amount of time in the room. Follow this example below.
% Create fake exitRoom and enterRoom logical vectors. In each of them, 1 indicates enter or exit.
exitRoom = false(size(tab_night.t));
exitRoom(1:1000:50001) = true;
enterRoom = false(size(tab_night.t));
enterRoom(400:1000:50400) = true;
% Get enter times
EnterTime = tab_night.t(enterRoom);
% Get exit times
ExitTime = tab_night.t(exitRoom);
% Subtract the two, assuming that all exit times come after enter times
% 'timeInRoom' should all be positive!! If not you have problems.
timeInRoom = ExitTime - EnterTime;
%How much total time in room?
sum(timeInRoom)

请先登录,再进行评论。


Guillaume
Guillaume 2018-6-29
编辑:Guillaume 2018-6-29
Hum, unless I'm very mistaken, wouldn't the simplest solution be to convert the array to double, then nnz(diff()) that?
%build demo array
c = categorical({'dbuid-20'; 'dbuid-17'});
c = c(repelem([1 2 1 2 1], [7 3 5 3 1]))
%result
ndiff = nnz(diff(double(c))) %all done!
No need for the statistical toolbox and works with all categorical arrays.

类别

Help CenterFile Exchange 中查找有关 Data Preprocessing 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by