Cleaning string and reaction time data

I've got data that is in two columns.
The first column is string data that houses animal names. The second column is numeric data that houses reaction times to produce said animal names.
However, there are instances where a participant provides an out of bound response (a non animal for instance). All out of bound responses have been recoded to read OTHER.
I've been working on code to eliminate the OTHERs from the free responses and then recalculate the reaction times from their last in-bound item to their next in-bound item (essentially a sum of reaction times across all out of bound items).
For instance, example data could look something like this: https://imgur.com/czO5gub
Eliminating the OTHERs in the first column was pretty straight-forward with
DAT_STR = DAT_STR(cellfun('isempty',strfind(DAT_STR,'OTHER')));
However, I am struggling to clean the reaction time data.
My first attempt was to create a logical array to determine whether 'OTHER' was present in a given row using:
logic_array = strcmp(DAT_STR,'OTHER')
From here I can find the corresponding row numbers:
row_nos = find(logic_array == 1);
And then I was going to loop through them in a for loop that looked something like this:
for k = 1:length(row_nos)
DAT_RT((row_nos(k))+1) = DAT_RT(row_nos(k)) + DAT_RT((row_nos(k))+1);
end
This loop basically takes the RT for the OTHER response and just adds the RT to the next in bound item, which is intended. The loop works really well for one-off 'OTHER' responses; however, it does a terrible job if there are consecutive 'OTHER' values in a row.
I've been beating my head against the wall trying to figure this out lol. My next attempt was to create 'start' and 'stop' values when there are consecutive 'OTHER' responses. Below is my attempt at that (warning: it doesn't work lol, the logic is off)
for k = 1:length(row_nos)
if DAT_STR((row_nos(k))+1) == 'OTHER'
logconsec = diff(row_nos)==1;
D = diff([0,logconsec',0]);
first1 = row_nos(D>0);
last1 = row_nos(D<0);
for j = 1:length(first1)
DAT_RT((last1(j))+1) = sum(DAT_RT((first1(j)):(last1(j))));
end
else
DAT_RT((row_nos(k))+1) = DAT_RT(row_nos(k)) + DAT_RT((row_nos(k))+1);
end
end
The thought behind this section was to look ahead one row and if the next row == 'OTHER', then treat it as consecutive OTHERS and use the first/last values. Else, it should do the typical addition that works well in the one-off cases.
I feel like I'm spinning my wheels and overcomplicating things without really making any progress, so any guidance or insight is greatly appreciated!!

4 个评论

Always helpful if you attach the data file or at least show a samle of the data the illustrates the issue and also shows what the expected result should be.
That aside, you'll be a whole lot better off if you keep the numeric data as numeric; turn the missing values into NaN instead of stuffing a string into a numeric variable. It's ok to build a corollary indicator variable if you want, but it's probably not really needed here although it might be useful as a grouping variable depending on just what it is you're wanting to do.
It's not clear to me what you want the result to be -- if you don't have a valid number for the missing response, what are you summing as substitute?
Thank you for the reply! Sorry if I wasn't clear, my brain was fried from trying to work through this.
The data is in two separate variables, I combined them for the imgur link posted above for simplicity.
Essentially, I have the string array of free responses stored in DAT_STR. Those may look something like this:
The reaction time data is stored in DAT_RT which is a double array and looks like this:
I'll make up some data in excel to show what the goal is, here is the made-up sample data:
Columns A and B would be the raw data. Columns E and F show the goal for cumulating responses across OTHERs, notice the RT for dog on row 8 includes all the previous OTHER RTs. Columns I and J show the final goal for the cleaned data set.
Essentially, the problem is with accumulating consecutive RTs for OTHER responses and storing them in the next in-bound response.
We can do nothing with images and aren't going elsewhere to look for stuff...post in the forum itself; use the toolset provided.
Sorry I couldn't figure out how to embed the images directly in the post. Thanks anyway.

请先登录,再进行评论。

 采纳的回答

I suspect that the standardizeMissing and/or fillmissing functions will be of interest to you.

更多回答(0 个)

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by