Rearranging rows side by side based on a column value
1 次查看(过去 30 天)
显示 更早的评论
Hi,
I have an excel file with 300k samples (rows) and 40 columns. The first column is ID which has duplicate values and last column is about status and has binary values either 0 or 1.
I am looking to scan through this file and if the status column of first row is 0 it should copy the next row columns from 2 to 39 (excluising ID and status) and paste it where first row ends first row if the belong to same ID. This should happen for every other row with status 0 and it should copy only data related to same ID. Please see example below. From the expected output you can obvserve for ID 35 we didn't append anyt value for first sample as the status is 1 and for ID 35 the third sample is also not appended even if status is 0 as its the last row related to 35 and we cannot append ID 45 values,
ID Col1 Col2 Col3 Col4 Status
35 993 65 130 0 1
35 993 65 24 1 0
35 993 65 7 1 0
45 993 65 9 1 0
45 993 65 19 1 0
45 993 65 58 0 0
Expected Output:
ID Col1 Col2 Col3 Col 4 Status Col1 Col2 Col3 Col4
35 993 65 130 0 1
35 993 65 24 1 0 993 65 7 1
35 993 65 7 1 0
45 993 65 9 1 0 993 65 19 1
45 993 65 19 1 0 993 65 58 0
45 993 65 58 0 0
Thanks
4 个评论
Guillaume
2019-2-8
Oh, I didn't realise you wanted the filtered columns to be appended to the right of the same file. While it's perfectly doable, are you really sure you want this? I wouldn't think that repeated data and data with gaps in the rows is very practical? Wouldn't you rather have it as a separate file (with no gaps)?
As to the headers, it's up to you if you want them or not. It's a slightly different approach (table vs matrix) but the same amount of code either way.
采纳的回答
Guillaume
2019-2-11
I'm assuming that status 1 only happens once per ID. I'm also assuming that all rows of an ID are together.
t = readtable('Input_File.xlsx'); %read input data
[~, ~, subs] = unique(t.ID); %assign unique ID from 1 to n to each ID of table
hasstatus1 = accumarray(subs, t.Status, [], @any); %find IDs that have a status of 1
endrows = accumarray(subs, (1:height(t))', [], @max); %find last row of each ID
rowstodelete = t.Status == 1; %mark rows with status 1 for deletion
rowstodelete(endrows(hasstatus1)) = true; %and last row of ID which have a status of 1
From there, you can create a new table with only the rows and columns you want:
newtable = t(~rowstodelete, 2:end-1); %2:end-1 as you want to get rid of 1st and last column
writetable(newtable, 'NewFile.xlsx');
Or append to the existing table with gaps in row. This forces all appended columns to be cell arrays, which is more awkward:
newcontent = num2cell(t{:, 2:end-1});
newcontent(rowstodelete, :) = {[]};
newtable = [t, cell2table(newcontent, 'VariableNames', compose('%s_1', string(t.Properties.VariableNames(2:end-1))))];
writetable(newtable, 'NewFile2.xlsx');
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!