Rearranging rows side by side based on a column value

Question

0 个投票

Hi,

I have an excel file with 300k samples (rows) and 40 columns. The first column is ID which has duplicate values and last column is about status and has binary values either 0 or 1.

I am looking to scan through this file and if the status column of first row is 0 it should copy the next row columns from 2 to 39 (excluising ID and status) and paste it where first row ends first row if the belong to same ID. This should happen for every other row with status 0 and it should copy only data related to same ID. Please see example below. From the expected output you can obvserve for ID 35 we didn't append anyt value for first sample as the status is 1 and for ID 35 the third sample is also not appended even if status is 0 as its the last row related to 35 and we cannot append ID 45 values,

  ID Col1 Col2 Col3 Col4 Status
993  65  130    0     1 
993  65  24     1     0 
993  65  7      1     0 
993  65  9      1     0 
993  65  19     1     0 
993  65  58     0     0 

Expected Output:

 ID Col1 Col2 Col3 Col 4 Status    Col1 Col2 Col3 Col4 
993  65  130   0     1 
993  65  24    1     0        993  65   7     1
993  65  7     1     0         
993  65  9     1     0        993  65  19     1 
993  65  19    1     0        993  65  58     0     
993  65  58    0     0 

Thanks

4 个评论
显示 2更早的评论隐藏 2更早的评论

Guillaume 2019-2-8

Oh, I didn't realise you wanted the filtered columns to be appended to the right of the same file. While it's perfectly doable, are you really sure you want this? I wouldn't think that repeated data and data with gaps in the rows is very practical? Wouldn't you rather have it as a separate file (with no gaps)?

As to the headers, it's up to you if you want them or not. It's a slightly different approach (table vs matrix) but the same amount of code either way.

Sunny 2019-2-8

编辑：Sunny 2019-2-9

Guillaume thanks. i can have it as a seperate file . Your suggestion is correct as I would have deleted rows with gaps.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Guillaume 2019-2-11

在 MATLAB Online 中打开

0 个投票

I'm assuming that status 1 only happens once per ID. I'm also assuming that all rows of an ID are together.

t = readtable('Input_File.xlsx');  %read input data
[~, ~, subs] = unique(t.ID);  %assign unique ID from 1 to n to each ID of table
hasstatus1 = accumarray(subs, t.Status, [], @any);  %find IDs that have a status of 1
endrows = accumarray(subs, (1:height(t))', [], @max);  %find last row of each ID
rowstodelete = t.Status == 1;  %mark rows with status 1 for deletion
rowstodelete(endrows(hasstatus1)) = true;  %and last row of ID which have a status of 1

From there, you can create a new table with only the rows and columns you want:

newtable = t(~rowstodelete, 2:end-1);  %2:end-1 as you want to get rid of 1st and last column
writetable(newtable, 'NewFile.xlsx');

Or append to the existing table with gaps in row. This forces all appended columns to be cell arrays, which is more awkward:

newcontent = num2cell(t{:, 2:end-1});
newcontent(rowstodelete, :) = {[]};
newtable = [t, cell2table(newcontent, 'VariableNames', compose('%s_1', string(t.Properties.VariableNames(2:end-1))))];
writetable(newtable, 'NewFile2.xlsx');

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Sunny 2019-2-12

Thanks, it worked perfectly.

请先登录，再进行评论。

Rearranging rows side by side based on a column value

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Rearranging rows side by side based on a column value

4 个评论 显示 2更早的评论 隐藏 2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论