How to remove table data based on comparison between different columns in another table in MATLAB?

3 次查看(过去 30 天)
I have 2 tables in MATLAB- table A and table B, each having different dimensions (different no. of rows and columns). The first column of table A has the Date and time in format like 2018-11-01 12:00:00 (DateTime data format).
Now, in Table B, the Third and Fourth Column also consist of Date and Time in format like 2018-11-01 01:11:12:173000. What I would like to achieve is to remove all the rows (which are the data instances) from Table A, in case of which the Datetime for Table A falls in the range between the Date and Time in Table B. (To be more precise, suppose Table B has an entry of DateTime in third column for the first row/first data instance as 2018-11-10 12:30:00:173 and in fourth column as 2018-11-10 12:40:00:145, I would like to remove all data entries/rows from Table A, in case of which the DateTime Column value for Table A falls in the range of 2018-11-10 12:30:00:173 to 2018-11-10 12:40:00:145, as an example). This means that basically I would be removing the data in the aforesaid range from the Table A.
To approach this, the first thing which comes to my mind is to use inner join(), but, it is evident from Mathworks Community guidance that innerjoin() only matches the exact column value which I specify the Key as, but in this case, I would be looking at a range of DateTime values in 2 columns of table B, so perhaps this might not be the best approach. Using a for loop for this purpose might as well work, but would be quite complex and redundant with huge computational time on the large data in the tables. Any help in this regard would be highly appreciated.

采纳的回答

Guillaume
Guillaume 2018-11-28
编辑:Guillaume 2018-11-28
It's fairly simple to do. Annoyingly implicit expansion and bsxfun don't work with datetime so you have to use repmat to do the comparison.
%inputs: TableA with a column named date, TableB with a column named datestart and dateend
%replace by actual table and variable names.
datetocheck = repmat(TableA.date, 1, height(TableB)); %replicate in as many columns as there are rows in B
datestart = repmat(TableB.datestart', height(TableA), 1); %tranpose and replicate in as many rows as in A
dateend = repmat(TableB.dateend', height(TableA), 1);
toremove = any(datetocheck >= datestart & datetocheck <= dateend, 2);
TableA(toremove, :) = [];
If implicit expansion was implemented for datetime, you could have replaced the first 4 lines by:
toremove = any(TableA.date >= TableB.datestart' & TableA.date <= TableB.dateend', 2);
  11 个评论
Joyjit Chatterjee
Joyjit Chatterjee 2018-12-4
In the code, till dateend, everything executes fine. but issue occurs when the matchinA line in code is run, showing error "All input arguments must be tables", but wierdly, table A and table B are tables (that is why it worked perfectly when I removed the rows bsed on your first time answer). Could you please guide me as to why this is happening, has it got anything to do with find() function or is it some other problem? Thanks again!
Guillaume
Guillaume 2018-12-5
Yes, I made a silly mistake
tableC = [tableA(matchinA, :), tableB(whereinB, 'columntocopy')];
or
tableC = tableA(matchinA, :);
tableC.columntocopy = tableB.columntocopy(whereinB);

请先登录,再进行评论。

更多回答(1 个)

Peter Perkins
Peter Perkins 2018-11-28
From your description it sounds like you want to remove rows from A whose timestamp falls in any of the time intervals defined by any of the timestamp pairs in B. As opposed to removing rows from A whose timestamp falls in the time interval defined by the timestamp pair in the corresponding row of B. Right?
Is there any structure to the intervals defined in B? Are they disjoint? Are they in some order?
Assuming they are disjoint and in increasing order, interleave the two vectors of interval endpoints in B. This defines edges for time bins, and anything in A that falls into an even bin you throw out, in an odd bin you keep. I think that's consistent with what you described.
Use discretize on A's timestamps to determine which bins each row of A falls in. Then use logical subscripting to remove the rows that fall in even bins.
If the intervals defined by B are not disjoint, you'll have to do something to merge overlapping intervals. Perhaps sort by the left endpoints, then look to see whose right endpoint is larger than the succeeding left endpoint.

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

产品


版本

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by