Outerjoin isn't matching the same values in two different tables
8 次查看(过去 30 天)
显示 更早的评论
Hi everyone! I'm having a problem with some datetime tables, and using outerjoin.
So I have this one matrix, that has all my data. The first column is the datetime, and the next columns are data matching each time (I haven't included the data in the attached files, just the first column). The problem I'm having is that sometimes the datetime skips time - for instance, one row will be July 10 2023 18:00:00, and then the next row will be July 12 2023 0:00:00. This is just due to the nature of the data collection, but since I want to run some statistical analyses on my data, I want the timestep for the date and time to be even, even if that means between those two time steps I will have NaNs in all the data columns (since there was no data collected during that time).
So to do this I made a time vector that had an even time step from the beginning and the end of my data collection. I then tried to use outerjoin to join this dataset with the datetime column of my original dataset. However, I'm finding that they're not matching up even when the date and time are the same in each dataset!
Here's what I mean:
This is the output table from using the outerjoin function. On the left is my time vector with even time steps, and on the right is my data table that is messier with skips in date and time. As you can see, there are times where they match up (for example, row 574), but there are also times where they don't (rows 577 and 578, where the values are the SAME but they're not recognized as such).
How can I fix this? I made sure there are no repeats in either table I'm working with. I've also tried using outerjoin when the datetime is in the matlab number format, but I still have the same problem. I'm open to any solution, even if it doesn't use outerjoin as a function. I just want an even timestep for my data.
I've attached my datasets: timevecTable is the time vector with even time steps, and v2Table is my dataset that skips datetimes. The resulting table using outerjoin is also included, and that's named timeJoined.
Thank you!
0 个评论
采纳的回答
Voss
2024-7-3
编辑:Voss
2024-7-3
As for why outerjoin() doesn't appear to combine data when the datetimes are the same, the fact is that some datetimes that appear to be the same are actually different. There are small fractions of a second in one that are not in the other, which are not apparent due to the Format showing only the seconds (no fractions).
For example the first two datetimes in timevecTable appear to be the same as the first two in v2Table:
timevecTable = load('timevecTable.mat').timevecTable;
v2Table = load('v2Table.mat').v2Table;
timevecTable.datetime([1 2])
v2Table.datetime([1 2])
but they are not equal:
isequal(timevecTable.datetime([1 2]),v2Table.datetime([1 2]))
In fact v2Table.datetime(2) has a small fraction of a second, which timevecTable.datetime(2) does not. You can see that by adjusting the Format of each:
fmt1 = timevecTable.datetime.Format;
fmt2 = v2Table.datetime.Format;
timevecTable.datetime.Format = [fmt1 '.SSSSSSSSS'];
v2Table.datetime.Format = [fmt2 '.SSSSSSSSS'];
And check the values again:
timevecTable.datetime([1 2])
v2Table.datetime([1 2])
You can see there that v2Table.datetime(2) has an additional 0.000006835 (or so) seconds relative to timevecTable.datetime(2).
To fix this, you can shift each datetime to the start of the nearest second:
timevecTable.datetime = dateshift(timevecTable.datetime,'start','second','nearest');
v2Table.datetime = dateshift(v2Table.datetime,'start','second','nearest');
[and you can change the Formats back (they are just for display/illustration anyway and don't change the underlying data)]
timevecTable.datetime.Format = fmt1;
v2Table.datetime.Format = fmt2;
and then the outerjoin() result should be more like what you expect:
timeJoined = outerjoin(timevecTable,v2Table)
Notice it's ~22000 rows instead of the ~40000 rows that you had in your outerjoined table (in timeJoined.mat) originally.
更多回答(2 个)
Cris LaPierre
2024-7-3
Not sure how you created the time steps, but datetimes apply a display format to the actual data. Perhaps the actual values do differ by some small amount.
Since you have a timetable, I'd try using retime. Your data was not in time order, so I use sortrows to put it in ascending time order.
load v2Table.mat v2Table
v2Table.Data = rand(height(v2Table),1);
v2Table = table2timetable(sortrows(v2Table))
v2Table_retimed = retime(v2Table,'regular','fillwithmissing','TimeStep',minutes(2))
Taylor
2024-7-3
MATLAB seems to be recognizing those datetimes (indexed at 577 and 578) as unique values
load('timevecTable.mat')
load('v2Table.mat')
t = outerjoin(timevecTable, v2Table, "MergeKeys",true)
t = table2array(t);
isequal(t(577), t(578))
You can just convert everything to a datetime array, then a string array, then back to a datetime array and it will recognize them as equal.
t = string(t);
t = datetime(t)
isequal(t(577), t(578))
Then use unique to filter out duplicate values.
t = unique(t);
size(t)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Distribution Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!