- you are talking about serial dates but showing only datestrings. You might have lost some precision by keeping at most seconds while the data was at e.g. millisecond.
- The join is performed on Time and not the serial dates?
Inner join() Producing Duplicate Entries
4 次查看(过去 30 天)
显示 更早的评论
I have several large times series of meteorological variables from the same measurement tower. I wanted to compare data values as the exact same measurement points in time. I setup serial date numbers and values into dataset() arrays similar to the following post:
I followed Message 5 to code something like C = join(Dataset1, Dataset2, 'Type', 'inner'). The results looked good at first input dates like the following:
[DS1.Time DS2.Time] =
01-Jan-2012 00:07:22 01-Jan-2012 00:07:22
01-Jan-2012 00:17:22 01-Jan-2012 00:17:22
01-Jan-2012 00:37:22 01-Jan-2012 00:37:22
01-Jan-2012 00:47:22 01-Jan-2012 00:47:22
01-Jan-2012 00:57:22 01-Jan-2012 00:57:22
01-Jan-2012 01:47:22 01-Jan-2012 01:07:22
01-Jan-2012 01:57:22 01-Jan-2012 01:27:22
01-Jan-2012 02:07:22 01-Jan-2012 01:47:22
01-Jan-2012 02:17:22 01-Jan-2012 01:57:22
01-Jan-2012 02:27:22 01-Jan-2012 02:07:22 ...
so that the resulting dates (with data) using C = join(DS1,DS2,'Type','inner') would be:
C.Time =
01-Jan-2012 00:07:22
01-Jan-2012 00:17:22
01-Jan-2012 00:37:22
01-Jan-2012 00:47:22
01-Jan-2012 00:57:22
01-Jan-2012 01:47:22
01-Jan-2012 01:57:22
01-Jan-2012 02:07:22
01-Jan-2012 02:17:22
01-Jan-2012 02:27:22 ...
The problems started when I would take the output C to perform more time series merging. From and inner join being like and intersection of the times in two datasets, it stands to reason that length(C) <= length(DS1) and length(C) <= length(DS2). This became not the case using Cnew = join(C,DS4,'Type','inner'). Checking the times on the ends looked fine, but I finally discovered repeated data rows in the middle of the resulting dataset like:
Cnew.Time =
...
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11 ...
After much investigation, the only way I found to fix this problem after an inner join was to use the unique() function in the following way:
Cnew = join( C, DS4, 'key', 'Time', 'Type', 'inner', 'MergeKeys', true ) ;
CnewUnique = unique(Cnew , 'Time') ;
This would finally produce the output I was looking for:
CnewUnique.Time = ...
02-Nov-2012 09:00:11
02-Nov-2012 09:10:11
02-Nov-2012 09:20:11
02-Nov-2012 09:30:11
02-Nov-2012 09:40:11
02-Nov-2012 09:50:11
02-Nov-2012 10:00:11
02-Nov-2012 10:10:11
02-Nov-2012 10:20:11 ...
This took many hours to figure out so I wanted to ask the following question(s):
- Why was the join(...,'inner',...) not working the way I expected, as it did before?
- Is there a better way to match up the times from several time series? (I did not have success with the synchronize function either for an "intersection" of the times.)
- Has anyone else had a similar problem? Is Matlab possibly having a "bug"-type behavior here?
Any insights are appreciated. Thank you for contributing this this post.
4 个评论
per isakson
2014-9-2
编辑:per isakson
2014-9-2
Disclaimer: I have not worked with dataset of the Stat Toolbox. But, I have worked with time series, meteorological and others.
I looked at the code of join. It uses the function unique for comparison. unique cannot handle double well. So why isn't there a test in the code? At least warning would have been appropriate.
The documentation of join says:   C = join(A,B,keys) performsthe merge using the variables specified by keys as the key variables in both A and B. keys is a positive integer , a vector of positive integers, a variable name ,a cell array of variable names, or a logical vector .
My conclusion is that serial date numbers (double) cannot be used as keys in join.
I have ended up using "serial second number" stored as uint32 to avoid problem like this.
Oleg Komarov
2014-9-3
编辑:Oleg Komarov
2014-9-3
What if you try to use table() instead of dataset? The table.join() has no restriction on the type of variable that you can use as keys.
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!