Unique function is not working as expected

8 次查看(过去 30 天)
I have a problem with the unique-function and I can't find the reason why.
I import a table vom excel and afterwards I want to use [table,~] = unique(table,'rows');
It worked until recently. The only change I remember I did, was that I adjusted the timeformat.
Now I have doubled rows in my table, even after using [table,~] = unique(table,'rows');.
table = [];
for i=1:length(files_table_data)
path_table = fullfile(files_table_data(i).name);
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
  4 个评论
the cyclist
the cyclist 2021-4-3
The last two rows -- as you have written them -- are obviously identical.
If I create a table using exactly what you have written, they will be identical, and the unique command will get rid of one of them. (Unless there is a bug in the unique command, which I doubt.)
This is why uploading the actual data, not just typing what it looks like, is so important to replicating the problem.
Can you create a very small segment of the data that is not sensitive to upload? Perhaps just a few rows, and only the relevant columns? Then test that segment to make sure it still exhibits the bug? Then upload everything we need to replicate that bug.
Ariane Fidelia Wettig
I think with the Code and the data from the excel sheet it should be visible... You might need to adjust the data_path, so the code can find the Test.xlsx.
I needed to change the original Time-Format, because I compare this data afterwards with different data from other sources, that have different time-formats.
Thank you!
function test = import_test()
files_test = dir('../data/not_anonymized/Test*');
%% read table
table = [];
for i=1:length(files_test)
path_table = fullfile('..','data','not_anonymized', files_test(i).name);
%fprintf('Starting file %d/%d: %s\n',i,length(files_test),files_test(i).name)
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
end

请先登录,再进行评论。

采纳的回答

Cris LaPierre
Cris LaPierre 2021-4-3
First, do not name your table variable table. That overwrites MATLAB's function for constructing tables.
When I look at your excel file, I see that times record milliseconds to three decimal places. With times, you can modify how it is displayed without losing this level of detail. You have set your display format to 'yyyy-MM-dd HH:mm' but the actual data is still saved as 'yyyy-MM-dd HH:mm:ss.SSS XXX'.
Therefore, the unique command is working just fine. The times are not identical when looking at the actual data.
If you only want to compare your times to the minute, then you must actually modify the data to only be minutes. Perhaps something like this.
% Create a time with seconds and milliseconds, but set display format to
% HH:mm
d=datetime(2021,01,02,3,4,5,123,'Format','yyyy-MM-dd HH:mm')
d = datetime
2021-01-02 03:04
% Change display format so seconds and milliseconds are shown
d.Format = 'dd-MMM-yyyy HH:mm:ss.SSS'
d = datetime
02-Jan-2021 03:04:05.123
% set second and milliseconds to 0.
d.Second = 0
d = datetime
02-Jan-2021 03:04:00.000
For a datetime variable in a table, use dot notation: tbl.Date.Second = 0
  5 个评论
Cris LaPierre
Cris LaPierre 2021-4-5
Here's a simplified version using the test data set you shared.
path_table = "Ariane_Test.xlsx";
opts = detectImportOptions(path_table);%excel
opts = setvartype(opts,'Date','datetime');%datetime
opts = setvaropts(opts,'Date','InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,'Date','DatetimeFormat','yyyy-MM-dd HH:mm.ss.SSS');
opts = setvartype(opts,'Name','string');
test = readtable(path_table,opts);
test.Date.Second = 0
test = 63×3 table
Name Date Number ______________ _______________________ ______ "338778418867" 2020-10-05 18:34.00.000 3 "338778418867" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 1 "338778419063" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 3 "338778419063" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 4 "338045763939" 2020-10-05 17:50.00.000 1 "378045659745" 2020-10-05 05:00.00.000 2 "378045659745" 2020-10-05 05:00.00.000 3 "378045659745" 2020-10-05 05:00.00.000 4 "378045659760" 2020-10-05 05:00.00.000 1 "378045659760" 2020-10-05 05:00.00.000 2
testU = unique(test,'rows');
tail(testU)
ans = 8×3 table
Name Date Number ______________ _______________________ ______ "378045659760" 2020-10-05 05:00.00.000 3 "378045659760" 2020-10-05 05:00.00.000 4 "378049751076" 2020-10-04 23:44.00.000 1 "378049751076" 2020-10-04 23:44.00.000 2 "378049751076" 2020-10-04 23:44.00.000 3 "378049751076" 2020-10-04 23:44.00.000 4 "378049751076" 2020-10-04 23:44.00.000 5 "378049751076" 2020-10-04 23:44.00.000 6
The last 6 have the same Name and Date, but Number is different.
Ariane Fidelia Wettig
Thank you very much!
I checked my data again, googled and found out, that unique is indeed seeing NaN as NaN~=NaN. Therefore I added:
test.Number(isnan(test.Number),:)=inf(size(test.Number(isnan(test.Number))));
[test,~] = unique(test,'rows');
test.Number(isinf(test.Number),:)=NaN(size(test.Number(isinf(test.Number))));
to my code and now it works! :)
Thank you very much for your help!

请先登录,再进行评论。

更多回答(0 个)

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by