concatenating and comparing two datsets

1 次查看(过去 30 天)
Hi,
I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.
Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.
  1. for the first dataset I want to concatenate the first 8 columns.
  2. For the second data sets I want to do the same as step.1
  3. If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.
I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.
Any help greatly appreciated.
Thanks.
SSR

采纳的回答

Jon
Jon 2019-8-16
编辑:Jon 2019-8-16
Assuming that the 9th column of both data sets is the same, I think you can do this just with an innerjoin
tbl1 = readtable('data1.xlsx')
tbl2 = readtable('data2.xlsx')
% join them to make third table which adds additional column to
% second table where character columns match
tbl3 = innerjoin(tbl1,tbl2)
There are probably some use cases that Guillaume is covering in his answer that may be important, but maybe this simple approach works for what you are doing.
  21 个评论
Jon
Jon 2019-8-20
Sorry Guilaaume. I read your response too quickly, and just saw the assert, and didn't notice that the later code handles multiple matching rows.
Guillaumes approach is much cleaner than my suggestion of putting it in a loop. Definitely do it that way!
I think you should be all set now.
Sheetal Shirsat
Sheetal Shirsat 2019-8-21
This is Perfect !.
Thank you so very much Guillaume and Jon for your time and help.. ! Much appreciated !.
Best regards,
Shital

请先登录,再进行评论。

更多回答(1 个)

Guillaume
Guillaume 2019-8-16
编辑:Guillaume 2019-8-16
Something like this should work:
dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx'); %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx'); %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')
edit: rereading the description, it sounds like a right outerloin, not a left one
  3 个评论
Guillaume
Guillaume 2019-8-16
编辑:Guillaume 2019-8-16
Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.
As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:
%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);
The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:
%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))]; %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end); %and copy relevant elements
Guillaume
Guillaume 2019-8-19
编辑:Guillaume 2019-8-19
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets
"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?
We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Matrices and Arrays 的更多信息

标签

产品


版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by