- separate table into two tables, each with aligned columns
- use outerjoin to merge these tables based on the shared key
Data Cleansing in Tables
    9 次查看(过去 30 天)
  
       显示 更早的评论
    
I want to clean a data table and create a new table/overwrite the incorrect one. To create a dummy case run following code to create a table
test = table();
test.index = ['11'; '12'; '13'; '14'; '15';'16'];
test.id = ['A'; 'B'; 'C'; 'D';'E';'Z'];
test.id2 = ['B'; 'C'; 'D'; 'E'; 'F'; 'Z'];
test.price = [1;2;3;4;5;6];
test.price1 = [10;20;30;40;50;60];
The output of the table is 
index    id    id2    price    price1
    _____    __    ___    _____    ______
     11      A      B       1        10  
     12      B      C       2        20  
     13      C      D       3        30  
     14      D      E       4        40  
     15      E      F       5        50  
     16      Z      Z       6        60  
In above table index of table is properly aligned with id2 and price, and id is properly aligned with price1. Based on this knowledge I want to create a new table with correct data. Desired Output:
index     id    price    price1
    _______    __    _____    ______
    {[NaN]}    A      NaN       10  
    {'11' }    B        1       20  
    {'12' }    C        2       30  
    {'13' }    D        3       40  
    {'14' }    E        4       50  
    {'15' }    F        5      NaN  
    {'16' }    Z        6       60 
Since for id A we do not index so we will assign it NaN Similarly price is also not avaliable so we can keep that NaN but we do have price1 so we keep that. For id B we have index and price from 1st row of faulty table and we can extract price1 from the next row of faulty table to create 2nd row of desired output. What is optimal way to achieve this using MATLAB code. Also note that distinct values from id and id2 should all be in id of desired output.
0 个评论
采纳的回答
  Sindar
      
 2020-9-24
        
      编辑:Sindar
      
 2020-9-25
  
      test_rejoined = outerjoin(test(:,[1 3 4]),test(:,[2 5]),'LeftKeys','id2','RightKeys','id','MergeKeys',true);
edited: strings > chars since older versions of Matlab often only accepts chars (single quotes) not strings (double quotes). In 2017b, table variable names must be passed as chars
6 个评论
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

