Creating a new matrix based on matching two columns

Question

0 个投票

I have generated two matrices from two different csv files. Matrix 1 contains two columns of information (matrix contain middle names and first name). Matrix 2 only contains one Column of information (middle name). I want to create a loop where I take the middle name from matrix 2 and match it to the middle names in matrix 1. Once the names have been matched I want to read out each matches corresponding first name in a new matrix

3 个评论
显示 1更早的评论隐藏 1更早的评论

User 2023-10-26

Example data attached below!

Walter Roberson 2023-10-26

See https://www.mathworks.com/matlabcentral/answers/2034499-loading-in-a-table-that-has-multiple-values-in-a-single-cell-seperated-by-a-comma#comment_2938846 for code.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Voss 2023-10-26

在 MATLAB Online 中打开

0 个投票

T1 = readtable('example1.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T2 = readtable('example2.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
disp(T1);
     firstName           lastName      
    ___________    ____________________

    {'ben'    }    {'smith'           }
    {'sarah'  }    {'doe'             }
    {'rebecca'}    {'martinez, garcia'}
    {'lucy'   }    {'griffin'         }
    {'grace'  }    {'lockwood'        }
    {'sam'    }    {'sanchez'         }
    {'brian'  }    {'pina'            }
    {'stef'   }    {'ngyugen'         }
    {'maria'  }    {'lee'             }
    {'emily'  }    {'grady'           }
    {'giselle'}    {'curry'           }
    {'andrea' }    {'green, martinez' }
disp(T2);
      lastName  
    ____________

    {'smith'   }
    {'green'   }
    {'miller'  }
    {'wilson'  }
    {'davis'   }
    {'allen'   }
    {'moore'   }
    {'cooper'  }
    {'adams'   }
    {'thompson'}
    {'lopez'   }
    {'hill'    }
    {'adams'   }
    {'bailey'  }
    {'thatcher'}
    {'raven'   }
    {'elsher'  }
    {'levine'  }
    {'brown'   }
    {'williams'}
    {'anderson'}
    {'wilson'  }
    {'gonzales'}
    {'garcia'  }

"I want to [...] take the [last] name from [array] 2 and match it to the [last] names in [array] 1. Once the names have been matched I want to read out each [match's] corresponding first name in a new [array]"

If you want exact matches:

result = T1{ismember(T1{:,2},T2{:,1}),1}
result = 1×1 cell array
    {'ben'}

Or, if you want to split the cells where there are more than one last name separated by commas into multiple separate entries, then something like this:

% make a new table with only one first and last name per row:
C = regexp(T1{:,2},',\s*','split');
idx = arrayfun(@(x,c)x(ones(1,numel(c{1}))),1:numel(C),C.','un',0);
T1_new = table(T1{[idx{:}],1},[C{:}].')
T1_new = 14×2 table
       Var1            Var2    
    ___________    ____________

    {'ben'    }    {'smith'   }
    {'sarah'  }    {'doe'     }
    {'rebecca'}    {'martinez'}
    {'rebecca'}    {'garcia'  }
    {'lucy'   }    {'griffin' }
    {'grace'  }    {'lockwood'}
    {'sam'    }    {'sanchez' }
    {'brian'  }    {'pina'    }
    {'stef'   }    {'ngyugen' }
    {'maria'  }    {'lee'     }
    {'emily'  }    {'grady'   }
    {'giselle'}    {'curry'   }
    {'andrea' }    {'green'   }
    {'andrea' }    {'martinez'}
% this part is the same as before with T1 but now using T1_new:
result = T1_new{ismember(T1_new{:,2},T2{:,1}),1}
result = 3×1 cell array
    {'ben'    }
    {'rebecca'}
    {'andrea' }

5 个评论
显示 3更早的评论隐藏 3更早的评论

Voss 2023-11-1

编辑：Voss 2023-11-1

在 MATLAB Online 中打开

T1 = readtable('example1.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T2 = readtable('example2.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
% exact matches only:
T_out = T1(ismember(T1{:,2},T2{:,1}),:)
T_out = 1×2 table
    firstName    lastName 
    _________    _________

     {'ben'}     {'smith'}
result = join(T_out{:,:},' ')
result = 1×1 cell array
    {'ben smith'}
% matches any last name in given row of T1:
C = regexp(T1{:,2},',\s*','split');
idx = arrayfun(@(x,c)x(ones(1,numel(c{1}))),1:numel(C),C.','un',0);
T1_new = table(T1{[idx{:}],1},[C{:}].');
T_out = T1_new(ismember(T1_new{:,2},T2{:,1}),:)
T_out = 3×2 table
       Var1           Var2   
    ___________    __________

    {'ben'    }    {'smith' }
    {'rebecca'}    {'garcia'}
    {'andrea' }    {'green' }
result = join(T_out{:,:},' ')
result = 3×1 cell array
    {'ben smith'     }
    {'rebecca garcia'}
    {'andrea green'  }

Walter Roberson 2023-11-1

With the original last name is something I posted code for several days ago at https://www.mathworks.com/matlabcentral/answers/2034499-loading-in-a-table-that-has-multiple-values-in-a-single-cell-seperated-by-a-comma#comment_2938846

Voss 2023-11-1

在 MATLAB Online 中打开

@User: Does this produce the expected result? The only change is changing the comma to a semicolon in the regexp() call, since the IDs are separated by semicolons in the real data file (it was commas in the example files).

T1 = readtable('allPharm1.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T1 = 5258×2 table
                                 Name                                                                                                                                                                                                                                                                                                                                                                                                            DrugIDs                                                                                                                                                                                                                                                                                                                                                                          
    _______________________________________________________________    ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

    {'Peptidoglycan synthase FtsI'                                }    {'DB00303'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                }
    {'Histidine decarboxylase'                                    }    {'DB00114; DB00117'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       }
    {'Glutaminase liver isoform, mitochondrial'                   }    {'DB00142'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                }
    {'Coagulation factor XIII A chain'                            }    {'DB02340; DB11300; DB11311; DB11571; DB13151'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            }
    {'Nitric oxide synthase, inducible'                           }    {'DB00125; DB00155; DB01017; DB01110; DB01234; DB01686; DB01835; DB01997; DB02044; DB02207; DB02234; DB02462; DB02644; DB03100; DB03144; DB03366; DB03449; DB03953; DB04400; DB04534; DB05214; DB05252; DB05383; DB06879; DB06916; DB07002; DB07003; DB07007; DB07008; DB07011; DB07029; DB07306; DB07318; DB07388; DB07389; DB07405; DB08214; DB08750; DB08814; DB09237; DB11327; DB14649'                                                                                                                                                                                                                                                                                                                                                               }
    {'Estradiol 17-beta-dehydrogenase 2'                          }    {'DB00157; DB13952; DB13953; DB13954; DB13955; DB13956'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   }
    {'NAD(P) transhydrogenase, mitochondrial'                     }    {'DB00157; DB01763; DB03461; DB09092'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     }
    {'Alcohol dehydrogenase class-3'                              }    {'DB00157; DB03017; DB03704; DB04153'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     }
    {'Aminomethyltransferase, mitochondrial'                      }    {'DB00116; DB00157; DB04789'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              }
    {'Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial'}    {'DB00157; DB06757; DB09092; DB09130'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     }
    {'Voltage-dependent T-type calcium channel subunit alpha-1I'  }    {'DB00381; DB00568; DB00617; DB00909; DB01118; DB01388; DB04841; DB06152; DB09061; DB09235; DB14009; DB14011'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             }
    {'Adenosine receptor A1'                                      }    {'DB00193; DB00201; DB00277; DB00555; DB00640; DB00651; DB00806; DB00824; DB00996; DB01223; DB01303; DB01412; DB04932; DB04954; DB06471; DB09061; DB11757; DB12569; DB12670'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              }
    {'Tyrosine-protein kinase ABL1'                               }    {'DB00171; DB00619; DB01254; DB03878; DB04868; DB05184; DB06616; DB07831; DB08043; DB08231; DB08339; DB08350; DB08583; DB08896; DB08901; DB12010; DB12267; DB12323'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       }
    {'High affinity immunoglobulin epsilon receptor subunit alpha'}    {'DB00043; DB00895; DB05797'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              }
    {'Coagulation factor VIII'                                    }    {'DB00055; DB00100; DB06050; DB11300; DB11312; DB11571; DB11572; DB12872; DB13133; DB13151; DB13152; DB13933; DB14700'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    }
    {'Prostaglandin G/H synthase 1'                               }    {'DB00154; DB00159; DB00244; DB00316; DB00328; DB00350; DB00461; DB00465; DB00469; DB00500; DB00554; DB00573; DB00586; DB00605; DB00711; DB00712; DB00749; DB00784; DB00788; DB00795; DB00812; DB00814; DB00821; DB00861; DB00870; DB00936; DB00939; DB00945; DB00963; DB00991; DB01009; DB01014; DB01050; DB01283; DB01397; DB01399; DB01401; DB01419; DB01435; DB01600; DB01837; DB01892; DB02047; DB02110; DB02198; DB02266; DB02379; DB02709; DB02773; DB03667; DB03752; DB03753; DB03783; DB04552; DB04557; DB04817; DB06725; DB06736; DB06802; DB07981; DB07983; DB07984; DB08814; DB09061; DB09212; DB09213; DB09214; DB09215; DB09216; DB09288; DB09295; DB11071; DB11079; DB11201; DB11323; DB12445; DB13346; DB13501; DB13783; DB14009; DB14011'}
T2 = readtable('drugID_names_matched_CCM.csv')
T2 = 188×2 table
       Drug                         Target                 
    ___________    ________________________________________

    {'DB00112'}    {'Vascular endothelial growth factor A'}
    {'DB01017'}    {'Vascular endothelial growth factor A'}
    {'DB01120'}    {'Vascular endothelial growth factor A'}
    {'DB01136'}    {'Vascular endothelial growth factor A'}
    {'DB01270'}    {'Vascular endothelial growth factor A'}
    {'DB03088'}    {'Vascular endothelial growth factor A'}
    {'DB05294'}    {'Vascular endothelial growth factor A'}
    {'DB05434'}    {'Vascular endothelial growth factor A'}
    {'DB05890'}    {'Vascular endothelial growth factor A'}
    {'DB05932'}    {'Vascular endothelial growth factor A'}
    {'DB05969'}    {'Vascular endothelial growth factor A'}
    {'DB06642'}    {'Vascular endothelial growth factor A'}
    {'DB06779'}    {'Vascular endothelial growth factor A'}
    {'DB08885'}    {'Vascular endothelial growth factor A'}
    {'DB09301'}    {'Vascular endothelial growth factor A'}
    {'DB10772'}    {'Vascular endothelial growth factor A'}
C = regexp(T1{:,2},';\s*','split');
idx = arrayfun(@(x,c)x(ones(1,numel(c{1}))),1:numel(C),C.','un',0);
T1_new = table(T1{[idx{:}],1},[C{:}].');
T_out = T1_new(ismember(T1_new{:,2},T2{:,1}),:)
T_out = 2099×2 table
                                 Var1                                     Var2    
    _______________________________________________________________    ___________

    {'Nitric oxide synthase, inducible'                           }    {'DB01017'}
    {'Nitric oxide synthase, inducible'                           }    {'DB03144'}
    {'Alcohol dehydrogenase class-3'                              }    {'DB03017'}
    {'Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial'}    {'DB06757'}
    {'Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial'}    {'DB09130'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB01254'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB05184'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB06616'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB08231'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB08896'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB08901'}
    {'Tyrosine-protein kinase ABL1'                               }    {'DB12010'}
    {'30S ribosomal protein S4'                                   }    {'DB01017'}
    {'Vascular endothelial growth factor receptor 3'              }    {'DB00398'}
    {'Vascular endothelial growth factor receptor 3'              }    {'DB05932'}
    {'Vascular endothelial growth factor receptor 3'              }    {'DB08896'}
result = join(T_out{:,:},' ')
result = 2099×1 cell array
    {'Nitric oxide synthase, inducible DB01017'                           }
    {'Nitric oxide synthase, inducible DB03144'                           }
    {'Alcohol dehydrogenase class-3 DB03017'                              }
    {'Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial DB06757'}
    {'Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial DB09130'}
    {'Tyrosine-protein kinase ABL1 DB01254'                               }
    {'Tyrosine-protein kinase ABL1 DB05184'                               }
    {'Tyrosine-protein kinase ABL1 DB06616'                               }
    {'Tyrosine-protein kinase ABL1 DB08231'                               }
    {'Tyrosine-protein kinase ABL1 DB08896'                               }
    {'Tyrosine-protein kinase ABL1 DB08901'                               }
    {'Tyrosine-protein kinase ABL1 DB12010'                               }
    {'30S ribosomal protein S4 DB01017'                                   }
    {'Vascular endothelial growth factor receptor 3 DB00398'              }
    {'Vascular endothelial growth factor receptor 3 DB05932'              }
    {'Vascular endothelial growth factor receptor 3 DB08896'              }
    {'Vascular endothelial growth factor receptor 3 DB09079'              }
    {'Vascular endothelial growth factor receptor 3 DB12010'              }
    {'Vascular endothelial growth factor receptor 1 DB00398'              }
    {'Vascular endothelial growth factor receptor 1 DB05932'              }
    {'Vascular endothelial growth factor receptor 1 DB08896'              }
    {'Vascular endothelial growth factor receptor 1 DB09079'              }
    {'Vascular endothelial growth factor receptor 1 DB09221'              }
    {'Vascular endothelial growth factor receptor 1 DB12010'              }
    {'Insulin receptor DB12010'                                           }
    {'RAF proto-oncogene serine/threonine-protein kinase DB00398'         }
    {'RAF proto-oncogene serine/threonine-protein kinase DB04973'         }
    {'RAF proto-oncogene serine/threonine-protein kinase DB05190'         }
    {'RAF proto-oncogene serine/threonine-protein kinase DB05268'         }
    {'RAF proto-oncogene serine/threonine-protein kinase DB08862'         }

请先登录，再进行评论。

Creating a new matrix based on matching two columns

3 个评论
显示 1更早的评论隐藏 1更早的评论

回答（1 个）

5 个评论
显示 3更早的评论隐藏 3更早的评论

类别

标签

Community Treasure Hunt

Creating a new matrix based on matching two columns

3 个评论 显示 1更早的评论 隐藏 1更早的评论

回答（1 个）

5 个评论 显示 3更早的评论 隐藏 3更早的评论

类别

标签

另请参阅

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论