How to arrange misplaced data elements

4 次查看(过去 30 天)
Hi all.
I have a data in excel format that has values under the wrong headers in different ways like;
Name Age Sex Weight Address Height
John M 20 House 2, collins street 120 55
This is not the data because i am not permitted to share it.
Is there a way the data can be arranged to put the values under the right headers using matlab?
Thank you for your help.
  2 个评论
Guillaume
Guillaume 2020-1-21
编辑:Guillaume 2020-1-22
Really, the best thing is to go back to whatever wrote these incorrect files and fix the issue at the source.
Trying to fix the issue in matlab may be possible but there's certainly no tool built-in to do that. You could possibly come up with some rules that would allow you to identify which entry should go where but you'd have to write these.
With your example, it should be easy to identify sex, it's either 'M' or 'F', you could possibly differentiate name and Address if you say that address contains numbers and text whereas name never contains a number. As for age, weight and height, it becomes a lot more ambiguous (particularly since we don't know the unit). Is 120 the height in cm, or the weight in kg or a very old person? Is 20 the weight in stone or the age? Is 55 a height of 5'5'' or the age or the weight in kg?
If you can come up with rules for each variable, we can help you write the code but I suspect there would always be some manual clean up required.
Temitope
Temitope 2020-1-22
Hi Guillaume,
Thank you for your comment. The file is from a backup source because the original system crashed.
Let me see what i can do about the rules and get back.
Thanks.

请先登录,再进行评论。

回答(1 个)

Bhaskar R
Bhaskar R 2020-1-21
编辑:Bhaskar R 2020-1-21
Read your excel data as
T = readtable('< your excel file>'); % your file read in table data
Reinitialte the header to get what you what(shuffle Weight and Address header ) as
T.Properties.VariableNames = {'Name', 'Age','Sex', 'Address', 'Weight', 'Height'} ;
Write back your modified table to file
writetable(T, '<your file name>');
  2 个评论
Temitope
Temitope 2020-1-21
Thank you Bhaskar, but the issue is there are about 1000 rows but the order of the misplacement is not the same. The value under the "Age" header for instance could be under "Address" for one row but under "Sex" for another row.
Image Analyst
Image Analyst 2020-1-21
Attach the file. Chances are that it's a csv file with missing or extra delimiters. But we'd need to see it.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Cell Arrays 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by