readtable sometimes changes class of input

3 次查看(过去 30 天)
Hi,
I use the readtable command to import .csv data. Those .csv files are created from fiji all by the same procedure and also when i open the files in excel, they all seem to be of the same format. However sometimes a certain column, which is filled with regulars numbers, is either imported as i.e 0.041 or sometimes as '0.041'. For a certain csv file it will be consistently be imported in the same way. If it is imported as '0.041', then using str2double for this column will not produce an error, so i think the csv files are fine and don't contain something else than numbers.
In the case '0.041' the following warning is shown aswell:
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the
VariableDescriptions property.
Set 'PreserveVariableNames' to true to use the original column headers as table variable names.
Unfortunately i can't attach the files since one of them exceeds 5 MB. The problem shows up in column 26 'Kurt'. As an example:
first two lines of a case, which is imported as '0.041'
,Label,Area,Mean,StdDev,Mode,Min,Max,X,Y,XM,YM,Perim.,BX,BY,Width,Height,Major,Minor,Angle,Circ.,Feret,IntDen,Median,Skew,Kurt,%Area,RawIntDen,Slice,FeretX,FeretY,FeretAngle,MinFeret,AR,Round,Solidity
1,126 bearbeitet.tif:puls no_2558,72,31662.639,2391.397,32512,24576,34561,87.389,8.139,87.393,8.139,62.569,81,1,13,14,11.652,7.867,57.668,0.231,15.556,2279710,32512,-0.863,0.041,100,2279710,1,81,12,45.000,10.419,1.481,0.675,0.585
first two lines of a case, which is imported as -0.212
,Label,Area,Mean,StdDev,Mode,Min,Max,X,Y,XM,YM,Perim.,BX,BY,Width,Height,Major,Minor,Angle,Circ.,Feret,IntDen,Median,Skew,Kurt,%Area,RawIntDen,Slice,FeretX,FeretY,FeretAngle,MinFeret,AR,Round,Solidity
1,127 bearbeitet.tif:puls no_2608,30,32154.100,2016.839,33281,26880,34561,357.700,4.833,357.714,4.833,30.870,353,1,9,7,6.915,5.524,157.564,0.396,9.220,964623.000,32769,-0.756,-0.212,100,964623.000,1,353,4,167.471,6.957,1.252,0.799,0.638
  1 个评论
dpb
dpb 2022-7-13
As @Steven Lord suggests, is the symptom still there if the file only contains the first N records? If it does, you can attach a lesser file that still has the symptom. Alternately, is the error still present if you read only the offending column? You could then save only it and it would undoubtedly be small enough to attach.
If the second of the above suggestions doesn't work, given the files are so big that manual perusal is tough to find the proverbial needle, I'd consider writing a test case that sets the data line by steps of a sizable N on each pass and see where it first breaks. Binary search after that should narrow down the area pretty quickly.

请先登录,再进行评论。

采纳的回答

Steven Lord
Steven Lord 2022-7-13
When I try to break up your lines along the comma delimeters I see that both seem to line up.
s1 = [",Label,Area,Mean,StdDev,Mode,Min,Max,X,Y,XM,YM,Perim.,BX,BY,Width,Height,Major,Minor,Angle,Circ.,Feret,IntDen,Median,Skew,Kurt,%Area,RawIntDen,Slice,FeretX,FeretY,FeretAngle,MinFeret,AR,Round,Solidity";
"1,126 bearbeitet.tif:puls no_2558,72,31662.639,2391.397,32512,24576,34561,87.389,8.139,87.393,8.139,62.569,81,1,13,14,11.652,7.867,57.668,0.231,15.556,2279710,32512,-0.863,0.041,100,2279710,1,81,12,45.000,10.419,1.481,0.675,0.585 "];
t1 = split(s1, ",");
t1(:, 26)
ans = 2×1 string array
"Kurt" "0.041"
s2 = [",Label,Area,Mean,StdDev,Mode,Min,Max,X,Y,XM,YM,Perim.,BX,BY,Width,Height,Major,Minor,Angle,Circ.,Feret,IntDen,Median,Skew,Kurt,%Area,RawIntDen,Slice,FeretX,FeretY,FeretAngle,MinFeret,AR,Round,Solidity";
"1,127 bearbeitet.tif:puls no_2608,30,32154.100,2016.839,33281,26880,34561,357.700,4.833,357.714,4.833,30.870,353,1,9,7,6.915,5.524,157.564,0.396,9.220,964623.000,32769,-0.756,-0.212,100,964623.000,1,353,4,167.471,6.957,1.252,0.799,0.638 "];
t2 = split(s1, ",");
t2(:, 26)
ans = 2×1 string array
"Kurt" "0.041"
Do you see the behavior your described if you read in just those two lines or do you need to read in a larger section of your file? If the latter, what do you see if you do this same type of experiment for a larger chunk of your data set? My suspicion is that a later line in your file has something non-numeric in that forces MATLAB to read it in as text rather than numbers.
If I tried converting those sections of the array to numbers using str2double I don't receive an error despite the presence of the non-number in row 1. So that 26th column of your data could have non-numeric information in it.
str2double(t1(:, 26))
ans = 2×1
NaN 0.0410
  3 个评论
Steven Lord
Steven Lord 2022-7-13
Rather than bisecting the data set, you could have tried calling str2double on the string vector then finding the NaN value(s). Let's take some sample data, convert it to a string vector, and inject something unexpected at a random location.
x = rand(10, 1);
s = string(x);
s(randi(10)) = 'infinity';
Converting s back to double with str2double will result in a NaN where the unexpected data was injected as you can see from the result table that shows all three variables side-by-side. [Note that x and d don't have the same values, since s only records 5 decimal places of the values in x.]
d = str2double(s);
result = table(x, s, d)
result = 10×3 table
x s d ________ __________ ________ 0.10645 "infinity" NaN 0.74351 "0.74351" 0.74351 0.16059 "0.16059" 0.16059 0.22516 "0.22516" 0.22516 0.57255 "0.57255" 0.57255 0.38014 "0.38014" 0.38014 0.18311 "0.18311" 0.18311 0.56394 "0.56394" 0.56394 0.008611 "0.008611" 0.008611 0.87057 "0.87057" 0.87057
fprintf("Element %d of s contains the value that cannot " + ...
"be converted to double.\n", find(isnan(d)))
Element 1 of s contains the value that cannot be converted to double.
Malte Römer-Stumm
Malte Römer-Stumm 2022-7-14
yes, you are right, that would have been the smarter way to do it. Thanks again.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Type Identification 的更多信息

标签

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by