How to internally change decimal separator while importing data from a csv?

320 次查看(过去 30 天)
Hello,
I am currently working with very large data files (csv) with approximately 15000 rows and a couple hundred columns. Since the data in the cells is derived by european software, the decimal separator turns out to be a comma and I am replacing it with a dot by using a small script. This script works very well but the corrected data has to be stored in a new file since the original file must not be manipulated. What I want to do is to import the data from the file, generate a workspace variable that contains all the data but has the commas replaced by dots. This would be very useful because for processing the data any further I always have to generate a new file and load that in manually. Since that is quite time-consuming, I would like to avoid this step.
Current Process:
%% Small script that replaces comma by dot and saves corrected data to new csv-file with name [...]_dots
[filenames,path]=uigetfile('*.csv');
cd(path);
disp(filenames)
oldfile=filenames(1:end-4);
NewFileName = sprintf('%s_Dots.csv',oldfile);
Data = fileread(filenames);
Data = strrep(Data, ',', '.');
% Save corrected data to new file
FID = fopen(NewFileName, 'w');
fwrite(FID, Data, 'char');
fclose(FID);
When having the dot as a decimal separator, the import is very easy and generates a handy structure array that is used in the further evaluation of the data as seen below. The structure array is used for further examination of the data
i_start_import=1;
datastruct=importdata(filenames,';',i_start_import);
However, the importdata function does not generate a structure array when the comma is used as a decimal separator.
Experiments:
What I have tried so far to receive the same struct but with dots as a separator is to read in the original data with importdata to further convert the comma to dot and then split the data string to receive tabular stored data. Unfortunately, this is very time consuming and takes too long for all the thousands of cells in my csv (especially the splitting process).
The data after applying importdata looks like this:
And is then split to this:
data=importdata(filenames,';',100000);
% Create structure array
datastruct=struct
datastruct.data=[];
% Replace comma by dot
for i=1:length(data)
a=strrep(data{i},',','.');
data{i}=a;
end
% Split data string
for i=1:length(data) %index rows
for j=1:516 %since there are 516 columns, index columns
str=strsplit(data{i},';');
datastruct.textdata{i,j}=str(1,j);
if j==516
break
else
continue
end
end
end
Ideally, the data should be stored in a structure array with field textdata having the values of the first column (TimeStamp) and the first row (variables) and a field data having all the data.
I am now looking for a way to read in the original data with comma as decimal separator and convert the comma to dot internally and save it as a variable (struct) that I can use in my further examination in the script.
  2 个评论
Jan
Jan 2021-5-6
Some pitfalls:
  • Do not use the important function "path" as name of a variable. This can cause extremely strange effects during debugging.
  • Avoid to use cd() to change the current directory. The callbacks of GUIs or TIMERs could do this also and the assumes files are not found anymore. This is a frequent source of bugs. Use absolute path names instead using fullfile(folder, filename).
  • In:
for j=1:516 %since there are 516 columns, index columns
str=strsplit(data{i},';');
datastruct.textdata{i,j}=str(1,j);
if j==516 % from here
break %
else %
continue %
end % to here
end
the marked block is useless. Simply omit it.
Dennis B
Dennis B 2021-5-6
Hello Jan,
Thank you for your hints, I will change that! Do you have any other ideas for changing the decimal separator internally without having to create a new file?

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2021-5-6
编辑:Stephen23 2021-5-6
Do NOT use CD to access data files: it is more efficient to use absolute/relative filenames (with FULLFILE).
To import a CSV (actually you appear to have a semi-colon-delimited fields) file with decimal comma simply select the appropriate options with READTABLE or READMATRIX:
[F,P] = uigetfile('*.csv');
T = readtable(fullfile(P,F), 'Delimiter',';', 'DecimalSeparator',',', 'VariableNamingRule','preserve')
Adapt to suit your file. If you had uploaded an actual data file (instead of screenshots) by clicking the paperclip button then I would have tested this as well.
  17 个评论
Louis-Marie
Louis-Marie 2022-10-20
Thanks Stephen for your useful tip. I have one additional question: i also need to change the readtable options using data=readtable(filename, opts) AND to read comma separated numbers using data=readtable(filename, 'DecimalSeparator',',')
But i didn't succed in mixing both strategies like: data=readtable(filename, opts,'DecimalSeparator',',') .
Is there a way to do so?
Stephen23
Stephen23 2022-10-20
@Louis-Marie: presumably the OPTS variable in your code is the object returned by DETECTIMPORTOPTIONS, in which case you can specify the 'DecimalSeparator' option there. It does not really make sense to specify the 'DecimalSeparator' option in READTABLE after calling DETECTIMPORTOPTIONS, because DETECTIMPORTOPTIONS will also need to know the decimal separator to detect e.g. which data are numeric.

请先登录,再进行评论。

更多回答(0 个)

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by