Import data with double and strings

20 次查看(过去 30 天)
Hello everybody,
I have a file (*.txt as attached). This file contains data perfilated with an equipment, displayed in a first column of date/time and four columns of numeric values (depth and water quality). Each profile is preceeded by a 6 lines heading (text data) and a given file may contain many profiles (always preceeded by a 6 lines heading).
Example:
Sensor Data Columns
Sensor Statistics (sensor #, Description, Statistic)
1 Depth Mean
2 OBS-1 (NTU) Mean
4 Temperature (deg C) Mean
5 Conductivity (mS/cm) Mean
16:15:35.0 07/15/2019 9.92 2422.9 28.77 6.73
16:15:35.5 07/15/2019 9.92 2422.9 28.77 6.73
16:15:36.0 07/15/2019 9.95 2422.9 28.77 6.72
16:15:36.5 07/15/2019 9.98 2422.9 28.77 6.73
16:15:37.0 07/15/2019 10.01 2422.9 28.77 6.72
16:15:37.5 07/15/2019 10.03 2422.9 28.77 6.74
16:15:38.0 07/15/2019 10.02 2422.9 28.77 6.74
The problem is that there are 5 spaces separating date/time and the column showing the depths, but only four spaces for depths greater than 9.9m. When I use:
OBS = importdata(AM2_15_1a.txt,' ',6);
It creates a strcut separating text and numeric data, which is exactly what I want, except that for depths > 9.9m, the lines are read as 0.1 (instead of 10.1) and the remaining "1" is put together with the date column which bugs the analysis.
As I have many files with many lines, is there any solution for fixing this problem? I tried pasting data to excel and then back to the log file; changing to csv and other types in order to standardise the columns separators, but when I do that, "importdata" doesn't work (it doesn't create a struct, but a cell containing only the heading.
Any sugestion? Thanks in advance
Best regards!
  2 个评论
Bob Thompson
Bob Thompson 2019-9-24
编辑:Bob Thompson 2019-9-24
Have you tried readtable, dlmread, or textscan?
I'm not sure if csvread will work well with the mixed format data, as it's really intended to open numeric data files.
Diego Fonseca
Diego Fonseca 2019-9-24
I tried those functions, but unfortunately they don't work, as the first column conatins a time string which I need to keep. But thank you anyway for the help! ;)

请先登录,再进行评论。

采纳的回答

Adam Danz
Adam Danz 2019-9-24
编辑:Adam Danz 2019-9-24
This solution uses fileread() to read in the entire file and then splits the text by line using strsplit().
It then identifies the start of each profile by looking for the string 'Sensor Data Columns' and then marks the start and end of each segment of data based on those line numbers and the 6 total header lines per profile.
It then loops through each profile and read in the datetime values and the 4 measurements and stores them in tables saved in the cell array "T".
See comments for alternative approaches.
% set parameters
file = 'AM2_15_1a.txt'; %file (better if it includes full path)
profileStartKey = 'Sensor Data Columns'; %text that indicates start of key
nHeaderRows = 6; %number of header rows for each profile
% Get start/end line number for each profile
text = strtrim(strsplit(fileread(file),'\n')'); %separate entire file by line
text(cellfun(@isempty,text)) = []; % remove empty lines
profileStartLineNum = find(~cellfun(@isempty,regexp(text,sprintf('^%s',profileStartKey)))); %lines where header starts
startLineNum = profileStartLineNum + nHeaderRows; %lines where data starts
endLineNum = [profileStartLineNum(2:end)-1;numel(text)]; %lines where data ends
% Loop through each profile and create a table for each profile, stored in "T"
T = cell(numel(startLineNum),1);
for i = 1:numel(startLineNum)
profile = text(startLineNum(i):endLineNum(i)); %block of profile data (cell of chars)
dataStr = cellfun(@strsplit,profile,'UniformOutput',false); %char array broken up into sub-cells
% Extract datetime and measurements for each line within the profile
[dtTemp,dataTemp] = cellfun(@(x)deal(datetime([x{1},' ',x{2}],'InputFormat','HH:mm:ss.s MM/dd/yyyy'),...
str2double(x(3:end))),dataStr,'UniformOutput',false);
T{i} = [cell2table(dtTemp,'VariableNames',{'Date'}),array2table(cell2mat(dataTemp),'VariableNames',{'m1','m2','m3','m4'})];
end
T{n} is the table from profile #n.
  5 个评论
Adam Danz
Adam Danz 2019-9-24
编辑:Adam Danz 2019-9-24
Glad I could help! In case you started implementing the answer before my final edit, I recommend using the answer in its current form where tables are produced and saved within a cell array. It's by far the cleanest and simplest approach.
Strtrim() is used to get rid of the carriage return at the end of each line which is added by strsplit().
Diego Fonseca
Diego Fonseca 2019-9-24
Ok! I'll get this edited code.
Thanks a lot again!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

产品


版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by