Extracting specific data from multiple excel files and create a single matrix from those
53 次查看(过去 30 天)
显示 更早的评论
Hi, I have a file on my computer with close to 1000 excel files and I don't want to manually extract the second row from every excel file manually and combine into a single excel file.
I was wondering how I can do this on Matlab through reading every Excel file in that specific file and taking out all of the columns from the second rows and then combining all that into a single matrix?
Thanks for the help
采纳的回答
Mathieu NOE
2021-8-31
hello
this is one example if you want to work out the entire folder
I assumed it would be numeric data so I used importdata (faster)
I also sorted the files names in natural order in case it might be relevant
It works even if your files have different size (number of columns)
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(cd,'*.xlsx')); % get list of all excel files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile));
14 个评论
Jonas Freiheit
2021-8-31
Hi Mathieu, I am getting this error reading
Unrecognized function or
variable 'second_row'.
Error in output (line 26)
writecell(second_row',fullfile(cd,outfile));
>>
What should I do?
Thanks
Mathieu NOE
2021-8-31
hello Jonas
can you check if the for loop is working ok ? do you get an output from this line :
what is displayed in your workspace when you type raw ?
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
Jonas Freiheit
2021-8-31
When I type raw, I get >> raw
Unrecognized function or
variable 'raw'.
yep my raw is exactly the same as yours.
Mathieu NOE
2021-8-31
so this means this line of code could not be executed ....
if you can share a couple of xls files I would like to test on my side...
Jonas Freiheit
2021-8-31
- 18136a AS ABS BC.CSV
- 18136a Deep Ocean control AS ABS BC(auto).CSV
- 18136a Deep Ocean control.CSV
- 18136b AS ABS BC.CSV
- 18136b Deep Ocean 2%.CSV
- 18136c AS ABS BC.CSV
- 18136c Deep Ccean 2.5%.CSV
- 18136d AS ABS BC.CSV
- 18136d Deep Ocean 3%.CSV
- 18136e AS ABS BC.CSV
- 18136e Deep Ocean 3.5%.CSV
- 18136f AS ABS BC.CSV
- 18136f Deep Ocean 4%.CSV
- 18136g AS ABS BC.CSV
- 18136g Deep Ocean 4.5%.CSV
- 18136h AS ABS BC.CSV
- 18136h Deep Ocean 5%.CSV
- 18136i AS ABS BC.CSV
- 18136i Deep Ocean 6%.CSV
- 18136j AS ABS BC.CSV
- 18136j Deep Ocean 8%.CSV
- 18136k AS ABS BC.CSV
- 18136k Deep Ocean 10%.CSV
- ARTEMIS_SET_control_10%.xlsx
Yeah sure, I'll send you a small folder of samples. I've tried using this folder and it returned an excel file called OUT that only had two zeroes in the excel file.
From the previous test I tried using a folder that contained 1000 excel files and Matlab somehow printed that error possibly because its too hard to process?
Also if this is the case I could create multiple smaller folders and then combine that into the Output matrix that was previously obtained and how would I do that?
Thanks
Mathieu NOE
2021-9-1
hello again
I simply modified one of your data file (ARTEMIS_SET_control_10%.xlsx) from xlsx to CSV format so they all have the same format. If you need to dig with a mix of CSV and XLSX files , I could update my code.
After that , I had not much to do and this code works like a charm, at least for this batch of 24 files;
see also at the end of my code , if you wish to have the filenames also stored in the OUT file, attached FYI
slightly updated code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile)); % without filenames
% writecell([fileNames_sorted' second_row'],fullfile(cd,outfile)); % with filenames stored in column 1
(seems to me a lot of files had the same data inside , just the file names are different)
Jonas Freiheit
2021-9-1
Sorry this is not working for me, all the files are different but its only printing out 401, 0 for every column.
Its supposed to do (2,:) copy everything in the 2nd row from each excel file and then every row thats copied to copy over into a single excel file or a matrix on matlab. Since there are 24 excel files in this batch there are supposed to be 24 rows and 936 columns for the OUT matrix
Jonas Freiheit
2021-9-1
Since this is to be used for principal components analysis. I only am interested in the second row since its got the absorption of the spectra and the first row simply is the wavenumbers which is the same for all the excel files and is useless information
Mathieu NOE
2021-9-1
ok I believe I guess what you really want- which is not what I understood so far
so you want the 2nd column (and not the second row ! ) of each data file
then ok the output size will be 24 rows and 936 columns
as all input files have same dimensions, I could make the code simpler and use writematrix instead of writecell
here code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
second_col= [];
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_col= [second_col raw(:,2)]; % extract the second column
end
% write all second columns lines into a matrix and store it in excel file
writematrix(second_col,fullfile(cd,outfile));
Jonas Freiheit
2021-9-1
Sorry I have another question, I need to create a { } cell which contains group names within it.
I need to create one thats a 5046x1 cell. containing 1682 entries saying Spot1 then 1682 entries saying Spot 2 then 1682 entries saying Spot 3. This is to group off my spectra, Do you want me to repost this so you can get more votes?
Thanks
Mathieu NOE
2021-9-1
Hi Jonas
it's not just a question of votes but yes indeed each question / topic should be addressed in a separate post
this way you can also get ore answers because it's not burried in the original post .
更多回答(1 个)
Ive J
2021-8-31
编辑:Ive J
2021-8-31
You can use readmatrix (assuming all values are numeric, otherwise use readtable) or fileDatastore to read those files. Something like this should work:
myfiles = ["file1.xlsx", "file2.xlsx"]; % file names: use dir to generate file names within the target directory
data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
13 个评论
Jonas Freiheit
2021-8-31
Hi Ive J, Thanks for that, I was wondering how can I generate all the file names into myfiles without writing all the files out?
Thanks
Ive J
2021-8-31
This would work:
targetDir = pwd; % only if files are within the current directory
filenames = string({dir(fullfile(targetDir, '*.xlsx')).name}); % converted to string to be used in my example above
Jonas Freiheit
2021-8-31
Hi Ive, should I plug filenames into myfiles as myfiles=[filenames]
Because doing that gives me only 'Output' from the code as the output?
Ive J
2021-8-31
filenames and myfiles are the same. You can use my example as:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.xlsx')).name});data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
% data is the matrix you're trying to generate from your excel files.
Ive J
2021-9-1
What's output? There is no variable called output in my snippet. I used some of your files and it works just fine:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});data = []; % you mentioned you have Excel files, which was wrong
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
myfiles =
"18136a AS ABS BC.CSV"
"18136c AS ABS BC.CSV"
"18136d Deep Ocean 3%.CSV"
"18136f AS ABS BC.CSV"
"18136g Deep Ocean 4.5%.CSV"
data =
401.1380 0
401.1380 0
401.1380 0
401.1380 0
401.1380 0
Jonas Freiheit
2021-9-1
Sorry Ive J, That works but I can't write each file out individually since its 1000 files.
Would you know how to make it work without manually inputting the file names?
Cheers
Ive J
2021-9-1
I didn't manually read files. The second line gets the names of all CSV files within the directory.
Jonas Freiheit
2021-9-1
Sorry, I mean its only printing out a 2x5 matrix with 401.1380 and 0 like shown in the data.
Its supposed to be for example a 25x936 matrix if there are 25 excel files containing infrared spectra with 2 rows and 936 columns each. The 1st row only contains the wavenumber data which needs to be ignored and the 2nd row contains the absorption which is different for every excel file and needs to be extracted and then combined into the final matrix.
I'm using this for principal components analysis.
Ive J
2021-9-1
It's 5X2 matrix because first, I only used 5 CSV files, and secondly, your sample CSV files contained only 2 columns, so it would be simply a 5X2 matrix.
In case you have 1000 files in the folder, myfiles would be a string array of 1000 CSV file names. If each of those CSV files have also 936 columns, then the resulting matrix would be of size 1000X936.
Note that this line
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});
extracts all CSV files within the target directory (path to the folder your CSV files are in), so doesn't matter if there are 5, 1000 or even more CSV files there, myfiles would still contain all those files which then will be looped over to extract the values exist in 2nd line.
Jonas Freiheit
2021-9-1
Sorry there are 936 columns in each excel file it just doesn't work for some reason..
Ive J
2021-9-1
Please attach some of these 1000 files (with 936 columns) you're trying to work with.
Also, please be more specific with ...doesn't work for some reason.. What exact error do you get in command window when running my snippet?
Jonas Freiheit
2021-9-1
Sorry, the error was that it was printing out only 401 and 0. The problem has been solved now I really appreciate the help.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Spreadsheets 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)