Problem with average two cells array

Dear expert,
I have this kind of problem: I have one cell array with 4420 cells, and in every cell there are 90 double values (in every cell I have one time series with 90 points). I need to keep the cells from 1 to 85 (every cell represent a brain's region) and make an average with the cells from 2211 to 2295 (the same 85 regions of the same brain, but it is different run, so different values). In other words I need to make an average between two time series of the same brain's region but from two different runs. After that I need to keep cells from 86 to 170 and make a average with cells from 2296 2380, and so on, until I will finish the 4420 array cell (I have 26 different brain for run1 and the same 26 brain for run2 ---> 26x85x2=4420). I post below the code that generate this cell array (AllResult).
%%Root Path
pathroot = 'C:\Temporal_series';
%%first level folder
MyExamDir = [30852 22061 20769 21734 21735 21977 20856 21976 20086 30697 30630 19993 30018 28832 19725 22440 28333 22439 22587 22586 21403 30944 21405 30943 22337 30948];
% convert it to string : easier to treat as folder Names.
MyStringDir = cellfun(@num2str,num2cell(MyExamDir),'UniformOutput',false);
% Initialize The Output Data (which will contain all the results
% Here, I assume all the files containts a 90x1 vector, so i will concatenate to create an array.
AllResult = [];
%%loop on every Exam folder
for i = 1:length(MyExamDir)
%%get all ".gz.txt" in the run1 folder
CurrentDir = fullfile(pathroot,MyStringDir{i},'run1');
AllFile = dir(fullfile(CurrentDir,'*gz.txt'));
% loop for each file
for j = 1:size(AllFile,1)
% current file
CurrentFile = fullfile(CurrentDir,AllFile(j).name);
% try to open
[fid, errormsg] = fopen(CurrentFile, 'r+');
if ~isempty(errormsg)
warning('failed to open %s due to %s', CurrentFile, errormsg);
else
A1 = fscanf(fid,'%f %f', [90 1]);
A1 = A1';
AllResult{end+1,1} = A1;
fclose(fid);
end
end
%%same operation for run2
CurrentDir = fullfile(pathroot,MyStringDir{i},'run2');
AllFile = dir(fullfile(CurrentDir,'*gz.txt'));
% loop for each file
for j = 1:size(AllFile,1)
% current file
CurrentFile = fullfile(CurrentDir,AllFile(j).name);
% try to open
[fid, errormsg] = fopen(CurrentFile, 'r+');
if ~isempty(errormsg)
warning('failed to open %s due to %s', CurrentFile, errormsg);
else
A1 = fscanf(fid,'%f %f', [90 1]);
A1 = A1';
AllResult{end+1,1} = A1;
fclose(fid);
end
end
end
Could someone help me? Thanks in advance for your attention.
Lorenzo

回答(1 个)

There isn't any reason to store your data in a cell array to start with, if it contains matrices of identical size.
Anyway, is this what you want:
regioncount = 85
braincount = 26
allbrainsregionsruns = cell2mat(AllResults); %convert cell to matrix, each row is a region
endrun1 = braincount * regioncount + 1; %row at which first run 1
brainsregionsrun1 = allbrainsregionsruns(1:endrun1, :); %all brains and regions of first run
brainsregionsrun2 = allbrainsregionsruns(endrun1+1:end, :); %all brains and regions of second run
brainregaverage = mean(cat(3, brainsregionsrun1, brainregionsrun2)); mean of both runs
brainregaverage is an (85x26) x 90 matrix, where the first 85 rows are the two runs average of the first brain, the next 85 the two runs average of the 2nd etc.
You could then divide that into a cell array of brains with:
brainaverage = mat2cell(brainregaverage, ones(1, braincount) * regioncount, size(brainregaverage, 2));
Each cell of brain average is the 85 x 90 matrix of a brain.

22 个评论

This is the error message, using your code:
Error using cat
Dimensions of matrices being concatenated are not consistent.
What are the sizes of the matrices then?
In the workspace there aren't brainregionsrun1 and brainregionsrun2.
If I write only first part of your code,
regioncount = 85;
braincount = 26;
allbrainsregionsruns = cell2mat(AllResult); %convert cell to matrix, each row is a region
There is ever the same error message like before. So, the error depends on cell2mat.
Any idea why?
When reporting an error, copy the entire error message, including the bit that shows the line on which it occurs.
Since, you didn't do that, I assumed the error was on the last line, the one with a cat, rather than the one with cell2mat.
Anyway, if cell2mat does not work, it's because AllResults is not as you have stated. "in every cell I have one time series with 90 points" is not the case.
What are the values of sizeelem and diffcell when you do the following:
sizeelem = size(AllResult{1})
diffcell = find(cellfun(@(e) ~isequal(sizeelem, size(e)), AllResult)
Sorry, you are right, I'm a little bit confused.
In cell array AllResult I have 4420 cells, in which I have 90 double values like a row vector (I can't explain better).
sizeelem = [1 90]
diffcell = < 85x1 double >
Have a look at the cells whose indices are in diffcell. These are all cells that do not have 90 double values.
Since there are 85 of them, which is your region count, it looks like one of your brain run isn't right.
Ok, I find the error. You're right another one. There is one brain that have 85 cells, but in every cell there are only 89 double values.
I must repair this error, and maybe the code will be fine.
I'll lett you know. Thank you for your great help
But, in case, is there another method to do the same thing that you descibe to me, also if in these particular cells there are only 89 double values?
Because I suppose that I can't obtain the missing values (they come from a long analysis, which I can't replicate)
Since it's only one of the run that's missing a value. How do you expect to average it with the 90 values of the other run?
The simplest thing would be for you to work out the position of the missing value and insert a NaN there:
misingcolumn = ???;
for c = diffcell
AllResult{c} = [AllResult{c}(1:missingcolumn-1) NaN AllResult{c}(missingcolumn:end)];
end
Ok, thank you, I try this solution
Hi Guillaume,
I managed to get the missing data. I used your code, like you wrote me.
The message error (entire error message) is:
??? Error using ==> cat
CAT arguments dimensions are not consistent.
Error in ==> cell2mat at 89
m{n} = cat(1,c{:,n});
Error in ==> load_def_Orione at 74
allbrainsregionsruns = cell2mat(AllResult); %convert cell to matrix, each row is a region
Do you know why?
There's still one or more cell in your array that is not the same size as the others. So, again:
sizeelem = size(AllResult{1})
diffcell = find(cellfun(@(e) ~isequal(sizeelem, size(e)), AllResult)
will tell you which one(s).
Note that my code is fairly simple. it convert your cell array into a matrix (hence why all cells need to be the same size), split that matrix in two (one for each run), and rejoin it along the third dimension, then calculate the average along that third dimension.
I understand your code, but now my problem is changed again (sorry).
I have 26 exams (or brain), every one with 2 runs, and every run has 85 regions, and every region has 90 double values. One of these 26 brain, has 85 regions, but every region has only 89 values. I want to average two runs each other for every brain, and the two runs with only 89 values per region will average each other, so, in theory, there isn't problem for average two brains with the same number of regions per run and with the same number of values (89 for this one and 90 for the other 25 brains). Do you understand?
Finally, I didn't obtain the 90th value (I wanted but I didn't), instead I eliminated the respective value for the other run.
So, now, can I use your code also if I have, for only one brain, 89 values for every region for the two runs? I think that I can't use the same matrix that I will use for the other brains. So, any ideas?
Thank you again for your great help
The problem is that your storage structure doesn't reflect your data structure. You would have been better off with a cell array of brains, each of these cell arrays a cell array of regions.
I think the simplest thing to do now is to add a dummy value to your shorter sequences so they're all the same size:
shorterbrain = ? %index of shorter brain
regioncount = 85;
braincount = 26;
startrun1 = (shorterbrain-1) * regioncount + 1;
startrun2 = (shorterbrain+braincount-1) * regioncount + 1;
for row = [startrun1:startrun1+regioncount startrun2:startrun2+regioncount]
Allresults{row} = [Allresultst{row} NaN];
end
You can then apply my initial algorithm.
The alternative is to use loops.
Ok, your idea is good.
The problem is that I can't add a dummy value (NaN) to my data, because after average my runs (remember that my 90 values are time series), I will do correlation anaysis between all pairs of time series. So, I can't use a NaN, because I don't know what I will obtain in this way
(what should I expect if I'll make Pearson's correlation between two time series, if one of those has a NaN inside their values??)
At the end of my original post, I convert the matrix of averages back to a cell array where each cell is a brain. At that point you just removed the extra element from the relevant brain:
brainaverage = mat2cell(brainregaverage, ones(1, braincount) * regioncount, size(brainregaverage, 2));
brainaverage{shorterbrain} = brainaverage{shorterbrain}(:, 1:end-1);
Hi Guillaume,
I had finally obtained my cell array with 25 cells, in which I have a matrix 85x90 in each cell.
But I excluded the exam with only 89 values in temporal series.
Do you know if I can add this single matrix to my cell-array? Can I have cells with different dimensions (25 cells with 85x90 matrices and only one matrix with 89x90) in the same cell-array? Can I do that?
The whole purpose of cell arrays is to be able to store matrices of different sizes, so yes it will work. That's just what my last answer did. Why didn't you try it?
Sorry, I thought I not to have sent the message. I did it. But, unfortunately, I found a strange problem:
In my first post I wrote the code that I used to read data from my files. I found that only the first time series of each run is correct (first of 85) instead the other time series, the other 84, aren't the same that I have in my files... I don't understand why.
So now in my vector AllResults (4420x90), only 1st, 86th, 171th, ecc. rows are correct, the others don't corrispond to my original data. I must repair this error.
+1 to Guillaume for pointing out that "The problem is that your storage structure doesn't reflect your data structure. You would have been better off with a cell array of brains, each of these cell arrays a cell array of regions."
Good design of the data structures goes a long way to helping solve many data-processing problems...
Hello Lorenzo,
This is not how this forum works. You ask a question, and when the question is answered, you accept the answer so that the answerer gets reputation points.
If you then have another question, as is the case here, you start a new question. It gives the new question more visibility and gives other people a chance to answer and get the reputation points.

请先登录,再进行评论。

类别

帮助中心File Exchange 中查找有关 Matrix Indexing 的更多信息

提问:

2014-11-21

评论:

2014-12-5

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by