Run this code for every certain columns of tables in a cell and saving the result in new cell

Question

BN 2020-2-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/503801-run-this-code-for-every-certain-columns-of-tables-in-a-cell-and-saving-the-result-in-new-cell

评论： BN 2020-2-10

C.mat

Hi all,

I have a cell named C. In the C I have 71 tables. Each table has a column named rrr24. I want to run this code below for each rrr24 in the tables and save the results in another cell named C_new.

Here this code which in the first line just uses the first table in the C.

I want the name of each table at the new cell (C_new) to be the name of each table highlighted in the station_name column of each table. Or copy station_name column from each table and paste it as a first column of the same table in the C_new.

I attached C.

Thank you

td = C{1, 1}.rrr24; % example for C{1, 1}
n=length(td);
sc_set= [3,6,12,24]; 
SI=zeros(n,length(sc_set)); 
for q = 1:length(sc_set)
  sc = sc_set(k);
  if length(td(td>=0))/length(td)~=1
      SI(n,q)=nan;
  else
      SI(1:sc-1,q)=nan;
      A1=[];
      for i=1:sc,
          A1=[A1,td(i:length(td)-sc+i)];
      end
      Y=sum(A1,2);
      nn=length(Y);
      SI1=zeros(nn,1);
      for k=1:12
          d=Y(k:12:nn);
          nnn=length(d);
          bp=zeros(nnn,1);
          for i=1:nnn
              bp(i,1)=sum(d(:,1)<=d(i,1));
          end
          y=(bp-0.44)./(nnn+0.12);
          SI1(k:12:nn,1)=y;
      end
      SI1(:,1)=norminv(SI1(:,1));
      %output
      SI(sc:end,q)=SI1;
  end
end

2 个评论
显示无隐藏无

Guillaume 2020-2-5

I'm sure I've answered a question of yours before, where I advised that having several tables is not a good idea. Most likely, it would be simpler to do whatever you want if you all your data in just one table.

Indeed, here it is: Counting all NaNs..., where the 1st thing I did is merge back all these tables to make the processing much easier.

And I see that Stephen gave you the same advice more recently.

With regards to the code you've posted above, I'm sorry it's unreadable. Good code should have comments and variable names than describe the content of the variable.

BN 2020-2-5

编辑：BN 2020-2-5

C.mat

Dear Guillaume,

You are truly right, I'm sorry I should mention about:

C = vertcat(C{:});

When I have one integrated table it's mean I have one column of stations_name and corresponding rrr24, So then I should use the same rrr24 of stations_name in the code. I'm sorry for my ignorant. Now I have a merged table named C, I want to recognize the rrr24 column of each station_name and use it as an input to this code below, and after that save separated results in a cell.

This code gains SPI in 3, 6, 12, and 24 months intervals which is a Drought Index,

About the output, I want the output to get me for every 71 stations: station_name column and calculated SPI for 3, 6,12, 24, and 48 columns.

If I describe it more clear:

I want to use each rrr24 of station_name as an input to this code against td. and save results for each station_name.

I use this code from Matlab file exchange here is the commented code, sorry this process is complicated for me and I don't know how to handle them. maybe I should use groupsummaty again but haven't any idea about how to do it.

% Source code: https://www.mathworks.com/matlabcentral/fileexchange/51080-nonparametric-standardized-precipitation-index-spi
% this code calculates SPI 1, 3, 6, 12, 24, 48 months for a given station.
%if I had splited C
td = Abadan.rrr24; % get rrr24 of first station_name I want do it for all station_name's rrr24
Date = Abadan.date % date is constant for all station_name
sc_vect=[1 3 6 12 24 48]; % a set including all values for iterations
% sc: scale of the index (>1, e.g., 3-month SPI or SSI)
n=length(td);
SI=zeros(n,numel(sc_vect)); % columns correspond to values in sc_vect
% iterate over sc_vect 
for p=1:numel(sc_vect)
    sc = sc_vect(p);
    
    %For some grid, no observation exist.
    if length(td(td>=0))/length(td)~=1
        SI(n,p)=nan;
    else
        % Obtain the prcp and smc for the specified time scale and
        % compute the standarized drought index (for SPI and SSI)
        SI(1:sc-1,p)=nan;
        
        A1=[];
        for i=1:sc,
            A1=[A1,td(i:length(td)-sc+i)];
        end
        Y=sum(A1,2);
        
        % Compute the SPI or SSI
        
        nn=length(Y);
        SI1=zeros(nn,1);
        
        for k=1:12
      
            d=Y(k:12:nn);
            %compute the empirical probability
            nnn=length(d);
            bp=zeros(nnn,1);
            
            for i=1:nnn
                bp(i,1)=sum(d(:,1)<=d(i,1));
            end
            
            y=(bp-0.44)./(nnn+0.12);
            
            SI1(k:12:nn,1)=y;
        end
        
        SI1(:,1)=norminv(SI1(:,1));
        %output
        SI(sc:end,p)=SI1;
    end
end
SI_table = array2table(SI);
SI_table.Properties.VariableNames = {'SPI_1month' 'SPI_3month' 'SPI_6month'...
    'SPI_12month' 'SPI_24month' 'SPI_48month'};

Thank you so much

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

fred ssemwogerere 2020-2-5

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/503801-run-this-code-for-every-certain-columns-of-tables-in-a-cell-and-saving-the-result-in-new-cell#answer_413974

Hello, I have taken a look at your workspace variable ("C"), and i think you should consider writing that program as a function file, so that you can use it for all tables in the cell array. You can use the code below and save it as your function file in the working (current) folder.

function SI = myfunction(Cn,rnum,colnum,tcol) % save this as a functionfile with filename: "myfunction.m"
%Summary of "myfunction"
% SI = output
% Cn = Cell array storing all tables
% rnum = row index of table in cell array
% colnum = column index of table in cell array
% tcol = column number of table variable ("rrr24") on which you are doing the computations
td = table2array(Cn{rnum, colnum}(:,tcol)); % example for C{1, 1}
n=numel(td);
sc_set= [3,6,12,24]; 
SI=zeros(n,length(sc_set)); 
for q = 1:length(sc_set)
  sc = sc_set(q); % I have replaced "k" with "q" on this line
  if numel(td(td>=0))/numel(td)~=1
      SI(n,q)=nan;
  else
      SI(1:sc-1,q)=nan;
      A1=[];
      for i=1:sc
          A1=[A1,td(i:numel(td)-sc+i)]; %#ok<AGROW>
      end
      Y=sum(A1,2);
      nn=numel(Y);
      SI1=zeros(nn,1);
      for k=1:12
          d=Y(k:12:nn);
          nnn=numel(d);
          bp=zeros(nnn,1);
          for i=1:nnn
              bp(i,1)=sum(d(:,1)<=d(i,1));
          end
          y=(bp-0.44)./(nnn+0.12);
          SI1(k:12:nn,1)=y;
      end
      SI1(:,1)=norminv(SI1(:,1));
      %output
      SI(sc:end,q)=SI1;
  end
end
end

After that, consider using the code below. It should do nicely:

Cnew=cell(1,size(C,2)); % pre-allocating your new array
for r=1:size(C,1)
    for m=1:size(C,2)
        SI{r,m}= myfunction(C,r,m,11);% On this line, "11" is the column index of "rrr24" (your table variable)
        Cnew{r,m}=cell2table(C{r,m}.station_name); % input names under the "station_name" variable of your old tables into the new cell array based on their indexed locations
        Cnew{r,m}.Output=SI{r,m}; % Add column variable, "Output" holding computed values to each table in the new cell array based on their indexed positions
        Cnew{r,m}.Properties.VariableNames{1}='Station Name'; % variable name for first column of each table
    end
end

3 个评论
显示 1更早的评论隐藏 1更早的评论

fred ssemwogerere 2020-2-5

Hello, something like this for example, should do nicely;

varNames={'Station_Name','SPI_3month','SPI_6month','SPI_12month','SPI_24month'};
for k=1:size(Cnew,1)
    for y=1:size(Cnew,2)
        CEdit{k,y} =[(Cnew{k,y}.Station_Name),array2table(Cnew{k,y}.Output)]; %#ok<*SAGROW>
        CEdit{k,y}.Properties.VariableNames=varNames;
    end
end

BN 2020-2-10

Dear fred ssemwogerere,

Hello and Respect

Thank you for your answer. This code worked perfectly and accurately. Today I want to run this code using new input (same as C but with little change in values and number of rows and columns, named Cgrid). I tried to adapt this code (just code, not function) and use it for my new input. Everything is good, the code even runs successfully, But I saw that in the output all tables have similar results, all is 2.02! At the first, I think this is my data issue so I calculate for one table manually and saw there is a difference between manually results and code output. I check everything I cant recognize what is the problem everything seems similar.

Again I want to tell that this code works very well with C. My problem is when I want to use Cgrid as input against C.

As I know this is a new question I created a new one here and wrote edited code based on the name of Cgrid. I would be so grateful if you can look at it and maybe help my problem because I am so sad about it and check everything I knew.

https://www.mathworks.com/matlabcentral/answers/504645-a-problem-with-using-generalized-code

Thank you so much.

请先登录，再进行评论。

Answer 2

Guillaume 2020-2-5

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/503801-run-this-code-for-every-certain-columns-of-tables-in-a-cell-and-saving-the-result-in-new-cell#answer_413987

Whatever the calculation is, yes it can be done with groupsummary. As Fred wrote, you will first have to write a function to perform that calculation. However, this function should take as input: the precipitation for a station and that index scale, something like:

function spi = NonParametricStandardPrecipitationIndex(precip, indexscale)
    %???
end

Then it's a straighforward groupsummary

results = groupsummary(bigtable, 'GroupingVariables', 'station_name', ...
    {@(p) NonParametricStandardPrecipitationIndex(p, 1), ...
     @(p) NonParametricStandardPrecipitationIndex(p, 3), ...
      ..etc}, 'rrr24');

Or you can make it neater by building the list of processing function for your desired scale indices beforehand:

  npspi = arrayfun(@(sc) @(p) NonParametricStandardPrecipitationIndex(p, sc), [1 3 6 12 24 48], 'UniformOutput', false);
  results = groupsummary(bigtable, 'GroupingVariables', 'station_name', npspi, 'rrr24');

Now with regards to the code for that function, first are you sure that the code in that fileexchange entry does what it's supposed to? Having looked at it a bit, it's very badly written! For example I was puzzling over

if length(td(td>=0))/length(td)~=1

which is simply

if any(td < 0)

why such a convoluted expression? Why write if A/B ~= 1 when if A ~= B is a lot clearer. Why calculate length(A(A>=0)) when sum(A>=0) does the same faster?

You may find that starting from scratch may be more efficient than adapting poorly written code.