Count the Number of Times a Specific String Occurs in a given Column

Question

Midimistro 2016-12-23

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/318028-count-the-number-of-times-a-specific-string-occurs-in-a-given-column

评论： Midimistro 2017-1-5

I have the following code that is meant to find the best time offset between TVATime and TVBTime. Each time stamp acts as part of a "block" that must overlap at some point with another "block" in the correct order of blocks. Basically, it acts as a comb in hair. For example, TVBTime creates two subvector: TVBTimeStart & TVBTimeEnd [not shown] to make up a block. Eac TVBTime block also has a set of string identifiers IDed as "IDA" or "IDB", where IDA greatly outnumbers IDB [where 1 (IDB) to every 330 (IDA)].

The current problem with the code is that tempVecD, the stuck in the middle case, has a unique condition where the IDA strings are all that is possible. The problem is that this unique condition is not possible due to the nature of the data, and thus is a false offset. Instead there must be the maximum number of "IDB" strings as possible, although there can be more IDA strings than IDB strings. Is there any better way to create this code, or is there a way to fix the existing code seen below, specifically the numel(toothType(:,toothColumn)=='Transmission')/CPsize==1) line?

TVAsize=numel(TVATime);
TVBsize=numel(TVBTime);
Offsets=UserDefinedStart:.001:UserDefinedEnd;
for b = Offsets
        tic;
        TestVectorAa = TVAStartTime+b;   %Start (typical value is TVATime+.003[user defined: missfixer/2])
        TestVectorAb = TVAFinishTime+b;  %Finish(typical value is TVATime-.003[user defined: missfixer/2])
        TestVectorAc = TVATime+b;        %Center
        for i = 1:TVBsize
            tmpVecA=logical.empty;       %Memory Management
            tmpVecB=logical.empty;
            tmpVecC=logical.empty;
            tmpVecD=logical.empty;
            tmpTypeVec=strings;
            tmpVecA=(TestVectorA >= TVBTimeStart(i)) & (TestVectorA <= TVBTimeEnd(i));%StartVector
            tmpVecB=(TestVectorB >= TVBTimeStart(i)) & (TestVectorB <= TVBTimeEnd(i));%FinishVector
            tmpVecC=(TestVectorC >= TVBTimeStart(i)) & (TestVectorC <= TVBTimeEnd(i));%CenterVector
            tmpVecD=(TestVectorA <= TVBTimeStart(i)) & (TestVectorB >= TVBTimeEnd(i)) & (round(abs(TestVectorA-TestVectorC),6)==round(missFixer/2,6));%Stuck-in-Middle Case 1
            tmpTypeVec=transmissionType(i); 
            tmpVec=(tmpVecA|tmpVecB|tmpVecC|tmpVecD);
            if (  any(find(tmpVec==1))  )
            %see if a value of TVA falls within TVB and its life (or visa versa)
            %if it does:
            %1) add a tick to be used as a percentage later
            %2) add the corresponding TVB identifier (either "IDA" or "IDB"              
                toothCount(tmpVec==1, toothColumn) = toothCount(tmpVec == 1, toothColumn) + 1;
                toothType(tmpVec==1,toothColumn)=toothType(tmpVec==1,toothColumn)+tmpTypeVec;
            end
        end
        %The following line is where the code fails
        %(always seems to go to this if, even if the column not 100% filled with "IDA"):
        if(numel(toothType(:,toothColumn)=='IDA')/TVAsize==1)
            %"IDA" count must be less than 100% in any given column.
            %If it is equal to 100%, do the following:
            disp(['Bad Match']);
            quality(toothColumn)='bad';
            %Reset toothCount for that column to 0 since it is providing an impossible match.
        else
            quality(toothColumn)='good';
        end;
        tick=(numel(find(toothCount(:,toothColumn)~=0)));
        disp(['ticks = ', num2str(tick)]);
        tickcount(end+1) = tick;
        %tickcount = [tickcount; tick];
        percentCalc = tick/CPsize*100.0;
        %calculate the precentage
        disp(['percent = ', num2str(percentCalc)]);
        offset(end+1)= b; %adds an additional element of the offset "a" to the growing vector of "offset" to be used for later comparison
        %offset = [offset; b] % Legacy version, column vector format
        percent(end+1)= percentCalc; %does same thing as previous line.
        %percent = [percent; percentCalc] %Legacy version, column vector format
        percentCalc = 0; %reset percentCalc
        disp(['percent reset = ', num2str(percentCalc)]);        
        toothColumn=toothColumn+1;
        toc;
    end
  %Find max Column with maximum "IDB" Count
  IDBPerCol=sum((toothType=='IDB'),1);
  maxIDBIndex=find(max(IDBPerCol));
  %show the value closest to true offset
  [bestPercent] = percent(maxIDBIndex);
  bestOffset = offset(maxRIDBIndex);
  bestTick = tickcount(maxIDBIndex);

Heres a small data sample set:

TVA=[1.002; 1.017; 32.006; 32.027; 33.100; 60.003; 60.028; 60.051]; %significantly different size than TVB
TVBStart=1:.0157:75;
TVBEnd=1.000256:.0157:75.000256; %Same size as TVBStart
TVBID=???; %Can be randomly generated; Must be where IDA is the primary, and IDB is sporadic; same size as TVBStart and TVBEnd;
missFixer=.006; %not included is the code that divides missFixer by 2 and uses it to create TVAStartTime and TVAEndTime
UserDefinedStart=-10; %or user defined value, works in seconds
UserDefinedEnd=10; %or user defined value, works in seconds

Let me know if you need any additional details or have questions. Without tmpVecA, tmpVecB, tmpVecD (which implement the "block" ability for TVATime) and without the string comparison implementation, the code runs fine and returns the expected offsets (let me know if you need the code for this). The problem is when I give TVATime a width like TVB, but I need this width for TVA for closer investigation of the data.

The output needs to be the best offset and best percent, where the best offset is located where there the "IDB" string count in a given column of toothType is at its highest and tickcount is at its highest (in otherwords, all elements in TVA have an equal in TVB (maximum tick count), and that this same column in toothType has the highest count of "IDB" strings possible.

I'll understand if this is extremely hard for anyone to grasp.

2 个评论
显示无隐藏无

John BG 2016-12-23

and TVBTime or a sample not supplied because ..

Midimistro 2017-1-5

编辑：Midimistro 2017-1-5

because TVBTime is the struct that consists of TVBStart and TVBEnd. As mentioned before, TVATime and TVBTime are nothing more than "blocks" that consist of 2 arrays of the same size that contain a starting time and an end time for the respective test vector.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Greg 2016-12-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/318028-count-the-number-of-times-a-specific-string-occurs-in-a-given-column#answer_248358

编辑：Greg 2016-12-24

Replace "numel" with "sum" (or "nnz" if you like...)

numel(toothType(:,toothColumn)=='IDA') --> sum(toothType(:,toothColumn)=='IDA')

Also, I recommend using strcmp instead of ==, but that's not part of the original question.

2 个评论
显示无隐藏无

Greg 2016-12-24

I further recommend comparing the 2 sizes directly, rather than the dividend to 1. I.e., sum(...) == TVAsize

Midimistro 2017-1-5

在 MATLAB Online 中打开

Your answer is correct, however both of us missed the following additional correction:

Original:

maxIDBIndex=find(max(IDBPerCol));

Fix:

maxIDBIndex=(max(IDBPerCol)==IDBPerCol);

The original was finding the actual max value, not the index, which was what I needed. The fix finds the indexes (locations) where the max value exists. Now the code works flawlessly. Thank you! and I didn't even expect anyone to even solve/understand half of what I was trying to accomplish.... I give you credit for that :)

请先登录，再进行评论。

Count the Number of Times a Specific String Occurs in a given Column

2 个评论
显示无隐藏无

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Count the Number of Times a Specific String Occurs in a given Column

2 个评论 显示 无隐藏 无

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无