Count the Number of Times a Specific String Occurs in a given Column
9 次查看(过去 30 天)
显示 更早的评论
I have the following code that is meant to find the best time offset between TVATime and TVBTime. Each time stamp acts as part of a "block" that must overlap at some point with another "block" in the correct order of blocks. Basically, it acts as a comb in hair. For example, TVBTime creates two subvector: TVBTimeStart & TVBTimeEnd [not shown] to make up a block. Eac TVBTime block also has a set of string identifiers IDed as "IDA" or "IDB", where IDA greatly outnumbers IDB [where 1 (IDB) to every 330 (IDA)].
The current problem with the code is that tempVecD, the stuck in the middle case, has a unique condition where the IDA strings are all that is possible. The problem is that this unique condition is not possible due to the nature of the data, and thus is a false offset. Instead there must be the maximum number of "IDB" strings as possible, although there can be more IDA strings than IDB strings. Is there any better way to create this code, or is there a way to fix the existing code seen below, specifically the numel(toothType(:,toothColumn)=='Transmission')/CPsize==1) line?
TVAsize=numel(TVATime);
TVBsize=numel(TVBTime);
Offsets=UserDefinedStart:.001:UserDefinedEnd;
for b = Offsets
tic;
TestVectorAa = TVAStartTime+b; %Start (typical value is TVATime+.003[user defined: missfixer/2])
TestVectorAb = TVAFinishTime+b; %Finish(typical value is TVATime-.003[user defined: missfixer/2])
TestVectorAc = TVATime+b; %Center
for i = 1:TVBsize
tmpVecA=logical.empty; %Memory Management
tmpVecB=logical.empty;
tmpVecC=logical.empty;
tmpVecD=logical.empty;
tmpTypeVec=strings;
tmpVecA=(TestVectorA >= TVBTimeStart(i)) & (TestVectorA <= TVBTimeEnd(i));%StartVector
tmpVecB=(TestVectorB >= TVBTimeStart(i)) & (TestVectorB <= TVBTimeEnd(i));%FinishVector
tmpVecC=(TestVectorC >= TVBTimeStart(i)) & (TestVectorC <= TVBTimeEnd(i));%CenterVector
tmpVecD=(TestVectorA <= TVBTimeStart(i)) & (TestVectorB >= TVBTimeEnd(i)) & (round(abs(TestVectorA-TestVectorC),6)==round(missFixer/2,6));%Stuck-in-Middle Case 1
tmpTypeVec=transmissionType(i);
tmpVec=(tmpVecA|tmpVecB|tmpVecC|tmpVecD);
if ( any(find(tmpVec==1)) )
%see if a value of TVA falls within TVB and its life (or visa versa)
%if it does:
%1) add a tick to be used as a percentage later
%2) add the corresponding TVB identifier (either "IDA" or "IDB"
toothCount(tmpVec==1, toothColumn) = toothCount(tmpVec == 1, toothColumn) + 1;
toothType(tmpVec==1,toothColumn)=toothType(tmpVec==1,toothColumn)+tmpTypeVec;
end
end
%The following line is where the code fails
%(always seems to go to this if, even if the column not 100% filled with "IDA"):
if(numel(toothType(:,toothColumn)=='IDA')/TVAsize==1)
%"IDA" count must be less than 100% in any given column.
%If it is equal to 100%, do the following:
disp(['Bad Match']);
quality(toothColumn)='bad';
%Reset toothCount for that column to 0 since it is providing an impossible match.
else
quality(toothColumn)='good';
end;
tick=(numel(find(toothCount(:,toothColumn)~=0)));
disp(['ticks = ', num2str(tick)]);
tickcount(end+1) = tick;
%tickcount = [tickcount; tick];
percentCalc = tick/CPsize*100.0;
%calculate the precentage
disp(['percent = ', num2str(percentCalc)]);
offset(end+1)= b; %adds an additional element of the offset "a" to the growing vector of "offset" to be used for later comparison
%offset = [offset; b] % Legacy version, column vector format
percent(end+1)= percentCalc; %does same thing as previous line.
%percent = [percent; percentCalc] %Legacy version, column vector format
percentCalc = 0; %reset percentCalc
disp(['percent reset = ', num2str(percentCalc)]);
toothColumn=toothColumn+1;
toc;
end
%Find max Column with maximum "IDB" Count
IDBPerCol=sum((toothType=='IDB'),1);
maxIDBIndex=find(max(IDBPerCol));
%show the value closest to true offset
[bestPercent] = percent(maxIDBIndex);
bestOffset = offset(maxRIDBIndex);
bestTick = tickcount(maxIDBIndex);
Heres a small data sample set:
- TVA=[1.002; 1.017; 32.006; 32.027; 33.100; 60.003; 60.028; 60.051]; %significantly different size than TVB
- TVBStart=1:.0157:75;
- TVBEnd=1.000256:.0157:75.000256; %Same size as TVBStart
- TVBID=???; %Can be randomly generated; Must be where IDA is the primary, and IDB is sporadic; same size as TVBStart and TVBEnd;
- missFixer=.006; %not included is the code that divides missFixer by 2 and uses it to create TVAStartTime and TVAEndTime
- UserDefinedStart=-10; %or user defined value, works in seconds
- UserDefinedEnd=10; %or user defined value, works in seconds
Let me know if you need any additional details or have questions. Without tmpVecA, tmpVecB, tmpVecD (which implement the "block" ability for TVATime) and without the string comparison implementation, the code runs fine and returns the expected offsets (let me know if you need the code for this). The problem is when I give TVATime a width like TVB, but I need this width for TVA for closer investigation of the data.
The output needs to be the best offset and best percent, where the best offset is located where there the "IDB" string count in a given column of toothType is at its highest and tickcount is at its highest (in otherwords, all elements in TVA have an equal in TVB (maximum tick count), and that this same column in toothType has the highest count of "IDB" strings possible.
I'll understand if this is extremely hard for anyone to grasp.
2 个评论
采纳的回答
Greg
2016-12-24
编辑:Greg
2016-12-24
Replace "numel" with "sum" (or "nnz" if you like...)
numel(toothType(:,toothColumn)=='IDA') --> sum(toothType(:,toothColumn)=='IDA')
Also, I recommend using strcmp instead of ==, but that's not part of the original question.
2 个评论
Greg
2016-12-24
I further recommend comparing the 2 sizes directly, rather than the dividend to 1. I.e., sum(...) == TVAsize
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Simulink Functions 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!