probability distribution from a simple vector
15 次查看(过去 30 天)
显示 更早的评论
Assume a vector like... 1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 4
How can I calculate the likelyhood that nr 2 follows after nr 3 or nr 1 follows after nr 2? Ideally I would like to display this relationship for all numbers in a probability distribution.
0 个评论
采纳的回答
Rik
2018-8-17
编辑:Rik
2018-8-20
Use meshgrid to generate all combinations and loop through them to count all occurrences. To convert to probability, divide by the total number of elements.
data=[1 3 2 3 1 4 2 3 1 3 4 2 1 1 2 3 5];
[first,second]=meshgrid(unique(data));
out=zeros(size(first));
for k=1:numel(first)
out(k)=sum(...
data(1:(end-1))==first(k) &...
data(2:end)==second(k));
end
P=out/numel(data);
figure(1),clf(1)
x=1:size(out,1);
y=size(out,2):-1:1;%flip y-direction
y_label=cellfun(@(x) num2str(x),num2cell(y),'UniformOutput',0);
image(x,y,P,'CDataMapping','scaled')
colormap(gray)
set(gca,'XTick',x)
set(gca,'YTick',y(end:-1:1))
set(gca,'YTickLabel',y_label)
xlabel('First value')
ylabel('Second value')
3 个评论
Rik
2018-8-20
To more easily add a scale, I've changed the previous code from imshow to image. I've also flipped the y-direction to have the (0,0) position in the lower left corner.
更多回答(2 个)
John D'Errico
2022-6-8
Note that the use of meshgrid is wildly inefficient, if all you want to know is count the frequency of one number following another. For example, suppose the vector had a length of 1e6? Then you would be generating matrices with meshgrid of size 1e7 by 1e7. Do you really want that? Do you have enough memory?
M = 1e7*1e7;
disp("At a minimum, approximately " + M/1e9/8 + " gigabytes of RAM will be required to perform your computation.")
That seems like a lot, so unless you have god's computer on your desktop, you might consider alternatives. :)
For example:
n = 1e7;
datavector = randi(7,[1,n]);
% counts(i,j) gives the number of events where i fell directly before j in the vector
ind = (1:n-1);
counts = accumarray([datavector(ind);datavector(ind+1)]',1)
And for the actual frequency of those events in this sample, we have:
freq = counts/n
So we see a remarkably uniform distribution, as would be expected in this specific case, since randi will indeed be a uniform random genertor of integers.
Our expectation for the true frequency would be (as the sample size approaches infinity) is of course:
format long
1/(7*7)
0 个评论
Steven Lord
2022-6-8
If you only have a small number of potential states (and they're all integer values) you could try histcounts2.
rng default
A = randi(6, 100, 1);
histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers')
Let's validate the 5 that is in element (5, 2).
locationOfFirst5 = find(A(1:end-1) == 5 & A(2:end) == 2)
There are five (5, 2) pairs.
A([locationOfFirst5, locationOfFirst5+1])
The other 5s are followed by other values.
other5s = find(A(1:end-1) == 5 & A(2:end) ~= 2);
A([other5s, other5s+1])
any(A(other5s+1) == 2) % false
If you want probabilities use a different Normalization.
histcounts2(A(1:end-1), A(2:end), 'BinMethod', 'integers', 'Normalization', 'probability')
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!