The best approach to avoid the Kullback–Leibler divergence equal to infinite
85 次查看(过去 30 天)
显示 更早的评论
Given two discrete probability distributions P and Q, containing zero values in some bins, what is the best approach to avoid the Kullback–Leibler divergence equal to infinite (and therefore getting some finite value, between zero and one)? Is there any function in Matlab that could calculate the Kullback–Leibler divergence correctly, i.e. by solving this issue?
Here below an example of calculation of the Kullback–Leibler divergence between P and Q, which gives an infinite value. I am tempted to manually remove "NaNs" and "Infs" from "log2( P./Q )", but I am afraid this is not correct. In addition, I am not sure that smoothing the PDFs could solve the issue...
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1); % sum(P) = 0.999999999
Q = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M = numel(P);
P = reshape(P,[M,1]);
Q = reshape(Q,[M,1]);
KLD = nansum( P .* log2( P./Q ) )
log2( P./Q )
0 个评论
采纳的回答
Matt J
2023-6-27
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1); % sum(P) = 0.999999999
Q = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M = numel(P);
P = reshape(P,[M,1]);
Q = reshape(Q,[M,1]);
tf=P~=0 & Q~=0;
KLD = nansum( P(tf) .* log2( P(tf)./Q(tf) ) )
2 个评论
更多回答(2 个)
the cyclist
2023-6-27
Your question is not really a MATLAB question, but a math/stats questions. (At least, it seems you already understand how to remove the NaNs and infinities, but are concerned about the theoretical validity of that.)
I don't know if it will help you or not, but the best explanation I found online about this is in this mathoverflow answer. I don't think I could explain the issue any more clearly myself, so I'll just point you there.
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!