possible Bug? or checking if OverFlow happened?
1 次查看(过去 30 天)
显示 更早的评论
I have a matrix of size (6472908 x 67) all single values. Different columns have different max/min (there are different variables.
So I calculate mean of each column using
avgData=mean(Data);
I am expecting the first value in avgData to be the mean of the first column. However, when I issue
avgData(1) - mean(Data(:,1))
ans =
-100.9785
as you can see the output is not zero. So what is changed? The same thing happened if I convert everything to double.
If I do this for sum() the difference is even more. So, I was wondering if overflow is happening and how should I check if overflow has happened? lastwarn() returns nothing.
I am afraid the X is about 800MB and can't upload it here.
0 个评论
采纳的回答
John D'Errico
2015-3-31
No. It is not a bug, but an artifact of operations that may be done in a different order due to the BLAS, or whatever scheme is used internally. NEVER assume that two distinct operations will do a given computation in the same order. It might conceivably reflect an issue of whether a double precision accumulator might be employed for vector input to sum (again, a choice probably made in the BLAS), but not for array input.
If the min and max values vary by such a large amount, that difference is trivial, essentially down in the least significant bits of the result, especially when you are summing millions of such elements.
You have not yet said what the total mean was either, so we cannot know how significant is the difference.
As for this being an overflow, that is not at all reasonable to assume. The numbers you have described are simply not large enough to cause overflow, AND if they did overflow, overflows in floating point result in inf, NOT a loss of precision.
realmax('single')
ans =
3.4028e+38
realmax('single')*2
ans =
Inf
The problem here is clearly an issue of bits lost at the low end, due to variation in the sequence of adds in these numbers.
You can test that claim by computing the mean in different sequences of your vector. For example, try this test several times:
mean(data(randperm(6472908),1))
then look at the differences.
As well, compare those differences to the size of the actual mean. How does that difference compare to eps for that same number?
2 个评论
per isakson
2015-3-31
编辑:per isakson
2015-3-31
R2013b,64bit,Win7
With 'single' I get identical mean values; i.e. order doesn't affect the value
>> R = randn([6472908,67],'single') + 12054;
>> format hex
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
and with 'double' there is a "rounding error" in the two last hex-positions, i.e. order does affect the value
>> R = randn([6472908,67],'double') + 12054;
>> mean(R(randperm(6472908),1))
ans = 40c78afff205ede5
>> mean(R(randperm(6472908),1))
ans = 40c78afff205edd3
>> mean(R(randperm(6472908),1))
ans = 40c78afff205eda7
更多回答(2 个)
James Tursa
2015-3-31
编辑:James Tursa
2015-3-31
How large is the average value? Is 100 close to eps of this average value? E.g., is the average value near 1e17? If so, this could just be rounding differences in the methods used. I.e., maybe the problem is split up differently for multi-threading if there are several columns of a matrix involved vs just one column.
"...The same thing happened if I convert everything to double..."
You get the exact same result? Or you get a similar result (i.e., something not "close" to zero.). It would not surprise me that in the background MATLAB uses double accumulators even if the input is single, which might explain the same result in single vs double.
EDIT:
Probably the better comparison would be eps of the sum, not eps of the average.
EDIT:
You might also consider using this FEX contribution from Jan Simon:
Image Analyst
2015-3-31
Yes, this is a known issue. We've converted arrays from double to single to save on memory yet when calling mean(), the means may not be correct. Basically it's ignoring the later elements as you add them because the sum is so huge and a tiny value added onto a gigantic value basically does not get added because it's so small. Basically underflow . We contacted the Mathworks to ask them. The Mathworks knows about this and does not consider it a bug , but just a normal precision issue comparable to this issue.
3 个评论
Roger Stafford
2015-4-1
"I expect the same results" <-- This is an assumption you should not make. Strictly speaking, even the associative and distributive laws of arithmetic are violated when computation is subject to round-off errors. When a series of numbers is added, the results can depend on how they are grouped together in the addition process. I like to give the following as an example which will usually give differing results even on a decimal calculator:
3/14+(3/14+15/14)
(3/14+3/14)+15/14
In computing mean(Data(:,1)) one cannot assume that the algorithm used is identical to that used for the first column of mean(Data). It all depends on just how the programmers who did the coding decided to handle the two different situations. Perhaps it depends on what they ate for breakfast? The basic assumption is that if different results agree within round-off error with perfect results, then either is acceptable.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!