Forgive me if you are already aware of this, but I found it quite alarming. I know that most code is interpreted by the computer in binary and we input in decimal, so problems can arise in conversion and with floating point. But the example I have below is so simple that it really surprised me.
I was converting a function from R into MATLAB so that a colleague could use it. I tested it out on the same data and got slightly different results. Digging into the problem, the difference was due to the fact that R was rounding 4.5 to 4 and MATLAB was rounding it to 5. I thought the "4.5" must have really been "4.49999...". But that was not so.
For example, this is the result of the round function for a few numbers.
Do you see a pattern?
I tried this on versions 2.13.1 and 2.14.0. I ran the same with MATLAB and it gave the expected results. I am not any kind of expert on computer sciences, so I was not sure why this is happening. Converting any decimal number that ends in .5 into binary results in a finite length binary number. For example, 4.5 is 100.1 in binary. Because of this, I wouldn't think the error would be due to floating points, but I really don't know.
Looking at the documentation for round, I found the reason. It states in the notes, "Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’." It is a little comforting knowing that there is a logic behind it and that R is abiding to some standard. But why isn't MATLAB abiding by the same standard? Also, I think most people expect numbers ending in .5 to round up, not the nearest even digit.
I was converting a function from R into MATLAB so that a colleague could use it. I tested it out on the same data and got slightly different results. Digging into the problem, the difference was due to the fact that R was rounding 4.5 to 4 and MATLAB was rounding it to 5. I thought the "4.5" must have really been "4.49999...". But that was not so.
For example, this is the result of the round function for a few numbers.
> round(0.5,0)
[1] 0
> round(1.5,0)
[1] 2
> round(2.5,0)
[1] 2
> round(3.5,0)
[1] 4
> round(4.5,0)
[1] 4
> round(5.5,0)
[1] 6
> round(6.5,0)
[1] 6
Do you see a pattern?
I tried this on versions 2.13.1 and 2.14.0. I ran the same with MATLAB and it gave the expected results. I am not any kind of expert on computer sciences, so I was not sure why this is happening. Converting any decimal number that ends in .5 into binary results in a finite length binary number. For example, 4.5 is 100.1 in binary. Because of this, I wouldn't think the error would be due to floating points, but I really don't know.
Looking at the documentation for round, I found the reason. It states in the notes, "Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’." It is a little comforting knowing that there is a logic behind it and that R is abiding to some standard. But why isn't MATLAB abiding by the same standard? Also, I think most people expect numbers ending in .5 to round up, not the nearest even digit.
#Definition of a function for "rounding in commerce"
cround = function(x,n){
vorz = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*vorz
}
# Example
> round(seq(0.5,6.5,1),0)
[1] 0 2 2 4 4 6 6
> cround(seq(0.5,6.5,1),0)
[1] 1 2 3 4 5 6 7
Sadly, the flame wars over "round to even" vs. "round up" continue, rather the way people argue about "0.999... != 1"
PS: @a Tom: I'm highly skeptical of your
claim about 2.46-->3. Do you have a citation?
I understand that in many middle east countries they start with the far right digit and round up or down, so 2.46 is rounded to 3!
Can you imagine trying to teach a 10 or 12 year old the IEC 60559 standard? Unfortunately, this is the method most adults are used to...
I agree, it is a little troubling that Matlab doesn't abide by the standard. Yet another reason to stick with R!
This method also treats positive and negative values symmetrically, and therefore is free of overall bias if the original numbers are positive or negative with equal probability. In addition, for most reasonable distributions of y values, the expected (average) value of the rounded numbers is essentially the same as that of the original numbers, even if the latter are all positive (or all negative). However, this rule will still introduce a positive bias for even numbers (including zero), and a negative bias for the odd ones.
So round-to-even seems to have *slightly* better numerical properties than "round ties away from zero", which is what is (I think) most often taught, because it's easier to understand. http://www.mathworks.com/matlabcentral/fileexchange/6752 gives a MATLAB function for "round to even".
If I had to guess I would predict that in borderline cases (which this certainly is) MATLAB would favor "do what will lead to happier users" and R would favor "do what is thought to be the best numerical practice".
I'm not sure I understand what you mean by "expected results"?
Regarding rounding, I was taught to round numbers ending in "1, 2, 3, and 4" *down*, and numbers that ended in "6, 7, 8, 9" *up*. Then, specifically regarding "5", if the preceding digit is odd, round up and if the preceding digit is even, to round down.
As you can see, this will then result in 50% of the numbers being rounded up, and 50% rounded down. If you round *down* on "1, 2, 3, 4" and round up on "5, 6, 7, 8, 9" you are rounding up 5/9th's of the time, and so introducing a bias.
It sounds like R is handling it the way I would. Is that what you were wondering about?
B. D. McCullough and H. D. Vinod
"The Numerical Reliability of Econometric Software,"
Journal of Economic Literature 37(2), 633-665, 1999
A temporary link is available here:
http://www.pages.drexel.edu/~bdm25/jel.pdf