The same behavior can be observed using `numpy`

directly. However, this only appears when using `dtype=np.float32`

rather than `dtype=np.float64`

. Change your `dtype`

to `np.float64`

to correct the problem.

In order to understand why, you must understand how are floating point numbers stored in memory. Let us consider `a`

when represented with single precision and with double precision:

```
import numpy as np
a = -1.55786165e+14
a_single = np.array([a], dtype=np.float32)
a_double = np.array([a], dtype=np.float64)
a_single[0], a_double[0], a
# The line above prints:
# (-155786160000000.0, -155786165000000.0, -155786165000000.0)
```

As you can see, `a`

is truncated when using single precision. But why is that?

The base-2 logarithm of `abs(a)`

is between 47 and 48. Hence, `a`

can be written as `-1 * 2^47 * 1.x`

. When representing floating points number, one has to encode the exponent (48) and the fraction (x).

In our case, `.x`

would be approximately equal to:

```
-a / pow(2, 47) - 1
```

which is equal to `0.1069272787267437`

. Now, what we want is writing this number as a sum of negative power of 2, starting from -1. This means that if we use `N`

bits to represent it, we will store in memory the integer part of `0.1069272787267437 * pow(2, N)`

.

In single precision, we use `N = 23`

bits to represent this number. Since the integer part of `0.1069272787267437 * pow(2, 23)`

is 896971, whose binary expansion is `11011010111111001011`

, which is 20-bits long, the number stored in memory is `00011011010111111001011`

.

When using double precision however, the number stored in memory is `0001101101011111100101100000110100101110100000000000`

. Note that the large number of trailing zeroes may indicate that the exact value of `a`

is stored (since we don’t need more precision to represent it), which is the case here.

That said, this explains why `a`

while represented as a single precision float is truncated. The same reasoning works for adding `b`

to `a`

. Since the exponent of the resulting float will be `47`

, it means that the smallest possible precision you can claim in single precision is `2^47 * 2^-23=2^24`

, while the smallest possible precision you can claim in double precision is `2^47 * 2^-52=2^-5`

. Since you are working with integers, this explains why you get an exact result with double precision and an incorrect one with single precision.

CLICK HERE to find out more related problems solutions.