LT4335 float description.
Version of 9 July 2011. Home. |
Treatment of floating-point numbers in the LT4335 is unconventional.
Every float consists of two integers, here termed the mantissa and the exponent. The number of bits in each can be selected separately; more bits in the mantissa means more precision, and more bits in the exponent means more range. The value of the float is the product of these two quantities:
As a detailed example, the table below shows in decimal notation each of the floats that can be represented with a 3-bit exponent and 4-bit mantissa. Aside from the nulls, these are the model numbers. Note that many of them are listed more than once.
Table one. | mantissa | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
−7 | −6 | −5 | −4 | −3 | −2 | −1 | 0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | null | ||
exponent | −3 | −0.875 | −0.75 | −0.625 | −0.5 | −0.375 | −0.25 | −0.125 | 0.0 | +0.125 | +0.25 | 0.375 | +0.5 | +0.625 | +0.75 | +0.875 | null |
−2 | −1.75 | −1.5 | −1.25 | −1.0 | −0.75 | −0.5 | −0.25 | 0.0 | +0.25 | +0.5 | +0.75 | +1.0 | +1.25 | +1.5 | +1.75 | null | |
−1 | −3.5 | −3.0 | −2.5 | −2.0 | −1.5 | −1.0 | −0.5 | 0.0 | +0.5 | +1.0 | +1.5 | +2.0 | +2.5 | +3.0 | +3.5 | null | |
0 | −7.0 | −6.0 | −5.0 | −4.0 | −3.0 | −2.0 | −1.0 | 0.0 | +1.0 | +2.0 | +3.0 | +4.0 | +5.0 | +6.0 | +7.0 | null | |
+1 | −14.0 | −12.0 | −10.0 | −8.0 | −6.0 | −4.0 | −2.0 | 0.0 | +2.0 | +4.0 | +6.0 | +8.0 | +10.0 | +12.0 | +14.0 | null | |
+2 | −28.0 | −24.0 | −20.0 | −16.0 | −12.0 | −8.0 | −4.0 | 0.0 | +4.0 | +8.0 | +12.0 | +1.06 | +20.0 | +24.0 | +28.0 | null | |
+3 | −56.0 | −48.0 | −40.0 | −32.0 | −24.0 | −16.0 | −8.0 | 0.0 | +8.0 | +16.0 | +24.0 | +32.0 | +40.0 | +48.0 | +56.0 | null | |
null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
The table above, although large enough to show how floats work, describes numbers of very small storage size. A model number table for floats of a more practical size, such as those with a 17-bit exponent and a 51-bit mantissa, would have millions of entries.
Unlike some floating point systems (such as IEEE 754), this one does not distinguish positive zero versus negative zero, and makes no allowances for infinity. A quietly propagated null float does correspond roughly to that standard's quiet NaN, while throwing an exception resembles what happens with the signaling NaN. Note however that with the LT4335, all nulls have the same meaning, even when rendered in different bit patterns.
Table two extracts the nonnegative numbers of table one, and arranges them in columns according to the difference between one model number and the next. To complete the pattern, a few numbers are repeated, and the non-model number 64.0 is included. The table reveals how the step size between model numbers is roughly proportional to their magnitude, and how there is a change of behavior near zero. Gradual underflow occurs automatically, and there is not the loss of precision associated with the denormal numbers of most floating point systems. The nonpositive numbers, naturally enough, work the same way.
Table two. | ||||||
---|---|---|---|---|---|---|
Δ = 0.125 | Δ = 0.25 | Δ = 0.5 | Δ = 1 | Δ = 2 | Δ = 4 | Δ = 8 |
1.000 | 2.00 | 4.0 | 8.0 | 16.0 | 32.0 | [64.0] |
0.875 | 1.75 | 3.5 | 7.0 | 14.0 | 28.0 | 56.0 |
0.750 | 1.50 | 3.0 | 6.0 | 12.0 | 24.0 | 48.0 |
0.625 | 1.25 | 2.5 | 5.0 | 10.0 | 20.0 | 40.0 |
0.500 | 1.00 | 2.0 | 4.0 | 8.0 | 16.0 | 32.0 |
0.375 | ||||||
0.250 | ||||||
0.125 | ||||||
0.000 |
Define some terms:
Table three. | ||
---|---|---|
item | in general | in table two |
number of bits in exponent | exp_bit | 3 |
number of bits in mantissa | man_bit | 4 |
maximum exponent | exp_max = 2 ** (exp_bit − 1) − 1 | 3 |
maximum mantissa | man_max = 2 ** (man_bit − 1) − 1 | 7 |
minimum positive number | min_pos = 0.5 ** exp_max | 0.125 |
maximum positive number | max_pos = 2 ** exp_max * man_max | 56 |
ultra positive number | ult_pos = 2 ** exp_max * (man_max + 1) | 64 |
positive boundary | pos_bound = (max_pos + ult_pos) ÷ 2 | |
maximum negative number | max_neg = −min_pos | |
minimum negative number | min_neg = −max_pos | |
ultra negative number | ult_neg = −ult_pos | |
negative boundary | neg_bound = −pos_bound |
Although the ultra numbers aid in explaining rounding, they are not model numbers; that is why 64.0 is shown in square brackets in the table above.
Rounding is handled as follows in four special cases. If the exact answer is…
Besides those is the usual case, where the exact answer is somewhere between min_neg and max_pos:
This scheme is termed round-nearest-half-even. Table four gives an example where seven mantissa bits (written least significant first) are being rounded to four.
Table four. | |
---|---|
example with an exponent of typical magnitude | |
000_0000 → 0000 | exact |
100_0000 → 0000 | round down to nearest |
010_0000 → 0000 | |
110_0000 → 0000 | |
001_0000 → 0000 | round down to even |
101_0000 → 1000 | round up to nearest |
011_0000 → 1000 | |
111_0000 → 1000 | |
000_1000 → 1000 | exact |
100_1000 → 1000 | round down to nearest |
010_1000 → 1000 | |
110_1000 → 1000 | |
001_1000 → 0100 | round up to even |
101_1000 → 0100 | round up to nearest |
011_1000 → 0100 | |
111_1000 → 0100 | |
000_0100 → 0100 | exact |
Whenever rounding reduces the quantity of mantissa bits, the value of exponent stands to be increased by how many mantissa bits are eliminated. However, if the exponent is already so close to exp_max that the full increase cannot take place (the very large exponent situation), then the mantissa will be padded with zeroes on the left. Table five is a variant of table four, adjusted for the case where the exponent before rounding equals exp_max − 1. Again, seven digits are rounded to four, but now two padding digits are required because the exponent can increase by only one. Important is that in table five, the amount of information lost to rounding is the same as in table four.
Table five. | |
---|---|
example with a very large exponent | |
000_0000 → 00_0000 | exact |
100_0000 → 00_0000 | round down to nearest |
010_0000 → 00_0000 | |
110_0000 → 00_0000 | |
001_0000 → 00_0000 | round down to even |
101_0000 → 00_1000 | round up to nearest |
011_0000 → 00_1000 | |
111_0000 → 00_1000 | |
000_1000 → 00_1000 | exact |
100_1000 → 00_1000 | round down to nearest |
010_1000 → 00_1000 | |
110_1000 → 00_1000 | |
001_1000 → 00_0100 | round up to even |
101_1000 → 00_0100 | round up to nearest |
011_1000 → 00_0100 | |
111_1000 → 00_0100 | |
000_0100 → 00_0100 | exact |
Miscellaneous points: