LT4335 float description

LT4335 float description.
Version of 9 July 2011.
Home.

Treatment of floating-point numbers in the LT4335 is unconventional.

Every float consists of two integers, here termed the mantissa and the exponent. The number of bits in each can be selected separately; more bits in the mantissa means more precision, and more bits in the exponent means more range. The value of the float is the product of these two quantities:

the mantissa
2 raised to the power of the exponent

If either the mantissa or exponent is null, however, the float is null.

As a detailed example, the table below shows in decimal notation each of the floats that can be represented with a 3-bit exponent and 4-bit mantissa. Aside from the nulls, these are the model numbers. Note that many of them are listed more than once.

Table one. mantissa
−7 −6 −5 −4 −3 −2 −1 0 +1 +2 +3 +4 +5 +6 +7 null
exponent −3 −0.875 −0.75 −0.625 −0.5 −0.375 −0.25 −0.125 0.0 +0.125 +0.25 0.375 +0.5 +0.625 +0.75 +0.875 null
−2 −1.75 −1.5 −1.25 −1.0 −0.75 −0.5 −0.25 0.0 +0.25 +0.5 +0.75 +1.0 +1.25 +1.5 +1.75 null
−1 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 +0.5 +1.0 +1.5 +2.0 +2.5 +3.0 +3.5 null
0 −7.0 −6.0 −5.0 −4.0 −3.0 −2.0 −1.0 0.0 +1.0 +2.0 +3.0 +4.0 +5.0 +6.0 +7.0 null
+1 −14.0 −12.0 −10.0 −8.0 −6.0 −4.0 −2.0 0.0 +2.0 +4.0 +6.0 +8.0 +10.0 +12.0 +14.0 null
+2 −28.0 −24.0 −20.0 −16.0 −12.0 −8.0 −4.0 0.0 +4.0 +8.0 +12.0 +1.06 +20.0 +24.0 +28.0 null
+3 −56.0 −48.0 −40.0 −32.0 −24.0 −16.0 −8.0 0.0 +8.0 +16.0 +24.0 +32.0 +40.0 +48.0 +56.0 null
null null null null null null null null null null null null null null null null null

Table one.	mantissa
−7	−6	−5	−4	−3	−2	−1	0	+1	+2	+3	+4	+5	+6	+7	null
exponent	−3	−0.875	−0.75	−0.625	−0.5	−0.375	−0.25	−0.125	0.0	+0.125	+0.25	0.375	+0.5	+0.625	+0.75	+0.875	null
−2	−1.75	−1.5	−1.25	−1.0	−0.75	−0.5	−0.25	0.0	+0.25	+0.5	+0.75	+1.0	+1.25	+1.5	+1.75	null
−1	−3.5	−3.0	−2.5	−2.0	−1.5	−1.0	−0.5	0.0	+0.5	+1.0	+1.5	+2.0	+2.5	+3.0	+3.5	null
0	−7.0	−6.0	−5.0	−4.0	−3.0	−2.0	−1.0	0.0	+1.0	+2.0	+3.0	+4.0	+5.0	+6.0	+7.0	null
+1	−14.0	−12.0	−10.0	−8.0	−6.0	−4.0	−2.0	0.0	+2.0	+4.0	+6.0	+8.0	+10.0	+12.0	+14.0	null
+2	−28.0	−24.0	−20.0	−16.0	−12.0	−8.0	−4.0	0.0	+4.0	+8.0	+12.0	+1.06	+20.0	+24.0	+28.0	null
+3	−56.0	−48.0	−40.0	−32.0	−24.0	−16.0	−8.0	0.0	+8.0	+16.0	+24.0	+32.0	+40.0	+48.0	+56.0	null
null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null

The table above, although large enough to show how floats work, describes numbers of very small storage size. A model number table for floats of a more practical size, such as those with a 17-bit exponent and a 51-bit mantissa, would have millions of entries.

Unlike some floating point systems (such as IEEE 754), this one does not distinguish positive zero versus negative zero, and makes no allowances for infinity. A quietly propagated null float does correspond roughly to that standard's quiet NaN, while throwing an exception resembles what happens with the signaling NaN. Note however that with the LT4335, all nulls have the same meaning, even when rendered in different bit patterns.

Table two extracts the nonnegative numbers of table one, and arranges them in columns according to the difference between one model number and the next. To complete the pattern, a few numbers are repeated, and the non-model number 64.0 is included. The table reveals how the step size between model numbers is roughly proportional to their magnitude, and how there is a change of behavior near zero. Gradual underflow occurs automatically, and there is not the loss of precision associated with the denormal numbers of most floating point systems. The nonpositive numbers, naturally enough, work the same way.

Table two.
Δ = 0.125 Δ = 0.25 Δ = 0.5 Δ = 1 Δ = 2 Δ = 4 Δ = 8
1.000 2.00 4.0 8.0 16.0 32.0 [64.0]
0.875 1.75 3.5 7.0 14.0 28.0 56.0
0.750 1.50 3.0 6.0 12.0 24.0 48.0
0.625 1.25 2.5 5.0 10.0 20.0 40.0
0.500 1.00 2.0 4.0 8.0 16.0 32.0
0.375
0.250
0.125
0.000

Table two.
Δ = 0.125	Δ = 0.25	Δ = 0.5	Δ = 1	Δ = 2	Δ = 4	Δ = 8
1.000	2.00	4.0	8.0	16.0	32.0	[64.0]
0.875	1.75	3.5	7.0	14.0	28.0	56.0
0.750	1.50	3.0	6.0	12.0	24.0	48.0
0.625	1.25	2.5	5.0	10.0	20.0	40.0
0.500	1.00	2.0	4.0	8.0	16.0	32.0
0.375
0.250
0.125
0.000

Define some terms:

Table three.
item in general in table two
number of bits in exponent exp_bit 3
number of bits in mantissa man_bit 4
maximum exponent exp_max = 2 ** (exp_bit − 1) − 1 3
maximum mantissa man_max = 2 ** (man_bit − 1) − 1 7
minimum positive number min_pos = 0.5 ** exp_max 0.125
maximum positive number max_pos = 2 ** exp_max * man_max 56
ultra positive number ult_pos = 2 ** exp_max * (man_max + 1) 64
positive boundary pos_bound = (max_pos + ult_pos) ÷ 2
maximum negative number max_neg = −min_pos
minimum negative number min_neg = −max_pos
ultra negative number ult_neg = −ult_pos
negative boundary neg_bound = −pos_bound

Table three.
item	in general	in table two
number of bits in exponent	exp_bit	3
number of bits in mantissa	man_bit	4
maximum exponent	exp_max = 2 ** (exp_bit − 1) − 1	3
maximum mantissa	man_max = 2 ** (man_bit − 1) − 1	7
minimum positive number	min_pos = 0.5 ** exp_max	0.125
maximum positive number	max_pos = 2 ** exp_max * man_max	56
ultra positive number	ult_pos = 2 ** exp_max * (man_max + 1)	64
positive boundary	pos_bound = (max_pos + ult_pos) ÷ 2
maximum negative number	max_neg = −min_pos
minimum negative number	min_neg = −max_pos
ultra negative number	ult_neg = −ult_pos
negative boundary	neg_bound = −pos_bound

Although the ultra numbers aid in explaining rounding, they are not model numbers; that is why 64.0 is shown in square brackets in the table above.

Rounding is handled as follows in four special cases. If the exact answer is…

…pos_bound or greater, it becomes null on account of overflow.
…greater than max_pos but less than pos_bound, it rounds to max_pos.
…neg_bound or less, it becomes null on account of overflow.
…less than min_neg but greater than neg_bound, it rounds to min_neg.

Besides those is the usual case, where the exact answer is somewhere between min_neg and max_pos:

If the exact answer equals a model number, that is the result.
If the exact answer is not halfway between two model numbers, it is rounded to the nearest.
If the exact answer is halfway between two model numbers, rounding is up or down, according to whichever will cause the least significant of the retained bits in the mantissa to be zero.

This scheme is termed round-nearest-half-even. Table four gives an example where seven mantissa bits (written least significant first) are being rounded to four.

Table four.
example with an exponent of typical magnitude
000_0000 → 0000 exact
100_0000 → 0000 round down to nearest
010_0000 → 0000
110_0000 → 0000
001_0000 → 0000 round down to even
101_0000 → 1000 round up to nearest
011_0000 → 1000
111_0000 → 1000
000_1000 → 1000 exact
100_1000 → 1000 round down to nearest
010_1000 → 1000
110_1000 → 1000
001_1000 → 0100 round up to even
101_1000 → 0100 round up to nearest
011_1000 → 0100
111_1000 → 0100
000_0100 → 0100 exact

Table four.
example with an exponent of typical magnitude
000_0000 → 0000	exact
100_0000 → 0000	round down to nearest
010_0000 → 0000
110_0000 → 0000
001_0000 → 0000	round down to even
101_0000 → 1000	round up to nearest
011_0000 → 1000
111_0000 → 1000
000_1000 → 1000	exact
100_1000 → 1000	round down to nearest
010_1000 → 1000
110_1000 → 1000
001_1000 → 0100	round up to even
101_1000 → 0100	round up to nearest
011_1000 → 0100
111_1000 → 0100
000_0100 → 0100	exact

Whenever rounding reduces the quantity of mantissa bits, the value of exponent stands to be increased by how many mantissa bits are eliminated. However, if the exponent is already so close to exp_max that the full increase cannot take place (the very large exponent situation), then the mantissa will be padded with zeroes on the left. Table five is a variant of table four, adjusted for the case where the exponent before rounding equals exp_max − 1. Again, seven digits are rounded to four, but now two padding digits are required because the exponent can increase by only one. Important is that in table five, the amount of information lost to rounding is the same as in table four.

Table five.
example with a very large exponent
000_0000 → 00_0000 exact
100_0000 → 00_0000 round down to nearest
010_0000 → 00_0000
110_0000 → 00_0000
001_0000 → 00_0000 round down to even
101_0000 → 00_1000 round up to nearest
011_0000 → 00_1000
111_0000 → 00_1000
000_1000 → 00_1000 exact
100_1000 → 00_1000 round down to nearest
010_1000 → 00_1000
110_1000 → 00_1000
001_1000 → 00_0100 round up to even
101_1000 → 00_0100 round up to nearest
011_1000 → 00_0100
111_1000 → 00_0100
000_0100 → 00_0100 exact

Table five.
example with a very large exponent
000_0000 → 00_0000	exact
100_0000 → 00_0000	round down to nearest
010_0000 → 00_0000
110_0000 → 00_0000
001_0000 → 00_0000	round down to even
101_0000 → 00_1000	round up to nearest
011_0000 → 00_1000
111_0000 → 00_1000
000_1000 → 00_1000	exact
100_1000 → 00_1000	round down to nearest
010_1000 → 00_1000
110_1000 → 00_1000
001_1000 → 00_0100	round up to even
101_1000 → 00_0100	round up to nearest
011_1000 → 00_0100
111_1000 → 00_0100
000_0100 → 00_0100	exact

Miscellaneous points:

If a float containing a correctly rounded mantissa is subsequently normalized, its value will not change, but the mantissa's least significant digit will probably no longer be zero.
It is legal to round to zero digits of precision, but a zero-digit mantissa gives a null.
A rounding request to increase the quantity of mantissa bits will not change the value of the float.
If x rounds to y, then −x rounds to −y.