**FLOAT**

**Floating-point
arithmetics**

Library functions of arithmetic with floating-point simulate math coprocessor with single-precision calculations. For a great range of code there are only most basic mathematical operations. Missing trigonometric functions, exponents, logarithms. The other reason for missing functions is the fact, that the internal single-precision calculations would cause too much error result. If a mathematical coprocessor is present, its support will be used for calculations. It can cause a small deviation in the last significant places, compared to emulated calculation.

Floating-point numbers in single-precision conform to the IEEE 754-1985 format. Operand size is 4 bytes, of which:

- bit 31: sign bit, 1=negative number,
0=positive number

- bit 23..30: exponent (8 bits) biased by 127 (i.e. zero
exponent, e.g. "1" number, has biased value 127)

- bit 0..22: mantissa (23 bits) without most significant bit
"1"

Exponent is in range -126 .. +127 (biased value 1 .. 254). Exponent -127 (biased value 0) represents zero. Unlike physical coprocessor, here does not calculate with subnormal values (i.e. with exponent -127 and non-zero mantissa). Exponent +128 (biased value 255) represents NaN infinity (result overflow). Negative infinity is not used.

Numbers are in range 1.1754944e-38 (=00800000h) to 1.7014118e+38 (=7F000000h), positive or negative.

**FloatZero** - floating-point
zero constant

OUTPUT:

DX:AX = 0.0

**FloatOne** - floating-point
one constant

OUTPUT:

DX:AX = 1.0

**FloatInf** - floating-point
infinity constant

OUTPUT:

DX:AX = 1.#INF000

**FloatPi** - floating-point
PI constant

OUTPUT:

DX:AX = PI constant (3.14159265)

**FloatNeg** - floating-point
negate number

INPUT:

DX:AX = number

OUTPUT:

DX:AX = result

NOTES: Only highest byte of number (DH) is needed.

**FloatAbs** - floating-point
absolute number

INPUT:

DX:AX = number

OUTPUT:

DX:AX = result

NOTES: Only highest byte of number (DH) is needed.

**FloatCmp** - floating-point
comparison

INPUT:

DX:AX = first operand

CX:BX = second operand

OUTPUT:

AL = 1 if first > second, 0 if first = second, -1 if first
< second

SF, ZF = as for "signed CMP first,second", use JL, JG,
JLE,...

**FloatCmpZ** - floating-point
comparison with zero

INPUT:

DX:AX = operand

OUTPUT:

AL = 1 if operand > 0, 0 if operand = 0, -1 if operand < 0

SF, ZF = as for "signed CMP operand,0", use JL, JG,
JLE,...

**FloatFact** - floating-point
factorial

INPUT:

AL = integer number 0..34

OUTPUT:

DX:AX = floating-point number n!

**FloatInvFact** -
floating-point invert factorial

INPUT:

AL = integer number 0..34

OUTPUT:

DX:AX = floating-point number 1/n!

**FloatFromWord** - import
floating-point from unsigned word

INPUT:

AX = integer unsigned number

OUTPUT:

DX:AX = floating-point number

**FloatFromDWord** - import
floating-point from unsigned dword

INPUT:

DX:AX = integer unsigned number

OUTPUT:

DX:AX = floating-point number

**FloatFromSWord** - import
floating-point from signed word

INPUT:

AX = integer signed number

OUTPUT:

DX:AX = floating-point number

**FloatFromSDWord** - import
floating-point from signed dword

INPUT:

DX:AX = integer signed number

OUTPUT:

DX:AX = floating-point number

**FloatToWord** - export
floating-point to unsigned word

INPUT:

DX:AX = floating-point number

OUTPUT:

AX = integer unsigned number

**FloatToDWord** - export
floating-point to unsigned dword

INPUT:

DX:AX = floating-point number

OUTPUT:

DX:AX = integer unsigned number

**FloatToSWord** - export
floating-point to signed word

INPUT:

DX:AX = floating-point number

OUTPUT:

AX = integer signed number

**FloatToSDWord** - export
floating-point to signed dword

INPUT:

DX:AX = floating-point number

OUTPUT:

DX:AX = integer signed number

**FloatAdd** - floating-point
addition

INPUT:

DX:AX = first operand

CX:BX = second operand

OUTPUT:

DX:AX = result

**FloatSub** - floating-point
subtraction

INPUT:

DX:AX = first operand

CX:BX = second operand

OUTPUT:

DX:AX = result (= first - second)

**FloatMul** - floating-point
multiplication

INPUT:

DX:AX = first operand

CX:BX = second operand

OUTPUT:

DX:AX = result

**FloatDiv** - floating-point
division

INPUT:

DX:AX = first operand (dividend)

CX:BX = second operand (divisor)

OUTPUT:

DX:AX = result (quotient)

**FloatInv** - floating-point
inverse (reciprocal) value

INPUT:

DX:AX = operand

OUTPUT:

DX:AX = result (1/operand)

Constants:

FLOAT_INF: (=7F800000h) floating-point
infinity value

FLOAT_ZERO: (=0) floating-point zero value

FLOAT_ONE: (=3F800000h) floating-point one value

FACT_MAX: (=34) max. valid factorial index