61C-P2-number
binary
basics
- A system of storing data using just two digits: 1 and 0.
- Everything in a computer is ultimately stored in binary (high voltage wire = 1, low voltage wire = 0)
- Generally rooted in the mathematical concept of binary (as a base 2 system of representing numbers)
- Since computers tend to “think” in binary, it is ultimately useful to work with values in binary. By convention we prepend any binary value with “0b”
operations
&, |, ~, ^: convert every
<<, >>:
- Left shift: Convert to binary, then move all bits left, appending 0s as needed (Equivalent to multiplying by a power of 2)
- Right shift (logical): Convert to binary, then move all bits right, prepending 0s as needed (Equivalent to dividing by a power of 2)
for example:
![[Pasted image 20240523104246.png]]
signed numbers
![[Pasted image 20240523110950.png]]
Formally:
- Define a “bias”
- To interpret stored binary: Read the data as an unsigned number, then add the bias
- To store a data value: Subtract the bias, then store the resulting number as an unsigned number
float
fixed point representation
![[Pasted image 20240523113405.png]]
but what about other numbers?
- very large number (31,556,926,010 (3.155692610 x 10^10))
- very small number (0.000000000052917710 (5.2917710 x 10^-11))
floating point
IEEE standard 754!
Take scientific notation as an example:
![[Pasted image 20240526155108.png]]
Similarly, the floating point method are as the $A*2^B$
- 1 bit: sign bit
- 8 bits: exponent (B)
- 23 bits: significand(A)
$(-1)^s*(1+significand)*2^{exponent-127}$
sth. special
0
zero have no normalized representation!(all zeros)
large & small numbers
255 is the same as 0? –overflow (more than 3.4* 10^38!) & underflow (less than 1.2* 10^-38!) !
$\pm \infty$
IEEE standard: export 1111 1111 , significand zero for $\pm \infty$
not a number(NaN)
export 1111 1111, significand nonzero.
Another problem: there is a gap between FP numbers and zero!
- smallest normalized number: $2^{-126}$
- smallest number between 2 numbers: $2^{-149}$
Solution: denormalized number( no (implied) leading 1; implicit exponent for all denorms = -126)
You can see [here](IEEE-754 Floating Point Converter (h-schmidt.net)) for IEEE 754 float transformation
other floating point representations
double precision floating point
extend the 16 bits to 32 bits!
sign 1bits; exponent 11bits; significand 20bits!
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
