## Binary and Hexadecimal: A Quick Java Primer

Many Java Developers will learn, quickly forget about, and never think about ever again, the ins and outs of data types they use every day. This could be considered a quick refresher for anyone who suddenly discovers they're working with raw data and the message they've been sent is `0x0000FEAD`

and not the regular looking `65197`

they were expecting. Hopefully this all feels incredibly familiar and easy.

Note: We only look at Little Endian here, but it's all true for Big Endian too just organised differently. Click here once you've finished reading this post. Also note that internally Java uses Big Endian.

# Binary, Octal and Hex

Binary, Octal, Decimal and Hex are all ways of representing numbers. We often have to consider these different ways of writing the same numbers because of how computers work with numbers, and the effect this has on our numbers.

## Binary

Binary numbers are numbers that represented by only 2 symbols - which is why it's also called base `2`

- usually `0`

and `1`

(we call these bits). If we're writing a number in binary we will either write them with a prefix of \(0b\) or with a bracketed subscript with the base number like this: \((11)_2\), so this number is not `11`

, but `3`

when converted to decimal. Like in decimal, we count up at each digit until we hit our max, then add a new digit as required. This gives us the following table:

Decimal | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`9` |
`10` |
... | `100` |

Binary | `1` |
`10` |
`11` |
`100` |
`101` |
`110` |
`111` |
`1000` |
`1001` |
`1010` |
... | `1100100` |

That is, each digit in binary represents a power of `2`

, starting with \(2^0\) from the right. So a conversion from binary to decimal can be done by taking the individual digits, multiplying each by their corresponding power of 2 and adding them all together. For example if we wish to convert \((11011010)_2\) to decimal we may calculate \(1 \times 2^7 + 1 \times 2^6 + 0 \times 2^5 + 1 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = (218)_{10}\). We can compare this to how a decimal number, for example 335, is \(2 \times 10^2 + 2 \times 10^1 + 5 \times 10^0\).

This is what computers use fundamentally when doing all their calculations. Your typical binary number on a 32-bit computer is also 32 bits in size, and similarly on a 64-bit machine a 64-bit number is the default. This means that the largest number that can be natively represented by a n-bit machine is n-bits. Although, we can work with larger numbers with some tricks if we need to (such as how BigInteger or BigDecimal work). It is rare to see a 32-bit machine these days because of memory limitations. This memory limitation comes from 32-bit machines only being able to store memory in 2^32 different places, which equates to around 4GB of "addressable" memory. This literally means if we were to try to use any more than this limit, we wouldn't be able to recall the data from memory because we cannot represent a number big enough to find its position in memory.

## Octal

Octal is sometimes used and is represented with `0o`

. Exactly as with binary, except we now use `8`

symbols, so each digit represents a power of `8`

. Our table now is:

Decimal | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`9` |
`10` |
... | `100` |

Binary | `1` |
`10` |
`11` |
`100` |
`101` |
`110` |
`111` |
`1000` |
`1001` |
`1010` |
... | `1100100` |

Octal | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`10` |
`11` |
... | `144` |

## Hexadecimal

Hexadecimal numbers are numbers where each digit represents a power of `16`

. This means we need `16`

symbols, and since we usually count in decimal we only have ten to use, so for the extra six we use A,B,C,D,E and F. Where `A`

is equivalent to `10`

in decimal and `F`

is equivalent to `15`

in decimal. Our prefix for hex is `0x`

. Hexadecimal is often used because if we are talking about locations in memory, or large numbers, we would be using a lot of extra characters to represent that number. We don't use decimal for this because `10`

is not a power of `2`

.

Data stored in bits are grouped into 8 bits, which we call a byte. If we had stored the decimal number `218`

in a computer, it would store the value `0b011011010`

. With hex we can instead write this as `0xDA`

, and you should note that the maximum number represented by 8 bits (`0b11111111`

) is also the maximum number represented by `2`

digits in hex (`0xFF`

), which is `255`

in decimal. When we include `0`

, we have `256`

total values from either of these.

Our table now is:

Decimal | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`9` |
`10` |
... | `100` |

Binary | `1` |
`10` |
`11` |
`100` |
`101` |
`110` |
`111` |
`1000` |
`1001` |
`1010` |
... | `1100100` |

Octal | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`10` |
`11` |
... | `144` |

Hex | `1` |
`2` |
`3` |
`4` |
`5` |
`6` |
`7` |
`8` |
`9` |
`10` |
... | `64` |

# Numeric Java Data Types

For the sake of speeding things along, I'm only considering *primitive* types here, starting with the most familiar.

**int**'s are`32`

bits or`4`

bytes in size. Since the first bit of an integer determines whether it is positive or negative, the maximum value is \(2^{31}-1\), where the`-1`

is the value`0`

. In total there are \(2^{32}\) possible values when we include 0 and all the negative numbers. In Java we can specify a number in a different base, such as binary or hex, by using their prefixes. For example, we can write`int a = 0xFE;`

to give the integer the value`254`

.**long**'s are`64`

bits or`8`

bytes in size. They are exactly like**int**s, but can store much larger values since they use`8`

bytes instead of`4`

.**short**'s are also just like**int**'s, but only use 2 bytes, giving a maximum value of`0xFFFF`

, or`65535`

.**byte**'s are the smallest number types you can have in Java, and are only a single byte as the name suggests. As we described in the previous section, the maximum value for this is`255`

. Since this is raw data, the first bit is not used to determine if the stored number is negative. If we did this we would have a maximum value of`127`

instead.

Lastly, **float**'s and **double**'s can be used to represent floating point numbers (such as `15.23`

). They are 4 and 8 bytes in size respectively. Since these have little to do with integral numbers, see here for how these represent floating point numbers: https://www.doc.ic.ac.uk/~eedwards/compsys/float/

Now when we say that a type has a size of n-bytes, we mean that regardless of what number is represented we use the same amount of memory to store that number. For example, if we are calculating with the number 7 of type **int**, the bits stored in memory are actually `0b00000000000000000000000000000111`

. We could also write this in hex as `0x00000007`

.

# A Hex Example

Suppose we have received a message from a socket that we know is a 4 byte integer, and we have the 4 bytes in a `byte[]`

array. If we were to inspect the array we'd see something along the lines of 4 entries all of the form `0x??`

. We could use `ByteBuffer.wrap(myArray).getInt();`

to retrieve the integer, and normally this is how we'd do this for performance reasons (because ByteBuffer will always try to use memory directly, instead of copying things to different memory locations). However, this is a refresher, so we could also consider...

## Bitwise Operations

We know we have 4 bytes each of the form `0x??`

, for an example we will use `(0x0E, 0x45, ox86, oxEF)`

. When we put all these bytes together we can see that if we were to use `int a = 0x0e4586ef;`

we would have an integer with decimal value `239437551`

. Unfortunately we can't just concatenate the bytes this way. However, using bitwise operations we can get the integer from these 4 bytes. Here are the bitwise operations:

`&`

is a logical AND operation, e.g.`0b10101000 & 0b11111001 = 0b10101110`

`^`

is exclusive OR, e.g.`0b10101000 & 0b11111001 = 0b01010001`

`|`

is inclusive OR, e.g.`0b10101000 & 0b11111001 = 0b11111001`

`<<`

and`>>`

are arithmetic shifts bits to the left and right respectively, e.g.`0b10101000 << 8 = 1010100000000000`

(shifts to the left by 8 places). Since these are arithmetic shifts, they preserve the sign of the first bit.`<<<`

and`>>>`

are logical shifts that do not preserve sign.

So we can convert our byte array as such:

```
int i = (myArray[0]<<24) & 0xff000000 |
(myArray[1]<<16) & 0x00ff0000 |
(myArray[2]<< 8) & 0x0000ff00 |
(myArray[3]<< 0) & 0x000000ff;
```

That is, we shift the byte into the correct position, then do bit wise AND to get the byte in the correct place in a 4 byte integer, then use inclusive OR to put all the bytes together.