Binary and Hexadecimal: A Quick Java Refresher
Many Java Developers will learn, quickly forget about, and never think about ever again, the ins and outs of data types they use every day. This could be considered a quick refresher for anyone who suddenly discovers they're working with raw data and the message they've been sent is 0x0000FEAD
and not the regular looking 65197
they were expecting. Hopefully this all feels incredibly familiar and easy.
Note: We only look at Little Endian here, but it's all true for Big Endian too just organised differently. Click here once you've finished reading this post. Also note that internally Java uses Big Endian.
Binary, Octal and Hex
Binary, Octal, Decimal and Hex are all ways of representing numbers. We often have to consider these different ways of writing the same numbers because of how computers work with numbers, and the effect this has on our numbers.
Binary
Binary numbers are numbers that represented by only 2 symbols - which is why it's also called base 2
- usually 0
and 1
(we call these bits). If we're writing a number in binary we will either write them with a prefix of \(0b\) or with a bracketed subscript with the base number like this: \((11)_2\), so this number is not 11
, but 3
when converted to decimal. Like in decimal, we count up at each digit until we hit our max, then add a new digit as required. This gives us the following table:
Decimal | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
... | 100 |
Binary | 1 |
10 |
11 |
100 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
... | 1100100 |
That is, each digit in binary represents a power of 2
, starting with \(2^0\) from the right. So a conversion from binary to decimal can be done by taking the individual digits, multiplying each by their corresponding power of 2 and adding them all together. For example if we wish to convert \((11011010)_2\) to decimal we may calculate \(1 \times 2^7 + 1 \times 2^6 + 0 \times 2^5 + 1 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = (218)_{10}\). We can compare this to how a decimal number, for example 335, is \(2 \times 10^2 + 2 \times 10^1 + 5 \times 10^0\).
This is what computers use fundamentally when doing all their calculations. Your typical binary number on a 32-bit computer is also 32 bits in size, and similarly on a 64-bit machine a 64-bit number is the default. This means that the largest number that can be natively represented by a n-bit machine is n-bits. Although, we can work with larger numbers with some tricks if we need to (such as how BigInteger or BigDecimal work). It is rare to see a 32-bit machine these days because of memory limitations. This memory limitation comes from 32-bit machines only being able to store memory in 2^32 different places, which equates to around 4GB of "addressable" memory. This literally means if we were to try to use any more than this limit, we wouldn't be able to recall the data from memory because we cannot represent a number big enough to find its position in memory.
Octal
Octal is sometimes used and is represented with 0o
. Exactly as with binary, except we now use 8
symbols, so each digit represents a power of 8
. Our table now is:
Decimal | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
... | 100 |
Binary | 1 |
10 |
11 |
100 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
... | 1100100 |
Octal | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
10 |
11 |
... | 144 |
Hexadecimal
Hexadecimal numbers are numbers where each digit represents a power of 16
. This means we need 16
symbols, and since we usually count in decimal we only have ten to use, so for the extra six we use A,B,C,D,E and F. Where A
is equivalent to 10
in decimal and F
is equivalent to 15
in decimal. Our prefix for hex is 0x
. Hexadecimal is often used because if we are talking about locations in memory, or large numbers, we would be using a lot of extra characters to represent that number. We don't use decimal for this because 10
is not a power of 2
.
Data stored in bits are grouped into 8 bits, which we call a byte. If we had stored the decimal number 218
in a computer, it would store the value 0b011011010
. With hex we can instead write this as 0xDA
, and you should note that the maximum number represented by 8 bits (0b11111111
) is also the maximum number represented by 2
digits in hex (0xFF
), which is 255
in decimal. When we include 0
, we have 256
total values from either of these.
Our table now is:
Decimal | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
... | 100 |
Binary | 1 |
10 |
11 |
100 |
101 |
110 |
111 |
1000 |
1001 |
1010 |
... | 1100100 |
Octal | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
10 |
11 |
... | 144 |
Hex | 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
... | 64 |
Numeric Java Data Types
For the sake of speeding things along, I'm only considering primitive types here, starting with the most familiar.
- int's are
32
bits or4
bytes in size. Since the first bit of an integer determines whether it is positive or negative, the maximum value is \(2^{31}-1\), where the-1
is the value0
. In total there are \(2^{32}\) possible values when we include 0 and all the negative numbers. In Java we can specify a number in a different base, such as binary or hex, by using their prefixes. For example, we can writeint a = 0xFE;
to give the integer the value254
. All integral data types in Java are signed. - long's are
64
bits or8
bytes in size. They are exactly like ints, but can store much larger values since they use8
bytes instead of4
. - short's are also just like int's, but only use 2 bytes, giving a maximum value of \(2^{15}-1\).
- byte's are the smallest number types you can have in Java, and are only a single byte as the name suggests. They have a maximum value of
127
and a maximum negative value of-128
.
Lastly, float's and double's can be used to represent floating point numbers (such as 15.23
). They are 4 and 8 bytes in size respectively. Since these have little to do with integral numbers, see here for how these represent floating point numbers: https://www.doc.ic.ac.uk/~eedwards/compsys/float/
Now when we say that a type has a size of n-bytes, we mean that regardless of what number is represented we use the same amount of memory to store that number. For example, if we are calculating with the number 7 of type int, the bits stored in memory are actually 0b00000000000000000000000000000111
. We could also write this in hex as 0x00000007
.
A Hex Example
Suppose we have received a message from a socket that we know is a 4 byte integer, and we have the 4 bytes in a byte[]
array. If we were to inspect the array we'd see something along the lines of 4 entries all of the form 0x??
. We could use ByteBuffer.wrap(myArray).getInt();
to retrieve the integer, and normally this is how we'd do this for performance reasons (because ByteBuffer
will always try to use memory directly, instead of copying things to different memory locations). However, this is a refresher, so we could also consider...
Bitwise Operations
We know we have 4 bytes each of the form 0x??
, for an example we will use (0x0E, 0x45, ox86, oxEF)
. When we put all these bytes together we can see that if we were to use int a = 0x0e4586ef;
we would have an integer with decimal value 239437551
. Unfortunately we can't just concatenate the bytes this way. However, using bitwise operations we can get the integer from these 4 bytes. Here are the bitwise operations:
&
is a logical AND operation, e.g.0b10101000 & 0b11111001 = 0b10101110
^
is exclusive OR, e.g.0b10101000 & 0b11111001 = 0b01010001
|
is inclusive OR, e.g.0b10101000 & 0b11111001 = 0b11111001
<<
and>>
are arithmetic shifts bits to the left and right respectively, e.g.0b10101000 << 8 = 1010100000000000
(shifts to the left by 8 places). Since these are arithmetic shifts, they preserve the sign of the first bit.
So we can convert our byte array as such:
int i = (myArray[0]<<24) & 0xff000000 |
(myArray[1]<<16) & 0x00ff0000 |
(myArray[2]<< 8) & 0x0000ff00 |
(myArray[3]<< 0) & 0x000000ff;
That is, we shift the byte into the correct position, then do bit wise AND to get the byte in the correct place in a 4 byte integer, then use inclusive OR to put all the bytes together.