Binary and Hexadecimal: A Quick Java Refresher

7pm on Wednesday 7th February, 2018

Many Java Developers will learn, quickly forget about, and never think about ever again, the ins and outs of data types they use every day. This could be considered a quick refresher for anyone who suddenly discovers they're working with raw data and the message they've been sent is 0x0000FEAD and not the regular looking 65197 they were expecting. Hopefully this all feels incredibly familiar and easy.

Note: We only look at Little Endian here, but it's all true for Big Endian too just organised differently. Click here once you've finished reading this post. Also note that internally Java uses Big Endian.

Binary, Octal and Hex

Binary, Octal, Decimal and Hex are all ways of representing numbers. We often have to consider these different ways of writing the same numbers because of how computers work with numbers, and the effect this has on our numbers.

Binary

Binary numbers are numbers that represented by only 2 symbols - which is why it's also called base 2 - usually 0 and 1 (we call these bits). If we're writing a number in binary we will either write them with a prefix of \(0b\) or with a bracketed subscript with the base number like this: \((11)_2\), so this number is not 11, but 3 when converted to decimal. Like in decimal, we count up at each digit until we hit our max, then add a new digit as required. This gives us the following table:

Decimal 1 2 3 4 5 6 7 8 9 10 ... 100
Binary 1 10 11 100 101 110 111 1000 1001 1010 ... 1100100

That is, each digit in binary represents a power of 2, starting with \(2^0\) from the right. So a conversion from binary to decimal can be done by taking the individual digits, multiplying each by their corresponding power of 2 and adding them all together. For example if we wish to convert \((11011010)_2\) to decimal we may calculate \(1 \times 2^7 + 1 \times 2^6 + 0 \times 2^5 + 1 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = (218)_{10}\). We can compare this to how a decimal number, for example 335, is \(2 \times 10^2 + 2 \times 10^1 + 5 \times 10^0\).

This is what computers use fundamentally when doing all their calculations. Your typical binary number on a 32-bit computer is also 32 bits in size, and similarly on a 64-bit machine a 64-bit number is the default. This means that the largest number that can be natively represented by a n-bit machine is n-bits. Although, we can work with larger numbers with some tricks if we need to (such as how BigInteger or BigDecimal work). It is rare to see a 32-bit machine these days because of memory limitations. This memory limitation comes from 32-bit machines only being able to store memory in 2^32 different places, which equates to around 4GB of "addressable" memory. This literally means if we were to try to use any more than this limit, we wouldn't be able to recall the data from memory because we cannot represent a number big enough to find its position in memory.

Octal

Octal is sometimes used and is represented with 0o. Exactly as with binary, except we now use 8 symbols, so each digit represents a power of 8. Our table now is:

Decimal 1 2 3 4 5 6 7 8 9 10 ... 100
Binary 1 10 11 100 101 110 111 1000 1001 1010 ... 1100100
Octal 1 2 3 4 5 6 7 8 10 11 ... 144

Hexadecimal

Hexadecimal numbers are numbers where each digit represents a power of 16. This means we need 16 symbols, and since we usually count in decimal we only have ten to use, so for the extra six we use A,B,C,D,E and F. Where A is equivalent to 10 in decimal and F is equivalent to 15 in decimal. Our prefix for hex is 0x. Hexadecimal is often used because if we are talking about locations in memory, or large numbers, we would be using a lot of extra characters to represent that number. We don't use decimal for this because 10 is not a power of 2.

Data stored in bits are grouped into 8 bits, which we call a byte. If we had stored the decimal number 218 in a computer, it would store the value 0b011011010. With hex we can instead write this as 0xDA, and you should note that the maximum number represented by 8 bits (0b11111111) is also the maximum number represented by 2 digits in hex (0xFF), which is 255 in decimal. When we include 0, we have 256 total values from either of these.

Our table now is:

Decimal 1 2 3 4 5 6 7 8 9 10 ... 100
Binary 1 10 11 100 101 110 111 1000 1001 1010 ... 1100100
Octal 1 2 3 4 5 6 7 8 10 11 ... 144
Hex 1 2 3 4 5 6 7 8 9 10 ... 64

Numeric Java Data Types

For the sake of speeding things along, I'm only considering primitive types here, starting with the most familiar.

  1. int's are 32 bits or 4 bytes in size. Since the first bit of an integer determines whether it is positive or negative, the maximum value is \(2^{31}-1\), where the -1 is the value 0. In total there are \(2^{32}\) possible values when we include 0 and all the negative numbers. In Java we can specify a number in a different base, such as binary or hex, by using their prefixes. For example, we can write int a = 0xFE; to give the integer the value 254. All integral data types in Java are signed.
  2. long's are 64 bits or 8 bytes in size. They are exactly like ints, but can store much larger values since they use 8 bytes instead of 4.
  3. short's are also just like int's, but only use 2 bytes, giving a maximum value of \(2^{15}-1\).
  4. byte's are the smallest number types you can have in Java, and are only a single byte as the name suggests. They have a maximum value of 127 and a maximum negative value of -128.

Lastly, float's and double's can be used to represent floating point numbers (such as 15.23). They are 4 and 8 bytes in size respectively. Since these have little to do with integral numbers, see here for how these represent floating point numbers: https://www.doc.ic.ac.uk/~eedwards/compsys/float/

Now when we say that a type has a size of n-bytes, we mean that regardless of what number is represented we use the same amount of memory to store that number. For example, if we are calculating with the number 7 of type int, the bits stored in memory are actually 0b00000000000000000000000000000111. We could also write this in hex as 0x00000007.

A Hex Example

Suppose we have received a message from a socket that we know is a 4 byte integer, and we have the 4 bytes in a byte[] array. If we were to inspect the array we'd see something along the lines of 4 entries all of the form 0x??. We could use ByteBuffer.wrap(myArray).getInt(); to retrieve the integer, and normally this is how we'd do this for performance reasons (because ByteBuffer will always try to use memory directly, instead of copying things to different memory locations). However, this is a refresher, so we could also consider...

Bitwise Operations

We know we have 4 bytes each of the form 0x??, for an example we will use (0x0E, 0x45, ox86, oxEF). When we put all these bytes together we can see that if we were to use int a = 0x0e4586ef; we would have an integer with decimal value 239437551. Unfortunately we can't just concatenate the bytes this way. However, using bitwise operations we can get the integer from these 4 bytes. Here are the bitwise operations:

  1. & is a logical AND operation, e.g. 0b10101000 & 0b11111001 = 0b10101110
  2. ^ is exclusive OR, e.g. 0b10101000 & 0b11111001 = 0b01010001
  3. | is inclusive OR, e.g. 0b10101000 & 0b11111001 = 0b11111001
  4. << and >> are arithmetic shifts bits to the left and right respectively, e.g. 0b10101000 << 8 = 1010100000000000 (shifts to the left by 8 places). Since these are arithmetic shifts, they preserve the sign of the first bit.

So we can convert our byte array as such:

int i = (myArray[0]<<24) & 0xff000000 | 
        (myArray[1]<<16) & 0x00ff0000 | 
        (myArray[2]<< 8) & 0x0000ff00 |
        (myArray[3]<< 0) & 0x000000ff;

That is, we shift the byte into the correct position, then do bit wise AND to get the byte in the correct place in a 4 byte integer, then use inclusive OR to put all the bytes together.

Permalink Java, Programming