One little, two little, three little endians
Written 2011-03-13
Tags:MIPS ARM endianness PDP-11
I've done a little work on two different endians. Currently, I've missed out on PDP-endian. Again, the question came up, which one is best, so I'll give it a go at explaining the three and their differences. Endianness, is the pattern of storing a multi-byte word into multiple bytes of memory. In particular, the bytes are often stored most-significant or least significant byte first. You might wonder why anyone would store objects backwards (indeed, little-endian is the only 'language' I know where the 'sentences' are written in ascending order, but the words are written in reverse order). However, there are some distinct advantages to storing words in little-endian order(union alignment and addition/subtraction/multiplication pipelining), as well as there are advantages to storing words in big-endian order( natural treatment of multi-byte objects, easier comparison ). Please keep in mind that with modern processors, some of these reasons end up not meaning nearly as much as they used to.
Union Alignment
The following section goes into why an odd feature of C works on little-endian but not on big-endian. You are warned not to use unions in this way.
union{
uint8_t one;
uint16_t two;
uint32_t four;
}example_union;
In a union, the address of the union is the address of each member. In little-endian, one is located at the same location as the least significant byte of two. And two is located at the least significant bytes of four. This means we can read and write to member one as if casting two or four to uint8_t. However in big-endian, one is located at the most significant byte of two and of four. And two shares the most significant bytes of four. So, we cannot access different members as if they were casts of each other. Since doing so is not portable, C code should not be written this way. Now you know why.
Addition/Multiplication/Subtraction Pipelining
When performing multi-digit arithmetic(except division), you always start with the rightmost(or least-significant) digit. Similarly, digital adders start with the least-significant bit/nibble/byte/word because overflow moves upward toward more significant digits, so when the first byte is loaded in, enough information to begin computation is already available. However, this optimization is rarely used in hardware design today - most memory subsystems deliver memory to the processor in processor-word sized amounts, or larger like cachelines, so the impact of this optimization is rare.
Division Pipelining
When performing long-division, you always start with the most-significant ends of both numerator and denominator. Similarly in digital division, carrys move from most significant digits toward less significant digits. So, having the most significant byte ready first can be leveraged for a small improvement in pipelining. But again, contemporary hardware designs are rarely impacted by this.
Natural Treatment of Object Byte Order
I'm going to call big-endian as natural byte order, because by keeping words in big-endian, things like memcmp() can get a tiny bit more efficient. For example, when comparing large objects with memcmp() on a big-endian system, it becomes possible to aggregate compares to full word sizes. This can be significantly more efficient than comparing per byte. (Potentially, you can implement something similar with little-endian, but it incurs an overhead when there is a difference in the two areas of memory, or a per word overhead)***. Also, most network protocols are written in big-endian. This usually isn't much overhead compared to say, generating the data to ship over the network, but for small applications it can build up.
Comparison and Sign information
By storing the most significant information first in memory, operations requiring sign information or comparison can be implemented more easily(**). Because the sign information is stored earlier in the word, a big-endian compare can complete as soon as there is a difference in the word. This becomes even more important when dealing with objects larger than the native word size.
PDP-endian
I haven't mentioned PDP-endian, because it is such an oddball. Worst of both worlds it seems. As a 16-bit architecture, 16-bit words are stored in little-endian format, but compilers on PDP-11s would store 32-bit words as two consecutive 16-bit little endian words. This makes it extra-confusing. Potentially compilers could be updated to store words in little-endian order, but this architecture is almost entirely dead, and all the other software running on it expects PDP-endian already. Sometimes I put a compiler-assert for this case. Nobody has ever complained.