Lets Talk About IC Interconnect Buses
Written 2017-07-23
Tags:Bus AMBA
Simple busses
In the beginning, CPUs had data buses for connecting to RAM and ROM. Often as simple as address and data pins, some simple external logic to decode an address and pick a chip was all that was needed to integrate a system. This architecture is still used in simple CPUs still produced today, like 8051 and AVR.Multi-master simple busses
Pretty early on, someone needed to hook another bus master, often a DMA controller, to a simple CPU. And thus bus arbitration was born, attempting to answer the question of - what happens when two masters want to talk to the same bus slave? One simple approach is to stall the CPU clock while the DMA accesses the bus. TI MSP430 DMAs work this way, as did Intel 8086/8088 CPUs where the Intel 8237 DMA controller would set the HOLD pin on the CPU to stall it. Once a CPU gets pipelining and caches, it sometimes makes sense to stall the DMA while the CPU uses the bus. Slightly more complicated is letting the CPU program a priority so that the CPU can be stalled by DMA0, and DMA1 only runs when DMA0 and the CPU are idle.Multi-master, multi-slave, crossbar switch
Letting a CPU fetch instructions from ROM while a DMA copies into RAM is a lot more performant if both can be done at the same time. Borrowing a hint from 1915 telephone equipment, the crossbar switch connects a number of bus masters with a number of bus slaves through a number of connections.Distributed crossbar switch
As chips gained more peripherals, connecting every bus master(CPU, a few DMA controllers, a GPU, maybe a few other things), to every bus slave (SPI, I2C, FlexBus, GPIO, USB, ...) gets a little bit excessive and ends up being a bit of a mess in the silicon as it requires routing all the internal buses correspodning to the masters and slaves to a single location. Additionally, a big crossbar gives a large fan-out, which is where one output signal has to drive a large number of, albeit unused, input signals.
This led to systems with a fast crossbar for the CPU, DMA, RAM, and ROM, and another smaller, slower switch for things like I2C, SPI, GPIO, that would then all be routed to one or two ports on the big crossbar. This architecture is common on ARM platforms, with a fast AHB and a slower APB for peripherals.
Things get a little more interesting with multicore microcontrollers. It's certainly possible to connect two microcontroller cores to the same main crossbar, but another approach is to give each core its own crossbar, and then routing a slave port for each crossbar to a master for the other crossbar, giving each core some of its own RAM, and tying the slow peripheral bus into both crossbars. Both of these approaches work, have their own upsides and downsides, and are an evolutionary step to...