Reverse engineering the ARM1, ancestor of the iPhone’s processor

The demonstration program

When you run the simulator, it executes a short hardcoded program that performs shifts of increasing amounts. You don’t need to understand the code, but if you’re curious it is:

0000  E1A0100F mov     r1, pc        @ Some setup
0004  E3A0200C mov     r2, #12
0008  E1B0F002 movs    pc, r2
000C  E1A00000 nop
0010  E1A00000 nop
0014  E3A02001 mov     r2, #1        @ Load register r2 with 1
0018  E3A0100F mov     r1, #15       @ Load r1 with value to shift
001C  E59F300C ldr     r3, pointer
    loop:
0020  E1A00271 ror     r0, r1, r2    @ Rotate r1 by r2 bits, store in r0
0024  E2822001 add     r2, r2, #1    @ Add 1 to r2
0028  E4830004 str     r0, [r3], #4  @ Write result to memory
002C  EAFFFFFB b       loop          @ Branch to loop

Inside the loop, register r1 (0x000f) is rotated to the right by r2 bit positions and the result is stored in register r0.
Then r2 is incremented and the shift result written to memory.
As the simulator runs, watch as r2 is incremented and as r0 goes through the various values of 4 bits rotated. The A and D values show the address and data pins as instructions are read from memory.

The changing shift values are clearly visible in the barrel shifter, as the diagonal line shifts position. If you zoom in on the register file, you can read out the values of the registers, as described earlier.

Conclusion

The ARM1 processor led to the amazingly successful ARM processor architecture that powers your smart phone. The simple RISC architecture of the ARM1 makes the circuitry of the processor easy to understand, at least compared to a chip such as the 386.[12]
The ARM1 simulator provides a fascinating look at what happens inside a processor, and hopefully this article has helped explain what you see in the simulator.

P.S. If you want to read more about ARM1 internals, see Dave Mugridge’s series of posts:

Inside the armv1 Register Bank

Inside the armv1 Register Bank – register selection

Inside the armv1 Read Bus

Inside the ALU of the armv1 – the first ARM microprocessor

Notes and references

[1]

I should make it clear that I am not part of the Visual 6502 team that built the ARM1 simulator.
More information on the simulator is in the Visual 6502 team’s blog post
The Visual ARM1.

[2]

The block diagram below shows the components of the chip in more detail.
See the ARM Evaluation System manual for an explanation of each part.

Floorplan of the ARM1 chip, from ARM Evaluation System manual. (Bus labels are corrected from original.)

[3]

You may have noticed that the ARM architecture describes 16 registers, but the chip has 25 physical registers.
There are 9 “extra” registers because there are extra copies of some registers for use while handling interrupts.

Another interesting thing about the register file is the PC register is missing a few bits. Since the ARM1 uses 26-bit addresses, the top 6 bits are not used. Because all instructions are aligned on a 32-bit boundary, the bottom two address bits in the PC are always zero. These 8 bits are not only unused, they are omitted from the chip entirely.

[4]

Advertisement1

The ALU doesn’t support multiplication (added in ARM 2) or division (added in ARMv7).

[5]

A bit more detail on the decode circuitry.
Instruction decoding is done through three separate PLAs.
The ALU decode PLA generates control signals for the ALU based on the four operation bits in the instruction. The shift decode PLA generates control signals for the barrel shifter. The instruction decode PLA performs the overall decoding of the instruction.
The register decode block consists of three layers. Each layer takes a 4-bit register id and activates the corresponding register. There are three layers because ARM operations use two registers for inputs and a third register for output.

[6]

In a RISC computer, the instruction set is restricted to the most-used instructions, which are optimized for high performance and can typically execute in a single clock cycle.
Instructions are a fixed size, simplifying the instruction decoding logic.
A RISC processor requires much less circuitry for control and instruction decoding, leaving more space on the chip for registers. Most instructions operate on registers, and only load and store instructions access memory.
For more information on RISC vs CISC,
see RISC architecture.

[7]

For details on the history of the ARM1, see Conversation with Steve Furber: The designer of the ARM chip shares lessons on energy-efficient computing.

[8]

The 386 and the ARM1 instruction sets are different in many interesting ways.
The 386 has instructions from 1 byte to 15 bytes, while all ARM1 instructions are 32-bits long.
The 386 has 15 registers – all with special purposes, while the ARM1 has 25 registers, mostly general-purpose.
386 instructions can usually operate on memory, while ARM1 instructions operate on registers except for load and store.
The 386 has about 140 different instructions, compared to a couple dozen in the ARM1 (depending how you count).
Take a look at the 386 opcode map to see how complex decoding a 386 instruction is.
ARM1 instructions fall into 5 categories and can be simply decoded.
(I’m not criticizing the 386’s architecture, just pointing out the major architectural differences.)

See the Intel 80386 Programmer’s Reference Manual
and 80386 Hardware Reference Manual
for more details on the 386 architecture.

[9]
Interestingly the ARM company doesn’t manufacture chips. Instead, the ARM intellectual property is licensed to hundreds of different companies that build chips that use the ARM architecture.
See The ARM Diaries: How ARM’s business model works for information on how ARM makes money from licensing the chip to other companies.

[10]

The first metal layer in the chip runs largely top-to-bottom, while the second metal layer runs predominantly horizontally.
Having two layers of metal makes the layout much simpler than single-layer processors such as the 6502 or Z-80.

[11]

In the register file, alternating bits are mirrored to simplify the layout.
This allows neighboring bits to share power and ground lines.
The ARM1’s register file is triple-ported, so two register can be read and one register written at the same time. This is in contrast to chips such as the 6502 or Z-80, which can only access registers one at a time.

[12]

For more information on the ARM1 internals, the book
VLSI Risc Architecture and Organization by
ARM chip designer Steven Furber has a hundred pages of information on the ARM chip internals.
An interesting slide deck is A Brief History of ARM by Lee Smith, ARM Fellow.

[tps_footer][/tps_footer]

Pages: 1 2 3 4 5 6 7

Reverse engineering the ARM1, ancestor of the iPhone’s processor

The demonstration program

Conclusion

Notes and references

More Articles to Read