Introduction


  • CPU for use in mobile devices, automotive, etc.
  • Licensed to Intel (Xscale) who sold to Marvell
  • Approx 75% of all 32-bit embedded CPUs
  • 350-500mHz Clock
  • 16 Registers
  • Piplined
  • Predicitve Branching
  • Instruction Set


General Architecture


  • High performance, Low cost, Low power
  • 8 Stage Pipeline (Sorta-scalar)
  • 64-Bit compatability: 64-bit data buses between processor interger unit and instruction and data caches, and between coprocessors and the interger unit.
  • Out-of-order completion
  • Different feature sets for different applications: THUMB, VFP, Jazelle


The Pipeline


  • Uses pipeline forwarding
  • Scalar initially, changes to parallel pipelines after decoding
  • ALU/MAC/LS - Arithmatic Logic Unit/Multiplier Accumulator/Load and Store
  • Out-of-order exec and parallel pipelines





Predictive Branching


  • What is it?
  • 64-entry, 4 state branch target address cache (BTAC)
  • Stages: Strongly Taken/Weakly Taken/Strongly not Taken/Weakly not Taken
  • Folding branchs
  • ~85% of branches are correctly predicted, resulting in saving five clock cycles for every correct prediction


ARM Registers


  • R0 - 1st Argument, return value (temporary register)
  • R1 - 2nd argument, second 32bit of doubt/int, return value (temporary register)
  • R2-R3 - Arguments (temporary register)
  • R4-R10 - R7 is the THUMB frame pointer, otherwise scratch registers (permenant register)
  • R11 - ARM Frame Pointer (permenant)
  • R12 - General Use (temporary register)
  • R13 - Stack Pointer
  • R14 - Link Register
  • R15 - Program Counter


Instruction Set


  • Usual suspects:ADD, AND, BL, CMP, DVF, MOV, SWI, NOP
  • Interesting instructions: FDV, SWP(swap), FML(fast multiply), LDM(load multiple), SIN/COS/TAN, SETEND(set endiness)