askill
assembly-arm

assembly-armSafety 95Repository

AArch64 and ARM assembly skill for reading and writing ARM assembly code. Use when reading GCC/Clang output for AArch64 or ARM Thumb targets, writing inline asm in C/C++, understanding the ARM ABI (AAPCS64/AAPCS), or debugging register and stack state on ARM hardware or QEMU. Activates on queries about AArch64 assembly, ARM Thumb, NEON/SVE SIMD, ARM calling convention, inline asm for ARM, or reading ARM disassembly.

17 stars
1.2k downloads
Updated 2/20/2026

Package Files

Loading files...
SKILL.md

ARM / AArch64 Assembly

Purpose

Guide agents through AArch64 (64-bit) and ARM (32-bit Thumb) assembly: registers, calling conventions, inline asm, and NEON/SVE SIMD patterns.

Triggers

  • "How do I read ARM64 assembly output?"
  • "What are the AArch64 registers and calling convention?"
  • "How do I write inline asm for ARM?"
  • "What is the difference between AArch64 and ARM Thumb?"
  • "How do I use NEON intrinsics?"

Workflow

1. Generate ARM assembly

# AArch64 (native or cross-compile)
aarch64-linux-gnu-gcc -S -O2 foo.c -o foo.s

# 32-bit ARM Thumb
arm-linux-gnueabihf-gcc -S -O2 -mthumb foo.c -o foo.s

# From objdump
aarch64-linux-gnu-objdump -d -S prog

# From GDB on target
(gdb) disassemble /s main

2. AArch64 registers (AAPCS64)

RegisterAliasRole
x0x7Arguments 1–8 and return values
x8xrIndirect result location (struct return)
x9x15Caller-saved temporaries
x16x17ip0, ip1Intra-procedure-call temporaries (used by linker)
x18prPlatform register (reserved on some OS)
x19x28Callee-saved
x29fpFrame pointer (callee-saved)
x30lrLink register (return address)
spStack pointer (must be 16-byte aligned at call)
pcProgram counter (not directly accessible)
xzrwzrZero register (reads as 0, writes discarded)
v0v7q0q7FP/SIMD args and return
v8v15Callee-saved SIMD (lower 64 bits only)
v16v31Caller-saved temporaries

Width variants: x0 (64-bit), w0 (32-bit, zero-extends to 64), h0 (16), b0 (8).

3. AAPCS64 calling convention

Integer/pointer args: x0x7 Float/SIMD args: v0v7 Return: x0 (int), x0+x1 (128-bit), v0 (float/SIMD) Callee-saved: x19x28, x29 (fp), x30 (lr), v8v15 (lower 64 bits) Caller-saved: everything else

Stack must be 16-byte aligned at any bl or blr instruction.

4. Common AArch64 instructions

InstructionEffect
mov x0, x1Copy register
mov x0, #42Load immediate
movz x0, #0x1234, lsl #16Move zero-extended with shift
movk x0, #0xabcdMove with keep (partial update)
ldr x0, [x1]Load 64-bit from address in x1
ldr x0, [x1, #8]Load from x1+8
str x0, [x1, #8]Store x0 to x1+8
ldp x0, x1, [sp, #16]Load pair (two regs at once)
stp x29, x30, [sp, #-16]!Store pair, pre-decrement sp
add x0, x1, x2x0 = x1 + x2
add x0, x1, #8x0 = x1 + 8
sub x0, x1, x2x0 = x1 - x2
mul x0, x1, x2x0 = x1 * x2
sdiv x0, x1, x2Signed divide
udiv x0, x1, x2Unsigned divide
cmp x0, x1Set flags for x0 - x1
cbz x0, labelBranch if x0 == 0
cbnz x0, labelBranch if x0 != 0
bl funcBranch with link (call)
blr x0Branch with link to address in x0
retReturn (branch to x30)
ret x0Return to address in x0
adrp x0, symbolPC-relative page address
add x0, x0, :lo12:symbolLow 12 bits of symbol offset

5. Typical function prologue/epilogue

// Non-leaf function
stp  x29, x30, [sp, #-32]!   // save fp, lr; allocate 32 bytes
mov  x29, sp                  // set frame pointer
stp  x19, x20, [sp, #16]     // save callee-saved registers
// ... body ...
ldp  x19, x20, [sp, #16]     // restore
ldp  x29, x30, [sp], #32     // restore fp, lr; deallocate
ret

// Leaf function (no calls, no callee-saved regs needed)
// Can use red zone (no rsp adjustment) — but AArch64 has no red zone
sub  sp, sp, #16             // allocate locals
// ... body ...
add  sp, sp, #16
ret

6. Inline assembly (GCC/Clang)

// Barrier
__asm__ volatile ("dmb ish" ::: "memory");

// Load acquire
static inline int load_acquire(volatile int *p) {
    int val;
    __asm__ volatile ("ldar %w0, %1" : "=r"(val) : "Q"(*p));
    return val;
}

// Store release
static inline void store_release(volatile int *p, int val) {
    __asm__ volatile ("stlr %w1, %0" : "=Q"(*p) : "r"(val));
}

// Read system counter
static inline uint64_t read_cntvct(void) {
    uint64_t val;
    __asm__ volatile ("mrs %0, cntvct_el0" : "=r"(val));
    return val;
}

AArch64-specific constraints:

  • "Q" — memory operand suitable for exclusive/acquire/release instructions
  • "r" — any general-purpose register
  • "w" — any FP/SIMD register

7. NEON SIMD intrinsics

#include <arm_neon.h>

// Add 4 floats at once
float32x4_t a = vld1q_f32(arr_a);   // load 4 floats
float32x4_t b = vld1q_f32(arr_b);
float32x4_t c = vaddq_f32(a, b);
vst1q_f32(result, c);

// Horizontal sum
float32x4_t sum = vpaddq_f32(c, c);
sum = vpaddq_f32(sum, sum);
float total = vgetq_lane_f32(sum, 0);

Naming convention: v<op><q>_<type>

  • q suffix: 128-bit (quad) vector
  • _f32: float32, _s32: int32, _u8: uint8, etc.

For a register reference, see references/reference.md.

Related skills

  • Use skills/low-level-programming/assembly-x86 for x86-64 assembly
  • Use skills/compilers/cross-gcc for cross-compilation toolchain
  • Use skills/debuggers/gdb for debugging ARM code with gdbserver

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

85/100Analyzed 2/24/2026

High-quality technical skill for ARM/AArch64 assembly with comprehensive register tables, calling conventions, instruction reference, inline asm examples, and NEON SIMD patterns. Well-structured with clear triggers and workflow. Somewhat deeply nested path but content is broadly applicable to ARM development. Tags seem mismatched but don't diminish the technical value.

95
90
85
75
80

Metadata

Licenseunknown
Version-
Updated2/20/2026
Publishermohitmishra786

Tags

github-actionsprompting