GNU MultiPrecision includes a LibMPN subsystem implemented in Assembly for a wide variety of CPUs. Normally I'd discuss the x86_64 implementation since that's what I'm running, but instead I'll study the ARM64 implementation since that's simpler!
This includes routines for:
* Inverting a digit using bitwise-logic, adds, multiplies, & subtracts.
* Hamming distance *mostly* consists of bitwise logic with extensive controlflow, & an accumulator.
* Digit squaring with low, high, & mid branches.
1/