Last month, I have mastered the skill of programming in x86-64. Assembly is somewhat outdated to learn, but it is important in optimizing programs and doing hardware-related work.
Assembly is all about cache and instruction set. There are several mainstream instruction sets, and x86-64 is the dominant one in the field of PC.
For starters, assembly is about moving stuff into and out of registers, so we have to specify how to do those two things.

Above is the x86-64 register layout. All of them are 8 bytes(64 bits) and, for instance, %rax
is reserved for return values from a function, while %rsp
is used to preserve the address of the top element in the stack, etc. %rdi
is preserved for the first argument of a function, etc.
Now, we can dig into how to move data from/to memory to/from the register. Using the instruction mov src, dest
, we can move from a source to destination.
movq %rax, %rbx #move the content from %rax to %rbx
movq (%rsi), %rax #move the content from the memory that is referenced by %rsi to %rax. The content in %rsi is an address of the memory
movq %rax, (%rsi) #move the content from %rax to the memory location referenced by %rsi
movq (%rax,4), %rsi #move the content from the memory at address (%rax + 4) to %rsi
movq 3(%rax, %rdi, 2), %rsi #move the content in the memory at address (%rax + 2 * %rdi + 3) to %rsi
movq (%rax), (%rdi) #this is illegal. Cannot perform memory to memory operations
Above is the basic operations of moving stuff around. Notice that the suffix ‘q’ at the end of mov means that the operations are dealing with 8 bytes(quadruple words). If you are dealing with other size, there are letters for those as well:
q is quadruple words for 8 bytes
l is long for 4 bytes
w is a word for 2 bytes --Notice the definition of word might be different on different machines
b si a byte for 1 byte
Below is the diagram of how to reference registers in other sizes:

Now we introduce some basic instructions that can perform logic and arithmetic operations.
add src, dest #dest = dest + src
sub src, dest #dest = dest - src
mul src, dest #dest = dest * src
sal k, dest #dest = dest << k
sar k, dest #dest = dest >> k
jmp dest #%rpi = dest, %rpi is the instruction pointer
cmp src, dest #set the flags: test if dest < src. To learn more about the flags, visit the x86-64 reference sheet
...
There are many other instructions in x86-64, but the central idea is all about moving around and changing data.
Now, I want to talk about some of the code optimizations that have been performed in x86-64. As one might know, the more instructions, the longer the time that it takes to execute codes. In order to optimize codes, we have to reduce the amount of instructions after translation from higher-order languages such as C or Java.
For instance, calculation of multiplication by even numbers can be achieved by shifting to the left:
salq 1, %rax #this is optimized version of %rax * 2
sarq 2, %rax #this is optimized version of %rax / 4
Here’s another example: there is an operation called “load effective address”, or lea
instruction. Here is an example:
leaq 3(%rdi, %rsi, 4), %rax #load the result of (%rdi + 4 * %rsi + 3) into %rax
By its name, lea
is used to calculate address shifted by certain amount, but since its ability to calculate using addition and multiplication, we can perform any arithmetic using lea
instead. For instance, we can multiply %rdi
by 5 using:
leaq (%rdi, %rdi, 4), %rdi
Also, for setting something to zero: instead of using mov, we can just use xor:
xor %rdi, %rdi #To set %rdi to 0
There are many other optimizations that a compiler does to make programs faster, but here just name a few.
Learning assembly isn’t that hard, and it can add another skill on your resume. If you want to learn more details, read this book and you will learn everything.