PROWAREtech
x86-64 Assembly: Tutorial - A Quick Guide to the Changes in 64-bit Assembly - Page 2
Be familiar with x86 assembly as the differences between x86 and x64 assembly are minor.
This code was compiled and linked using the Microsoft Macro Assembler for Visual Studio 2022.
Instructions Compared to x86
Just like in x86, instruction operands must be the same size and at least one must be a register.
cmp rbx, ecx ; cannot do this
cmp rdx, rbx ; okay
mov rbx, ecx ; cannot do this
mov ebx, rax ; cannot do this
mov ebx, QWORD PTR [rax] ; cannot do this
mov rbx, 12h ; okay
mov rdx, rbx ; okay
mov rbx, QWORD PTR [rax] ; okay
mov ebx, DWORD PTR [eax] ; okay
add rbx, ecx ; cannot do this
add rbx, rcx ; okay
xchg rbx, ecx ; cannot do this
xchg rbx, rcx ; okay
div 10h ; cannot do this
mov [rbx], QWORD PTR [rax] ; cannot do this
Microsoft's x64 Calling Convention
Passing function/procedure parameters in x64 uses the "fastcall" calling convention. This means the first four QWORD parameters are passed in the rcx
, rdx
, r8
and r9
registers. More than four parameters and the stack is used to pass them to the function. The same follows for DWORD parameters except ecx
, edx
, r8d
and r9d
are used. See the following diagram:
Param | QWORD | DWORD | WORD | BYTE |
---|---|---|---|---|
1 | rcx | ecx | cx | cl |
2 | rdx | edx | dx | dl |
3 | r8 | r8d | r8w | r8b |
4 | r9 | r9d | r9w | r9b |
5+ | uses the stack | |||
*x64 fastcall diagram |
Microsoft adds something called shadow space to the stack. This is done by subtracting 32 from the RSP
register. This shadow space represents the four 64-bit variables (32 bytes) that would have been passed on the stack if not for passing them in registers. This space is not wasted, it can be used to save ECX
, EDX
, r8
and r9
in the event a sub-procedure needs to be called, particularly a recursive one. So every time the stack is used, it must be done with 32 (or 20h) added to it. This is done in many of the examples below.
Consider the following C/C++ code:
// delcare the assembly function
extern "C" long long addx64(long long a, long long b, long long c, long long d, long long* num1, long long* num2, int *num3, int num4, int num5);
int main()
{
long long num1 = 0x1000, num2 = 0x2000;
int num3 = 0x4000;
long long x = addx64(0x1, 0x2, 0x3, 0x4, &num1, &num2, &num3, 0x8000, 0x10000);
return 0;
}
Parameters 1-4 are passed in 64-bit registers; parameters 5-8 are passed on the stack:
_TEXT SEGMENT
addx64 PROC
cmp rcx, 1h ; parameter 1 (a - passed in rcx)
jne exit
mov rax, rcx
cmp rdx, 2h ; parameter 2 (b - passed in rdx)
jne exit
add rax, rdx
cmp r8, 3h ; parameter 3 (c - passed in r8)
jne exit
add rax, r8
cmp r9, 4h ; parameter 4 (d - passed in r9)
jne exit
add rax, r9
mov r10, [rsp+(1+0)*8+32] ; parameter 5 (&num1 - passed on the stack)
add rax, [r10]
mov r10, [rsp+(2+0)*8+32] ; parameter 6 (&num2 - passed on the stack)
add rax, [r10]
mov r10, [rsp+(3+0)*8+32] ; parameter 7 (&num3 - passed on the stack)
mov r10d, [r10]
add rax, r10
mov r10d, [rsp+(4+0)*8+32] ; parameter 7 (num4 - passed on the stack)
add rax, r10
mov r10d, [rsp+(5+0)*8+32] ; parameter 8 (num5 - passed on the stack)
add rax, r10
exit:
ret
addx64 ENDP
_TEXT ENDS
END
Consider the following C/C++ code:
// delcare the assembly function using six 32-bit parameters this time and three 64-bit values (pointers)
extern "C" int addx64(int a, int b, int c, int d, int* num1, int* num2, int *num3, int num4, int num5);
int main()
{
int num1 = 0x1000, num2 = 0x2000;
int num3 = 0x4000;
int x = addx64(0x1, 0x2, 0x3, 0x4, &num1, &num2, &num3, 0x8000, 0x10000);
return 0;
}
Parameters 1-4 are passed in 32-bit registers; parameters 5-8 are passed on the stack:
_TEXT SEGMENT
addx64 PROC
cmp ecx, 1h ; parameter 1 (a - passed in ecx)
jne exit
mov eax, ecx
cmp edx, 2h ; parameter 2 (b - passed in edx)
jne exit
add eax, edx
cmp r8d, 3h ; parameter 3 (c - passed in r8d)
jne exit
add eax, r8d
cmp r9d, 4h ; parameter 4 (d - passed in r9d)
jne exit
add eax, r9d
mov r10, QWORD PTR [rsp+(1+0)*8+32] ; parameter 5 (&num1 - passed on the stack)
mov r10d, DWORD PTR [r10]
add eax, r10d
mov r10, QWORD PTR [rsp+(2+0)*8+32] ; parameter 6 (&num2 - passed on the stack)
mov r10d, DWORD PTR [r10]
add eax, r10d
mov r10, QWORD PTR [rsp+(3+0)*8+32] ; parameter 7 (&num3 - passed on the stack)
mov r10d, DWORD PTR [r10]
add eax, r10d
mov r10d, DWORD PTR [rsp+(4+0)*8+32] ; parameter 7 (num4 - passed on the stack)
add eax, r10d
mov r10d, DWORD PTR [rsp+(5+0)*8+32] ; parameter 8 (num5 - passed on the stack)
add eax, r10d
exit:
ret
addx64 ENDP
_TEXT ENDS
END
*Notice that the code is more easily understood using QWORD PTR and DWORD PTR but, as with x86 assembly, the x64 assembler knows which value is being copied based on the first operand.
The same holds true for passing WORD and BYTE values; that is the first four parameters are passed in 16- and 8-bit registers, respectively, and then the rest are passed on the stack.
Consider the following C/C++ code which passes four different sized parameters:
extern "C" long long addx64(char a, short b, int c, long long d);
int main()
{
long long val = addx64(1, 2, 4, 8);
return 0;
}
It is as simple as this to handle an 8-, 16-, 32- and 64-bit values.
_TEXT SEGMENT
addx64 PROC
xor rax, rax ; zero-out rax
add al, cl ; add bytes
add ax, dx ; add words
add eax, r8d ; add dwords
add rax, r9 ; add qwords
ret
addx64 ENDP
_TEXT ENDS
END