Introduction to writing x64 assembly in Visual Studio

asm64helloThis article completes the previous article about writing x86 assembly code in Visual Studio. Therein, I will show you how to extend the previous knowledge and give you a head start on how to write x64 assembler code instead.

The Setup

Configuring your project

If you did not yet create a new project, please follow the “Setup” steps described in the x86 article.

After you created a new project, let us get started by configuring your project for two things:

  • The ability to assemble and link ASM files from the IDE
  • Add a new x64 configuration for your project

Note: If you don’t want to start from scratch, I suggest you download the source code used in this article from here and modify the project / sources as you see fit.

Adding ASM files compilation support to your project

In order to instruct Visual Studio’s build process to assemble and link ASM files found in your project with ml or ml64, you need to do the following steps:

Step 1

Right click on your project’s name and choose “Build Dependencies” then “Build Customizations…”

image

Step 2

Add the MASM files customization support by checking the “masm (.targets, .props)” option:

image

That’s it!

Now if your project has .asm files, then building your project will, not only compile C/C++ files but, they will be assembled into object files (.obj) ready to be linked into your project. That is very good because it saves you the trouble of manually invoking ml64.exe or setting up pre and post build steps.

Adding the x64 project configuration

In Visual Studio, choose the “Build” menu and then select “Configuration Manager”. The following dialog will appear:

image

Now press on the “Active solution platform” combo and select “<New>”:

image

Now, select the “x64” platform and then choose to copy settings from the “Win32” platform.

Press OK and confirm that now you have “x64” listed under the “Platform” column:

image

Now press “Close” to finish!

Selecting the x64 configuration

Having created the configuration in the previous step, make sure that it is selected. One easy way is to navigate to the toolbar and select the “x64” configuration:

image

The other way to achieve that is to select it from the “Build/Configuration Manager” dialog:

image

Set up the debugging environments

It will be very handy to have the proper debug windows when intermixing ASM and C/C++ code together. Press F10 to start the debugging sessions and then enable the following windows from the “Debug/Windows” menu:

  • Registers
  • Memory
  • Disassembly

image
Also note that you can press F9 to put breakpoints anywhere in executable code in the ASM source code.
Now that we are done with the set up, let me proceed and give you some background information about writing x64 code.

Basic Background

Here are some stuff that you need to keep in mind when writing x64 assembler code:

  • You are no longer permitted to write inline assembler code. Instead you are obliged to write the ASM code in a separate ASM file then compile (with ML64) and link the file with your project.
  • In many cases, compiler Intrinsics can be used  to issue some machine instructions
  • There are more general purpose registers in x64. Each register is 8 bytes long or one QWORD.
    • The regular 32bit general purpose registers EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP become RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP in x64.
    • In addition to the 8 registers, there are 8  more registers: R8, R9, R10, R11, R12, R13, R14, R15.
    • Technically speaking the registers: “RAX,RCX,RDX, RBX, RSP, RBP, RSI, RDI” are aliases to registers R0 to R7.
  • Each stack operation (say a PUSH/POP/CALL/RET/etc…) consumes a QWORD.
  • There is only one calling convention supported and that is the “fast calling convention
    • Registers RCX, RDX, R8 and R9 are used to pass the first four arguments
    • The remainder of the arguments must be pushed/passed on the stack
    • Even though that the first four registers are passed via the RCX, RDX, R8 and R9 registers, the stack pointer should still be decreased by 8 * 4 = 32 bytes prior to a function call. That is called registers shadow area.
    • The caller is responsible for allocating a shadow area for the 4 registers prior to calling a function, even if the callee takes no arguments.
  • 16 bytes alignment is essential for various XMM instructions. Therefore, if you are going to use the Windows APIs or other foreign libraries, then ascertain that prior to a function call, the stack pointer is 16 bytes aligned (thus upon the called function’s entry).
  • Do not mess up the registers values. The compiler expects certain non volatile registers to be preserved between function calls
    • The non-volatile registers are: RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15. They must be preserved between function calls
    • The volatile registers are: RAX, RCX, RDX, R8, R9, R10, R11
  • Returning values:
    • The RAX register is used to return integer values up to 64bits in size
    • The XMM0 register is used to return non-scalar types (float, double, __m128, …)
  • SEH  can no longer be installed like in x86 by adding an exception record into the FS:0 list
    • Special consideration for unwinding must be taken
    • There will be an example in this article on how to write an ASM routine that can be caught inside your C code’s try/except
    • Check the “Exception Handling (x64)” topic on MSDN for more information.For more information, check the “x64 Software Conventions” article on MSDN.

flower separator
batchography-good-resDo you want to master Batch Files programming? Look no further, the Batchography is the right book for you.

Available in print or e-book editions from Amazon.
flower separator

Writing x64 assembly code

Let’s get started by creating an ASM file called “asm64code.asm”” with the following template body:

_DATA SEGMENT

_DATA ENDS

_TEXT SEGMENT

_TEXT ENDS

END

We will use that template to add our code slowly as we progress in this article.

Defining data

To define read/write data, you have to define the data within a read/write data segment. Here’s how to do it:

_DATA SEGMENT
hello_msg db "Hello world", 0
info_msg  db "Info", 0
_DATA ENDS

To define read-only and executable data, do that inside the text segment:

_TEXT    SEGMENT
hello_msg db "Hello world", 0
info_msg  db "Info", 0
_TEXT    ENDS

Referencing external data and functions

In the assembler code we are going to write, we will need to refer to functions residing in other modules (libraries or object files). We can use the EXTERN directive.

In the following C code, we define a few types that we will be referring to from the ASM code:

extern "C"
{
char c_ext_byte = 1;
unsigned short c_ext_word = 2;
long c_ext_dword = 3;
__int64 c_ext_qword = 4;
void *c_ext_ptr = (void *)(5);
void c_ext_my_function();
}

Note that I wrap all those definitions inside an “extern C” just because I don’t want the names to be decorated (or mangled).

Now to make those data types accessible to our ASM code, we have to declare the proper EXTERN directives like this:

EXTERN c_ext_byte: byte
EXTERN c_ext_word: word
EXTERN c_ext_dword: dword
EXTERN c_ext_qword: qword
EXTERN c_ext_ptr: qword
EXTERN c_ext_my_function: PROC

EXTERN MessageBoxA: PROC
EXTERN GetForegroundWindow: PROC

To put it into perspective, we can write the following ASM code to access this data:

PUBLIC access_extern_data

access_extern_data PROC
; Dereference all the data according to each data's sizes
mov    al, byte ptr [c_ext_byte]
mov ax, word ptr [c_ext_word]
mov eax, dword ptr [c_ext_dword]
mov rax, qword ptr [c_ext_qword]

; Remember, a pointer is just a QWORD
mov rax, qword ptr [c_ext_ptr]

; Similarily, a function pointer is also a QWORD
mov rax, offset c_ext_my_function
sub rsp, 4 * 8 ; Register shadow stack
call rax ; call the C function
add rsp, 4 * 8 ; Restore the stack

ret
access_extern_data ENDP

Hello world in x64

Okay, now we are ready to write our hello world function in x64 ASM:

EXTERN MessageBoxA: PROC

EXTERN GetForegroundWindow: PROC
PUBLIC hello_world_asm

hello_world_asm PROC

push rbp ; save frame pointer
mov rbp, rsp ; fix stack pointer
sub rsp, 8 * (4 + 2) ; allocate shadow register area + 2 QWORDs for stack alignment

; Get a window handle
call GetForegroundWindow
mov rcx, rax

; WINUSERAPI int WINAPI MessageBoxA(
;  RCX =>  _In_opt_ HWND hWnd,
;  RDX =>  _In_opt_ LPCSTR lpText,
;  R8  =>  _In_opt_ LPCSTR lpCaption,
;  R9  =>  _In_ UINT uType);

mov rdx, offset hello_msg
mov r8, offset info_msg
mov r9, 0 ; MB_OK

and rsp, not 8 ; align stack to 16 bytes prior to API call
call MessageBoxA

; epilog. restore stack pointer
mov rsp, rbp
pop rbp

ret
hello_world_asm ENDP

Things to note:

  • Make sure you declare the proper EXTERNs so we can use the Windows APIs
  • Create a prolog so we can modify the stack pointer and allocate local stack storage area
  • Since this is handwritten assembler code, it is hard to allocate the proper amount of stack to achieve the right stack pointer alignment. Allocate a couple more QWORDs so we can adjust/align the stack pointer prior to foreign API calls that have the potential to invoke XMM instructions
  • Align the stack pointer to 16 bytes prior to calling a foreign API
  • Restore the stack pointer
  • Return to the caller

Now to call this function from the C/C++ code, we have to make sure that it is defined as PUBLIC in the ASM code, and also declare its prototype in the C/C++ code:

void hello_world_asm();
int main(int argc, _char * argv[])
{
  hello_world_asm();
  return 0;
}

image

Now, had we not aligned the stack pointer to 16 bytes prior to calling “MessageBoxA”, then we have a chance of causing an access violation exception deep down in user32.dll when and if an XMM instruction is used:

image

Related articles:

A simple function with two inputs and one out argument

We are going to write a very simple function that adds two number. It is the logical equivalent of the following C function:

__int64 c_add_by_ref(int a, int b, __int64 *r)
{
  *r = a + b;
  return *r;
}

In ASM x64, one way to write it is like this:

; __int64 add_by_ref(int a, int b, __int64 *r)
add_by_ref PROC
movsxd rax, ecx
movsxd rdx, edx
add    rax, rdx ; result in RAX
mov qword ptr [r8], rax ; store / deref
ret
add_by_ref ENDP

Working with structures

A common need when writing x64 assembler code is working with structures that are common to C/C++ and the ASM code.

Suppose the following C structure:

#pragma pack(push, 1)
struct MyStruct
{
char b;
void *buf;
void (WINAPI *fptr)(int a);
__int64 sig;
};
#pragma pack(pop)

Let us now assume there’s an assembler function called “init_struct()” that initializes this structure and populates it such as the C code can use it.

It’s C prototype is:

void init_struct(MyStruct *st);

Now, here’s an example syntax on how to define the equivalent structure in ASM:

MyStruct struct
b  db ?
buf  dq ?
fptr  dq ?
sig dq ?
MyStruct ends

…followed by the implemention of the “init_struct()” in assembler

; void init_struct(MyStruct *st);
init_struct PROC
; st->b = 1
mov byte ptr MyStruct.b[rcx], 1

; st->buf = &buf
mov rax, offset buf
mov qword ptr MyStruct.buf[rcx], rax

; st->fptr = fptr
lea rax, non_public_func
mov qword ptr MyStruct.fptr[rcx], rax

; st->sig = value
mov rax, 0badbeefbabeh
mov qword ptr MyStruct.sig[rcx], rax

ret
init_struct ENDP

Unwindable ASM function

In this last section, I will illustrate how to write an ASM function that is unwindable in the event of an exception occurring.

First, let me write a non-unwidable function and illustrate what the problem could be.

Assume the following ASM function that generates an access violation exception:

; AV generating function
cause_av_bad PROC
push rbp
mov rbp, rsp

push 1
push 2
push 3
sub rsp, 0400h

xor rax, rax
mov rax, [rax] ; cause A/V

add rsp, (8 * 3) + 0400h
mov rsp, rbp
pop rbp

ret
cause_av_bad ENDP

Note that this dummy function modifies the stack pointer. By doing that, and without properly generating unwind information, it will make this function unwindable, thus making the exception non resumable.

Now, let’s try to invoke this function from our C code:

void call_cause_av_bad()
{
  __try
  {
    printf("Calling faulty ASM function...\n");
    cause_av_bad();
  }
  __except(EXCEPTION_EXECUTE_HANDLER)
  {
    printf("caught exception...\n");
  }
}

And we get this exception that is not resumable. It will keep on triggering and the “__except” block will not be reached:

image

Now I will illustrate how to write an ASM function with proper unwind information:

cause_av_good PROC FRAME
push rbp ; save previous frame pointer
.pushreg rbp ; encode unwind info
mov rbp, rsp ; set new frame pointer
.setframe rbp, 0 ; encode frame pointer
.endprolog

; feel free to modify the stack now
push 1
push 2
push 3
sub rsp, 0400h

xor rax, rax
mov rax, [rax] ; cause AV

add rsp, (8 * 3) + 0400h
mov rsp, rbp
pop rbp

ret
cause_av_good ENDP

The difference here is how the ASM procedure definition (note the additional FRAME keyword after the PROC), and the unwind information directives (.pushreg, .setframe, and .endprolog).

The unwind directives encode which register is the frame pointer and what are the stack operations that took place before the frame pointer was set. This information is vital when an exception occurs and the exception handler is dispatched.

Here’s another example showing how to encode other stack pointer manipulation prior to setting the frame pointer:

cause_av_good2 PROC FRAME
sub rsp, 020h ; allocate stack space
.allocstack 020h ; encode that change
push rbp ; save old frame pointer
.pushreg rbp ; encode stack operation
mov rbp, rsp ; set new frame pointer
.setframe rbp, 0 ; encode frame pointer
.endprolog

; any stack pointer modifications here on are okay...
sub rsp, 080h

; we can unwind from the following AV because of the frame pointer
xor rax, rax
mov rax, [rax] ; cause AV

; properly restore the stack pointer (in case exception did not happen or the handler corrected the situation)
mov rsp, rbp
pop rbp
add rsp, 020h


ret
cause_av_good2 ENDP

Now, in both cases, if we call either of the previous two ASM functions, then the C exception handler will graceful know how to unwind:

void call_cause_av_good()
{
  __try
  {
    printf("Calling faulty ASM function that has unwind info...\n");
    cause_av_good();
  }
  __except (EXCEPTION_EXECUTE_HANDLER)
  {
    printf("caught exception...\n");
  }
  printf("function returned!\n");
}

Observe how the function executes all the way to the end:

image

Reference and useful links

Conclusion

In conclusion, I hope that this article gave you a head start as promised. Please make sure you go to my GitHub repository and download the source code for this project.

flower separator

You may also like:

13 Replies to “Introduction to writing x64 assembly in Visual Studio”

  1. It would help if the author clarifies what content goes into each file.

    For instance, what exactly is the full content of asm64.asm? Where does the extern “C”{} code go? In the asm file?

    As of now, for the reader, it is completely unclear how many files are needed, C file, asm file, etc., how many of these are to be loaded into the project in order to recreate the “Hello World”.

    1. Hello user,

      I am the author of this article.

      Try downloading the source code mentioned in the article from GitHub to see the big picture.

      Let me know if that helps please.

      1. Yes, actually, I hadn’t looked down in the page where the link was provided. Thank you for this.

        I have been able to compile the code and step through it in Visual studio 2013.

        BTW, can you suggest any good book for 64 bit visual studio programming?

        I am working my way through Dantemann, Pappas and Abrash’s series of 3 books, but while they are good from a conceptual POV, the code provided there does not compile in VS.

  2. No, the git didn’t work. I could not find a way to get the code. All I can see is a thing about accessing arrays.
    The assembler says “error A2034: must be in segment block : access_extern_data” followed by 10 more of the same error and “fatal error A1010: unmatched block nesting : access_extern_data” at the end. I am on Win10 using VS2017 in VS2015 target. As the previous User suggested a bit of editing, saying what goes where, would turn this into a useful tutorial.

  3. On normal return the last example (using .allocstack) would have restored a bogus value for rbp (because the restore order does not match the save order). The save order on the stack is:

    – decrement RSP by 20h
    – save RBP
    – decrement RSP by 80h

    The restoration order should be the opposite of that, but it instead does:

    – increment RSP by (20h+80h)
    – restore RBP

  4. Many thx for the details!

    Was a strong job to convert some image processing asm code’s to 64bit , many are realized in c++ but the long asm passage’s are much much faster 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.