This article completes the previous article about writing x86 assembly code in Visual Studio. Therein, I will show you how to extend the previous knowledge and give you a head start on how to write x64 assembler code instead.
The Setup
Configuring your project
If you did not yet create a new project, please follow the “Setup” steps described in the x86 article.
After you created a new project, let us get started by configuring your project for two things:
- The ability to assemble and link ASM files from the IDE
- Add a new x64 configuration for your project
Note: If you don’t want to start from scratch, I suggest you download the source code used in this article from here and modify the project / sources as you see fit.
Adding ASM files compilation support to your project
In order to instruct Visual Studio’s build process to assemble and link ASM files found in your project with ml or ml64, you need to do the following steps:
Step 1
Right click on your project’s name and choose “Build Dependencies” then “Build Customizations…”
Step 2
Add the MASM files customization support by checking the “masm (.targets, .props)” option:
That’s it!
Now if your project has .asm files, then building your project will, not only compile C/C++ files but, they will be assembled into object files (.obj) ready to be linked into your project. That is very good because it saves you the trouble of manually invoking ml64.exe or setting up pre and post build steps.
Adding the x64 project configuration
In Visual Studio, choose the “Build” menu and then select “Configuration Manager”. The following dialog will appear:
Now press on the “Active solution platform” combo and select “<New>”:
Now, select the “x64” platform and then choose to copy settings from the “Win32” platform.
Press OK and confirm that now you have “x64” listed under the “Platform” column:
Now press “Close” to finish!
Selecting the x64 configuration
Having created the configuration in the previous step, make sure that it is selected. One easy way is to navigate to the toolbar and select the “x64” configuration:
The other way to achieve that is to select it from the “Build/Configuration Manager” dialog:
Set up the debugging environments
It will be very handy to have the proper debug windows when intermixing ASM and C/C++ code together. Press F10 to start the debugging sessions and then enable the following windows from the “Debug/Windows” menu:
- Registers
- Memory
- Disassembly
Also note that you can press F9 to put breakpoints anywhere in executable code in the ASM source code.
Now that we are done with the set up, let me proceed and give you some background information about writing x64 code.
Basic Background
Here are some stuff that you need to keep in mind when writing x64 assembler code:
- You are no longer permitted to write inline assembler code. Instead you are obliged to write the ASM code in a separate ASM file then compile (with ML64) and link the file with your project.
- In many cases, compiler Intrinsics can be used to issue some machine instructions
- There are more general purpose registers in x64. Each register is 8 bytes long or one QWORD.
- The regular 32bit general purpose registers EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP become RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP in x64.
- In addition to the 8 registers, there are 8 more registers: R8, R9, R10, R11, R12, R13, R14, R15.
- Technically speaking the registers: “RAX,RCX,RDX, RBX, RSP, RBP, RSI, RDI” are aliases to registers R0 to R7.
- Each stack operation (say a PUSH/POP/CALL/RET/etc…) consumes a QWORD.
- There is only one calling convention supported and that is the “fast calling convention”
- Registers RCX, RDX, R8 and R9 are used to pass the first four arguments
- The remainder of the arguments must be pushed/passed on the stack
- Even though that the first four registers are passed via the RCX, RDX, R8 and R9 registers, the stack pointer should still be decreased by 8 * 4 = 32 bytes prior to a function call. That is called registers shadow area.
- The caller is responsible for allocating a shadow area for the 4 registers prior to calling a function, even if the callee takes no arguments.
- 16 bytes alignment is essential for various XMM instructions. Therefore, if you are going to use the Windows APIs or other foreign libraries, then ascertain that prior to a function call, the stack pointer is 16 bytes aligned (thus upon the called function’s entry).
- Do not mess up the registers values. The compiler expects certain non volatile registers to be preserved between function calls
- The non-volatile registers are: RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15. They must be preserved between function calls
- The volatile registers are: RAX, RCX, RDX, R8, R9, R10, R11
- Returning values:
- The RAX register is used to return integer values up to 64bits in size
- The XMM0 register is used to return non-scalar types (float, double, __m128, …)
- SEH can no longer be installed like in x86 by adding an exception record into the FS:0 list
- Special consideration for unwinding must be taken
- There will be an example in this article on how to write an ASM routine that can be caught inside your C code’s try/except
- Check the “Exception Handling (x64)” topic on MSDN for more information.For more information, check the “x64 Software Conventions” article on MSDN.
Do you want to master Batch Files programming? Look no further, the Batchography is the right book for you.
Available in print or e-book editions from Amazon.
Writing x64 assembly code
Let’s get started by creating an ASM file called “asm64code.asm”” with the following template body:
_DATA SEGMENT _DATA ENDS _TEXT SEGMENT _TEXT ENDS END
We will use that template to add our code slowly as we progress in this article.
Defining data
To define read/write data, you have to define the data within a read/write data segment. Here’s how to do it:
_DATA SEGMENT hello_msg db "Hello world", 0 info_msg db "Info", 0 _DATA ENDS
To define read-only and executable data, do that inside the text segment:
_TEXT SEGMENT hello_msg db "Hello world", 0 info_msg db "Info", 0 _TEXT ENDS
Referencing external data and functions
In the assembler code we are going to write, we will need to refer to functions residing in other modules (libraries or object files). We can use the EXTERN directive.
In the following C code, we define a few types that we will be referring to from the ASM code:
extern "C" { char c_ext_byte = 1; unsigned short c_ext_word = 2; long c_ext_dword = 3; __int64 c_ext_qword = 4; void *c_ext_ptr = (void *)(5); void c_ext_my_function(); }
Note that I wrap all those definitions inside an “extern C” just because I don’t want the names to be decorated (or mangled).
Now to make those data types accessible to our ASM code, we have to declare the proper EXTERN directives like this:
EXTERN c_ext_byte: byte EXTERN c_ext_word: word EXTERN c_ext_dword: dword EXTERN c_ext_qword: qword EXTERN c_ext_ptr: qword EXTERN c_ext_my_function: PROC EXTERN MessageBoxA: PROC EXTERN GetForegroundWindow: PROC
To put it into perspective, we can write the following ASM code to access this data:
PUBLIC access_extern_data access_extern_data PROC ; Dereference all the data according to each data's sizes mov al, byte ptr [c_ext_byte] mov ax, word ptr [c_ext_word] mov eax, dword ptr [c_ext_dword] mov rax, qword ptr [c_ext_qword] ; Remember, a pointer is just a QWORD mov rax, qword ptr [c_ext_ptr] ; Similarily, a function pointer is also a QWORD mov rax, offset c_ext_my_function sub rsp, 4 * 8 ; Register shadow stack call rax ; call the C function add rsp, 4 * 8 ; Restore the stack ret access_extern_data ENDP
Hello world in x64
Okay, now we are ready to write our hello world function in x64 ASM:
EXTERN MessageBoxA: PROC EXTERN GetForegroundWindow: PROC PUBLIC hello_world_asm hello_world_asm PROC push rbp ; save frame pointer mov rbp, rsp ; fix stack pointer sub rsp, 8 * (4 + 2) ; allocate shadow register area + 2 QWORDs for stack alignment ; Get a window handle call GetForegroundWindow mov rcx, rax ; WINUSERAPI int WINAPI MessageBoxA( ; RCX => _In_opt_ HWND hWnd, ; RDX => _In_opt_ LPCSTR lpText, ; R8 => _In_opt_ LPCSTR lpCaption, ; R9 => _In_ UINT uType); mov rdx, offset hello_msg mov r8, offset info_msg mov r9, 0 ; MB_OK and rsp, not 8 ; align stack to 16 bytes prior to API call call MessageBoxA ; epilog. restore stack pointer mov rsp, rbp pop rbp ret hello_world_asm ENDP
Things to note:
- Make sure you declare the proper EXTERNs so we can use the Windows APIs
- Create a prolog so we can modify the stack pointer and allocate local stack storage area
- Since this is handwritten assembler code, it is hard to allocate the proper amount of stack to achieve the right stack pointer alignment. Allocate a couple more QWORDs so we can adjust/align the stack pointer prior to foreign API calls that have the potential to invoke XMM instructions
- Align the stack pointer to 16 bytes prior to calling a foreign API
- Restore the stack pointer
- Return to the caller
Now to call this function from the C/C++ code, we have to make sure that it is defined as PUBLIC in the ASM code, and also declare its prototype in the C/C++ code:
void hello_world_asm(); int main(int argc, _char * argv[]) { hello_world_asm(); return 0; }
Now, had we not aligned the stack pointer to 16 bytes prior to calling “MessageBoxA”, then we have a chance of causing an access violation exception deep down in user32.dll when and if an XMM instruction is used:
Related articles:
- 7 DLL injection techniques in Microsoft Windows
- Introducing Ganxo v0.1 – An open source API hooking framework
- Detect executable format using Python
A simple function with two inputs and one out argument
We are going to write a very simple function that adds two number. It is the logical equivalent of the following C function:
__int64 c_add_by_ref(int a, int b, __int64 *r) { *r = a + b; return *r; }
In ASM x64, one way to write it is like this:
; __int64 add_by_ref(int a, int b, __int64 *r) add_by_ref PROC movsxd rax, ecx movsxd rdx, edx add rax, rdx ; result in RAX mov qword ptr [r8], rax ; store / deref ret add_by_ref ENDP
Working with structures
A common need when writing x64 assembler code is working with structures that are common to C/C++ and the ASM code.
Suppose the following C structure:
#pragma pack(push, 1) struct MyStruct { char b; void *buf; void (WINAPI *fptr)(int a); __int64 sig; }; #pragma pack(pop)
Let us now assume there’s an assembler function called “init_struct()” that initializes this structure and populates it such as the C code can use it.
It’s C prototype is:
void init_struct(MyStruct *st);
Now, here’s an example syntax on how to define the equivalent structure in ASM:
MyStruct struct b db ? buf dq ? fptr dq ? sig dq ? MyStruct ends
…followed by the implemention of the “init_struct()” in assembler
; void init_struct(MyStruct *st); init_struct PROC ; st->b = 1 mov byte ptr MyStruct.b[rcx], 1 ; st->buf = &buf mov rax, offset buf mov qword ptr MyStruct.buf[rcx], rax ; st->fptr = fptr lea rax, non_public_func mov qword ptr MyStruct.fptr[rcx], rax ; st->sig = value mov rax, 0badbeefbabeh mov qword ptr MyStruct.sig[rcx], rax ret init_struct ENDP
Unwindable ASM function
In this last section, I will illustrate how to write an ASM function that is unwindable in the event of an exception occurring.
First, let me write a non-unwidable function and illustrate what the problem could be.
Assume the following ASM function that generates an access violation exception:
; AV generating function cause_av_bad PROC push rbp mov rbp, rsp push 1 push 2 push 3 sub rsp, 0400h xor rax, rax mov rax, [rax] ; cause A/V add rsp, (8 * 3) + 0400h mov rsp, rbp pop rbp ret cause_av_bad ENDP
Note that this dummy function modifies the stack pointer. By doing that, and without properly generating unwind information, it will make this function unwindable, thus making the exception non resumable.
Now, let’s try to invoke this function from our C code:
void call_cause_av_bad() { __try { printf("Calling faulty ASM function...\n"); cause_av_bad(); } __except(EXCEPTION_EXECUTE_HANDLER) { printf("caught exception...\n"); } }
And we get this exception that is not resumable. It will keep on triggering and the “__except” block will not be reached:
Now I will illustrate how to write an ASM function with proper unwind information:
cause_av_good PROC FRAME push rbp ; save previous frame pointer .pushreg rbp ; encode unwind info mov rbp, rsp ; set new frame pointer .setframe rbp, 0 ; encode frame pointer .endprolog ; feel free to modify the stack now push 1 push 2 push 3 sub rsp, 0400h xor rax, rax mov rax, [rax] ; cause AV add rsp, (8 * 3) + 0400h mov rsp, rbp pop rbp ret cause_av_good ENDP
The difference here is how the ASM procedure definition (note the additional FRAME keyword after the PROC), and the unwind information directives (.pushreg, .setframe, and .endprolog).
The unwind directives encode which register is the frame pointer and what are the stack operations that took place before the frame pointer was set. This information is vital when an exception occurs and the exception handler is dispatched.
Here’s another example showing how to encode other stack pointer manipulation prior to setting the frame pointer:
cause_av_good2 PROC FRAME sub rsp, 020h ; allocate stack space .allocstack 020h ; encode that change push rbp ; save old frame pointer .pushreg rbp ; encode stack operation mov rbp, rsp ; set new frame pointer .setframe rbp, 0 ; encode frame pointer .endprolog ; any stack pointer modifications here on are okay... sub rsp, 080h ; we can unwind from the following AV because of the frame pointer xor rax, rax mov rax, [rax] ; cause AV ; properly restore the stack pointer (in case exception did not happen or the handler corrected the situation) mov rsp, rbp pop rbp add rsp, 020h ret cause_av_good2 ENDP
Now, in both cases, if we call either of the previous two ASM functions, then the C exception handler will graceful know how to unwind:
void call_cause_av_good() { __try { printf("Calling faulty ASM function that has unwind info...\n"); cause_av_good(); } __except (EXCEPTION_EXECUTE_HANDLER) { printf("caught exception...\n"); } printf("function returned!\n"); }
Observe how the function executes all the way to the end:
Reference and useful links
- Annotated x64 Disassembly
- x64 Architecture
- How to make proper prologue and epilogue
- Simple exception handler
- Programming against the x64 exception handling support, part 1: Definitions for x64 versions of exception handling support
- win64 Structured Exception Handling
- .SETFRAME reference
- Exceptional Behavior – x64 Structured Exception Handling
- x64 Exception Handling Port
Conclusion
In conclusion, I hope that this article gave you a head start as promised. Please make sure you go to my GitHub repository and download the source code for this project.
You may also like:
It would help if the author clarifies what content goes into each file.
For instance, what exactly is the full content of asm64.asm? Where does the extern “C”{} code go? In the asm file?
As of now, for the reader, it is completely unclear how many files are needed, C file, asm file, etc., how many of these are to be loaded into the project in order to recreate the “Hello World”.
Hello user,
I am the author of this article.
Try downloading the source code mentioned in the article from GitHub to see the big picture.
Let me know if that helps please.
Yes, actually, I hadn’t looked down in the page where the link was provided. Thank you for this.
I have been able to compile the code and step through it in Visual studio 2013.
BTW, can you suggest any good book for 64 bit visual studio programming?
I am working my way through Dantemann, Pappas and Abrash’s series of 3 books, but while they are good from a conceptual POV, the code provided there does not compile in VS.
64 bit programming in general? I am not sure if there is anything specific you have to worry about. If you use C/C++ then you should be set.
OTOH, you might find this article useful: https://www.viva64.com/en/l/
No, the git didn’t work. I could not find a way to get the code. All I can see is a thing about accessing arrays.
The assembler says “error A2034: must be in segment block : access_extern_data” followed by 10 more of the same error and “fatal error A1010: unmatched block nesting : access_extern_data” at the end. I am on Win10 using VS2017 in VS2015 target. As the previous User suggested a bit of editing, saying what goes where, would turn this into a useful tutorial.
I don’t know what happened with git from your end. Anyway, that’s the code: https://github.com/lallousx86/AsmInVs/archive/master.zip
On normal return the last example (using .allocstack) would have restored a bogus value for rbp (because the restore order does not match the save order). The save order on the stack is:
– decrement RSP by 20h
– save RBP
– decrement RSP by 80h
The restoration order should be the opposite of that, but it instead does:
– increment RSP by (20h+80h)
– restore RBP
Thank you, fixed!
This was so helpful – exactly what I needed. Thanks!
Very helpful. Thanks for taking the time to put this together.
Many thx for the details!
Was a strong job to convert some image processing asm code’s to 64bit , many are realized in c++ but the long asm passage’s are much much faster 🙂