In this technical blog post, I am going to give you a head start on how to write assembler code and compile it directly from the Visual Studio IDE.
If you are interested in x64, please check this article.
You are expected to be familiar with:
- The Intel x86 assembly language and writing basic assembly code
- Understanding of various calling conventions (stdcall, cdecl and fastcall)
Okay, let’s get started!
Do you want to master Batch Files programming? Look no further, the Batchography is the right book for you.
- Run Visual Studio (or press Windows-Key+R, then type “devenv”)
- Choose “File / New / New project”
- Choose “Templates / C++” then “Win32 console application”
Now when you debug the code, it will be useful for you to show the registers, disassembly (when needed) and the memory window.
To show those windows, go to the “Debug / Windows” submenu:
This is how my windows layout is when I am debugging the sample code in this article:
- The memory window: it is useful to inspect contents of memory pointed by the registers or pointers
- The registers window: useful to see the values of general purpose registers
- The source code window
- The disassembly window, docked next to the source code window. When debugging you get to see the disassembly code equivalent of your C code
- The “Auto” window: it will automatically show relevant variables around the currently executing statement
- The call stack window
Here are some stuff that you need to keep in mind when writing assembler code:
- Do not mess up the registers values. The compiler expects certain non volatile registers to be preserved between function calls
- All registers must be preserved except for EAX, EDX, ECX and ESP.
- On x86, the EDX:EAX register pair are used to return 64 bits integer values. The register EAX alone is used to return a 32 bits integer value
- Do not mess the stack or frame pointer. You can modify those registers but make sure you restore them
- Respect calling conventions if you call functions or if you are called
- When calling cdecl functions make sure you purge the stack upon returning
- When being called as an stdcall function, make sure you purge the stack before returning
For more information, check the “x86 Architecture” article on MSDN.
Writing x86 assembly code
The main focus of this post will be how to write inline assembler code inside your C/C++ functions. I will cover:
- Inline assembler code, mixed with C code in the same function
- Naked functions containing purely assembler code
I will be giving a handful of examples. I will start with easy ones and end with hard ones. Please remember that the presented code is for illustration purposes only and is not meant for production.
Mixing C and Assembler
To write inline assembler code, you just have to use the __asm keyword, open a code block then close the block to terminate the inline assembler code.
It is also possible to use “__asm ASM instruction” to emit one instruction at a time (without opening a block).
Another way to emit machine code is to use the “__asm __emit byte_value” to emit an arbitrary machine code. If the emitted machine byte codes do not correspond to valid instructions then your code may cause exceptions.
Now, let me dive in directly by writing examples and explaining them.
Example 1 – Hello world from Win32 inline assembler
In the following example, we write the assembler equivalent of the following C code:
MessageBoxA(GetForegroundWindow(), msg, info, MB_OK);
- Lines 1,2,17 – Normal function declaration constructs
- Lines 3, 4 – Declaring a couple of constant C strings
- Lines 6,7, 16 – The inline assembler block
- Lines 8 , 9, 10 – correspond to the last 3 arguments of MessageBoxA (namely: uType, lpCaption, and lpText)
- Line 12 – Calls the GetForegroundWindow(). The result is an HWND stored in EAX
- Line 13 – Pass the last remaining argument to MessageBoxA (the hWnd)
- Line 15 – Invoke MessageBoxA
This is the most simple example.
Example 2 – Calculate function
In this example, a function called “calculate” is implemented in order to illustrate the following:
- How to mix C and assembler code
- How to access C variables in the assembler code block
- How to define assembler labels and branch to them
- How to preserve and restore registers
Let’s get started:
- Lines 4-6: validate the ‘count’ input
- Lines 8-32: declare the inline assembly block
- Lines 10-12: save registers
- Line 29: move the result from the ‘edx’ register to the local variable ‘result’
- Lines 30-33: restore the saved registers. Not that we don’t really have to restore or save EAX/EDX since they are volatile.
- Lines 17, 23, 26 – Declare some labels. The “L_” prefix is not mandatory. It is a convention I use. Just make sure you don’t use label names that coincide with reserved keywords or assembler mnemonics
- Line 16: loads the ‘count’ into ECX
- Line 27: uses the “loop” instruction to repeat the iteration.
- Line 18-25: the code checks the count variable (which was loaded into ‘ecx’) to see if it is odd or even and take a different branch accordingly.
We use the “test_calculate” function to check that the code works and outputs the same values for the same input.
Example 3 – String scrambling
In this example, I will illustrate how to manipulate an input string and return a newly allocated string. I will cover:
- How to do basic string manipulation from the assembler code
- Calling standard C library functions (strlen and malloc)
- How to walk a string character by character and do a single character XOR operation
- Lines 1-4: the function declaration
- Lines 8-10: save registers
- Lines 43-45: restore registers
- Lines 12-17: checks the input string length and bail out if it was zero
- Lines 20-25: allocate a string as long as the input string length plus one (for the zero termination character). Bail out if malloc() returns NULL
- Line 26: Load into ‘edi’ the output buffer. This is the newly allocated buffer
- Line 27: Store that allocated buffer into ‘result’. We return that buffer to the caller
- Line 30: Load into ‘esi’ the input buffer
- Line 31: Load the scrambling key into ‘edx’ (zero extend)
- Line 33: Use the ‘lodsb’ instruction to load a byte at ‘esi’ to the ‘al’ register and then increase the ‘esi’ register
- Lines 34-35: if the character is the null character then this is the end of the string –> bail out
- Line 36: XOR the string character with the key and
- Line 37: Use the ‘stosb’ to store the new ‘al’ value into the memory at ‘edi’ then increment the ‘edi’ register
- Line 39-41: At the end, store the null termination at the end of the string
The ‘test_scramble’ function is used to scramble “hello world”, then descramble the result back into the original value. Since the ‘scramble’ function returns a new buffer using the ‘malloc’ function, we have to free the buffer with ‘free’ function
Example 4 – Generating code dynamically
This is a long but comprehensive example which demonstrates how to generate code dynamically and then call it.
I will illustrate the concept of binding arguments to a function and generating a new function that takes no arguments but knows how to run with the previously bound arguments.
The following points will be covered:
- How to write assembler code for use as a template code and then relocate / patch that code
- How to emit byte codes (using the __asm __emit)
- How to allocate executable memory and put function code in that executable memory
- How to implement basic closures in C/C++
The code generator is written in the cf1bind_t class. This class binds a C function with one argument of your choice. The result is a function pointer that you can call any time.
First, let me start by showing you the example code:
In this example code, we defined a function called print_msg that takes one argument and prints it to the console.
Then the test_func_bind function that binds the print_msg and generates two functions say_hi and say_bye.
Each of those generated functions are bound to the arguments passed to the cf1_bind_t::make function.
Later, we can invoke the newly generated functions. When they are called, they will invoke print_msg (the bound function) with the corresponding bound argument.
Now, let us look at the implementation of the cf1bind_t class:
- Lines 1-8: Define the cf1bind_t class with two member variables.
- func_body: points to the newly generated function body
- func_size: the size of the generated function
- Line 10: Declare the dynamic_code function. This function generates the bound function. It takes three arguments:
- in: function to bind
- in: function argument to bind with the function
- out: the generated function size
- Lines 19-23: Declares a few static variables that are used to remember characteristic of the inlined ASM code. I chose static because I want to initialize those variables once per the program’s life time.
- Lines 23 and 61: on time initialization for the inlined ASM code template
- Line 31: skip over the inlined ASM template
- Lines 33-40: Define the ASM code template
- Declare some labels that we can later patch-in proper opcodes and values
- Declare a proper dynamic function body. This body will be the actual bound function body
- All the locations at the labels will be later updated with correct values
- Lines 35: Issue a relative call for another location that will do a pseudo far/absolute call
- Lines 39-40: A pseudo absolute call by using the “push ADDRESS/ret” construct
- Lines 40-50: emit some bytes. This is just for demonstration purpose
- Lines 53-56: Take the offsets of all the needed ASM code location and store into local variables. Sometimes we take the location at the label (the address of the instruction itself) and sometimes we take the location of the operand of the assembler code
- Line 65: Allocate Read/Write memory. This will hold the code of the bound function
- Line 75: Copy the code template to the allocated memory location
- Lines 81 and 82: Patch in both the address of the function to call and its bound parameter. This will specialize the ASM code template so that it will always calls the bound function with the bound parameter
- Line 85: Make memory executable only. We should not keep R/W/X memory pages if we don’t have to.
- Line 100: Define the free_func() function that frees the bound function
- Line 116: Define the bind() function. This function calls dynamic_code() to generate a new bound function and remember the function’s size and body address into the cf1bind_t class member variables (func_body and func_size).
- Line 127: Define the call() function. This function casts the func_body member variable into a void function that takes no argument and then calls it. When called, execution will go to the specialized ASM code template that will pass the previously bound argument to the bound function
- Line 138: Defines the class static function make() that returns a new instance of cf1bind_t class that is ready to be used with the newly constructed bound function.
In short, the cf1bind_t class does the following:
- Declare ASM code template that PUSHes an argument and CALLs a function
- Creates new memory page, copies the ASM code template, patches in the correct argument and call address
- Lock the memory by making it only executable
- That’s it! Now you can call the newly generated function body
Notes: For production code, use HeapCreate() with HEAP_CREATE_ENABLE_EXECUTE and later use HeapAlloc(). This will avoid allocating one page (normally 4kb) per bound function
Naked function with assembler code only
I wrote just one example in this section. To declare a pure assembler function, you have to specify that this function is “naked” like this:
return_type __declspec(naked) calling_convention Function_Name(Parameters)
Inside the naked function, you are allowed to write as many “__asm” blocks as you want and you are allowed to declare variables.
- Lines 2-4: Declare the naked function sum_buf. Notice the use of the declspec(naked) keyword
- Lines 8-10: Declare the function prolog. We need to do that
- Lines 12-14: Save work registers. We need to restore them later
- Lines 16-24: If the passed count argument is -1 then call strlen to compute the actual buffer length as a C terminated string
- Line 26: Store the counter value in ECX. We will be using the LOOP instruction to loop
- Lines 27-29: Start of loop. Clear EAX which will be used by LODSB, and clear EDX that will hold the checksum of each byte in the buffer
- Lines 30-33: Use LODSB to load a single byte into EAX, add the byte value to EDX and then loop again
- Line 36: Store EDX into EAX. EAX will be return value of the function (per the __stdcall convention).
- Lines 39-43: Restored work registers
- Lines 44-46: Function epilog
- Line 48: Purge the stack (per the __stdcall convention). Since the function takes two arguments, we need to purge sizeof(DWORD) * 2
- Lines 52: Tests declares a function to test our naked ASM function
I hope you learned something new from this post. It is by no means meant to teach all the tricks but hopefully it was enough. Please download the source code associated with this article from my GitHub repository.
You might also like: