Introduction to writing x86 assembly code in Visual Studio

Hello,

In this technical blog post, I am going to give you a head start on how to write assembler code and compile it directly from the Visual Studio IDE.

If you are interested in x64, please check this article.

image

You are expected to be familiar with:

  • The Intel x86 assembly language and writing basic assembly code
  • Understanding of various calling conventions (stdcall, cdecl and fastcall)

Okay, let’s get started!

flower separator
batchography-good-resDo you want to master Batch Files programming? Look no further, the Batchography is the right book for you.

Available in print or e-book editions from Amazon.
flower separator

The setup

First, let us create a new console application:

  1. Run Visual Studio (or press Windows-Key+R, then type “devenv”)
  2. Choose “File / New / New project”
  3. Choose “Templates / C++” then “Win32 console application”

image

Now when you debug the code, it will be useful for you to show the registers, disassembly (when needed) and the memory window.

To show those windows, go to the “Debug / Windows” submenu:

image

This is how my windows layout is when I am debugging the sample code in this article:

image

  1. The memory window: it is useful to inspect contents of memory pointed by the registers or pointers
  2. The registers window: useful to see the values of general purpose registers
  3. The source code window
  4. The disassembly window, docked next to the source code window. When debugging you get to see the disassembly code equivalent of your C code
  5. The “Auto” window: it will automatically show relevant variables around the currently executing statement
  6. The call stack window

Basic background

Here are some stuff that you need to keep in mind when writing assembler code:

  1. Do not mess up the registers values. The compiler expects certain non volatile registers to be preserved between function calls
  2. All registers must be preserved except for EAX, EDXECX and ESP.
  3. On x86, the EDX:EAX register pair are used to return 64 bits integer values. The register EAX alone is used to return a 32 bits integer value
  4. Do not mess the stack or frame pointer. You can modify those registers but make sure you restore them
  5. Respect calling conventions if you call functions or if you are called
    1. When calling cdecl functions make sure you purge the stack upon returning
    2. When being called as an stdcall function, make sure you purge the stack before returning

For more information, check the “x86 Architecture” article on MSDN.

Writing x86 assembly code

The main focus of this post will be how to write inline assembler code inside your C/C++ functions. I will cover:

  • Inline assembler code, mixed with C code in the same function
  • Naked functions containing purely assembler code

I will be giving a handful of examples. I will start with easy ones and end with hard ones. Please remember that the presented code is for illustration purposes only and is not meant for production.

Mixing C and Assembler

To write inline assembler code, you just have to use the __asm keyword, open a code block then close the block to terminate the inline assembler code.

It is also possible to use “__asm ASM x86_instruction” to emit one instruction at a time (without opening a block).

Another way to emit machine code is to use the “__asm __emit byte_value” to emit an arbitrary machine code. If the emitted machine byte codes do not correspond to valid instructions then your code may cause exceptions.

Now, let me dive in directly by writing examples and explaining them.

Example 1 – Hello world from Win32 inline assembler

In the following example, we write the assembler equivalent of the following C code:

MessageBoxA(GetForegroundWindow(), msg, info, MB_OK);

void hello_world_asm()
{
  static const char* msg = "Hello world from x86 assembly language!";
  static const char* info ="Hello";

  __asm
  {
    push MB_OK
    push info
    push msg
    
    call GetForegroundWindow
    push eax

    call MessageBoxA
  }
}
  • Lines 1,2,17 – Normal function declaration constructs
  • Lines 3, 4 – Declaring a couple of constant C strings
  • Lines 6,7, 16 – The inline assembler block
  • Lines 8 , 9, 10 – correspond to the last 3 arguments of MessageBoxA (namely: uType, lpCaption, and lpText)
  • Line 12 – Calls the GetForegroundWindow(). The result is an HWND stored in EAX
  • Line 13 – Pass the last remaining argument to MessageBoxA (the hWnd)
  • Line 15 – Invoke MessageBoxA

This is the most simple example.

Example 2 – The calculate function

In this example, a function called “calculate” is implemented in order to illustrate the following:

  • How to mix C and assembler code
  • How to access C variables in the assembler code block
  • How to define assembler labels and branch to them
  • How to preserve and restore registers

Let’s get started:

int calculate(int count)
{
  int result;
  if (count <= 0)
    count = 1;

  __asm
  {
      push eax
      push ebx
      push edx
      mov eax, 0x11223344
      mov ebx, 0x55667788
      xor edx, edx
      mov ecx, count
    L_REPEAT:
      test ecx, 1
      jz L_ODD
      add edx, eax
      ror eax, 1
      jmp L_CONTINUE
    L_ODD:
      add edx, ebx
      rol edx, 1
    L_CONTINUE:
      loop L_REPEAT

      mov result, edx
      pop edx
      pop ebx
      pop eax
  }
  return result;
}

void test_calculate()
{
  for (int i = 1; i < 10; i++)
    printf("calculate(%d)=%x\n", i, calculate(i));
}
  • Lines 4-5: validate the ‘count’ input
  • Lines 8-32: declare the inline assembly block
  • Lines 9-11: save registers
  • Line 28: move the result from the ‘edx’ register to the local variable ‘result’
  • Lines 29-31: restore the saved registers. Not that we don’t really have to restore or save EAX/EDX since they are volatile.
  • Lines 16, 21, 25 – Declare some labels. The “L_” prefix is not mandatory. It is a convention I use. Just make sure you don’t use label names that coincide with reserved keywords or assembler mnemonics
  • Line 15: loads the ‘count’ into ECX
  • Line 26: uses the “loop” instruction to repeat the iteration.
  • Line 17: the code checks the count variable (which was loaded into ‘ecx’) to see if it is odd or even and take a different branch accordingly.

We use the “test_calculate” function to check that the code works and outputs the same values for the same input.

Example 3 – String scrambling

In this example, I will illustrate how to manipulate an input string and return a newly allocated string. I will cover:

  1. How to do basic string manipulation from the assembler code
  2. Calling standard C library functions (strlen and malloc)
  3. How to walk a string character by character and do a single character XOR operation

The code snippet:

char *scramble_string(
  const char *src, 
  unsigned char k)
{
  char *result = NULL;
  __asm
  {
      push edi
      push esi
      push edx

      // get string length and bail out if zero
      push src
      call strlen
      add   esp, 4
      test eax, eax
      jz L_EXIT

      // malloc(strlen + 1)
      inc eax
      push eax
      call malloc
      add esp, 4 // cdecl, clean the stack
      test eax, eax // NULL pointer? exit
      jz L_EXIT

      mov edi, eax
      mov result, eax

      mov esi, src
      movzx edx, k
    L_NEXT:
      lodsb
      test al, al
      jz L_DONE
      xor al, dl
      stosb
      jmp L_NEXT
    L_DONE:
      mov al, 0
      stosb
    L_EXIT:
      pop edx
      pop esi
      pop edi
  }
  return result;
}

//--------------------------------------------------------------------------
void test_scramble()
{
  char *enc = scramble_string("hello world", 0xeb);
  if (enc != NULL)
  {
    char *dec = scramble_string(enc, 0xeb);
    free(enc);
    if (dec != NULL)
      free(dec);
  }
}
  • Lines 1-3: the function declaration
  • Lines 8-10: save registers
  • Lines 43-45: restore registers
  • Lines 13-17: checks the input string length and bail out if it was zero
  • Lines 20-25: allocate a string as long as the input string length plus one (for the zero termination character). Bail out if malloc() returns NULL
  • Line 27: Load into ‘edi’ the output buffer. This is the newly allocated buffer
  • Line 28: Store that allocated buffer into ‘result’. We return that buffer to the caller
  • Line 30: Load into ‘esi’ the input buffer
  • Line 31: Load the scrambling key into ‘edx’ (zero extend)
  • Line 33: Use the ‘lodsb’ instruction to load a byte at ‘esi’ to the ‘al’ register and then increase the ‘esi’ register
  • Lines 34-35: if the character is the null character then this is the end of the string –> bail out
  • Line 36: XOR the string character with the key and
  • Line 37: Use the ‘stosb’ to store the new ‘al’ value into the memory at ‘edi’ then increment the ‘edi’ register
  • Line 39-41: At the end, store the null termination at the end of the string

The ‘test_scramble’ function is used to scramble “hello world”, then descramble the result back into the original value. Since the ‘scramble’ function returns a new buffer using the ‘malloc’ function, we have to free the buffer with ‘free’ function

Example 4 – Generating code dynamically

This is a long but comprehensive example which demonstrates how to generate code dynamically and then call it.

I will illustrate the concept of binding arguments to a function and generating a new function that takes no arguments but knows how to run with the previously bound arguments.

The following points will be covered:

  • How to write assembler code for use as a template code and then relocate / patch that code
  • How to emit byte codes (using the __asm __emit)
  • How to allocate executable memory and put function code in that executable memory
  • How to implement basic closures in C/C++

The code generator is written in the cf1bind_t class. This class binds a C function with one argument of your choice. The result is a function pointer that you can call any time.

First, let me start by showing you the example code:

void __cdecl print_msg(char *msg)
{
  printf("%s", msg);
}

//--------------------------------------------------------------------------
void test_func_bind()
{
  cf1bind_t *say_hi = cf1bind_t::make(print_msg, "hi!\n");
  cf1bind_t *say_bye = cf1bind_t::make(print_msg, "bye!\n");

  say_hi->call();
  say_bye->call();

  delete say_bye;
  delete say_hi;
}

 

In this example code, we defined a function called print_msg that takes one argument and prints it to the console.

Then the test_func_bind function that binds the print_msg and generates two functions say_hi and say_bye.

Each of those generated functions are bound to the arguments passed to the cf1_bind_t::make function.

Later, we can invoke the newly generated functions. When they are called, they will invoke print_msg (the bound function) with the corresponding bound argument.

Now, let us look at the implementation of the cf1bind_t class:

 

//--------------------------------------------------------------------------
// Creates a new function that generates a new function that no arguments
// but calls the bound function with its bound arguments
class cf1bind_t
{
private:
  void *func_body;
  size_t func_size;

  // Generate a function that calls c_func1 with the passed parameter
  static void *dyanmic_code(
    void *c_func1,
    void *param1,
    size_t *func_size)
  {
    //
    // Use static variables so we initialize them once
    //
    
    // The beginning of the asm code
    static PBYTE begin_addr = NULL;

    // Some offsets into the code and the code size
    static size_t code_size, param_offset, xfer_offset;

    // Grab the assembler code once
    if (begin_addr == NULL)
    {
      PBYTE end, param, xfero;
      __asm
      {
        jmp L_END
        L_BEGIN :
      L_PARAM_PLACEHOLDER:
        push 0x19810417 // 0x68 XXXXXXXX (5 bytes)
        call L_XFER_PLACEHOLDER
        add  esp, 4 // purge the stack
        ret
      L_XFER_PLACEHOLDER:
        push 0xaabbccdd
        ret
      }
      // Emit some free style bytes.
      // Use the __asm __emit to generate opcodes not supported by the
      // assembler or perhaps data
      __asm __emit 0x44 // 'D'
      __asm __emit 0x55 // 'U'
      __asm __emit 0x4D // 'M'
      __asm __emit 0x4D // 'M'
      __asm __emit 0x59 // 'Y'
      __asm
      {
      L_END:
        mov begin_addr, offset L_BEGIN
        mov param, offset L_PARAM_PLACEHOLDER
        mov xfero, offset L_XFER_PLACEHOLDER
        mov end, offset L_END
      }
      code_size = end - begin_addr;
      param_offset = param - begin_addr + 1;
      xfer_offset = xfero - begin_addr + 1;
    }

    // Allocate memory for the function body. Make writable but not executable (not yet)
    // Note: (demo code) it is not optimal to allocate a page per bound function
    PBYTE func_memory = (PBYTE)VirtualAlloc(
      NULL,
      code_size,
      MEM_RESERVE | MEM_COMMIT,
      PAGE_READWRITE);

    if (func_memory == NULL)
      return NULL;

    // Copy the function
    memcpy(
      func_memory,
      begin_addr,
      code_size);

    // Customize the function body
    *((DWORD_PTR *)(func_memory + xfer_offset)) = (DWORD_PTR)c_func1;
    *((DWORD_PTR *)(func_memory + param_offset)) = (DWORD_PTR)param1;

    // Lock the memory and make executable only
    DWORD old_protect;
    if (VirtualProtect(
        func_memory, 
        code_size, 
        PAGE_EXECUTE, 
        &old_protect) == FALSE)
    {
      VirtualFree(func_memory, code_size, MEM_RELEASE);
      return NULL;
    }

    *func_size = code_size;
    return func_memory;
  }

  void free_func()
  {
    if (func_body == NULL)
      return;

    VirtualFree(
      func_body, 
      func_size, 
      MEM_RELEASE);
    func_body = NULL;
  }
public:
  cf1bind_t() : func_body(NULL), func_size(0)
  {
  }

  bool bind(void *c_func1, void *arg1)
  {
    free_func();
    func_body = dyanmic_code(
      c_func1, 
      arg1, 
      &func_size);
    
    return func_body != NULL;
  }

  void call()
  {
    typedef void(__cdecl *callfunc_t)(void);
    ((callfunc_t)func_body)();
  }

  ~cf1bind_t()
  {
    free_func();
  }

  static cf1bind_t *make(
    void *c_func1, 
    void *arg1)
  {
    cf1bind_t *f = new cf1bind_t();
    if (!f->bind(c_func1, arg1))
    {
      delete f;
      return NULL;
    }
    return f;
  }
};

 

Explanation:

  • Lines 1-8: Define the cf1bind_t class with two member variables.
    • func_body: points to the newly generated function body
    • func_size: the size of the generated function
  • Line 10: Declare the dynamic_code() function. This function generates the bound function. It takes three arguments:
    • in: function to bind
    • in: function argument to bind with the function
    • out: the generated function size
  • Lines 19-25: Declares a few static variables that are used to remember characteristic of the inlined ASM code. I chose static because I want to initialize those variables once per the program’s life time.
  • Lines 27 to 62: one time initialization for the inlined ASM code template
  • Line 32: skip over the inlined ASM template
  • Lines 33-41: Define the ASM code template
    • Declare some labels that we can later patch-in proper opcodes and values
    • Declare a proper dynamic function body. This body will be the actual bound function body
    • All the locations at the labels will be later updated with correct values
  • Lines 36: Issue a relative call for another location that will do a pseudo far/absolute call
  •  Lines 40-41: A pseudo absolute call by using the “push ADDRESS/ret” construct
  • Lines 46-50: emit some bytes. This is just for demonstration purpose
  • Lines 54-57: Take the offsets of all the needed ASM code location and store into local variables. Sometimes we take the location at the label (the address of the instruction itself) and sometimes we take the location of the operand of the assembler code
  • Line 66: Allocate Read/Write memory. This will hold the code of the bound function
  • Line 76: Copy the code template to the allocated memory location
  • Lines 82 and 83: Patch in both the address of the function to call and its bound parameter. This will specialize the ASM code template so that it will always calls the bound function with the bound parameter
  • Line 85: Make memory executable only. We should not keep R/W/X memory pages if we don’t have to.
  • Line 101: Define the free_func() function that frees the bound function
  • Line 117: Define the bind() function. This function calls dynamic_code() to generate a new bound function and remember the function’s size and body address into the cf1bind_t class member variables (func_body and func_size).
  • Line 128: Define the call() function. This function casts the func_body member variable into a void function that takes no argument and then calls it. When called, execution will go to the specialized ASM code template that will pass the previously bound argument to the bound function
  • Line 138: Defines the class static function make() that returns a new instance of cf1bind_t class that is ready to be used with the newly constructed bound function.

In short, the cf1bind_t class does the following:

  • Declare ASM code template that PUSHes an argument and CALLs a function
  • Creates new memory page, copies the ASM code template, patches in the correct argument and call address
  • Lock the memory by making it only executable
  • That’s it! Now you can call the newly generated function body

Notes: For production code, use HeapCreate() with HEAP_CREATE_ENABLE_EXECUTE and later use HeapAlloc(). This will avoid allocating one page (normally 4kb) per bound function

Naked function with assembler code only

I wrote just one example in this section. To declare a pure assembler function, you have to specify that this function is “naked” like this:

return_type __declspec(naked) calling_convention Function_Name(Parameters)

Inside the naked function, you are allowed to write as many “__asm” blocks as you want and you are allowed to declare variables.

size_t __declspec(naked) __stdcall sum_buf(
  const void *buf, 
  size_t count = size_t(-1))
{
  __asm
  {
    // Prolog <
    push ebp
    mov ebp, esp // >

    push esi
    push ecx
    push ebx

    mov esi, buf
    mov ecx, count
    cmp ecx, -1
    jnz L_START
    
    // strlen(buf) <
    push esi
    call strlen
    add esp, 4 // >

    mov ecx, eax
  L_START:
    xor eax, eax
    xor edx, edx
  L_LOOP:
    lodsb
    add edx, eax
    loop L_LOOP

    // Return value <
    mov eax, edx
    xor edx, edx // >

    // Restore saved registers <
    pop ebx
    pop ecx
    pop esi // >
    
    // Epilog <
    mov esp, ebp 
    pop ebp // >
    
    ret 4*2
  }
}

//--------------------------------------------------------------------------
void test_sum_buf()
{
  if (sum_buf("\x01\x02\x03", 3) != 6)
  {
    printf("bad function!\n");
    return;
  }

  const char *buf = "calling a naked asm x86 function";
  printf("the sum of characters in '%s' is %d\n", buf, sum_buf(buf));
}

 

Explanation:

  • Lines 1-3: Declare the naked function sum_buf. Notice the use of the declspec(naked) keyword
  • Lines 8 and 9: Declare the function prolog. We need to do that
  • Lines 11-13: Save work registers. We need to restore them later
  • Lines 15-18: If the passed count argument is -1 then call strlen to compute the actual buffer length as a C terminated string
  • Line 25: Store the counter value in ECX. We will be using the LOOP instruction to loop
  • Lines 27-28: Start of loop. Clear EAX which will be used by LODSB, and clear EDX that will hold the checksum of each byte in the buffer
  • Lines 30-32: Use LODSB to load a single byte into EAX, add the byte value to EDX and then loop again
  • Line 35: Store EDX into EAX. EAX will be return value of the function (per the __stdcall convention).
  • Lines 39-41: Restored work registers
  • Lines 44-45: Function epilog
  • Line 47: Purge the stack (per the __stdcall convention). Since the function takes two arguments, we need to purge sizeof(DWORD) * 2
  • Lines 52: declares a function to test our naked ASM function

 

Final notes

I hope you learned something new from this post. It is by no means meant to teach all the tricks but hopefully it was enough. Please download the source code associated with this article from my GitHub repository.

flower separatorYou might also like:

Leave a Reply