Introduction to writing x86 assembly code in Visual Studio

Hello,

In this technical blog post, I am going to give you a head start on how to write assembler code and compile it directly from the Visual Studio IDE.

If you are interested in x64, please check this article.

image

You are expected to be familiar with:

  • The Intel x86 assembly language and writing basic assembly code
  • Understanding of various calling conventions (stdcall, cdecl and fastcall)

Okay, let’s get started!

flower separator
batchography-good-resDo you want to master Batch Files programming? Look no further, the Batchography is the right book for you.

Available in print or e-book editions from Amazon.
flower separator

The setup

First, let us create a new console application:

  1. Run Visual Studio (or press Windows-Key+R, then type “devenv”)
  2. Choose “File / New / New project”
  3. Choose “Templates / C++” then “Win32 console application”

image

Now when you debug the code, it will be useful for you to show the registers, disassembly (when needed) and the memory window.

To show those windows, go to the “Debug / Windows” submenu:

image

This is how my windows layout is when I am debugging the sample code in this article:

image

  1. The memory window: it is useful to inspect contents of memory pointed by the registers or pointers
  2. The registers window: useful to see the values of general purpose registers
  3. The source code window
  4. The disassembly window, docked next to the source code window. When debugging you get to see the disassembly code equivalent of your C code
  5. The “Auto” window: it will automatically show relevant variables around the currently executing statement
  6. The call stack window

Basic background

Here are some stuff that you need to keep in mind when writing assembler code:

  1. Do not mess up the registers values. The compiler expects certain non volatile registers to be preserved between function calls
  2. All registers must be preserved except for EAX, EDXECX and ESP.
  3. On x86, the EDX:EAX register pair are used to return 64 bits integer values. The register EAX alone is used to return a 32 bits integer value
  4. Do not mess the stack or frame pointer. You can modify those registers but make sure you restore them
  5. Respect calling conventions if you call functions or if you are called
    1. When calling cdecl functions make sure you purge the stack upon returning
    2. When being called as an stdcall function, make sure you purge the stack before returning

For more information, check the “x86 Architecture” article on MSDN.

Writing x86 assembly code

The main focus of this post will be how to write inline assembler code inside your C/C++ functions. I will cover:

  • Inline assembler code, mixed with C code in the same function
  • Naked functions containing purely assembler code

I will be giving a handful of examples. I will start with easy ones and end with hard ones. Please remember that the presented code is for illustration purposes only and is not meant for production.

Mixing C and Assembler

To write inline assembler code, you just have to use the __asm keyword, open a code block then close the block to terminate the inline assembler code.

It is also possible to use “__asm ASM instruction” to emit one instruction at a time (without opening a block).

Another way to emit machine code is to use the “__asm __emit byte_value” to emit an arbitrary machine code. If the emitted machine byte codes do not correspond to valid instructions then your code may cause exceptions.

Now, let me dive in directly by writing examples and explaining them.

Example 1 – Hello world from Win32 inline assembler

In the following example, we write the assembler equivalent of the following C code:

MessageBoxA(GetForegroundWindow(), msg, info, MB_OK);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
voidhello_world_asm

()
{

static constchar*

msg

=“Hello world from x86 assembly language!”

;

static constchar*

info

=“Hello”

;

__asm

  {
    push MB_OK
    push info
    push msg
    
    call GetForegroundWindow
    push eax

    call MessageBoxA
  }
}

 

  • Lines 1,2,17 – Normal function declaration constructs
  • Lines 3, 4 – Declaring a couple of constant C strings
  • Lines 6,7, 16 – The inline assembler block
  • Lines 8 , 9, 10 – correspond to the last 3 arguments of MessageBoxA (namely: uType, lpCaption, and lpText)
  • Line 12 – Calls the GetForegroundWindow(). The result is an HWND stored in EAX
  • Line 13 – Pass the last remaining argument to MessageBoxA (the hWnd)
  • Line 15 – Invoke MessageBoxA

This is the most simple example.

Example 2 – Calculate function

In this example, a function called “calculate” is implemented in order to illustrate the following:

  • How to mix C and assembler code
  • How to access C variables in the assembler code block
  • How to define assembler labels and branch to them
  • How to preserve and restore registers

Let’s get started:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
intcalculate

(

int

 count)
{

int

 result;

if

 (count

<=0

)
   count

=1

;

__asm

  {
      push eax
      push ebx
      push edx
      mov eax,

0x11223344

      mov ebx,

0x55667788

      xor edx, edx
      mov ecx, count

L_REPEAT:

      test ecx,

1

      jz L_ODD
      add edx, eax
      ror eax,

1

      jmp L_CONTINUE

L_ODD:

      add edx, ebx
      rol edx,

1L_CONTINUE:

      loop L_REPEAT

      mov result, edx
      pop edx
      pop ebx
      pop eax
  }

return

 result;
}

voidtest_calculate

()
{

for

 (

int

 i

=1

; i

<10

; i

++

)
    printf(

“calculate(%d)=%x\n

, i, calculate(i));
}

 

  • Lines 4-6: validate the ‘count’ input
  • Lines 8-32: declare the inline assembly block
  • Lines 10-12: save registers
  • Line 29: move the result from the ‘edx’ register to the local variable ‘result’
  • Lines 30-33: restore the saved registers. Not that we don’t really have to restore or save EAX/EDX since they are volatile.
  • Lines 17, 23, 26 – Declare some labels. The “L_” prefix is not mandatory. It is a convention I use. Just make sure you don’t use label names that coincide with reserved keywords or assembler mnemonics
  • Line 16: loads the ‘count’ into ECX
  • Line 27: uses the “loop” instruction to repeat the iteration.
  • Line 18-25: the code checks the count variable (which was loaded into ‘ecx’) to see if it is odd or even and take a different branch accordingly.

We use the “test_calculate” function to check that the code works and outputs the same values for the same input.

Example 3 – String scrambling

In this example, I will illustrate how to manipulate an input string and return a newly allocated string. I will cover:

  • How to do basic string manipulation from the assembler code
  • Calling standard C library functions (strlen and malloc)
  • How to walk a string character by character and do a single character XOR operation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
char*scramble_string

(

constchar*

src, 

unsignedchar

 k)
{

char*

result

=NULL

;

__asm

  {
    push edi
    push esi
    push edx

// get string length and bail out if zero

    push src
    call strlen
    add  esp,

4

    test eax, eax
    jz L_EXIT

// malloc(strlen + 1)

    inc eax
    push eax
    call malloc
    add esp,

4// cdecl, clean the stack

    test eax, eax

// NULL pointer? exit

    jz L_EXIT

    mov edi, eax
    mov result, eax

    mov esi, src
    movzx edx, k

L_NEXT:

    lodsb
    test al, al
    jz L_DONE
    xor al, dl
    stosb
    jmp L_NEXT

L_DONE:

    mov al,

0

    stosb

L_EXIT:

    pop edx
    pop esi
    pop edi
  }

return

 result;
}

voidtest_scramble

()
{

char*

enc

=

 scramble_string(

“hello world”

,

0xeb

);

if

 (enc

!=NULL

)
  {

char*

dec

=

 scramble_string(enc,

0xeb

);
    free(enc);

if

 (dec

!=NULL

)
      free(dec);
  }
}
  • Lines 1-4: the function declaration
  • Lines 8-10: save registers
  • Lines 43-45: restore registers
  • Lines 12-17: checks the input string length and bail out if it was zero
  • Lines 20-25: allocate a string as long as the input string length plus one (for the zero termination character). Bail out if malloc() returns NULL
  • Line 26: Load into ‘edi’ the output buffer. This is the newly allocated buffer
  • Line 27: Store that allocated buffer into ‘result’. We return that buffer to the caller
  • Line 30: Load into ‘esi’ the input buffer
  • Line 31: Load the scrambling key into ‘edx’ (zero extend)
  • Line 33: Use the ‘lodsb’ instruction to load a byte at ‘esi’ to the ‘al’ register and then increase the ‘esi’ register
  • Lines 34-35: if the character is the null character then this is the end of the string –> bail out
  • Line 36: XOR the string character with the key and
  • Line 37: Use the ‘stosb’ to store the new ‘al’ value into the memory at ‘edi’ then increment the ‘edi’ register
  • Line 39-41: At the end, store the null termination at the end of the string

The ‘test_scramble’ function is used to scramble “hello world”, then descramble the result back into the original value. Since the ‘scramble’ function returns a new buffer using the ‘malloc’ function, we have to free the buffer with ‘free’ function

Example 4 – Generating code dynamically

This is a long but comprehensive example which demonstrates how to generate code dynamically and then call it.

I will illustrate the concept of binding arguments to a function and generating a new function that takes no arguments but knows how to run with the previously bound arguments.

The following points will be covered:

  • How to write assembler code for use as a template code and then relocate / patch that code
  • How to emit byte codes (using the __asm __emit)
  • How to allocate executable memory and put function code in that executable memory
  • How to implement basic closures in C/C++

The code generator is written in the cf1bind_t class. This class binds a C function with one argument of your choice. The result is a function pointer that you can call any time.

First, let me start by showing you the example code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
void__cdeclprint_msg

(

char*

msg)
{
  printf(

“%s”

, msg);
}

voidtest_func_bind

()
{

cf1bind_t*

say_hi

=cf1bind_t::

make(print_msg,

“hi!\n

);

cf1bind_t*

say_bye

=cf1bind_t::

make(print_msg,

“bye!\n

);

  say_hi

->

call();
  say_bye

->

call();

delete

 say_bye;

delete

 say_hi;
}

In this example code, we defined a function called print_msg that takes one argument and prints it to the console.

Then the test_func_bind function that binds the print_msg and generates two functions say_hi and say_bye.

Each of those generated functions are bound to the arguments passed to the cf1_bind_t::make function.

Later, we can invoke the newly generated functions. When they are called, they will invoke print_msg (the bound function) with the corresponding bound argument.

Now, let us look at the implementation of the cf1bind_t class:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
// Creates a new function that generates a new function that no arguments// but calls the bound function with its bound argumentsclasscf1bind_t

{

private:void*

func_body;

size_t

 func_size;

// Generate a function that calls c_func1 with the passed parameterstaticvoid*dyanmic_code

(

void*

c_func1,

void*

param1,

size_t*

func_size)
  {

//// Use static variables so we initialize them once//// The beginning of the asm codestatic

 PBYTE begin_addr

=NULL

;

// Some offsets into the code and the code sizestaticsize_t

 code_size, param_offset, xfer_offset;

// Grab the assembler code onceif

 (begin_addr

==NULL

)
    {
      PBYTE end, param, xfero;

__asm

      {
        jmp L_END
        L_BEGIN

:L_PARAM_PLACEHOLDER:

        push

0x19810417// 0x68 XXXXXXXX (5 bytes)

        call L_XFER_PLACEHOLDER
        add  esp,

4// purge the stack

        ret

L_XFER_PLACEHOLDER:

        push

0xaabbccdd

        ret
      }

// Emit some free style bytes.// Use the __asm __emit to generate opcodes not supported by the// assembler or perhaps data__asm

 __emit

0x44// ‘D’__asm

 __emit

0x55// ‘U’__asm

 __emit

0x4D// ‘M’__asm

 __emit

0x4D// ‘M’__asm

 __emit

0x59// ‘Y’__asm

      {

L_END:

        mov begin_addr, offset L_BEGIN
        mov param, offset L_PARAM_PLACEHOLDER
        mov xfero, offset L_XFER_PLACEHOLDER
        mov end, offset L_END
      }
      code_size

=

 end

 begin_addr;
      param_offset

=

 param

 begin_addr

+1

;
      xfer_offset

=

 xfero

 begin_addr

+1

;
    }

// Allocate memory for the function body. Make writable but not executable (not yet)// Note: (demo code) it is not optimal to allocate a page per bound function

    PBYTE func_memory

=

 (PBYTE)VirtualAlloc(

NULL

,
      code_size,
      MEM_RESERVE

|

 MEM_COMMIT,
      PAGE_READWRITE);

if

 (func_memory

==NULL

)

returnNULL

;

// Copy the function

    memcpy(
      func_memory,
      begin_addr,
      code_size);

// Customize the function body*

((DWORD_PTR

*

)(func_memory

+

 xfer_offset))

=

 c_func1;

*

((DWORD_PTR

*

)(func_memory

+

 param_offset))

=

 param1;

// Lock the memory and make executable only

    DWORD old_protect;

if

 (VirtualProtect(
        func_memory, 
        code_size, 
        PAGE_EXECUTE, 

&

old_protect)

==

 FALSE)
    {
      VirtualFree(func_memory, code_size, MEM_RELEASE);

returnNULL

;
    }

*

func_size

=

 code_size;

return

 func_memory;
  }

voidfree_func

()
  {

if

 (func_body

==NULL

)

return

;

    VirtualFree(
      func_body, 
      func_size, 
      MEM_RELEASE);
    func_body

=NULL

;
  }

public:cf1bind_t

()

:

 func_body(

NULL

), func_size(

0

)
  {
  }

bool

 bind(

void*

c_func1,

void*

arg1)
  {
    free_func();
    func_body

=

 dyanmic_code(
      c_func1, 
      arg1, 

&

func_size);
    

return

 func_body

!=NULL

;
  }

void

 call()
  {

typedefvoid

(

__cdecl*callfunc_t

)(

void

);
    ((

callfunc_t

)func_body)();
  }

~cf1bind_t

()
  {
    free_func();
  }

staticcf1bind_t*

make(

void*

c_func1, 

void*

arg1)
  {

cf1bind_t*

f

=newcf1bind_t

();

if

 (

!

f

->

bind(c_func1, arg1))
    {

delete

 f;

returnNULL

;
    }

return

 f;
  }
};

 

Explanation:

  • Lines 1-8: Define the cf1bind_t class with two member variables.
    • func_body: points to the newly generated function body
    • func_size: the size of the generated function
  • Line 10: Declare the dynamic_code function. This function generates the bound function. It takes three arguments:
    • in: function to bind
    • in: function argument to bind with the function
    • out: the generated function size
  • Lines 19-23: Declares a few static variables that are used to remember characteristic of the inlined ASM code. I chose static because I want to initialize those variables once per the program’s life time.
  • Lines 23 and 61: on time initialization for the inlined ASM code template
  • Line 31: skip over the inlined ASM template
  • Lines 33-40: Define the ASM code template
    • Declare some labels that we can later patch-in proper opcodes and values
    • Declare a proper dynamic function body. This body will be the actual bound function body
    • All the locations at the labels will be later updated with correct values
  • Lines 35: Issue a relative call for another location that will do a pseudo far/absolute call
  •  Lines 39-40: A pseudo absolute call by using the “push ADDRESS/ret” construct
  • Lines 40-50: emit some bytes. This is just for demonstration purpose
  • Lines 53-56: Take the offsets of all the needed ASM code location and store into local variables. Sometimes we take the location at the label (the address of the instruction itself) and sometimes we take the location of the operand of the assembler code
  • Line 65: Allocate Read/Write memory. This will hold the code of the bound function
  • Line 75: Copy the code template to the allocated memory location
  • Lines 81 and 82: Patch in both the address of the function to call and its bound parameter. This will specialize the ASM code template so that it will always calls the bound function with the bound parameter
  • Line 85: Make memory executable only. We should not keep R/W/X memory pages if we don’t have to.
  • Line 100: Define the free_func() function that frees the bound function
  • Line 116: Define the bind() function. This function calls dynamic_code() to generate a new bound function and remember the function’s size and body address into the cf1bind_t class member variables (func_body and func_size).
  • Line 127: Define the call() function. This function casts the func_body member variable into a void function that takes no argument and then calls it. When called, execution will go to the specialized ASM code template that will pass the previously bound argument to the bound function
  • Line 138: Defines the class static function make() that returns a new instance of cf1bind_t class that is ready to be used with the newly constructed bound function.

In short, the cf1bind_t class does the following:

  • Declare ASM code template that PUSHes an argument and CALLs a function
  • Creates new memory page, copies the ASM code template, patches in the correct argument and call address
  • Lock the memory by making it only executable
  • That’s it! Now you can call the newly generated function body

Notes: For production code, use HeapCreate() with HEAP_CREATE_ENABLE_EXECUTE and later use HeapAlloc(). This will avoid allocating one page (normally 4kb) per bound function

Naked function with assembler code only

I wrote just one example in this section. To declare a pure assembler function, you have to specify that this function is “naked” like this:

return_type __declspec(naked) calling_convention Function_Name(Parameters)

Inside the naked function, you are allowed to write as many “__asm” blocks as you want and you are allowed to declare variables.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
//————————————————————————–size_t__declspec

(

naked

)

__stdcall

 sum_buf(

constvoid*

buf, 

size_t

 count

=size_t

(

1

))
{

__asm

  {

// Prolog <

    push ebp
    mov ebp, esp

// >

    push esi
    push ecx
    push ebx

    mov esi, buf
    mov ecx, count
    cmp ecx,

1

    jnz L_START
    

// strlen(buf) <

    push esi
    call strlen
    add esp,

4// >

    mov ecx, eax

L_START:

    xor eax, eax
    xor edx, edx

L_LOOP:

    lodsb
    add edx, eax
    loop L_LOOP

// Return value <

    mov eax, edx
    xor edx, edx

// >// Restore saved registers <

    pop ebx
    pop ecx
    pop esi

// >// Epilog <

    mov esp, ebp 
    pop ebp

// >

    
    ret

4*2

  }
}

voidtest_sum_buf

()
{

if

 (sum_buf(

\x01\x02\x03

,

3

)

!=6

)
  {
    printf(

“bad function!\n

);

return

;
  }

constchar*

buf

=“calling a naked asm x86 function”

;
  printf(

“the sum of characters in ‘%s’ is %d\n

, buf, sum_buf(buf));
}

 

Explanation:

  • Lines 2-4: Declare the naked function sum_buf. Notice the use of the declspec(naked) keyword
  • Lines 8-10: Declare the function prolog. We need to do that
  • Lines 12-14: Save work registers. We need to restore them later
  • Lines 16-24: If the passed count argument is -1 then call strlen to compute the actual buffer length as a C terminated string
  • Line 26: Store the counter value in ECX. We will be using the LOOP instruction to loop
  • Lines 27-29: Start of loop. Clear EAX which will be used by LODSB, and clear EDX that will hold the checksum of each byte in the buffer
  • Lines 30-33: Use LODSB to load a single byte into EAX, add the byte value to EDX and then loop again
  • Line 36: Store EDX into EAX. EAX will be return value of the function (per the __stdcall convention).
  • Lines 39-43: Restored work registers
  • Lines 44-46: Function epilog
  • Line 48: Purge the stack (per the __stdcall convention). Since the function takes two arguments, we need to purge sizeof(DWORD) * 2
  • Lines 52: Tests declares a function to test our naked ASM function

 

 

Final notes

I hope you learned something new from this post. It is by no means meant to teach all the tricks but hopefully it was enough. Please download the source code associated with this article from my GitHub repository.

flower separatorYou might also like:

Leave a Reply