Intelligent Design: 02

The assembly language dialect used in this paper is not the one commonly used in the world of Windows, also known as LETNi syntax. It is AT&T syntax, known in the world of Linux and Unix. When I began to write code for the x86 platform in 1993, GCC was the only free development tool one could get, so I had to use AT&T syntax. If you only know LETNi syntax, there is a short introduction to AT&T syntax in one of the next posts to learn the difference between the both. If you worked with AS for a short time, you don't want to return to the complicated and perversed (or was that reversed?) LETNi syntax anymore. The programming techniques introduced in this paper do not rely on a specific syntax. However, knowing AT&T syntax might help you to understand the sample code. Reaching the goal is what really counts. How we get there is another, slightly different problem.

The Stack

Every x86 processor works as a stack machine. Parameters we have to pass to called functions, return addresses to calling functions, local variables and structures are put onto or read from the stack. Unfortunately, most of today's programmers just are used to click together some of those prefabricated code fragments coming along with the daily 20 GB version upgrade for their favourite VisualXYZ(plus-minus-dotnet) development suite. If you ask any of them 'What is a stack?', they probably tell you something about hay or - of course! - money. Sad, but true. Exceptionally sad, because the mechanisms of a software stack are that simple, one were attempted to call the big idea behind nothing else than 'brilliant'.

Whenever you compile a program, you pass a definition file with the extension .def to the linker (LINK.EXE or similar). The definition file holds some important information about the program for the session manager of your operating system. While the compiled program is started, the operating system reserves three independent memory blocks (code, data, stack) for the new process and the segment registers CS, DS+ES and SS are set to the address of one of these blocks. Whatever you defined as STACKSIZE in the definition file, exactly this size is allocated for your stack segment. After allocating the required memory blocks for those three segments, the program code is copied to the code segment, all defined global variables are written to the data segment - they are 'initialised' - and rSP is set to the top of the stack segment. Finally, the address of the array with the command line parameters and the argument count are pushed onto the virgin stack before the session manager calls the main() function of our program. Entering main(), the processor starts to execute the code found there, until it stumbles upon the final ret instruction and passes control back to the session manager.

Entering our program's main() function, the stack looks like this:

The top stack element holds the argument vector argv[] (parameter 2), the next lower stack element holds the argument count argc (parameter 1). The current stack element, the one ESP points to, holds the return address to the session manager. Whenever the program is terminated, the instruction pointer is loaded with this address and the terminating sequence of the session manager is executed. It first frees those resources our program eventually reserved for itself, e.g. allocated memory blocks, open files, open devices, and so on. Finally, it releases the allocated segments and cleans up all structures holding control data of our program.

In the running program, the content of the stack pointer is decreased with every push instruction, the creation of a stack frame or the call to another function. It is increased whenever we pop data from the stack, release a stack frame or return to a calling function. In 32 bit functions, four byte are subtracted from ESP with every push or call, while four byte are added to ESP with every pop or ret. All subtractions or additions are done by the processor automatically, because they are an integral part of the mentioned instructions. Using a picturesque language, we might state: The stack grows with every push or call and shrinks with every pop or ret.

Using The Stack

To make sensible use of the stack, any x86 processor provides the instructions call, enter, leave, pop, push and ret. All these instructions update the content of ESP automatically. Besides these special instructions, ESP can be used like any other register, as well. We are free to add or subtract immediate values or the content of another register to/from the stack pointer and do other funny things with ESP. However, the most utilised action probably is the subtraction of an immediate value from ESP to create a stack frame and the addition of exactly the same value to release that stack frame if we do not need it any longer. A detailed description follows later on, see About Stackframes. For now, we focus our attention on some more basic things like the instructions mentioned above. To understand the concept of conventional programming methods, it is very important to know how these instructions work and how they manipulate the stack and ESP.

CALL

If the processor encounters a call instruction, it first subtracts two, four or eight (depending on the standard datasize) from ESP, then stores the address of the instruction following the call on the stack. Next, the address passed as a part of the call instruction is loaded into the instruction pointer. Execution now is continued with the instructions found at the new location, until the processor stumbles upon a ret.

...
call DoNothing # call function DoNothing
xorl %eax,%eax #<- the address of this instruction
...             # is stored on the stack and loaded
...             # into EIP with the RET instruction

DoNothing:      # local function DoNothing
ret             # return to caller...

ENTER

Please do not use this vector path instruction - it blocks the processor for entire 13 clock cycles. The usual replacement

...
pushl %ebp       # save EBP
movl %esp,%ebp   # save ESP
subl $0x10,%esp # create stack frame
...

is executed in four clock cycles. Saving nine clock cycles is a speed improvement of 325 percent. You should prefer the replacement code over enter under any cicumstances! The 16 byte are just a randomly chosen example to keep the sample code valid. The real size you have to subtract from ESP depends on the amount of local variables and other temporary data your function has to store on the stack.

LEAVE

The leave instruction is equivalent with the following two instructions:

...
movl %ebp,%esp   # restore ESP (DP 1, 2 byte)
popl %ebp        # restore EBP (VP 4, 1 byte)
...

Like pop, leave is a vector path instruction. Because pop blocks the processor for four clock cycles and also has to wait for the valid result of the preceeding mov instruction, leave actually is two clock cycles faster. Moreover, the one byte leave is shorter than its three byte replacement. Hence, you should prefer leave over the alternative method.

POP

Pop copies the content of the current stack element to a register or memory location, then adds, depending on the processor mode and an optional prefix, two, four or eight to the content of the stack pointer. Unlike the direct path push, pop is a vector path instruction. It is executed in pipes 0 and 1, while pipe 2 is blocked for the time the instruction is processed. Every pop instruction lasts four clock cycles. Pop is a special vector path mov instruction, where ESP automatically is updated after it was used to address the source of a copy operation.

In general, pop is used to restore the content of a register. You should keep track of the stack pointer, because it is quite difficult to find errors caused by asymmetrically executed push and pop instructions. Especially, if some of them are inside a loop while others are not...

...
popl %ebx # restore EBX [=> ESP + 4(!)]
...

Push

Push first adds, depending on the processor mode and an optional prefix, two, four or eight to the stack pointer, then copies an imediate value, the content of a register or the content of a memory location into the stack element the updated ESP points to. Push is a special direct path mov instruction, where ESP automatically is updated before it is used to address the target of a copy operation. While the entire execution time of each push instruction is 3 clock cycles, ESP is available one clock earlier (after 2 cycles) for the following instructions.

In general, push is used to put parameters or register contents onto the stack. It is a good idea to remove 'used' parameters from the stack as soon as possible, because they decrease the available stack size, cause avoidable stack pointer arithmetics, and so on. To remove them, we don't have to pop them - beware! - from the stack. We just have to add the appropriate amount of byte to the stack pointer. If we pushed two parameters in a 32 bit function as shown in the below example, we add 8 byte to ESP. If it were eight parameters, we had to add 8 * 4 = 32 byte to ESP, and so on.

Regardless of the lazy behavior of HLL (high level language) compilers, it is a good idea to correct the content of ESP after each call instruction if you passed parameters to the called function. Humans have minor problems to keep track of the current content of the stack pointer, especially, if the function body exceeds the size of their display. Compilers can do that much better. However, their output is less optimised than the code written by a human... ;)

...
pushl %eax       # put parameter 2 onto the stack
pushl %ebx       # put parameter 1 onto the stack
call _helpling   # call another function
addl $0x08,%esp # correct ESP directly after CALL
...

RET

Whenever the processor stumbles upon a ret instruction, it copies the address stored in the current stack element into the instruction pointer EIP, then adds the standard datasize (2, 4 or 8 byte) to the stack pointer. After ESP was updated, execution continues with the instruction found at the address EIP now points to.

...
ret              # return to caller
...

Stack Management

As mentioned in the descriptions of the single instructions, keeping track of the current content of the stack pointer is a must with highest priority. Because passing of parameters generally is done with push instructions, while parameters are taken from the stack by adding the appropriate amount of byte to the stack pointer ESP, it is very important to handle these operations with exceptional care. Especially the necessary corrections of the stack pointer bear some potential to be erroneous. Unfortunately, such errors cause unexpected crashes and malfunctions, and it is quite hard to track them down until the culprit causing the mess is found. This reaches a new dimension with the creation of a stack frame. What is this already mentioned, sinister stack frame? The next post offers a detailed explanation.

Tuesday, April 13, 2010

02 - Basics

No comments:

Post a Comment