Intelligent Design: 03 - Stack Frames

When the session manager passes control to our main() function, we surely create a message queue and a frame window, but we also might play some music or those mega-cool animations which are a must for all recent applications swimming along the main stream.

Whatever programs might do when they get control over the computer, the first lines of any function, including main(), always follow the set routine. First: A stack frame is built. Second: All used registers are saved on the stack. Actually, ECX and EDX never are saved because the C-conventions say so (see Conventions). Third: The code found in the function body performs all tasks the function is coded for. Fourth: All saved registers are restored. Fifth: The stack frame is destroyed (released). Sixth: The final ret is executed and the function returns to its caller.

One thing should be mentioned explicitely: We only need a stack frame, if we have to store local variables or other temporary data structures on the stack. Functions without local variables do not need a stack frame at all. Building a stack frame lasts at least 6 clock cycles and occupies 10 byte of code. To save superfluous activities, you should switch on the fomit-frame-pointer option (GCC) by default. It skips building stack frames where they are not required.

Grasping how a base pointer is used and how it works is the key to understand conventional programming techniques. Because this is a very important issue, all explanations are very detailed. Some may find them too lengthy, but we should give others the chance to get in touch with all aspects, so (hopefully) anyone is able to gather the knowledge required to apply the learned stuff in real life.

Example

The following example shows how conventional stack frames are created. It is trivial code used in every program. Even huge monster applications like operating systems, browsers, audio studios, et cetera use the same pattern. They only differ in the size of their stack frames.

pushl %ebp               # save old base pointer
movl %esp,%ebp           # load new base pointer
subl $0x08,%esp          # reserve 2 local variables
pushl %ebx               # save EBX
movl 0x08(%ebp),%eax     # EAX = argument count
movl 0x0C(%ebp),%ebx     # EBX = argument vector
movl $0x00,-0x04(%ebp)   # local variable 1 = 0
movl $0x20,-0x08(%ebp)   # local variable 2 = 32
pushl %eax               # copy EAX to -0x10(%ebp)
pushl %ebx               # copy EBX to -0x14(%ebp)

The Stack

After execution of the above instructions, the stack looks like this:

ESP always points to the current stack element. In our case, this is the address where the content of EBX was copied to. Depending on the instructions in the function body, ESP moves down towards stack bottom or back towards stack top. In properly designed functions, the content of ESP never can be greater than or equal to the content of EBP. This tells us that the base pointer not only is used to address stack elements. It also marks the border between the current stack frame and the stack frame of the calling function.

pushl %ebp

The first step to create a new stack frame is saving the content of the old base pointer of the calling function on the stack. The content of the old base pointer must not be changed under any circumstances! Otherwise, the calling function uses an invalid basis to address its local variables. It will crash if it tries to read or write data to an address stored in a local variable, because that address, formerly stored at -0x0C[EBP], now might be found at 0x3C[EBP]. At the latest, the calling function commits suicide while trying to return to its caller. It fills the base pointer with random data, then loads other random data into the instruction pointer. The attempt to execute that 'code' raises one of those exceptions and an 'access violation' is reported on the user's screen.

movl %esp,%ebp

In the second step, we copy the address of the current stack element to EBP. The new base pointer now contains the address, where the content of the old base pointer is stored. The next stack element below the new base pointer is the topmost stack element we are allowed to write to. All stack elements above EBP - including 0x00[EBP]! - only should be used to read from, but never to write to! From here on, we use the base pointer to address local variables and parameters passed to our function. In other words, these data are addressed with offsets to the 'frozen' base pointer.

As shown in figure 03, the old base pointer is stored at offset zero to the current base pointer. We write this as 0x00[EBP], where the 0x tells us this is a hexadecimal number. Everything related to programming should be written in hexadecimal notation. It is much easier to read, numbers are shorter and always formatted correctly. For example, 0xF100 tells us at first sight: 'It's the second page in memory block 0xF000.' Its decimal equivalent 61696 does not tell us anything. We have to start a calculator to translate it into 0xF100, wasting precious time.

Back to the base pointer. Passed parameters and the return address to the calling function are stored in stack elements above the one the base pointer points to. Therefore, they are addressed with positive offsets. Our return address is stored at 0x04[EBP], followed by one or more optional parameters, where parameter 1 always is stored at 0x08[EBP], parameter 2 at 0x0C[EBP], and so on. The 'top down' order is caused by a C-convention, saying that all parameters are pushed onto the stack back to forth, starting with the last parameter the compiler finds inside those round brackets following the name of the called function.

subl $0x08,%esp

The real creation of a stack frame is done with the third and last step. First, we calculate the required size for all variables and structures, then we subtract the result from ESP. Of course, the result must be rounded up to the next multiple of the standard datasize, before we subtract it from the stack pointer. If we subtracted an 'odd' number, the stack became corrupted and was not valid any longer. Everything addressed via ESP, e.g. call or push, was misaligned, causing a lot of penalty cycles. If ESP accidentally was set to 0xFFE2 instead of 0xFFE4 (e.g. by subtracting 0x0A rather than the proper value 0x08) in our sample code, then we push two local variables 0x89ABCDEF and 0x01234567 onto the stack and copy variable 2 from -0x08[EBP] to EBX, EBX finally contained the invalid number 0xCDEF0123 - the doubleword currently stored at address 0xFFE4.

With this subtraction, we reserve the subtracted amount of byte on the stack. This reserved area is safe from being overwritten by any following call or push instruction, because ESP always is set to the next lower stack element before a write operation. Because all local variables are lying below the stack element where the old base pointer is stored, they are addressed with negative offsets. The first is stored at address -0x04[EBP], the 2nd at -0x08[EBP], and so on. It neither matters how you name your local variables nor if you count them from top to bottom or vice versa. The only important thing is to remember where which of them is stored.

pushl %ebx

Saves the content of EBX on the stack. Following the C-conventions, the content of EBX, EDI and ESI must be saved before overwriting them and must be restored before returning to the calling function. In other words: The content of these registers must be preserved by all functions (see Conventions).

Urgent Needs

In conventional code, the content of EBP never must be changed. If you cannot avoid to use EBP for general purposes, because you urgently need an extra register, you have to save it before using it. You should restore EBP immediately after passing the bottleneck. Not worth mentioning, but, nevertheless: EBP cannot be used to adress local variables while it holds your private data...

The Function Body

ESP always points to the address of the current stack element. Looking at our example, it is quite obvious that we are going to call another function. Our local variable 1 might be a counter which is incremented each time the called function returns TRUE. Local variable 2 might be a loop counter, so the function might be called 32 times, counting how many times a specific condition was met.

Stack Frame Destruction

After all instructions in the function body are executed, we first have to restore the saved registers. Next, we have to destroy the stack frame to release the area we reserved for our private use. Finally, we return to the calling function. There are two possible ways to perform these tasks:

addl $0x08,%esp   # correction after 2 PUSH instructions
popl %ebx         # restore EBX
leave             # destroy stack frame (VP 3, 1 byte)
ret               # return to caller

or

addl $0x08,%esp   # correction after 2 PUSH instructions
popl %ebx         # restore EBX
movl %ebp,%esp    # restore ESP (DP 1, 2 byte)
popl %ebp         # restore EBP (VP 4, 1 byte)
ret               # return to caller

Both variants restore ESP and EBP. However, leave is two clock cycles faster and two byte smaller. The correction of ESP is required to restore the preserved content of EBX. Because we pushed two parameters onto the stack after we pushed EBX, ESP is eight byte below the address where the content of EBX is stored. To POP the proper content into EBX, we have to add these eight byte to ESP.

If no registers were saved on the stack, no correction of ESP is required and leave and ret are the only instructions we need to restore ESP and EBP before we return to the caller.

Return To Caller

In our example, ret copies the return address to the session manager of the operating system to EIP and continues execution of its code. The session manager closes our program, frees still allocated resources and passes the content we stored in EAX to the command interpreter.

What Did We Learn?

All introduced mechanisms are, except the size of the stack frame and the amount of preserved registers, identical for all functions following the C-conventions. As far as I know, all of the known operating systems follow these coventions. The advantages are obvious. Once coded functions are portable from one version to the next or from one operating system to another. In the latter case, minor changes must be applied if functions of the target OS have other names or await parameters in different order. Putting it all together, conventions simplify re-use and porting of existing code.

But - every object in our universe has at least two (or more) sides. The disadvantages of the introduced methods, mechanisms and conventions are not as obvious as the advantages, because they are hidden in the deepest and darkest parts of the machine room. Only experienced, well trained technicians are able to find out why the machine chokes and does not run as fast as expected. In the next post, we will analyse what C-conventions do on assembly language level.

Tuesday, April 13, 2010

03 - Stack Frames

No comments:

Post a Comment