What Instruction Pushes The Ip Register Onto The Stack?
This is the fifth chapter in a series about virtual memory. The goal is to learn some CS basics in a different and more practical mode.
If you missed the previous chapters, y'all should probably start there:
- Chapter 0: Hack The Virtual Memory: C strings & /proc
- Chapter one: Hack The Virtual Memory: Python bytes
- Affiliate 2: Hack The Virtual Retentiveness: Drawing the VM diagram
- Chapter iii: Hack the Virtual Memory: malloc, the heap & the program break
The Stack
Every bit we take seen in affiliate 2, the stack resides at the high end of memory and grows downward. But how does information technology work exactly? How does information technology interpret into assembly code? What are the registers used? In this chapter nosotros will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.
Once we understand this, we will be able to play a bit with information technology, and hijack the menstruum of our programme. Prepare? Permit's start!
Note: We will talk only about the user stack, every bit opposed to the kernel stack
Prerequisites
In club to fully understand this article, yous will demand to know:
- The nuts of the C programming language (especially pointers)
Surround
All scripts and programs have been tested on the following system:
- Ubuntu
- Linux ubuntu iv.iv.0-31-generic #50~fourteen.04.1-Ubuntu SMP Wed Jul thirteen 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
- Tools used:
- gcc
- gcc (Ubuntu 4.8.4-2ubuntu1~xiv.04.3) 4.viii.4
- objdump
- GNU objdump (GNU Binutils for Ubuntu) two.2
Everything we encompass will be true for this organisation/environment, merely may be different on another system
Automated resource allotment
Allow's first wait at a very simple program that has i function that uses 1 variable (0-main.c
):
#include <stdio.h> int principal(void) { int a; a = 972; printf("a = %d\n", a); return (0); }
Let's compile this programme and disassemble it using objdump
:
holberton$ gcc 0-principal.c holberton$ objdump -d -j .text -One thousand intel
The assembly code produced for our principal
part is the following:
000000000040052d <main>: 40052d: 55 push button rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec ten sub rsp,0x10 400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc 40053c: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 40053f: 89 c6 mov esi,eax 400541: bf e4 05 twoscore 00 mov edi,0x4005e4 400546: b8 00 00 00 00 mov eax,0x0 40054b: e8 c0 fe ff ff call 400410 <printf@plt> 400550: b8 00 00 00 00 mov eax,0x0 400555: c9 leave 400556: c3 ret 400557: 66 0f 1f 84 00 00 00 nop Give-and-take PTR [rax+rax*1+0x0] 40055e: 00 00
Allow's focus on the showtime three lines for at present:
000000000040052d <principal>: 40052d: 55 button rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec ten sub rsp,0x10
The first lines of the function primary
refers to rbp
and rsp
; these are special purpose registers. rbp
is the base arrow, which points to the base of the current stack frame, and rsp
is the stack pointer, which points to the top of the current stack frame.
Let's decompose step by step what is happening here. This is the state of the stack when we enter the office main
before the commencement instruction is run:
-
push rbp
education pushes the value of the registerrbp
onto the stack. Because information technology "pushes" onto the stack, now the value ofrsp
is the memory address of the new peak of the stack. The stack and the registers at present await similar this:
-
mov rbp, rsp
copies the value of the stack pointerrsp
to the base pointerrbp
->rpb
andrsp
now both point to the top of the stack
-
sub rsp, 0x10
creates a space to shop values of local variables. The space betweenrbp
andrsp
is this space. Note that this space is big plenty to shop our variable of blazoninteger
We have simply created a space in retention – on the stack – for our local variables. This space is called a stack frame. Every function that has local variables will employ a stack frame to shop those variables.
Using local variables
The fourth line of assembly code of our main
function is the following:
400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc
0x3cc
is actually the value 972
in hexadecimal. This line corresponds to our C-lawmaking line:
a = 972;
mov DWORD PTR [rbp-0x4],0x3cc
is setting the retentiveness at address rbp - 4
to 972
. [rbp - 4]
IS our local variable a
. The estimator doesn't actually know the proper noun of the variable we apply in our code, it simply refers to memory addresses on the stack.
This is the country of the stack and the registers after this operation:
get out
, Automated de-allocation
If we look now at the end of the function, we will find this:
400555: c9 leave
The teaching go out
sets rsp
to rbp
, so pops the top of the stack into rbp
.
Because we pushed the previous value of rbp
onto the stack when we entered the function, rbp
is now fix to the previous value of rbp
. This is how:
- The local variables are "de-allocated", and
- the stack frame of the previous function is restored before we exit the electric current role.
The state of the stack and the registers rbp
and rsp
are restored to the same state as when we entered our main
function.
Playing with the stack
When the variables are automatically de-allocated from the stack, they are not completely "destroyed". Their values are still in retentivity, and this infinite will potentially be used past other functions.
This is why information technology is important to initialize your variables when yous write your code, because otherwise, they volition take whatever value in that location is on the stack at the moment when the program is running.
Let's consider the following C code (1-main.c
):
#include <stdio.h> void func1(void) { int a; int b; int c; a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); } void func2(void) { int a; int b; int c; printf("a = %d, b = %d, c = %d\n", a, b, c); } int main(void) { func1(); func2(); render (0); }
Equally you can run into, func2
does non ready the values of its local vaiables a
, b
and c
, yet if we compile and run this program it will print…
holberton$ gcc 1-main.c && ./a.out a = 98, b = 972, c = 1070 a = 98, b = 972, c = 1070 holberton$
… the same variable values of func1
! This is because of how the stack works. The two functions alleged the same amount of variables, with the same blazon, in the aforementioned social club. Their stack frames are exactly the aforementioned. When func1
ends, the retentiveness where the values of its local variables reside are non cleared – merely rsp
is incremented.
As a consequence, when we call func2
its stack frame sits at exactly the aforementioned place of the previous func1
stack frame, and the local variables of func2
have the same values of the local variables of func1
when nosotros left func1
.
Let's examine the associates code to prove it:
holberton$ objdump -d -j .text -One thousand intel
000000000040052d <func1>: 40052d: 55 push rbp 40052e: 48 89 e5 mov rbp,rsp 400531: 48 83 ec ten sub rsp,0x10 400535: c7 45 f4 62 00 00 00 mov DWORD PTR [rbp-0xc],0x62 40053c: c7 45 f8 cc 03 00 00 mov DWORD PTR [rbp-0x8],0x3cc 400543: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8] 400546: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] 400549: 01 d0 add eax,edx 40054b: 89 45 fc mov DWORD PTR [rbp-0x4],eax 40054e: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 400551: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 400554: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 400557: 89 c6 mov esi,eax 400559: bf 34 06 xl 00 mov edi,0x400634 40055e: b8 00 00 00 00 mov eax,0x0 400563: e8 a8 fe ff ff call 400410 <printf@plt> 400568: c9 leave 400569: c3 ret 000000000040056a <func2>: 40056a: 55 push rbp 40056b: 48 89 e5 mov rbp,rsp 40056e: 48 83 ec ten sub rsp,0x10 400572: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 400575: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 400578: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 40057b: 89 c6 mov esi,eax 40057d: bf 34 06 40 00 mov edi,0x400634 400582: b8 00 00 00 00 mov eax,0x0 400587: e8 84 fe ff ff call 400410 <printf@plt> 40058c: c9 get out 40058d: c3 ret 000000000040058e <main>: 40058e: 55 push rbp 40058f: 48 89 e5 mov rbp,rsp 400592: e8 96 ff ff ff call 40052d <func1> 400597: e8 ce ff ff ff call 40056a <func2> 40059c: b8 00 00 00 00 mov eax,0x0 4005a1: 5d pop rbp 4005a2: c3 ret 4005a3: 66 2e 0f 1f 84 00 00 nop Word PTR cs:[rax+rax*1+0x0] 4005aa: 00 00 00 4005ad: 0f 1f 00 nop DWORD PTR [rax]
As you tin can see, the way the stack frame is formed is ever consequent. In our two functions, the size of the stack frame is the same since the local variables are the same.
push button rbp mov rbp,rsp sub rsp,0x10
And both functions end with the leave
statement.
The variables a
, b
and c
are referenced the aforementioned style in the two functions:
-
a
lies at retentivity addressrbp - 0xc
-
b
lies at memory addressrbp - 0x8
-
c
lies at memory addressrbp - 0x4
Annotation that the guild of those variables on the stack is not the same every bit the order of those variables in our code. The compiler orders them every bit it wants, and then you lot should never assume the society of your local variables in the stack.
So, this is the state of the stack and the registers rbp
and rsp
before we leave func1
:
When we leave the function func1
, we hitting the pedagogy get out
; as previously explained, this is the land of the stack, rbp
and rsp
right before returning to the office main
:
So when we enter func2
, the local variables are prepare to whatever sits in memory on the stack, and that is why their values are the aforementioned as the local variables of the function func1
.
ret
You lot might have noticed that all our example functions stop with the instruction ret
. ret
pops the return address from stack and jumps there. When functions are called the program uses the instruction call
to push the return address before information technology jumps to the showtime pedagogy of the function called.
This is how the programme is able to call a part so render from said function the calling function to execute its adjacent didactics.
So this means that there are more than just variables on the stack, there are likewise memory addresses of instructions. Let's revisit our 1-main.c
lawmaking.
When the main
office calls func1
,
400592: e8 96 ff ff ff telephone call 40052d <func1>
information technology pushes the retentivity address of the adjacent teaching onto the stack, and then jumps to func1
.
As a consequence, before executing any instructions in func1
, the top of the stack contains this address, so rsp
points to this value.
Later the stack frame of func1
is formed, the stack looks like this:
Wrapping everything upwards
Given what we just learned, we can directly use rbp
to directly admission all our local variables (without using the C variables!), equally well as the saved rbp
value on the stack and the return address values of our functions.
To practice then in C, we tin use:
register long rsp asm ("rsp"); annals long rbp asm ("rbp");
Here is the list of the program two-main.c
:
#include <stdio.h> void func1(void) { int a; int b; int c; annals long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\north", a, b, c); printf("func1, rpb = %lx\n", rbp); printf("func1, rsp = %lx\n", rsp); printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); printf("func1, return accost value = %lx\due north", *(unsigned long int *)((char *)rbp + 8) ); } void func2(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\n", a, b, c); printf("func2, rpb = %60\n", rbp); printf("func2, rsp = %lx\n", rsp); } int main(void) { annals long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("principal, rpb = %lx\n", rbp); printf("master, rsp = %lx\n", rsp); func1(); func2(); render (0); }
Getting the values of the variables
From our previous discoveries, we know that our variables are referenced via rbp
– 0xX:
-
a
is atrbp - 0xc
-
b
is atrbp - 0x8
-
c
is atrbp - 0x4
So in order to go the values of those variables, we demand to dereference rbp
. For the variable a
:
- cast our variable
rbp
to achar *
:(char *)rbp
- subtract the correct corporeality of bytes to go the accost of where the variable is in memory:
(char *)rbp) - 0xc
- cast it again to a pointer pointing to an
int
sincea
is of typeint
:(int *)(((char *)rbp) - 0xc)
- and dereference information technology to get the value sitting at this address:
*(int *)(((char *)rbp) - 0xc)
The saved rbp
value
Looking at the higher up diagram, the current rbp
directly points to the saved rbp
, so nosotros simply have to cast our variable rbp
to a arrow to an unsigned long int
and dereference it: *(unsigned long int *)rbp
.
The render address value
The return accost value is right earlier the saved previous rbp
on the stack. rbp
is 8 bytes long, so nosotros but need to add 8 to the current value of rbp
to get the address where this return value is on the stack. This is how we exercise it:
- cast our variable
rbp
to achar *
:(char *)rbp
- add 8 to this value: ((char *)rbp + 8)
- bandage it to point to an
unsigned long int
:(unsigned long int *)((char *)rbp + 8)
- dereference it to go the value at this address:
*(unsigned long int *)((char *)rbp + 8)
The output of our program
holberton$ gcc 2-main.c && ./a.out primary, rpb = 7ffc78e71b70 principal, rsp = 7ffc78e71b70 a = 98, b = 972, c = 1070 func1, rpb = 7ffc78e71b60 func1, rsp = 7ffc78e71b50 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7ffc78e71b70 func1, return accost value = 400697 func2, a = 98, b = 972, c = 1070 func2, rpb = 7ffc78e71b60 func2, rsp = 7ffc78e71b50 holberton$
We can see that:
- from
func1
we tin can admission all our variables correctly viarbp
- from
func1
we can get therbp
of the functionchief
- we confirm that
func1
andfunc2
do take the samerbp
andrsp
values - the difference between
rsp
andrbp
is 0x10, as seen in the assembly code (sub rsp,0x10
) - in the
primary
function,rsp
==rbp
because in that location are no local variables
The render accost from func1
is 0x400697
. Allow's double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right afterwards the call of func1
in the main
role.
holberton$ objdump -d -j .text -M intel | less
0000000000400664 <main>: 400664: 55 push button rbp 400665: 48 89 e5 mov rbp,rsp 400668: 48 89 e8 mov rax,rbp 40066b: 48 89 c6 mov rsi,rax 40066e: bf 3b 08 twoscore 00 mov edi,0x40083b 400673: b8 00 00 00 00 mov eax,0x0 400678: e8 93 fd ff ff call 400410 <printf@plt> 40067d: 48 89 e0 mov rax,rsp 400680: 48 89 c6 mov rsi,rax 400683: bf 4c 08 forty 00 mov edi,0x40084c 400688: b8 00 00 00 00 mov eax,0x0 40068d: e8 7e fd ff ff call 400410 <printf@plt> 400692: e8 96 fe ff ff telephone call 40052d <func1> 400697: e8 7a ff ff ff call 400616 <func2> 40069c: b8 00 00 00 00 mov eax,0x0 4006a1: 5d pop rbp 4006a2: c3 ret 4006a3: 66 2e 0f 1f 84 00 00 nop Word PTR cs:[rax+rax*1+0x0] 4006aa: 00 00 00 4006ad: 0f 1f 00 nop DWORD PTR [rax]
And yes! \o/
Hack the stack!
Now that we know where to find the render address on the stack, what if we were to modify this value? Could we change the flow of a program and make func1
return to somewhere else? Permit'south add a new part, called bye
to our program (3-main.c
):
#include <stdio.h> #include <stdlib.h> void bye(void) { printf("[ten] I am in the role bye!\n"); go out(98); } void func1(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); printf("func1, rpb = %lx\n", rbp); printf("func1, rsp = %60\n", rsp); printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\due north", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %threescore\n", *(unsigned long int *)rbp ); printf("func1, return accost value = %lx\n", *(unsigned long int *)((char *)rbp + viii) ); } void func2(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\due north", a, b, c); printf("func2, rpb = %lx\n", rbp); printf("func2, rsp = %lx\n", rsp); } int main(void) { register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("main, rpb = %60\n", rbp); printf("main, rsp = %lx\n", rsp); func1(); func2(); return (0); }
Allow'southward run across at which address the code of this function starts:
holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less
00000000004005bd <goodbye>: 4005bd: 55 push button rbp 4005be: 48 89 e5 mov rbp,rsp 4005c1: bf d8 07 40 00 mov edi,0x4007d8 4005c6: e8 b5 fe ff ff call 400480 <puts@plt> 4005cb: bf 62 00 00 00 mov edi,0x62 4005d0: e8 eb fe ff ff call 4004c0 <exit@plt>
Now let's replace the render address on the stack from the func1
office with the accost of the beginning of the function adieu
, 4005bd
(4-main.c
):
#include <stdio.h> #include <stdlib.h> void cheerio(void) { printf("[x] I am in the function bye!\northward"); exit(98); } void func1(void) { int a; int b; int c; register long rsp asm ("rsp"); register long rbp asm ("rbp"); a = 98; b = 972; c = a + b; printf("a = %d, b = %d, c = %d\n", a, b, c); printf("func1, rpb = %sixty\northward", rbp); printf("func1, rsp = %lx\due north", rsp); printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); printf("func1, previous rbp value = %sixty\n", *(unsigned long int *)rbp ); printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); /* hack the stack! */ *(unsigned long int *)((char *)rbp + 8) = 0x4005bd; } void func2(void) { int a; int b; int c; annals long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("func2, a = %d, b = %d, c = %d\n", a, b, c); printf("func2, rpb = %sixty\north", rbp); printf("func2, rsp = %lx\n", rsp); } int main(void) { register long rsp asm ("rsp"); register long rbp asm ("rbp"); printf("main, rpb = %lx\n", rbp); printf("main, rsp = %lx\north", rsp); func1(); func2(); return (0); }
holberton$ gcc four-main.c && ./a.out main, rpb = 7fff62ef1b60 chief, rsp = 7fff62ef1b60 a = 98, b = 972, c = 1070 func1, rpb = 7fff62ef1b50 func1, rsp = 7fff62ef1b40 func1, a = 98 func1, b = 972 func1, c = 1070 func1, previous rbp value = 7fff62ef1b60 func1, return address value = 40074d [x] I am in the function farewell! holberton$ echo $? 98 holberton$
Nosotros have chosen the office bye
, without calling it!
Outro
I hope that you enjoyed this and learned a couple of things about the stack. Every bit usual, this volition exist connected! Let me know if you have annihilation y'all would similar me to cover in the next chapter.
Questions? Feedback?
If y'all have questions or feedback don't hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/nil
.
Happy Hacking!
Cheers for reading!
As e'er, no one is perfect (except Chuck of grade), so don't hesitate to contribute or send me your comments if you find anything I missed.
Files
This repo contains the source code (X-main.c
files) for programs created in this tutorial.
Read more about the virtual retention
Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the 5th affiliate in our series on the virtual memory. If you lot missed the previous ones, here are the links to them:
- Chapter 0: Hack The Virtual Memory: C strings & /proc
- Affiliate 1: Hack The Virtual Memory: Python bytes
- Chapter 2: Hack The Virtual Retentiveness: Drawing the VM diagram
- Chapter 3: Hack the Virtual Memory: malloc, the heap & the program pause
Many thank you to Naomi for proof-reading!
What Instruction Pushes The Ip Register Onto The Stack?,
Source: https://blog.holbertonschool.com/hack-virtual-memory-stack-registers-assembly-code/
Posted by: kreidersonters.blogspot.com
0 Response to "What Instruction Pushes The Ip Register Onto The Stack?"
Post a Comment