A Simple Low Level Example of How Pointers Work in C

c-pointers

As programming application has been extending with an unprecedented speed during last recent years, there are more and more people trying to start learning some kind of programming language to satisfy their very need to seek computers help to perform their tasks. If you are an academic person, no difference interested in biology, physics, math, mechanics, electrical engineering, etc. you will need to learn how to code. We have not exaggerated if we say programming is something like mathematics from the science point of view. Different programming languages emergence stems from this exact fact that for almost everything there need to exists some sort of help from computers, and the performer of that task must know a way to communicate with the computer in order to guide it on how to accomplish the job. But of course as not everybody need to be a mathematician, similarly not everybody need to be a professional computer coder. This is why we have different kinds of languages to fulfill different kinds of academic needs.

One of those general purpose languages which accompanied the machine we recognize as “Computer” from almost its inception (or at least the inception of pioneer operating systems) is C. By general purpose I mean there is not a single task, that a computer is capable of and with this language you can’t do. C is one step higher level than assembly which is in turn one step higher level than raw machine coeds (aka ones and zeros). The problem with C is its inherent difficulty to master the language. One of the infamous confusing concepts of C family languages is the pointers which we try to clarify in this post. Beforehand, there is something important to note.

From many individuals trying to learn programming there is really a small percentage of them who actually need learn C/C++. Most of the tasks needing some sort of programming, can be done with other less difficult, higher level languages. You do not need C to interact with webservers, doing arithmetic, resolving complex mathematical equations, drawing charts, information retrieval, making smartphone apps, etc.

C is a language appropriate for low level system programming; tampering with operating system structures, doing low level network tasks, low level file system programming, hacking and security oriented coding, writing firmware for custom hardware, and in one word: Low level and low level.

So if you are not into these fields of work and study, trust me; you don’t need C, just stick to one of those beautiful high level languages such as python or java.

But if you are curios about how computer works, want to professionally learn to be a system programmer or are a student who has to learn some C to pass her university courses, let’s start our discussion about one of the pieces of this harsh, aggressive powerful language: Pointers

Any variable which is stored in memory has a unique address in the owning process address space. From the compiler perspective, variable is just an address. What is stored in that address of memory creates the concept of data types. That value itself can also be an address which is what we a name pointer. So a pointer is a variable containing an address to another part of memory. (It points to somewhere in memory. This is why it obtained such a name.) Consider the simple structure below:

    struct test_struct{
        int integer;
        char c;
        double h;
    } x,*y,*z,**p;

Okay. We have x as test_struct type, y and z as pointers to an object of test_struct type and p as pointer to an address to an object of test_struct type. Consider we allocate a test_struct object and assign its address to z. We name this dynamic object dyn_object . Also y has address of x inside.

– x is as object
– y has the address of x
– z has the address of dyn_object > z and y are pointing to different objects.
– p has the address of z

y ----> x
z ----> dyn_object
p ----> z


0x6009d0    |--------------------|
            |   Object Content   |
            |                    |
            |                    |
            |--------------------|
            
            
            
0x6009e0    |--------------------|
            |   Object Address   |
            |      0x6009d0      |
            |--------------------|
 

This is the sequence of codes to run:

    1. x.integer=1;
    2. y=&x;
    3. y->integer = 2;
    4. z->integer = 3;
    5. p = &z;   
    6. (*p) -> integer = 4;

To understand what’s going on at the machine level we should look at the assembly dump of the compiled executable:

 :	push   rbp
  :	push   rbp
 :	mov    rbp,rsp
 :	mov    DWORD PTR [rip+0x20047c],0x1        # 0x6009d0 x
 :	mov    QWORD PTR [rip+0x200481],0x6009d0        # 0x6009e0 y
 :	mov    edi,0x10
 :	call   0x400440 
 :	mov    QWORD PTR [rip+0x200458],rax        # 0x6009c8 z
 :	mov    rax,QWORD PTR [rip+0x200469]        # 0x6009e0 y
 :	mov    DWORD PTR [rax],0x2
 :	mov    rax,QWORD PTR [rip+0x200444]        # 0x6009c8 z
 :	mov    DWORD PTR [rax],0x3
 :	mov    QWORD PTR [rip+0x20042b],0x6009c8        # 0x6009c0 p
 :	mov    rax,QWORD PTR [rip+0x200424]        # 0x6009c0 p
 :	mov    rax,QWORD PTR [rax]
 :	mov    DWORD PTR [rax],0x4
 :	mov    rax,QWORD PTR [rip+0x20041c]        # 0x6009c8 z
 :	mov    rdi,rax
 :	call   0x400410 
 :	pop    rbp
 :	ret    

As the comments implies, variable x is located at 0x6009d0. To the compiler and the computer itself there is nothing as x, x is represented by its address which is 0x6009d0. At +4, x.integer is set a value, the object itself lays at memory address 0x6009d0. The struct members are sequentially stored at that address one after another. So the size of x (the size of variable x at the address 0x6009d0) is equal to the size of test_struct. At +14 y (with address 0x6009e0) gets the address of x which is 0x6009d0. Look at +42. 8 bytes of memory is read from address 0x6009e0 which is where y is stored (QWORD PTR) . This 8 bytes are the address which y is containing as a pointer. Then at +42 this addressed is dereferenced and value 2 is stored there so that it occupies 4 bytes of memory (DWORD which is size of the integer member of our struct). Same as this happens at +55 and +62 for z but for a different object. Now at +79,+86,+89 we have two separate dereferencing for p. p is a pointer to a pointer of type test_struct. At the first stage at +79, 8 bytes are read from where p resides. All pointers to any type are 8 bytes in an x64 architecture. A pointer, a pointer to a pointer, a pointer to a pointer to a pointer ….. are all fixed 8 bytes values in which an address is stored. So you may ask why do we need types for pointers if they are equal in size. The answer lies in different things you can do with different kinds of variables including pointers. One reason for this is to try to limit programming errors – A pointer of a type should only point a variable of that type. Another reason is for pointer arithmetic. While an integer is typically 4 bytes in size, a double is 8 bytes. So an increment for a pointer of type integer causes the pointer to actually grow 4 units not one. On the contrary adding one to a character pointer sets its value to exactly one unit bigger since a character type takes up one byte in memory. Consider you have an array of integers with a pointer, pointing to one item of the array. In order to point to the next item the address must be added 4 bytes. But this doesn’t change the size of memory needed to store a pointer inside. A pointer is a memory address and such an address has a fixed size (8 bytes in our case). At +86 rax is dereferenced and again at +89 rax gets dereferenced again to finally store value 4 at integer member of dyn_object.

Conclusion
For anybody who wants to professionally develop or analyze system programs, a deep understanding of memory and how machine codes deal with hardware resources is indispensable. By using a simple program we tried to make the idea behind pointers and how they are stored in memory a bit more clear.

Contact: sirus.shahini@gmail.com
         twitter.com/_BitWar
         BTC Donation: 14VbVxML8M2MUnXF9kPAKWCEQka232pc5h
Iran University of Science and Technology
Department of Computer Engineering


2 thoughts on “A Simple Low Level Example of How Pointers Work in C

  1. My brother recommended I might like this
    web site. He was entirely right. This post truly made my day.
    You can not imagine just how much time I had spent for this info!

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *