Sunday, March 28, 2010

Dynamic Memory Allocation in c


Dynamic Memory Allocation in C--------------------------
One of the big differences between a high-level scripting language and C is that all variables have to be declared at the beginning of a function in C. You cannot simply define a new variable when you need it, as you could in say, Perl or Python, or any other scripting language
for that matter. C is different; everything has to be declared at the beginning. This brings up some problems.
CODE :

#include
int main(){
int x;

printf ("Type a number: ");
scanf ("%d",&x);
printf ("You typed %d\n",x);

return 0;
}

For a simple program such as this, there's no problem. The necessity of declaring variables at the start of the program isn't a big deal. But what about when the value of one variable depends on another? An example is a program I wrote to find prime numbers, as an exercise in learning C. Here is the beginning of the the main function:
CODE :

int main() {
int maxprimes;
int count = 1;
int value = 1;
int composite;
int j;
int primes[maxprimes]; //this is the part that doesn't work.

printf ("How many primes? ");
scanf ("%d", &maxprimes);
...

I needed to set up an array, called 'primes' to hold a certain number of prime numbers, which was held in the variable 'maxprimes.' However, since I don't define maxprimes until I call scanf() later on, the size of the 'primes' array is arbitrary, just as the value of maxprimes is arbitrary. For example, if we run this code:
CODE :

#include
int main(){
int x;
printf ("%d\n",x);
return 0;
}

we get some interesting values:
CODE :

-1208182672
-1207551888
-1208387472
-1207273360
-1208219536

First of all, notice that the values are more or less random. This causes a problem simply because it will set up a random-sized array. Also, the values are negative.
So when the code is run with the array primes defined as it was above, I get a segmentation fault. How do we fix this?

The answer is called dynamic memory allocation. We use the malloc() function to allocate memory to a variable. The good thing about this is that we can do this using previously undefined variables. The beginning of the prime numbers program is now as follows:
CODE :

int main() {
int maxprimes;
int count = 1;
int value = 1;
int composite;
int j;

printf ("How many primes? ");
scanf ("%d", &maxprimes);
int *primes = malloc(maxprimes * sizeof(int));
...

There are a few things going on here. First of all, notice that the value of primes is actually a pointer. We have to dereference it with the * operator. This is because the malloc() function returns a pointer to a memory location--specifically, to the starting address of the block of memory that it allocates. Inside the parentheses, we define the size of the block we want allocated. In this case, we want maxprimes amount of blocks, and then we have to define the size. Since the variables to be stored in our memory are going to be the int type, we multiply maxprimes by the size of an integer. The sizeof() function returns the size of its argument (in bytes). Now, if maxprimes were, say, 50, we have just allocated enough memory for an array of 50 integers. This is C's way of getting around the need to declare every variable at the beginning of the program.

I said before that the malloc() function returns a pointer to a location in memory. Well, that's only if it works. If it doesn't work, you want to be notified. Mistakes in memory, especially with a language like C, can be catastrophic. At the very least, if malloc() doesn't work, your program won't work properly. So, after we allocate memory, we add this conditional statement:
CODE :

if(primes == NULL) {
fprintf(stderr, "Out of memory, exiting\n");
exit(1);
}

If the malloc() function fails, it returns a NULL pointer. So, if this happens, we simply output to stderr (the fprintf() function is simply a way to specify the filestream that it outputs to; printf() goes only to stdout) that the program is out of memory, and exit.

Another thing you always wanted to do when allocating memory for a program is to free it. Since malloc() allocates static memory, it stays allocated after the program exits, not allowing any other programs to access it. So, the end of this program looks like this:
CODE :

free (primes);
return 0;
}

The free() function takes the pointer returned by malloc() as its argument and frees any memory it had allocated to it.

There are two other allocation functions, calloc() and realloc(). The former is simply a way to fill the allocated memory with initialized data. It also works differently, in that you would say
CODE :

calloc(5, sizeof(int))

to allocate five integers. The five could also be variable.

The realloc() function is used to change the size of a memory block previously allocated with malloc(). For example, if we said
CODE :

int *primes = malloc(maxprimes * sizeof(int))

and we decided to store each value as a long int in a new array, we would say:
CODE :

int *new_primes = realloc(primes, maxprimes * sizeof(long int))

This function returns a pointer to a memory location of the newly specified size, and that new location has the same data as was in the old one, pointed to by the pointer (in this case, primes).

As a final note, to access elements of the array, one uses the syntax of *(pointer + x), where x is the number of the element you want to access. Here are a few examples:
CODE :

---------------------
| h | e | l | l | o |
---------------------
^--int *pointer
*(pointer + 0) = h
*(pointer + 3) = l
*(pointer + 5) = ? //this would give you an unpredictable value. this is known as a 'buffer overflow'


So there's dynamic memory allocation. I meant it to be an explanation of how it works to people who have programming experience in higher-level languages like Perl, Python, BASH scripting, etc.

No comments:

Post a Comment