13: Dynamic Data

UC Irvine - Fall ‘22 - ICS 45C

Quick list of things I want to talk about:

  • variable-defined-sized arrays
  • stack vs heap
  • new / delete
    • memory leaks
  • linked lists

Expanded notes:

So far in this course we have only used variables that are statically defined. That is, we have created local variables and they’re all managed by the compiler/computer, we haven’t done anything besides declaring/using them.

This is nice because we don’t need to manage those memory spaces, but it doesn’t give us full flexibility. For example, we cannot create an array that has its size defined by the user, we need to know the exact size beforehand.

Stack vs Heap

When we use variables that are statically defined, they are stored in a memory space called stack. The stack contains the necessary values to execute the current function (e.g., local variables). Then, once the function is over, all that memory is cleared up and made free to the computer again.

But there are other memory spaces1! In particular, there is a space called heap, which is not automatically managed. To create variables in the heap, we have to explicitly allocate that space, using the new keyword, and then later on delete them, with the delete keyword. By doing that we can dynamically create more variables and reserve more space as we need. Also, your program usually has more space available by default in the heap, so we can create larger arrays, store more elements, etc.

Allocating memory

To create some space, we use the new keyword. The usage is new int; for an int variable, or new int[100]; for an array of ints.

new always returns a pointer! So if you create an int, it will return a pointer to the memory where we can use to store our integer.

Let’s see an example:

#include <iostream>

using namespace std;

int main() {
  int *new_var = new int;
  int *new_array = new int[100];

  *new_var = 2;
  cout << "*new_var: " << *new_var << '\n';

  new_array[0] = 2;
  cout << "new_array[0]: " << new_array[0] << '\n';

  return 0;
}

If you run this code, you should get this output:

*new_var: 2
new_array[0]: 2

Notice how we have an int * for both a regular int and an array of ints. The only difference is how we use them, *new_var vs new_array[0]. You can also change the code to print out the address the new operator reserved if you don’t dereference the pointers.

This example has a mistake though! We allocated memory for our two variables, but never deleted them. Since they are in the heap, they wouldn’t be automatically freed after our code finishes executing.

Freeing memory

Once you’re done using the memory you requested, you should free it up so the computer can use it again. The way we do this is by using the delete keyword.

If you have allocated some memory and saved the address in my_ptr, then you can use delete my_ptr; to free it. If you have allocated an array in my_ptr, then you should use delete[] my_ptr;.

As an example, let’s fix our previous code:

#include <iostream>

using namespace std;

int main() {
  // Allocates new memory.
  int *new_var = new int;
  int *new_array = new int[100];

  // Use those spaces.
  *new_var = 2;
  cout << "*new_var: " << *new_var << '\n';

  new_array[0] = 2;
  cout << "new_array[0]: " << new_array[0] << '\n';

  // Free the memory after we're done.
  delete new_var;
  delete[] new_array;

  return 0;
}

Memory leaks

Since the heap is not managed for us, we need to be careful how we use it. In one of our previous examples, we created an int and an array with 100 ints. The new keyword reserves the required space, in this case 404 bytes, and returns the address to our pointers.

However, once our code is done executing, our pointers will be erased since they are local variables. But those heap values will not! So after running this, we have 404 bytes that we don’t know their addresses and can’t delete them anymore.

As a rule: if you use new, you need to delete later!

Detecting memory leaks

There are some tools that can help detecting memory leaks:

This is just some of them, you can probably find more if you search for it.

As an example, let’s try running that same example again with cppcheck.
I saved that code as mem_leak_test.cpp. Then, I navigate to that directory and ran:

~ cppcheck mem_leak_test.cpp
Checking mem_leak_test.cpp ...
mem_leak_test.cpp:15:3: error: Memory leak: new_var [memleak]
  return 0;
  ^
mem_leak_test.cpp:15:3: error: Memory leak: new_array [memleak]
  return 0;
  ^

cppcheck tells us there are two memory leaks, one with new_var and one with new_array.

This is also a nice example because gradescope uses cppcheck to analyze your submissions for projects. It’s an open test case, so you should see any feedback over there too, but you can try running it yourself beforehand if you want.

Again, keep in mind that:

As a rule: if you use new, you need to delete later!

“Hidden” memory leaks

That leak was pretty easy to detect. That might not always be the case though.

For example, let’s see a small example that still has leaks:

#include <iostream>

using namespace std;

int main() {
  int *user_input;
  
  do {
    cout << "Type in a number (-1 to exit): ";
    user_input = new int;
    cin >> *user_input;
  } while (*user_input != -1);
  
  delete user_input;
  
  return 0;
}

We have a new and a delete, so what’s the problem?

At each loop iteration, we allocate space for a new int and store that address. What’s happening to the address for the previous int? It’s lost! We’re overwriting it with the new one, and wouldn’t be able to delete it even if we wanted. We’re just deleting the very last int we store, all the previous ones are just leaking into memory.

One last reminder that:

As a rule: if you use new, you need to delete later!

Array size comparison

Many lectures ago we saw how there was a limit in declaring a large array. If we create an array in the stack, the computer has more strict restrictions in the size we have available. However, in the heap we usually have more space that we can use.

For example, let’s try running this code:

#include <iostream>
using std::cout;

const long size = 10000000;

int main() {
  int my_array[size];
  for (long i = 0; i < size; i++) {
    my_array[i] = i;
  }

  cout << "All done :)";

  return 0;
}

In my computer, I get a segmentation fault:

[1]    89434 segmentation fault  ./a.out

This means that we had some memory management problem. In this case, we asked for too much memory from the stack.

Let’s try running the same thing using the heap:

#include <iostream>
using std::cout;

const long size = 10000000;

int main() {
  int *my_array = new int[size];
  for (long i = 0; i < size; i++) {
    my_array[i] = i;
  }

  delete[] my_array;

  cout << "All done :)";

  return 0;
}

When I tried running this, it worked fine, printed All done :), and finished successfully. So, in my computer2, I can create an array with 10,000,000 ints in the heap, but not in the stack.

“Infinitely large” structures

These arrays still have one size though, and if we want more elements we would need to keep creating larger arrays, which is not ideal. A possible way to store any number of elements you want is to use “infinitely large” structures!

What I mean by an infinitely large structure is something in memory that can keep adding new values as long as we have memory available. To create a structure like this, we need to combine a few things we’ve learned: (i) structs, (ii) pointers, and (iii) dynamic memory management.

Unlike arrays, the elements of this structure are not necessarily stored in a contiguous area of memory. Instead, each element of these structures have pointer(s) that store the address(es) of the next element(s), so they can be all over the memory and that’s fine, we just need to know where to start.

Let’s look at an example of one of these structures, which should help a lot in understanding how they work.

Linked list example

A classic example of a data structure that can grow infinitely large is a linked list. A linked list consists of separate nodes, where each node stores a single value and the address to the next node. The figure below shows a visual representation of a linked list structure:

Visualization of a linked list

Whenever we want to add a new value to this list, we just allocate more space for a single node with new and fix any pointers that need to be updated. Similarly, if you don’t want a value anymore, you just delete that node and fix any pointers that need to be updated.

Let’s try implementing a simple linked list, with a few functions, so we can see how to combine dynamic data, pointers, and structs. For this example, we will store ints in the list.

Creating a struct to store data

First, we need to create a structure to store the data containing a value and a pointer. We use a struct that can store an int value and a pointer to another variable of the same type.

typedef struct LinkedList {
  int value;
  struct LinkedList *next;
} LinkedList;

Note that we use struct LinkedList *next. Since we’re still within the syntax of the typedef instruction, we don’t have the LinkedList name available yet. So you need to give the struct a name, and use the entire struct NAME within the struct.

Creating a new node

Now, let’s create our first function: it reserves space for a new node, stores the value, and returns a pointer to the newly created node.

LinkedList* create_node(int x) {
  // Create space in memory for new node.
  LinkedList* node = new LinkedList();
  // Initialize fields.
  node->value = x;
  node->next = nullptr;
  return node;
}

Notice how we’re using the pointer operator -> like we mentioned in the previous notes!

Adding a node in the front

We can then build on top of creating a node, by creating a sequence of nodes. The easiest way to do that is to add a new node at the very front of the list and push everything else to the back. We then return the pointer to the new head of the list.

LinkedList* add_node(LinkedList* head, int x) {
  LinkedList* new_head = create_node(x);
  new_head->next = head;
  return new_head;
}

Printing the list

Now that we can add a bunch of nodes in this list, let’s print it out to make sure they’re all there. To do this, we will iterate through the list until we reach our nullptr node that indicates the end.

void print_list(LinkedList* head) {
  cout << "List: ";
  while (head) {
    // Print value at current node and advance to next.
    cout << head->value << ", ";
    head = head->next;
  }
  cout << '\n';
}

Since the nodes are not stored in a sequence, we can’t just use head[5] to get to the 6th node. We need to iterate through the pointers to get there. Our head pointer is a local variable in this function, so it’s not a problem to assign a new value to it, as it won’t change anything outside the function3.

Deleting the list

Finally, we should also be ready to delete the list. Whenever we don’t want to store these values anymore, we need to delete them so the memory is free again – we don’t want leaks!

To make it simple, we’ll delete the entire list.

void delete_list(LinkedList* head) {
  LinkedList* next;
  while (head) {
    // Save address of next node before deleting this.
    next = head->next;
    delete head;
    // Advance to next node.
    head = next;
  }
}

Notice how we need to delete the nodes one by one. Since they’re not necessarily stored all together, we can’t do delete[] in this structure. Instead, we have to traverse the list and delete them individually.

Testing our code

After implementing all those functions, let’s add a main function to put it all together.

int main() {
  int n;
  cout << "Type in a size: ";
  cin >> n;

  LinkedList* head = nullptr;
  for (int i=0; i < n; i++) {
    head = add_node(head, i);
  }

  print_list(head);
  delete_list(head);
  print_list(head); // undefined behavior!
  // We deleted our list, so we have an "old" pointer.
  // After deleting something, it's important to reset addresses.
  // So after delete_list, we should have head = nullptr;

  return 0;
}

You can download this complete example here.

References


  1. If you’re interested in learning more, you can check the reference about the more in-depth overview. This is out of the scope of our course, but it is interesting :) ↩︎

  2. As usual, your results may vary depending on your memory size, computer architecture, platform, etc. ↩︎

  3. We can use a pointer to change the value of a memory address outside the function, but it won’t change a pointer we receive. If you want to modify an external pointer, you would need to receive a pointer of a pointer. ↩︎