11: Structs and Unions

UC Irvine - Fall ‘22 - ICS 45C

Quick list of things I want to talk about:

  • multi-type things
  • structs
  • unions
  • memory usage

Expanded notes:

So far we have used variables to store single values – or values of the same type. All our functions and examples use the basic types (e.g., int, float), strings, or arrays of basic types.

But what do we do if we need to write a function that accepts more than one type of variable? Or if we want to (i) group variables that make sense to go together (e.g., name and age of a person, make, model, and year of a car), or (ii) make a function that can calculate a result that could be int or float?

One way to do that is to use structs for (i) or unions for (ii).

Structs

A struct is used to define a new type that stores multiple pieces of information inside of it.

For example, in the comic at the top of this notes, we could create a struct that stores all those different flags, so we don’t need many variables to keep track of something, but instead could use only one.

To create a struct, you use the following syntax:

struct -NAME- {
  TYPE1 var_t1_1, var_t1_2, ..., var_t1_N;
  TYPE2 var_t2_1, var_t2_2, ..., var_t2_N;
  ...
  TYPEN var_tN_1, var_tN_2, ..., var_tN_N;
};

This creates a type called struct -NAME-, where -NAME- is a valid name you define. We can declare variables of this type like this:

struct -NAME- my_var;

Each variable will give you access to each variable we defined, and we can access them with the . operator. For example, you could use my_var.var_t1_1 to access the value of (or assign something to) that field of type1.

An example struct:

struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
  int field7;
};

struct MyStruct test1;

// Can assign and access each field individually, but test1 is still a single variable.
test1.field1 = 1;
test1.field2 = 3.14;

cout << test1.field1 << ", " << test1.field2;

You can then use your customly defined structs as variables, a function return type, function parameter, or anywhere else you use other types.

Initializing a struct

You can use the list-initializer to assign values to your struct. The order of the values is defined by the order you have declared them inside your struct.

For example:

struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
  int field7;
};

struct MyStruct test1{1, 2.3, '4', '5', true, 7};

cout << test1.field1 << ", " << test1.field2;

Should print 1, 2.3.

Unions

A union is different to a struct in the sense that it does not combine a lot of values in a single variable. What a union does is let you access the same value as if they are different types. So, in other words, the fields from a union share the same space in memory, you’re just changing how you interpret that value when you read it as a different type.

For example, if you want to write a function that could return an int or a float, you could return a union between those two types, and the user could access the correct one.

Similarly to structs, we can define a custom union and then use it wherever we want. Also similar to structs, we use the . operator to access the field we want.

To create a union we use the following syntax:

union -NAME- {
  TYPE1 var_t1;
  TYPE2 var_t2;
  ...
  TYPEN var_tN;
};

This creates a type called union -NAME-, where -NAME- is a valid name you define. We can declare variables of this type like this:

union -NAME- my_var;

Example:

union MyUnion {
  unsigned char a;
  char b;
};

union MyUnion test2;

// Decide which one you want to choose, and then we can just access that field;
// But test2 is still a single variable.
test2.a = 255;
cout << (short) test2.a;

Unnamed structs/unions

On the examples above, we gave names to our struct and union. However, if you want to use your struct/union immediately and don’t need a named-type, you can create unnamed ones:

struct {
  int a;
  float b;
  long long c;
} my_var1;

my_var1.a = 123;
cout << my_var1.a;


union {
  int a;
  float b;
} my_var2;
my_var2.a = 123;
cout << my_var2.a;

The problem here is that we cannot use these types in function definitions, since we haven’t named them. And if we want to create more variables later on, we’d need to recreate the same structure.

Nested construction

Both unions and structs allow you to create nested things. For example, you could put a struct, inside a union, inside a struct.

struct S1 {
  int a;
  float b;
};

union U1 {
  char a;
  struct S1 b;
};

struct S2 {
  U1 a;
  long double b;
};

struct S2 test_var;

test_var.a.b.a = 123;
cout << test_var.a.b.a;

Accessing the wrong field…

in a struct

What do you think the following piece of code prints?

struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
};

struct MyStruct test3;

// Assigning to field1 but accessing field2.
test3.field1 = 1;
cout << test3.field2;

Well, most of the time probably 0. But, this is undefined behavior!

Like we discussed in lecture, we shouldn’t use values of variables before they are initialized. A field inside a struct is similar to a regular variable, so we don’t know what’s there.

in a union

What do you think the following piece of code prints?

union MyUnion {
  unsigned char a;
  char b;
};

union MyUnion test4;

// Assigning to a but accessing b.
test4.a = 255;
cout << (short) test4.b;

This should print -1. Remember that the fields inside a union share the same space. Since we have an unsigned char and a char, this union takes a single byte. Whever we assign 255 to test4.a, we store 11111111 in that space. When we read that same value as a signed char, that’s equal to -1.

Memory usage

Since a struct uses all its values independently, it needs space for all of them. So a struct that contains 4 chars, 2 shorts, and 5 ints, will use 4 * 1bytes + 2 * 2bytes + 5 * 4bytes, for a total of 28 bytes1.

A union however, does not use its values separately. It just allows the user/programmer to read the same value as a different type. So a union simply takes the same space as the largest type inside of it. For example, a union with a char, an int, and a long long int, would take the max(1, 4, 8) bytes, so 8 bytes1.

Bit-field struct

This won’t be used in this course, but if you really want to save as much space as possible, you can access individual bits independently. This can save a lot of space when using flags, for example, since each flag can take a single bit instead of an entire byte – 1/8 comparison!

To create a bit-field struct, you would use the following syntax:

// Example adapted from: https://en.cppreference.com/w/cpp/language/bit_field
struct S
{
    // A variable of this struct usually occupies 2 bytes:
    // 3 bits: value of b1
    // 2 bits: unused
    // 6 bits: value of b2
    // 2 bits: value of b3
    // 3 bits: unused
    unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
};

Although you could access individual bits using bit-masks:

// to get the 4th bit of an int
(my_int >> 3) & 1;

bit-fields has some advantages. It allows you to defined custom sizes (54 bits for example) instead of being tied to the pre-defined 8-16-32-64 bits. And even though it might take a little extra space since you can only request whole bytes, you always have that same space available with any compiler in any platform, since you’re manually creating your type with a specified size.

You can read more about them here: https://en.cppreference.com/w/cpp/language/bit_field

Using typedef

Now we know how to create structs and unions, but the types we define still need the keyword struct/union as a prefix (e.g., struct MyStruct, union MyUnion).

typdef is a keyword that lets us define a name for an existing type. We can combine this instruction with our previous syntax to get rid of the extra struct/union we need to type.

The examples below try to illustrate this:

Example 1

struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
};

struct MyStruct test5;

test5.field1 = 123;
cout << test5.field1;

This is similar to what we saw above. We create a struct, then use struct MyStruct as a type for our variable.

Example 2

struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
};

typedef struct MyStruct NewName;

NewName test5;

test5.field1 = 123;
cout << test5.field1;

The first part is the same here. We define the same struct with the same name and fields.

Then, we give it a new name (NewName) and use that to create variables.

Example 3

typedef struct MyStruct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
} NewName;

NewName test5;

test5.field1 = 123;
cout << test5.field1;

This is a combination of the struct definition part and the name definition from the example above. Here, we create the struct inside the typedef, and immediately give it a NewName.

Example 4

typedef struct {
  int field1;
  double field2;
  char field4, field5;
  bool field6;
} NewName;

NewName test5;

test5.field1 = 123;
cout << test5.field1;

In this final example, we create an unnamed struct and give it a name with typedef. This way, we can use NewName in functions and create new variables later on as needed.

typedef summary

In all examples above, the test5 variable has the same fields and all code snippets output the same thing. However:

  • on example 1 we can only use struct MyStruct;
  • on examples 2 and 3 we can use both struct MyStruct and NewName;
  • on example 4 we can only use NewName.

Note that in all examples NewName could be any valid name – including MyStruct!
Since MyStruct is not a valid keyword/type by itself, it needs to be struct MyStruct, we can actually use that for a new type.

All the examples above use struct as the custom type, but the usage of typedef would be the same for unions.

References


  1. Assuming the same sizes we used in the Variable notes a little ago. ↩︎ ↩︎