C Programming/Advanced data types
In the chapter Variables we looked at the primitive data types. However, more advanced data types allow us greater flexibility in managing data in our program.
Structs
[edit | edit source]Structs are data types made of variables of other data types (possibly including other structs). They are used to group pieces of information into meaningful units, and also permit some constructs not possible otherwise. The variables declared in a struct are called members.
Defining
[edit | edit source]One defines a struct using the struct keyword and a block of members. These members are specified using variable declarations. For example:
struct mystruct {
int int_member;
double double_member;
char string_member[25];
} struct_var;
struct_var is a variable of type struct mystruct, which we declared along with the definition of the new struct mystruct data type.
This new type's name is made up of multiple words, just like some built-in types, such as unsigned long.
|
More commonly, struct variables are declared after the definition of the struct, using the form:
struct mystruct {
// ...
};
struct mystruct struct_var;
Accessing members
[edit | edit source]The members of a struct variable may be accessed using the member access operator . (a dot):
struct_var.int_member = 0;
Type synonyms
[edit | edit source]It is often common practice to make a type synonym so we don't have to type "struct mystruct" all the time. C allows us the possibility to do so using a typedef statement, which aliases a type:
typedef struct {
// ...
} Mystruct;
The struct itself is an incomplete type (by the absence of a name on the first line), but it is aliased as Mystruct. Then the following may be used:
Mystruct struct_var;
Nesting
[edit | edit source]Structs may contain not only their own variables but may also contain other structs:
#include <stdio.h>
#include <stdlib.h>
struct weapon {
char name[100];
int attack_power;
struct {
int strength;
int agility;
int intelligence;
} attributes;
};
int main(int argc, char *argv[]) {
struct weapon sword = { "A cool thing", 5, { 3, 1, 0 } };
printf("This sword requires %d STR.\n", sword.attributes.strength);
printf("It also takes up %zd bytes.\n", sizeof sword);
return EXIT_SUCCESS;
}
Outputs: This sword requires 3 STR. It also takes up 116 bytes. |
Members and memory
[edit | edit source]If our new type really is a type, then like any other type, it must have a size.
Size
[edit | edit source]Recall struct mystruct from earlier. It is composed of an int, double, and char[25]. On most modern systems, these have sizes of 4, 8, and 25 bytes, respectively. What do you think is the size of struct mystruct as a whole? Would it be bytes?
| Storage and alignment sizes for each data type can vary from system to system. In this section, we'll use the numbers for a 64-bit computer, but the concepts themselves apply everywhere. |
Let's test this assumption.
#include <stdio.h>
#include <stdlib.h>
struct mystruct {
int int_member;
double double_member;
char string_member[25];
};
int main(int argc, char *argv[]) {
printf("sizeof(int) = %zu\n", sizeof(int));
printf("sizeof(double) = %zu\n", sizeof(double));
printf("sizeof(char[25]) = %zu\n", sizeof(char[25]));
printf("sizeof(struct mystruct) = %zu\n", sizeof(struct mystruct));
printf("alignof(int) = %zu\n", alignof(int));
printf("alignof(double) = %zu\n", alignof(double));
printf("alignof(char[25]) = %zu\n", alignof(char[25]));
printf("alignof(struct mystruct) = %zu\n", alignof(struct mystruct));
return EXIT_SUCCESS;
}
Output: sizeof(int) = 4 sizeof(double) = 8 sizeof(char[25]) = 25 sizeof(struct mystruct) = 48 alignof(int) = 4 alignof(double) = 8 alignof(char[25]) = 1 alignof(struct mystruct) = 8 |
alignof(...) by itself is new in C23. In older development environments, you must #include <stdalign.h> beforehand or instead type it as _Alignof(...).
|
The whole is greater than the sum of its parts! Why is this? It has to do with alignment, which we'll cover now.
Alignment
[edit | edit source]If a two-byte short is placed in memory at address 0080, another one couldn't be placed at 0079 or 0081 since it will overlap; 0078 and 0082 are the next closest options, leaving no gap between them in memory. However, if there was a one-byte char at 0082, a following second short wouldn't be allocated at 0083, but at 0084, leaving one byte of padding in the middle.
This is because your processor is a little picky in how it wants to load data from and store data to memory. This pickiness is related to your processor's word size, which is 64 bits (or 8 bytes) on most computers today. There might be a performance penalty if either of the following happen:
- the load address isn't a multiple of the load size
- the load size isn't a power of two (in bytes) including one, capped at the word size
Compiled programs must have the processor either access members of the new type directly or split a load or store of the whole thing into multiple word-size loads. And, on some systems, the above scenarios are outright prohibited. To avoid any performance penalties (at best) or crashes (at worst), all types in C have an alignment, which is the number of bytes that must be between each instance of that type. When data is allocated, unused padding bytes fill space in memory before it until the alignment is met. All primitive types have sizes that are powers of two, so their alignments are always their sizes or the word size, whichever is smaller. Structs, being made of multiple members of possibly different sizes, have the alignment of their largest-alignment member.
The alignof keyword in the above example works like sizeof. It is not a function; during compilation, it evaluates to the alignment of the type or object it is given. Using the values from the example, we can recreate how our data is laid out in memory:
| Offset | Value |
|---|---|
00 |
4 bytes for int_member
|
04 |
4 bytes of padding, to align double_member
|
08 |
8 bytes for double_member
|
0C
| |
10 |
25 bytes for string_member
|
14
| |
18
| |
1C
| |
20
| |
24
| |
28 |
7 bytes of padding, to align the struct itself |
2C
|
Repacking
[edit | edit source]If we rearrange the struct so that the members are ordered from greatest alignment to least alignment, the padding hole is removed from the middle of the struct.
Swapping the first two members:
struct mystruct {
double double_member;
int int_member;
char string_member[25];
};
...results in sizeof(struct mystruct) == 40, saving eight bytes! Let's check its new memory layout:
| Offset | Value |
|---|---|
00 |
8 bytes for int_member
|
04
| |
08 |
4 bytes for int_member
|
0C |
25 bytes for string_member
|
10
| |
14
| |
18
| |
1C
| |
20
| |
24 |
7 bytes of padding, to align the struct itself |
28
|
Now, the only unused space is the unavoidable alignment padding at the end of the struct.
Enumerations
[edit | edit source]Enumerations are artificial data types representing associations between labels and integers. Unlike structs or unions, they are not composed of other data types. An example declaration:
enum weather {
sunny,
windy,
cloudy,
rain,
} weather_outside;
In the example above, sunny equals 0, windy equals 1, ... and so on. It is possible to assign values to labels within the integer range, but they must be a literal.
Switching on value
[edit | edit source]Similar declaration syntax that applies for structs and unions also applies for enums. Also, one normally doesn't need to be concerned with the integers that labels represent:
enum weather weather_outside = rain;
This peculiar property makes enums especially convenient in switch-case statements:
switch (weather_outside) {
case sunny:
wear_sunglasses();
break;
case windy:
wear_windbreaker();
break;
case cloudy:
get_umbrella();
break;
case rain:
get_umbrella();
wear_raincoat();
break;
}
Changing data type
[edit | edit source]Sometimes, the underlying data type for the enumeration matters. Since C23, it is possible to set the exact type of integer to use:[1]
enum weather : short {
sunny,
windy,
cloudy,
rain,
} weather_outside;
| This is an example of a feature that originated in C++ and some C compilers' extensions to the language, then later standardized. |
Unions
[edit | edit source]The definition of a union is similar to that of a struct. The difference between the two is that in a struct, the members occupy different areas of memory, but in a union, the members occupy the same area of memory. Thus, in the following type, for example:
union {
int i;
double d;
} u;
The programmer can access either u.i or u.d, but not both at the same time. Since u.i and u.d occupy the same area of memory, modifying one modifies the value of the other.
|
The size of a union is the size of its largest member.
Tagged union
[edit | edit source]Imagine that you are developing a settings editor for an application. In this application, setting values can be integers, floating-point numbers, or single characters. Despite this, it's possible to represent all setting values with a single type.
Structs, enums, and unions can be combined to create a tagged union, a complex type which pairs some data that varies in type with information on the current type of that data. The C runtime does not keep track of the types of data in your program, so an enum is used to define all types that data could be. A union represents the varying-type data, and a struct keeps the type enum and data union together. Combining this with type synonyms, we get the following:
#include <stdio.h>
#include <stdlib.h>
typedef enum {
integer,
decimal,
character
} SettingValueType;
typedef struct {
SettingValueType type;
union {
int integer;
double decimal;
char character;
} data;
} SettingValue;
int main(int argc, char *argv[]) {
SettingValue value = { decimal, { .decimal = 3.7 } };
switch (value.type) {
case integer:
printf("int value = %d\n", value.data.integer);
break;
case decimal:
printf("double value = %lf\n", value.data.decimal);
break;
case character:
printf("char value = %c\n", value.data.character);
break;
}
printf("sizeof value = %zd\n", sizeof value);
return EXIT_SUCCESS;
}
Outputs: double value = 3.700000 sizeof value = 16 |
References
[edit | edit source]- ↑ "N3030: Enhancements to Enumerations". open-std.org. Retrieved 2025-11-09.