Objects, pointers and references
An object is something that has a lifetime and occupies storage. Even a humble int is an object, but a function is not.
A pointer is a typed address. It associates a type with what is found at some memory location.
Pointers allow us to do arithmetic, but that’s legitimately seen as a dangerous operation, as it can take us to arbitrary locations. Accessing the contents of arbitrary addresses is just asking for trouble.
C++ has special types for pointer manipulation:
void*means address with no specific type semantics. Avoid*is an address with no associated type. All pointers are implicitly convertible tovoid*; an informal way to read this is all pointers regardless of type are addresses. The converse does not hold. For example, it’s not true that all addresses are implicitly convertible tointpointers.char*means pointer to a byte. Due to the C language roots of C++, achar*can alias any address in memory (thechartype regardless of its name, which evocates character, really meansbytein C and by extension in C++). There is an ongoing effort in C++ to to givecharthe meaning of character.std::byte*is the new pointer to a byte, atleast since C++17. The long term intent ofstd::byte*is to replacechar*in those functions that do byte-per-byte manipulation or addressing, but since there’s so much code that useschar*to that effect, this will take time.
String literals
A string literal is a character sequence enclosed within double quotes.
A string literal contains one more character than it appears to have; it is terminated by the \0 null termination character (having integer value 0). For example,
The type of a string literal is array of appropriate number of const characters, so "Bohr" is of type const char[5].
In C and older C++ code, you could assign a string literal to a non-const char*.
Modifying string literals is undefined behavior. In practice, the implementation can for instance store the string literal in read-only memory, such as the .rodata segment on Linux.
void f()
{
//char* p = "Cauchy"; // error, since C++11
const char* s = "Cauchy"; // ok
//s[4] = 'e'; // error, assignment to const
}Having string literals as immutable is not only obvious but also allows implementations to do significant optimizations in the way string literals are stored and accessed.
If we want a string that we are guaranteed to be able to modify, we must place the characters in a non-const array.
A string literal is statically allocated so it is safe to return one from a function. It is an lvalue.
Whether two identical strings are allocated as one array, or as two is implementation defined. For example:
const char* p = "Carl Friedrich Gauss";
const char* q = "Carl Friedrich Gauss";
if(p == q)
std::cout << "\n" << "one!"; // Implementation definedNote that, == compares addresses (pointer values) when applied to pointers, and not the objects pointed to.
Challenge puzzle
Observe the code snippet below. What is printed?
#include <iostream>
int main()
{
// string literals
char s1[] = {'h','e','l','l','o', '\0'};
char s2[] = "hello";
const char* s3 = "world";
std::cout << "s1 = " << s1 << "\n";
std::cout << "s2 = " << s2 << "\n";
std::cout << "s3 = " << s3 << "\n";
const char* c[] = {
"C++", "is", "a", "general", "purpose",
"programming", "language"
};
std::cout << "c + 0 : " << (c) << "\n";
std::cout << "c + 1 : " << (c + 1) << "\n";
std::cout << "c + 2 : " << (c + 2) << "\n";
std::cout << "c + 3 : " << (c + 3) << "\n";
std::cout << "c[0] : " << c[0] << "\n";
std::cout << "c[1] : " << c[1] << "\n";
std::cout << "c[2] : " << c[2] << "\n";
std::cout << "c[3] : " << *(c + 3) << "\n";
const char** cp[] = { c + 2, c + 3, c, c + 1 };
const char*** cpp = cp;
std::cout << *cpp[1] << ' ';
std::cout << *(*(*(cpp + 2) + 2) + 3) << ' ';
std::cout << (*cpp)[-1] << ' ';
std::cout << *(cpp + 3)[-1] << std::endl;
return 0;
}Challenge puzzle
Which of the following can be used to print the address of a char variable?
&ch[0](char*)&ch(void*)&ch&ch
&ch is of type char*. The << operator for std::cout has an overload for char* that interprets it as a pointer to a null-terminated C-style string, so it attempts to print characters starting from the address of ch until it encounters a null terminator (\0).
A pointer has to be able address all memory space. On 64-bit architecture, sizeof(T*) is therefore, \(8\) bytes = \(64\) bits.
void f(int* pi)
{
void* pv = pi; // ok : implicit conversion of `int*` to `void*`
// *pv; // error: can't dereference void*
++pv; // error: can't increment void*
int* pi2 = static_cast<int*>(pv); // explicit conversion back to `int*`
//double* pd1 = pv; // error
//double* pd2 = pi; // error
double* pd3 = static_cast<double*>(pv); // unsafe
}In general, it is not safe to use a pointer that has been converted to a type that differs from the type of the object pointed to.
The primary use for void* is for passing pointers to functions that are not allowed to make assumptions about the type of the object and for returning untyped objects from functions.
Pointer declarations
Use the spiral rule, when reading pointer declarations. Start at the inner-most level and work your way outwards spiralling in a counter-clockwise direction.
Dereferencing null pointers
Trying to dereference a null pointer is an error. On most platforms, it generally causes a signal, usually SIGSEGV (see Signals).
Likewise a pointer that has the wrong alignment for the target data type (on most types of computer), or points to a part of memory that has not been allocated in the process’s address space.
Pointer comparisons
Two pointer values are equal if they point to the same memory address or they are both nullptr. Ordering comparisons such as > and >= operate on pointers by converting them to unsigned integers.
Pointers into arrays
In C++, pointers and arrays are closely related. The name of the array holds the starting address of the array can be used as a pointer to the initial element.
int v[] = {1,2,3,4};
int* p1 = v; // pointer to initial element
int* p2 = &v[0]; // pointer to initial element
int* p3 = v+4; // pointer to one beyond the last elementTaking a pointer to the element one beyond the end of an array is guaranteed to work. This is important for many algorithms. However, since such a pointer does not in fact point to an element of the array, it should not be used for dereferencing, reading or writing values.
The result of taking the address of the element before the initial element or beyond one-past-the-last element is undefined and should be avoided.
For example:
Challenge puzzle
int x = 10;
int *p = &x;
int **pp = &p;
// What happens with each line?
**pp = 20;
*pp = nullptr;
// Can you still access x? What's its value?Compiler Explorer After the line *pp=20, the variable x has been assigned a new value 20. *p = nullptr will reset the value in pointer variable p to a nullptr. We cannot access the contents of the variable x through the pointer variables p and pp.
Challenge Puzzle
Observe the code snippet below. What is printed?
// Headers
int main()
{
const char* str[] = { "AAAAA", "BBBBB", "CCCCC", "DDDDD" };
const char** sptr[] = { str + 3, str + 2, str + 1, str };
const char*** pp;
pp = sptr;
++pp;
std::cout << **++pp + 2;
}Initially, pp is incremented to point to the address of sptr[1].
In the order of precedence, from high to low, we have:
*dereference and++pre-increment operator (Right-to-left)+- Addition binary operator (Left-to-right)
So, **++pp + 2 would be parsed as *(*(+pp)) + 2. Thus, pp is now incremented to point to the address of str+1. Then, it is dereferenced twice to yield the string literal BBBBB which is a char*. Finally, an offset of 2, will print the text BBB. ## Passing C-style arrays
Arrays cannot be directly passed by value. Instead, an array is as a pointer to its first element.
double vec_norm(double* vec, std::size_t n)
{
double sum_of_squares{0.0};
for(int i{0}; i<n; ++i)
{
sum_of_squares += vec[i];
}
return sqrt(sum_of_squares);
}Multi-dimensional arrays can be passed in a similar fashion.
References
The C++ language supports two families of indirections: pointers and references. A reference can be seen as an alias for an existing entity. We deliberately did not use the word object, since one could refer to a function and we already know that a function is not an object.
Pointers are objects. As such they occupy storage. References, on the other hand, are not objects, they do not use any storage of their own.
The sizeof operator applied to a reference, will yield the size of whatever it refers to. In C++, a reference is always bound to an object and remains bound to that object until the end of the reference’s lifetime. A pointer, on the other hand, can point to numerous distinct objects during its lifetime.
Another difference between pointers and references is that, contrary to the situation, there is no such thing as reference arithmetic. This makes references safer than pointers.
Understanding the fundamental properties of objects
We saw earlier that in C++, an object has a type and an address. It occupies a region of storage from the beginning of it’s construction to the end of it’s destruction.
Object lifetime
In C++, generally speaking, automatic objects are destructed at the end of their scope in a well-defined order. Static(global) objects are destructed on program termination in a somewhat well-defined order. Dynamically allocated objects are destroyed when your program says so.
Let’s examine some aspects of object lifetime with the following very simple program:
#include <string>
#include <print>
#include <format>
struct X{
std::string s;
X(std::string_view s) : s{ s }
{
std::print("X::X({})\n", s);
}
~X(){
std::print("~X::X() for {}\n", s);
}
};
X glob{ "glob" };
void g(){
X xg{ "g()" };
}
int main()
{
X* p0 = new X{ "p0" };
[[maybe_unused]] X* p1 = new X{ "p1" }; // will leak
X xmain{ "main()" };
g();
delete p0;
// oops, forgot to delete p1
return 0;
}When executed, the program will print the following:
X::X(glob)
X::X(p0)
X::X(p1)
X::X(main())
X::X(g())
~X::X() for g()
~X::X() for p0
~X::X() for main()
~X::X() for globThe fact that the number of constructors and destructors do not match is a sign that we did something wrong. More specifically, in this example, we manually created an object (pointed to by p1) with the operator new but never manually destructed that object afterward. This is a memory leak.
Object size, alignment and padding
Since each object occupies storage, the space associated with an object is an important(if low-level) property of C++ types. For example, look at the following code:
class B; // fporward declaraion: there will be a class B
// at some point in the future
void f(B*); // fine, we know what NB is, even if don't know the details yet,
// and all object addresses are of the same size
class D : B{}; // oops! This is the definition of class D. To determine
// sizeof(D), we have to know how big sizeof(B) is and what
// a B object contains since a D is a BIn the above example, trying to define the D class would not compile. This is because in order to create aD object, the compiler needs to reserve enough space for a D object, but a D object is also a B object and as such we cannot kn ow the size of a D object without knowing the size of B object.
The size of an object or equivalently of a type can be obtained through the sizeof operator. This operator yields a compile-time, non-zero unsigned integral value corresponding to the number of bytes required to store an object.
#include <print>
int main(){
char c;
// a char precisely occupies one byte of storage, per
// standard wording
static_assert(sizeof(c) == 1);
struct Tiny{};
// all C++ types occupy non-zero bytes of storage by
// definition, even if they are empty like type Tiny
static_assert(sizeof(Tiny) == 1);
}In the preceding example, the Tiny class is empty because it has no data-member. A class could have member functions and still be empty.
A C++ object always occupies atleast one byte of storage, even in the case of empty classes such as Tiny. That’s because if the object’s size was zero, that object could be at the same memory location as its immediate neighbor, which would be somewhat hard to reason about.
C++ differs from many other languages in that it does not standardize the size of all fundamental types. For example, sizeof(int) can yield different values depending on the compiler and the platform. Still there are rules concerning the size of objects:
- The size reported by operator
sizeoffor objects of typesigned char,unsigned charandcharis \(1\), and the same goes forsizeof(std::byte)as each of these types can be used to represent a single byte. - The standard specifies a minimum width for the fundamental types:
- Signed Integer Types : Signed integers are represented in two’s complement form. The rules of binary arithmetic are the same for two’s complement as the standard base-2 representation. Hence, the same binary adder hardware circuits can be used for signed integer addition.
| Type | Minimum width \(N\) |
|---|---|
signed char |
\(1\) byte |
short |
\(2\) bytes |
int |
\(2\) bytes |
long |
\(4\) bytes |
long long |
\(8\) bytes |
- For each of the signed integer types, there exists the corresponding, but different, standard unsigned integer types. An unsigned integer type has the same width \(N\) as the corresponding unsigned integer type. The range of representable values for an unsigned type is \(0\) to \(2^N - 1\). Arithmetic for the unsigned type is performed modulo \(2^N\). Unsigned arithmetic does not overflow. Each value \(x\) of an unsigned integer type with \(N\) has a unique representation \(x = x_0 2^0 + x_1 2^1 + \ldots + x_{N-1}2^{N-1}\), where each coefficient \(x_i\) is either \(0\) or \(1\): this is called the base-2 representation of \(x\).
- The size occupied by an object of any
structorclasscannot be less than the sum of the size of its data-members.
Empty base class optimization (EBCO)
Consider the following code snippet:
class X{};
class Y : X{ // private inheritance
char c;
};
int main()
{
Y y;
static_assert(sizeof(Y) == 1);
return 0;
}Compiler Explorer Since the base class is empty, and objects of the derived class Y occupy atleast one byte of storage, the base class can be flattened, when creating objects of the derived class Y. Note that, since the presence of X in Y is an implementation detail, not something that participates in the interface of class Y, I used private inheritance.
Alignment
struct X{
char c; // atleast 1 byte
short s; // atleast 2 bytes
int i; // atleast 2 bytes
long long ll; // atleast 8 bytes
};
int main(){
static_assert(sizeof(X) >= 13);
}We know that sizeof(X) will be atleast \(13\) bytes. In practice however, sizeof(X) is likely to be equal to \(32\) bytes. This might seem surprising at first, but it’s a logical consequence of something called alignment.
The alignment of an object tells us where that object can be placed in memory. The char type has an alignment of \(1\), and as such one can place a char object literally anywhere (as long as one can access that memory). short has an alignment of 2. If a type has an alignment \(n\), then objects of that type must be placed at an address that is a multiple of \(n\).
The alignment has to be a strictly positive power of \(2\).
The C++ language offers two operators related to alignment: - The alignof operator, which yields the natural alignment of a type T or of an object of that type. - The alignas operator, which lets the programmers impose the alignment of an object. this is often useful, when playing tricks with memory(as we will), or when interfacing wit with exotic hardware. Of course, alignas can only reasonably increase the natural alignment of a type T, not reduce it.
For some fundamental type T, we can expect the assertion sizeof(T) is equal to alignof(T) to hold, but that assertion does not generalize to composite types. For example, consider again the struct X:
struct X{
char c; // alignof(c) == 1
short s; // alignof(s) == 2
int i; // alignof(i) == 4 (most probable)
long long ll; // alignof(ll) == 8 (most probable)
};Generally, speaking for a composite type, the alignment will correspond to the worst alignment of the data members. Here, worst means biggest. For struct X, the worst-aligned data member is type long long, and as such X objects will be aligned on \(8\)-byte boundaries, so it can be placed at an address \(8\), \(16\), \(24\) and so forth. It is highly probable sizeof(X) == 16 bytes.
Offset Content
------ -------
0 | c | (char, 1 byte)
+----+
1 |pad | (padding, 1 byte)
+----+
2 | s | (short, 2 bytes)
3 | |
+----+
4 | i | (int, 4 bytes)
5 | |
6 | |
7 | |
+----+
8 | ll | (long long, 8 bytes)
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
+----+Now, that we know about alignment, just changing the order of the elements in a struct can affect memory consumption. We should always code structs in the order of largest to smallest data-members.