Aliasing of C-language arrays.

Aliasing of C-language arrays.
Version of Friday 3 January 2014.
Dave Barber's other pages.

In the well-known C programming language, the management of arrays is particularly transparent; this arises from the designers' quest for a language that could be implemented very efficiently in terms of both time and space.

In particular, when a C program contains code to access an array element, the compiler can almost always generate a very brief sequence of machine language instructions for the central processing unit to execute. The tradeoff is that the C implementation of arrays gives little protection to the programmer who writes erroneous code, often leading to bugs that are difficult to track down.

Among C derivatives, C++ is practically the same in this regard, but Java and C# exhibit noticeable differences.

Note that in C terminology, a definition is an instruction to the compiler that not only describes an item, but also reserves storage for it and might effect an initialization. In contrast, a declaration describes an item, but does not reserve storage or initialize. A definition is also a declaration, but not vice versa. Every item must have exactly one definition, but may have many declarations, as long as they are consistent.

Multiple declarations of the same item become essential when several sections of a program are compiled separately and subsequently linked. Each section needs a declaration for any item that it must know about, but exactly one section must contain the definition. Beyond that, multiple declarations are frequently necessary when recursive data structures or algorithms are employed.

First we look at the one-dimensional array. For example, the definition

    int P[7];

creates an array named P that has 7 components, all integers. Those components are numbered 0 through 6, with square brackets being used for subscripts. Thus P[0] is the first component, P[1] the second component, and P[6] the last. Subscripts begin at 0 rather than 1 because this simplifies address calculations in the executable version of the program. This gives rise to the surprising phenomemon that we write P[7] to create the array, but we never write P[7] to use any of the components of the array.

When the programmer defines an array, the computer stores the data within memory wherever it will; from the programmer's point of view, the ultimate address is arbitrary and unpredictable. (Still, once the array is created, the program can find out what the address turned out to be.) The important thing is the relative locations of the array components. If for example the following conditions apply:

the machine decides to place P at the starting address 1F30 (hexadecimal),
each integer requires two bytes, and
machine addresses represent bytes,

then P[0] will be stored within addresses 1F30 and 1F31, P[1] at 1F32 and 1F33, and P[6] at 1F3C and 1F3D. If the programmer writes P[7] (7 being an out-of-range subscript) then whatever happens to be stored at address 1F3E will be accessed, and because that address is not part of P, the program will probably malfunction. Also out of range is any negative subscript, for instance P[−1] which would refer to 1F2E and 1F2F. Here is a tabulation:

Table one.
address component
1F2F and below unrelated data
1F30 and 1F31 P[0]
1F32 and 1F33 P[1]
1F34 and 1F35 P[2]
1F36 and 1F37 P[3]
1F38 and 1F39 P[4]
1F3A and 1F3B P[5]
1F3C and 1F3D P[6]
1F3E and above unrelated data

Table one.
address	component
`1F2F` and below	unrelated data
`1F30` and `1F31`	`P[0]`
`1F32` and `1F33`	`P[1]`
`1F34` and `1F35`	`P[2]`
`1F36` and `1F37`	`P[3]`
`1F38` and `1F39`	`P[4]`
`1F3A` and `1F3B`	`P[5]`
`1F3C` and `1F3D`	`P[6]`
`1F3E` and above	unrelated data

Two-dimensional arrays can be also defined, and they are mapped into one-dimensional arrays according to a simple polynomial formula. For instance,

    int Q[7][12];

will have 7 × 12 = 84 integer components, the first being Q[0][0] and the last Q[6][11].

Consider a hypothetical one-dimensional array that precisely overlays Q:

    int R[84];

Then in C (and other languages that use row-major order), component Q[i][j] is the same piece of memory as R[i×12 + j]. (The 7 that appears in int Q[7][12] plays no role here.) As will be seen later, C provides a syntax that permits definitions similar to Q and R to indeed be in effect simultaneously, as aliases.

Arrays of three or more dimensions readily follow, for instance

    int S[3][7][4];

If S with its 3 × 7 × 4 = 84 components precisely overlays R, then S[i][j][k] refers to the same address as R[i×7×4 + j×4 + k].

Demonstrating initialization is the following C-language definition of a 3-by-5 array named T, containing floating-point numbers. The fifteen components are initialized to arbitrary values chosen by the programmer:

    float T[3][5] = {
        { +1.432, +6.545, -8.767,  0.000, -9.999 }, 
        { -2.213, -2.543, +3.949, +6.789, +3.232 }, 
        { +0.243, -5.949, +7.191, +4.321, -0.017 } 
    };

Some components are T[2][3] which equals +4.321 and T[0][2] which equals −8.767. Note that T[3][0] represents a location outside the array; and T[1][5] is an irregular way to access T[2][0].

Next is a C-language definition of a 4-by-6 array of floats loaded with values that form an informative pattern. The components are also displayed in column A of table two, which reveals why the initialization is called subscriptive.

    float A[4][6] = {
        { 0.00, 0.01, 0.02, 0.03, 0.04, 0.05 }, 
        { 0.10, 0.11, 0.12, 0.13, 0.14, 0.15 }, 
        { 0.20, 0.21, 0.22, 0.23, 0.24, 0.25 }, 
        { 0.30, 0.31, 0.32, 0.33, 0.34, 0.35 } 
    };

Table two.
A B C D E
A[0][0] == 0.00 A[0][1] == 0.01 A[0][2] == 0.02 A[0][3] == 0.03 A[0][4] == 0.04 A[0][5] == 0.05 A[1][0] == 0.10 A[1][1] == 0.11 A[1][2] == 0.12 A[1][3] == 0.13 A[1][4] == 0.14 A[1][5] == 0.15 A[2][0] == 0.20 A[2][1] == 0.21 A[2][2] == 0.22 A[2][3] == 0.23 A[2][4] == 0.24 A[2][5] == 0.25 A[3][0] == 0.30 A[3][1] == 0.31 A[3][2] == 0.32 A[3][3] == 0.33 A[3][4] == 0.34 A[3][5] == 0.35 B[ 0] == 0.00 B[ 1] == 0.01 B[ 2] == 0.02 B[ 3] == 0.03 B[ 4] == 0.04 B[ 5] == 0.05 B[ 6] == 0.10 B[ 7] == 0.11 B[ 8] == 0.12 B[ 9] == 0.13 B[10] == 0.14 B[11] == 0.15 B[12] == 0.20 B[13] == 0.21 B[14] == 0.22 B[15] == 0.23 B[16] == 0.24 B[17] == 0.25 B[18] == 0.30 B[19] == 0.31 B[20] == 0.32 B[21] == 0.33 B[22] == 0.34 B[23] == 0.35 C[0][0] == 0.00 C[0][1] == 0.01 C[0][2] == 0.02 C[0][3] == 0.03 C[1][0] == 0.04 C[1][1] == 0.05 C[1][2] == 0.10 C[1][3] == 0.11 C[2][0] == 0.12 C[2][1] == 0.13 C[2][2] == 0.14 C[2][3] == 0.15 C[3][0] == 0.20 C[3][1] == 0.21 C[3][2] == 0.22 C[3][3] == 0.23 C[4][0] == 0.24 C[4][1] == 0.25 C[4][2] == 0.30 C[4][3] == 0.31 C[5][0] == 0.32 C[5][1] == 0.33 C[5][2] == 0.34 C[5][3] == 0.35 D[0][0] == 0.00 D[0][1] == 0.01 D[0][2] == 0.02 D[0][3] == 0.03 D[0][4] == 0.04 D[0][5] == 0.05 D[0][6] == 0.10 D[0][7] == 0.11 D[1][0] == 0.12 D[1][1] == 0.13 D[1][2] == 0.14 D[1][3] == 0.15 D[1][4] == 0.20 D[1][5] == 0.21 D[1][6] == 0.22 D[1][7] == 0.23 D[2][0] == 0.24 D[2][1] == 0.25 D[2][2] == 0.30 D[2][3] == 0.31 D[2][4] == 0.32 D[2][5] == 0.33 D[2][6] == 0.34 D[2][7] == 0.35 E[0][0][0] == 0.00 E[0][0][1] == 0.01 E[0][1][0] == 0.02 E[0][1][1] == 0.03 E[0][2][0] == 0.04 E[0][2][1] == 0.05 E[1][0][0] == 0.10 E[1][0][1] == 0.11 E[1][1][0] == 0.12 E[1][1][1] == 0.13 E[1][2][0] == 0.14 E[1][2][1] == 0.15 E[2][0][0] == 0.20 E[2][0][1] == 0.21 E[2][1][0] == 0.22 E[2][1][1] == 0.23 E[2][2][0] == 0.24 E[2][2][1] == 0.25 E[3][0][0] == 0.30 E[3][0][1] == 0.31 E[3][1][0] == 0.32 E[3][1][1] == 0.33 E[3][2][0] == 0.34 E[3][2][1] == 0.35

Table two.
`A`	`B`	`C`	`D`	`E`
`A[0][0] == 0.00 A[0][1] == 0.01 A[0][2] == 0.02 A[0][3] == 0.03 A[0][4] == 0.04 A[0][5] == 0.05 A[1][0] == 0.10 A[1][1] == 0.11 A[1][2] == 0.12 A[1][3] == 0.13 A[1][4] == 0.14 A[1][5] == 0.15 A[2][0] == 0.20 A[2][1] == 0.21 A[2][2] == 0.22 A[2][3] == 0.23 A[2][4] == 0.24 A[2][5] == 0.25 A[3][0] == 0.30 A[3][1] == 0.31 A[3][2] == 0.32 A[3][3] == 0.33 A[3][4] == 0.34 A[3][5] == 0.35`	`B[ 0] == 0.00 B[ 1] == 0.01 B[ 2] == 0.02 B[ 3] == 0.03 B[ 4] == 0.04 B[ 5] == 0.05 B[ 6] == 0.10 B[ 7] == 0.11 B[ 8] == 0.12 B[ 9] == 0.13 B[10] == 0.14 B[11] == 0.15 B[12] == 0.20 B[13] == 0.21 B[14] == 0.22 B[15] == 0.23 B[16] == 0.24 B[17] == 0.25 B[18] == 0.30 B[19] == 0.31 B[20] == 0.32 B[21] == 0.33 B[22] == 0.34 B[23] == 0.35`	`C[0][0] == 0.00 C[0][1] == 0.01 C[0][2] == 0.02 C[0][3] == 0.03 C[1][0] == 0.04 C[1][1] == 0.05 C[1][2] == 0.10 C[1][3] == 0.11 C[2][0] == 0.12 C[2][1] == 0.13 C[2][2] == 0.14 C[2][3] == 0.15 C[3][0] == 0.20 C[3][1] == 0.21 C[3][2] == 0.22 C[3][3] == 0.23 C[4][0] == 0.24 C[4][1] == 0.25 C[4][2] == 0.30 C[4][3] == 0.31 C[5][0] == 0.32 C[5][1] == 0.33 C[5][2] == 0.34 C[5][3] == 0.35`	`D[0][0] == 0.00 D[0][1] == 0.01 D[0][2] == 0.02 D[0][3] == 0.03 D[0][4] == 0.04 D[0][5] == 0.05 D[0][6] == 0.10 D[0][7] == 0.11 D[1][0] == 0.12 D[1][1] == 0.13 D[1][2] == 0.14 D[1][3] == 0.15 D[1][4] == 0.20 D[1][5] == 0.21 D[1][6] == 0.22 D[1][7] == 0.23 D[2][0] == 0.24 D[2][1] == 0.25 D[2][2] == 0.30 D[2][3] == 0.31 D[2][4] == 0.32 D[2][5] == 0.33 D[2][6] == 0.34 D[2][7] == 0.35`	`E[0][0][0] == 0.00 E[0][0][1] == 0.01 E[0][1][0] == 0.02 E[0][1][1] == 0.03 E[0][2][0] == 0.04 E[0][2][1] == 0.05 E[1][0][0] == 0.10 E[1][0][1] == 0.11 E[1][1][0] == 0.12 E[1][1][1] == 0.13 E[1][2][0] == 0.14 E[1][2][1] == 0.15 E[2][0][0] == 0.20 E[2][0][1] == 0.21 E[2][1][0] == 0.22 E[2][1][1] == 0.23 E[2][2][0] == 0.24 E[2][2][1] == 0.25 E[3][0][0] == 0.30 E[3][0][1] == 0.31 E[3][1][0] == 0.32 E[3][1][1] == 0.33 E[3][2][0] == 0.34 E[3][2][1] == 0.35`

In the table, note the use of the double equal sign to indicate the state of equality, as contrasted to the operation of assignment. This is a C language practice.

Here are statements creating some aliases of array A, detailed in the remaining columns of the table above:

    float * B = (float *) A; 
    // corresponding to float B[24];

    float (*C)[4] = (float (*)[4]) A;
    // corresponding to float C[6][4];

    float (*D)[8] = (float (*)[8]) A;
    // corresponding to float D[3][8];

    float (*E)[3][2] = (float (*)[3][2]) A;
    // corresponding to E[4][3][2];

Note that B, C, D and E are not copies of A, but rather different ways of looking at the same data. In other words, all five are at the same memory address. Thus if we change A[2][3] to 3.14159, each of B[15], C[3][3], D[1][7] and E[2][1][1] will reflect the change. Because of this, the statements creating B, C, D and E might appear to declarations and not definitions. However, the compiler presumably will allocate for each of B, C, D and E enough memory to hold a pointer, and because of that the statements turn out to be definitions of pointers but not arrays.

We say "presumably" because, depending on the complexity of the program, the compiler might be able to figure out the aliasing and optimize BK, CK, DK and EK away. That simplification is more likely if they are described as constant pointers, which means that BK, CK, DK and EK will always point to the data in A:

    float * const BK = (float *) A; 
    float (* const CK)[4] = (float (*)[4]) A;
    float (* const DK)[8] = (float (*)[8]) A;
    float (* const EK)[3][2] = (float (*)[3][2]) A;

Note that these are constant pointers to arrays of variable floats. Under the following definition, it would not be possible to change the contents of A by using F:

    float const (*F)[8] = (float (*)[8]) A;

Sample program:

#include <iostream>
#include <iomanip>
using std::cout;
using std::setw;

float A[4][6] = {
    { 0.00, 0.01, 0.02, 0.03, 0.04, 0.05 },
    { 0.10, 0.11, 0.12, 0.13, 0.14, 0.15 }, 
    { 0.20, 0.21, 0.22, 0.23, 0.24, 0.25 }, 
    { 0.30, 0.31, 0.32, 0.33, 0.34, 0.35 } 
};

float (*B)       = (float (*)      ) A;
float (*C)[4]    = (float (*)[4]   ) A;
float (*D)[8]    = (float (*)[8]   ) A;
float (*E)[3][2] = (float (*)[3][2]) A;

int main () {
    cout.setf (std::ios::fixed, std::ios::floatfield);
    cout.precision (2);

    cout << "\n &A = " << &A;
    cout << "\n &B = " << &B;
    cout << "\n &C = " << &C;
    cout << "\n &D = " << &D;
    cout << "\n &E = " << &E;

    cout << "\n ";

    for (int i (0); i < 4; ++i)
    for (int j (0); j < 6; ++j) {
        cout << "\n A[" << i << "][" << j;
        cout << "] = " << A[i][j];
    }

    cout << "\n ";

    for (int i (0); i < 24; ++i) {
        cout << "\n B[" << std::setw(2) << i;
        cout << "] = " << B[i];
    }

    cout << "\n ";

    for (int i (0); i < 6; ++i)
    for (int j (0); j < 4; ++j) {
        cout << "\n C[" << i << "][" << j;
        cout << "] = " << C[i][j];
    }

    cout << "\n ";

    for (int i (0); i < 3; ++i)
    for (int j (0); j < 8; ++j) {
        cout << "\n D[" << i << "][" << j;
        cout << "] = " << D[i][j];
    }

    cout << "\n ";

    for (int i (0); i < 4; ++i)
    for (int j (0); j < 3; ++j)
    for (int k (0); k < 2; ++k) {
        cout << "\n E[" << i;
        cout << "][" << j << "][" << k;
        cout << "] = " << E[i][j][k];
    }

    return 0;
}

Remarks.

1. The C-language syntax for declaring pointers * and arrays [ ] can seem impenetrable to beginners, and tricky to veterans.

2. C allows the explicit conversion of ANY pointer type to ANY pointer type, even if the conversion is absolute nonsense, so the programmer has to be very careful. A C compiler will accept the following code, even though it will surely yield horrible results when ipp is used:

    double n = 1.2345;
    double * dp = &n;
    int *** ipp = (int ***) dp;

3.Although array A (from above) has only 24 components, the compiler is perfectly happy to treat array *G as having 4845 components, so the next definition will certainly lead to perdition:

    float (* G)[19][17][15] = (float (*)[19][17][15]) A;

Thus the programmer must be careful when using a pointer conversion to reïnterpret an array.