[A89] Re: Pointers and Matricies

To: <assembly-89@lists.ticalc.org>
Subject: [A89] Re: Pointers and Matricies
From: "Zeljko Juric" <zjuric@utic.net.ba>
Date: Sat, 24 Feb 2001 09:50:00 +0100
Reply-To: assembly-89@lists.ticalc.org

Hi!

> Uhoh -- I don't fell very confident when disagreeing with Zeljko,
> although my understanding is different.  Zeljko, if I'm wrong here,
> please correct me =)

I am very sorry, but you are wrong (I am not happy when I need to
say that someone is wrong). And, if you read in some books that you
are right (which is very possible, because I saw such books), I must
to say that, unfortunately, these books are wrong.

> I don't believe that it does need to know the width, because of the
> way that C treats multidimensional arrays. The main array contains a
> list of pointers to each array.

This is not true (read further). This _is_ true when you create a 
twodimensional array dinamically using a double pointer (i.e. 'char**'),
but, the interpretation of 'char a[m][n]' is quite different than the
interpretation of 'char **a'. More precise 'char a[m][n]' is much more
similar to '(char *a)[n]', although not the same (read further).

> Two dimensional arrays don't contain all values in one chunk of memory.

This is not true. They _are_ in one chunk of memory. Due to this, only
the first dimension of the array may be omitted. For example, 'a[][3]' and
'a[][3][5]' are legal, but 'a[][]', 'a[3][]' and 'a[][3][]' are not.
I will explain this in more details further.

> For instance, I thought that C treats these two examples identically:
> 
> char example1[][] =
>  {{0,1,2,3,4}{5,6,7,8,9}};

Have you ever tried to compile this example? It does not compile. For
example, Borland Turbo C++ will give

Size of 'example1' is unknown or zero

(in this case "unknown" is true), and GNU C will give even three errors:

Elements of array 'example1' have incomplete type
Array size missing in 'example1'
Storage size of 'example1' isn't known

But, this will work:

char example1[][5] =
 {{0,1,2,3,4}{5,6,7,8,9}};

> char example2row1[] =
>  {0,1,2,3,4};
> 
> char example2row2[] =
>  {5,6,7,8,9};
> 
> char * example2[] =
>  {example2row1, example2row2};

This example will _not_ produce the same image in the memory as the
first example, although in the both cases dereferencing like
'example2[i][j]' and 'example1[i][j]' will work correctly. You ask
how? Read further.

> The beauty of the second example is that you can reference example2
> in the exact same way as example1, and the compiler only needs to know
> the size of the final type (in this case, char).

This is true, but what we need to know is the following. Note that

a[i]

is the equivalent to

*(a+i);

assuming that '+' works using the pointer arithmetic, i.e. 'a+i' is,
in fact,

(typeof a)((long)a+i*sizeof(*a))

Let's see now how 'example2[1][2]' and 'example1[1][2]' are evaluated.
Note first that 'example2' is the (initialized) array of pointers,
so 'example2[i]' is the pointer to the char, '*example2' is also the
pointer to the char, that's why 'sizeof(*example2)' is 4. Note also that
'char[]' may be converted to 'char*' for the purpose of typecasting:

example2[1][2] =

(example2[1])[2] = 

(*(example2+1))[2] =

(*(char*[])((long)example2+1*sizeof(*example2)))[2] =

(*(char*[])((long)example2+1*sizeof(char*)))[2] =

(*(char*[])((long)example2+4))[2] =

example2row[2] =

*(example2row+2) =

*(char[])((long)example2row+2*sizeof(*example2row)) =

*(char[])((long)example2row+2*sizeof(char*)) =

*(char[])((long)example2row+2) =

*(char*)((long)example2row+2) =

7

But, see now how 'example1[1][2]' is evaluated. 'example1' is _not_
the array of pointers, but a pointer to an array. I.e. 'example1[][5]'
is like '(example1*)[5]'. So, 'sizeof(*example1)' is not 4 but 5 (you
can check this using the compiler), because '*example1' is not a pointer,
but a 5-bytes long array of chars. Although many books say that arrays
are only const pointers, this is not true. For example, they behave
differently with the 'sizeof' operator. And, this is not the only
difference (read further). 'example1[1][2]' is evaluated as follows
(note that it is extremely hard to track, and note that the evaluator
may omit the array size when it is not necessary for the calculation): 

example1[1][2] =

(example1[1])[2] = 

(*(example1+1))[2] =

(*((char*)[5])((long)example1+1*sizeof(*example1)))[2] =

(*((char*)[5])((long)example1+1*sizeof(char[5])))[2] =

(*((char*)[5])((long)example1+5))[2] =

(*((char*)[])((long)example1+5))[2] =

(*(char**)((long)example1+5))[2] =

((char*)((long)example1+5))[2] =

*((char*)((long)example1+5)+2) =

*((char*)((long)(char*)((long)example1+5)+2*sizeof(char))) =

*((char*)((long)(char*)((long)example1+5)+2)) =

*((char*)((long)((long)example1+5)+2)) =

*((char*)((long)example1+5+2)) =

*((char*)((long)example1+5+2)) =

*(char*)((long)example1+7) =

7

>From the '*(char*)((long)example1+7)' you can see how the elements in
the two-dimensional arrays are really stored. Clear? I believe that this
is not, but unfortunately this is the truth!

> No need to know the height _or the width_ of the array, it just
> continually dereferences pointers until it gets to the innermost array.

Again, no. You _can_ make such organization (like in your 'example2'),
and indexing with 'a[i][j][k][l]' will work correctly for example, but
this is _not_ the method used in C when you declare 4-dimensional array
for example!

> Using this method - with multidimensional arrays really just being
> arrays with pointers to more arrays.

No. Two-dimensional array is not an array of pointers, but a pointer to
arrays, although you can make an array of pointers, and it will work
well too, as I already explained.

> the compiler implementation is extremely simple,

But, array_of_pointers implementation may be terrible memory-comsuming
with n-dimensional arrays where n>2. Fortunately, you are not right,
especially the fortune is on TI. See why. Using your method, for 100x100
matrix, you will have 400 bytes of space wasting for 100 pointers to
each row.

> the number of dimensions it can handle are theoretically unlimited,

It is unlimited independently of the method.

> it's very efficient.

Your array_of_pointers method is really much more efficient then the
default method when indices are variable, but the default method is
more efficient with constant indices.

> and the pointer can be copied and used with array notation (as we
> want to do).

It can be, regardless of the method used. This is clear, if you understand
what I wrote above (although it is very confuse).

> That's why this implementation of arrays is used by most all compilers
> (or at least I thought it is).

No. My theory is checked yesterday with 5 different compilers!

> I believe I read the explanation of this in Dennis Ritchie's paper on
> the development of the C language, available at 
> http://cm.bell-labs.com/cm/cs/who/dmr/chist.html although it might have
> been somewhere else, I'm not quite sure =(

I believe that you mis-interpreted something, because I think that the
behaviour explained by me is well-defined in Ritchie, but it is written
on so confise way that the misinterpretation is very likely.

It will be good here to break yet another misinterpretation related to
arrays and pointers, which are more different than what you can read in
the most of the books. For example, the most of books say that the only
difference between

char str[] = "Hello";
char *str = "Hello";

is the fact that in the first case 'str' is the constant pointer. But,
these two statements are very different!!! Suppose that 'str' is declared
in the body of the function (i.e. it is an automatic local variable). In
both cases, text "Hello" is stored somewhere in the memory, and the
"result" of "Hello" is the address where it is stored (we will call this
address 'addr'). In the first case, 'str' is a local array (i.e. it is
created on the stack in a run time, and there will be 6 bytes reserved
for it) which content is initialized in the run-time (because it does not
exist in the compile-time) with the sequence of bytes located at 'addr'.
In other words, it is the same as you wrote: 

char str[6];
strcpy (str, "Hello");

The consequence is that if you change the content of 'str' (note that
this is not the same as changing bytes pointed to by 'addr'), it will be
reinitialized each time when this statement is encountered again. But,
in the second case, str is a local pointer, which content is initialized
(in the run-time) with 'addr'! So, if the content of string "Hello" is
changed, this change will be permanent, because when this statement is
encountered again, 'str' will simply be reinitialized (if changed) to
'addr' but bytes pointed to by it are not restored! Confused? See the
following code: 

for (i = 0; i < 2; i++)
  {
    char str[] = "Hello";
    printf (str);
    str[0] = 'a';
  }

This program will work as expected (it will display "Hello" twice).
But, if you change 'str[]' with '*str', it will not work as expected
(it will display "Hello" then "aello"). 

> Zeljko, have you tried my (and Olle's) solution on an actually compiler?

No, I had not enough time.

Cheers,

Zeljko Juric
Prev: [A89] Re: Pointers and Matricies
Next: [A89] pseudo-random numbers
Index(es):
- Main
- Thread