Variable, Data Type and Array Data Structures

fog37 · Jan 23, 2021

Hello,
In Python and other programming languages, data can be of different types: integer, float, string, Boolean. On the other hand, Data structures are containers of data items which can (or not) have the same data type.

A variable, when created, has:
a) name (a piece of data stored in memory and separate from the data the variable points to)
b) memory address of the memory cell or multiple consecutive cell where the data content of the variable is stored (he content can be a data item of different type)

Is that correct?

Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"? The list has a name, memory address, data content. All the data content of the list should be stored in consecutive memory cells, I believe.

Or does each item in the list represent a variable on its own?Thanks!

phyzguy · Jan 23, 2021

fog37 said:

Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"? The list has a name, memory address, data content. All the data content of the list should be stored in consecutive memory cells, I believe.

A Python list is just a group of variables. There is no guarantee that the are stored in consecutive memory locations. For example, in the following code, the three separate variables x, y, and z are probably not stored in consecutive locations.

Python:

x = 7.0
y = 'String'
z = True
myList = [x,y,z]

PeterDonis · Jan 23, 2021

fog37 said:

In Python and other programming languages, data can be of different types: integer, float, string, Boolean. On the other hand, Data structures are containers of data items which can (or not) have the same data type.

What you are calling "Data structures" are, to Python, just additional data types. They are sometimes called "containers" to indicate that they can "hold" references to other objects. But the Python interpreter just treats them as additional data types.

fog37 said:

A variable, when created, has:
a) name (a piece of data stored in memory and separate from the data the variable points to)
b) memory address of the memory cell or multiple consecutive cell where the data content of the variable is stored (he content can be a data item of different type)

Sort of. The operation you are describing as "creating a variable", which I assume would correspond to a line of code like this...

Python:

my_list = []

...is actually two operations to the Python interpreter. The first operation is "create a new list object which is empty"; the second operation is "store a reference to the list object you just created in the current namespace under the name my_list".

fog37 said:

Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"?

Again, the term "variable" is a bad one since it conflates two things that are separate in Python: an object, and a reference to that object stored in a namespace.

Python lists, sets, dictionaries, etc. are objects, just like Python integers, booleans, floats, strings, etc. The only thing that makes "containers" like lists, sets, or dictionaries different from other objects is that they can store references to other objects. But they're still objects themselves.

A namespace can store a reference to any object; the namespace doesn't care what kind of object it is. (Under the hood, a Python namespace is just a specialized Python dictionary; the names are the keys and the object references are the values. So a namespace in Python is really just a kind of container.)

fog37 said:

All the data content of the list should be stored in consecutive memory cells, I believe.

That is true in CPython, at least, because the underlying C code that implements the list object does it that way. But Python itself has no requirement that it be done this way. The only thing Python itself requires is that the ordering of the object references in the list is preserved, and that the list object supports all the operations (i.e., methods) defined in the Python specification for list objects.

PeterDonis · Jan 23, 2021

phyzguy said:

A Python list is just a group of variables.

No, it isn't, it's an ordered group of object references. "Variables" is a bad term when discussing Python, for the reasons given in my post in response to the OP just now.

phyzguy said:

in the following code, the three separate variables x, y, and z are probably not stored in consecutive locations.

You are mixing up two different references to the objects labeled by the names "x", "y", "z".

In the namespace in which the code you wrote is executed, there are references to objects stored under the names "x", "y", and "z". The name "x" stores a reference to a float object with value 7.0; the name "y" stores a reference to a string object with value "String"; the name "z" stores a reference to a bool object with value True. The first two objects will be created and stored on the heap; the last object will be the global singleton bool object True that is created by the interpreter on initialization. So yes, the data for those three objects will not all be in consecutive areas of memory (the data for the first two might be if they are the first two objects allocated on the heap, but the last will certainly not be since it will be in the Python interpreter's global data segment). And the references to those three objects stored in the current namespace will probably also not be in consecutive areas of memory (since I don't think the underlying implementation of the Python dict object, which is what the namespace is, does that for pointers to values).

A fourth object is then created, a list object that stores references to the three objects described above (the float, the string, and the bool). A reference to this object is then stored in the current namespace under the name "myList". The object references contained in the list will be in consecutive areas of memory, because that is how the underlying CPython implementation of the list object works.

Since object references are just pointers, the end result will be that two copies of the same three pointers will be stored: one copy of each in the current namespace, under the names "x", "y", and "z"; the second copy of each in the internal memory of the list object. The first set of pointers will most likely not be in consecutive areas of memory, but the second set will.

pbuk · Jan 23, 2021

I think it is a mistake to get hung up about physical memory locations, these are implementation dependent - in fact a well optimised compiler may not even use a memory location at all for a variable with a limited scope such as a loop counter.

When we talk about a variable in Python (or any other language), we simply mean a container with a label (or name) which we use in the language's source code to refer to the container and/or its contents.

hmmm27 · Jan 30, 2021

PeterDonis said:

the last object will be the global singleton bool object True that is created by the interpreter on initialization.

That seems rather unnecessarily inefficient ; more likely that - just as the memory reserved for 'x' and 'y' are being loaded with the float and int representations of the coded values - 'z' is also, based on the compiler's definition of 'TRUE' or 'FALSE' (ie: x'01' or x'00'). Why would a pointer be loaded, instead of the actual value ? Or, am I misinterpreting what you've written.

pbuk · Jan 30, 2021

hmmm27 said:

Or, am I misinterpreting what you've written.

Yes. Read this bit again.

PeterDonis said:

The name "x" stores a reference to a float object with value 7.0; the name "y" stores a reference to a string object with value "String"; the name "z" stores a reference to a bool object with value True.

Python is not a strongly typed language so it needs to keep track at run time of each symbol's (current) type as well value. So it needs to know that 'x' is currently a float (pointer to 'float' type class in program memory) and its value is currently 7.0 (it can store this inline in the symbol table); and 'y' is currently a string (pointer to 'string' type class in program memory) and its value is currently 'String' (pointer to the start of the string on the heap).

So for 'z' it could store the fact that this is currently a boolean (pointer to 'boolean' type class in program memory) and its value is currently True (inline or on the heap), but as a boolean can only have True or False values it is more efficient to represent 'boolean Frue' and 'boolean False' as separate singleton objects or static classes.

hmmm27 · Jan 30, 2021

pbuk said:

So for 'z' it could store the fact that this is currently a boolean (pointer to 'boolean' type class in program memory) and its value is currently True (inline or on the heap), but as a boolean can only have True or False values it is more efficient to represent 'boolean Frue' and 'boolean False' as separate singleton objects or static classes.

So... every time you actually use the variable 'z', in stead of just checking if it's equal to x'00' or x'01' right on the spot, it has to go check to see if the value of the pointee is equal to x'00' or x'01' ? Doesn't seem more efficient. Opposite, in fact.

PeterDonis · Jan 30, 2021

pbuk said:

Python is not a strongly typed language so it needs to keep track at run time of each symbol's (current) type as well value. So it needs to know that 'x' is currently a float (pointer to 'float' type class in program memory) and its value is currently 7.0 (it can store this inline in the symbol table); and 'y' is currently a string (pointer to 'string' type class in program memory) and its value is currently 'String' (pointer to the start of the string on the heap).

That's not quite how Python namespaces work. Python namespaces are Python dicts; in other words, they are hash tables. The variable we are calling 'x' is a hash table entry whose key is the Python string object 'x' (actually its hash--we'll leave out how the implementation deals with hash collisions) and whose value is a pointer to a Python float object, which is a C structure. One of the fields in that C structure stores the value of the float; another field is a pointer to a different C structure that stores the type information for float objects. So all the type information is stored in the C structure of the object; none of that information is stored in the hash table for the variable namespace.

Similarly, the hash table entry whose key is the Python string object 'y' has a value that is a pointer to a Python string object, which is another C structure; one of its fields stores a pointer to the actual string data, another field is a pointer to the C structure that stores the type information for string objects.

And, finally, the hash table entry whose key is the Python string object 'z' has a value that is a pointer to the global singleton bool object "True", which is a C structure, one of whose fields stores the integer data 1 (how many bits depends on whether the interpreter was compiled in 32-bit or 64-bit mode), another of whose fields stores a pointer to the C structure that stores the type information for bool objects.

PeterDonis · Jan 30, 2021

hmmm27 said:

every time you actually use the variable 'z', in stead of just checking if it's equal to x'00' or x'01' right on the spot, it has to go check to see if the value of the pointee is equal to x'00' or x'01' ? Doesn't seem more efficient. Opposite, in fact.

If you think you can build a dynamic language with all of Python's capabilities without using this kind of indirection, have at it.

hmmm27 · Jan 30, 2021

PeterDonis said:

If you think you can build a dynamic language with all of Python's capabilities without using this kind of indirection, have at it.

Ah, I finally did notice that your replies are laced with "dynamic" and "interpreter"... puts things in a different light. Elegant from that point of view, I suppose.

Variable, Data Type and Array Data Structures

1. What is a variable and how is it used in programming?

2. What are data types and why are they important in programming?

3. How are arrays used to store and organize data?

4. What is the difference between a one-dimensional and multi-dimensional array?

5. How do data structures like arrays impact the performance of a program?

Similar threads

Hot Threads

Recent Insights