Variable, Data Type and Array Data Structures

In summary: The third object, "z", will not be stored in consecutive memory cells, because it is of a different data type.
  • #1
fog37
1,568
108
TL;DR Summary
understand the difference between Variable, Data Type and Array Data Structures.
Hello,
In Python and other programming languages, data can be of different types: integer, float, string, Boolean. On the other hand, Data structures are containers of data items which can (or not) have the same data type.

A variable, when created, has:
a) name (a piece of data stored in memory and separate from the data the variable points to)
b) memory address of the memory cell or multiple consecutive cell where the data content of the variable is stored (he content can be a data item of different type)

Is that correct?

Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"? The list has a name, memory address, data content. All the data content of the list should be stored in consecutive memory cells, I believe.

Or does each item in the list represent a variable on its own?Thanks!
 
Technology news on Phys.org
  • #2
fog37 said:
Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"? The list has a name, memory address, data content. All the data content of the list should be stored in consecutive memory cells, I believe.
A Python list is just a group of variables. There is no guarantee that the are stored in consecutive memory locations. For example, in the following code, the three separate variables x, y, and z are probably not stored in consecutive locations.
Python:
x = 7.0
y = 'String'
z = True
myList = [x,y,z]
 
  • Like
Likes fog37
  • #3
fog37 said:
In Python and other programming languages, data can be of different types: integer, float, string, Boolean. On the other hand, Data structures are containers of data items which can (or not) have the same data type.

What you are calling "Data structures" are, to Python, just additional data types. They are sometimes called "containers" to indicate that they can "hold" references to other objects. But the Python interpreter just treats them as additional data types.

fog37 said:
A variable, when created, has:
a) name (a piece of data stored in memory and separate from the data the variable points to)
b) memory address of the memory cell or multiple consecutive cell where the data content of the variable is stored (he content can be a data item of different type)

Sort of. The operation you are describing as "creating a variable", which I assume would correspond to a line of code like this...

Python:
my_list = []

...is actually two operations to the Python interpreter. The first operation is "create a new list object which is empty"; the second operation is "store a reference to the list object you just created in the current namespace under the name my_list".

fog37 said:
Is a Python list (or a set, dictionary, etc. all examples of data structures) also to be considered a "variable"?

Again, the term "variable" is a bad one since it conflates two things that are separate in Python: an object, and a reference to that object stored in a namespace.

Python lists, sets, dictionaries, etc. are objects, just like Python integers, booleans, floats, strings, etc. The only thing that makes "containers" like lists, sets, or dictionaries different from other objects is that they can store references to other objects. But they're still objects themselves.

A namespace can store a reference to any object; the namespace doesn't care what kind of object it is. (Under the hood, a Python namespace is just a specialized Python dictionary; the names are the keys and the object references are the values. So a namespace in Python is really just a kind of container.)

fog37 said:
All the data content of the list should be stored in consecutive memory cells, I believe.

That is true in CPython, at least, because the underlying C code that implements the list object does it that way. But Python itself has no requirement that it be done this way. The only thing Python itself requires is that the ordering of the object references in the list is preserved, and that the list object supports all the operations (i.e., methods) defined in the Python specification for list objects.
 
  • Like
Likes fog37
  • #4
phyzguy said:
A Python list is just a group of variables.

No, it isn't, it's an ordered group of object references. "Variables" is a bad term when discussing Python, for the reasons given in my post in response to the OP just now.

phyzguy said:
in the following code, the three separate variables x, y, and z are probably not stored in consecutive locations.

You are mixing up two different references to the objects labeled by the names "x", "y", "z".

In the namespace in which the code you wrote is executed, there are references to objects stored under the names "x", "y", and "z". The name "x" stores a reference to a float object with value 7.0; the name "y" stores a reference to a string object with value "String"; the name "z" stores a reference to a bool object with value True. The first two objects will be created and stored on the heap; the last object will be the global singleton bool object True that is created by the interpreter on initialization. So yes, the data for those three objects will not all be in consecutive areas of memory (the data for the first two might be if they are the first two objects allocated on the heap, but the last will certainly not be since it will be in the Python interpreter's global data segment). And the references to those three objects stored in the current namespace will probably also not be in consecutive areas of memory (since I don't think the underlying implementation of the Python dict object, which is what the namespace is, does that for pointers to values).

A fourth object is then created, a list object that stores references to the three objects described above (the float, the string, and the bool). A reference to this object is then stored in the current namespace under the name "myList". The object references contained in the list will be in consecutive areas of memory, because that is how the underlying CPython implementation of the list object works.

Since object references are just pointers, the end result will be that two copies of the same three pointers will be stored: one copy of each in the current namespace, under the names "x", "y", and "z"; the second copy of each in the internal memory of the list object. The first set of pointers will most likely not be in consecutive areas of memory, but the second set will.
 
  • Like
Likes phyzguy and fog37
  • #5
I think it is a mistake to get hung up about physical memory locations, these are implementation dependent - in fact a well optimised compiler may not even use a memory location at all for a variable with a limited scope such as a loop counter.

When we talk about a variable in Python (or any other language), we simply mean a container with a label (or name) which we use in the language's source code to refer to the container and/or its contents.
 
  • #6
PeterDonis said:
the last object will be the global singleton bool object True that is created by the interpreter on initialization.
That seems rather unnecessarily inefficient ; more likely that - just as the memory reserved for 'x' and 'y' are being loaded with the float and int representations of the coded values - 'z' is also, based on the compiler's definition of 'TRUE' or 'FALSE' (ie: x'01' or x'00'). Why would a pointer be loaded, instead of the actual value ? Or, am I misinterpreting what you've written.
 
  • #7
hmmm27 said:
Or, am I misinterpreting what you've written.
Yes. Read this bit again.

PeterDonis said:
The name "x" stores a reference to a float object with value 7.0; the name "y" stores a reference to a string object with value "String"; the name "z" stores a reference to a bool object with value True.

Python is not a strongly typed language so it needs to keep track at run time of each symbol's (current) type as well value. So it needs to know that 'x' is currently a float (pointer to 'float' type class in program memory) and its value is currently 7.0 (it can store this inline in the symbol table); and 'y' is currently a string (pointer to 'string' type class in program memory) and its value is currently 'String' (pointer to the start of the string on the heap).

So for 'z' it could store the fact that this is currently a boolean (pointer to 'boolean' type class in program memory) and its value is currently True (inline or on the heap), but as a boolean can only have True or False values it is more efficient to represent 'boolean Frue' and 'boolean False' as separate singleton objects or static classes.
 
  • #8
pbuk said:
So for 'z' it could store the fact that this is currently a boolean (pointer to 'boolean' type class in program memory) and its value is currently True (inline or on the heap), but as a boolean can only have True or False values it is more efficient to represent 'boolean Frue' and 'boolean False' as separate singleton objects or static classes.
So... every time you actually use the variable 'z', in stead of just checking if it's equal to x'00' or x'01' right on the spot, it has to go check to see if the value of the pointee is equal to x'00' or x'01' ? Doesn't seem more efficient. Opposite, in fact.
 
  • #9
pbuk said:
Python is not a strongly typed language so it needs to keep track at run time of each symbol's (current) type as well value. So it needs to know that 'x' is currently a float (pointer to 'float' type class in program memory) and its value is currently 7.0 (it can store this inline in the symbol table); and 'y' is currently a string (pointer to 'string' type class in program memory) and its value is currently 'String' (pointer to the start of the string on the heap).

That's not quite how Python namespaces work. Python namespaces are Python dicts; in other words, they are hash tables. The variable we are calling 'x' is a hash table entry whose key is the Python string object 'x' (actually its hash--we'll leave out how the implementation deals with hash collisions) and whose value is a pointer to a Python float object, which is a C structure. One of the fields in that C structure stores the value of the float; another field is a pointer to a different C structure that stores the type information for float objects. So all the type information is stored in the C structure of the object; none of that information is stored in the hash table for the variable namespace.

Similarly, the hash table entry whose key is the Python string object 'y' has a value that is a pointer to a Python string object, which is another C structure; one of its fields stores a pointer to the actual string data, another field is a pointer to the C structure that stores the type information for string objects.

And, finally, the hash table entry whose key is the Python string object 'z' has a value that is a pointer to the global singleton bool object "True", which is a C structure, one of whose fields stores the integer data 1 (how many bits depends on whether the interpreter was compiled in 32-bit or 64-bit mode), another of whose fields stores a pointer to the C structure that stores the type information for bool objects.
 
  • #10
hmmm27 said:
every time you actually use the variable 'z', in stead of just checking if it's equal to x'00' or x'01' right on the spot, it has to go check to see if the value of the pointee is equal to x'00' or x'01' ? Doesn't seem more efficient. Opposite, in fact.

If you think you can build a dynamic language with all of Python's capabilities without using this kind of indirection, have at it.
 
  • #11
PeterDonis said:
If you think you can build a dynamic language with all of Python's capabilities without using this kind of indirection, have at it.
Ah, I finally did notice that your replies are laced with "dynamic" and "interpreter"... puts things in a different light. Elegant from that point of view, I suppose.
 

1. What is a variable and how is it used in programming?

A variable is a placeholder for a value that can be changed during the execution of a program. It is used to store and manipulate data, making programs more dynamic and flexible. Variables can hold various types of data such as numbers, strings, or boolean values.

2. What are data types and why are they important in programming?

Data types are classifications of data that determine the type of operations that can be performed on the data. They are important in programming because they help ensure data is used in a consistent and meaningful way. They also help to optimize memory usage and increase program efficiency.

3. How are arrays used to store and organize data?

Arrays are a data structure used to store a collection of elements of the same data type. They allow for easy and efficient storage and retrieval of data. Arrays can be accessed using an index, which specifies the position of the element in the array. They can also be used to perform operations on a large set of data efficiently.

4. What is the difference between a one-dimensional and multi-dimensional array?

A one-dimensional array is a collection of elements organized in a single row or column, while a multi-dimensional array is a collection of elements organized in multiple rows and columns. One-dimensional arrays are useful for storing lists of data, while multi-dimensional arrays are useful for storing more complex data structures, such as tables or matrices.

5. How do data structures like arrays impact the performance of a program?

Data structures like arrays can greatly impact the performance of a program. Choosing the right data structure can lead to faster execution times and more efficient use of memory. For example, using an array for searching and sorting operations can be more efficient compared to other data structures like linked lists.

Similar threads

  • Programming and Computer Science
Replies
1
Views
285
  • Programming and Computer Science
Replies
11
Views
997
  • Programming and Computer Science
Replies
29
Views
1K
  • Programming and Computer Science
2
Replies
50
Views
4K
  • Programming and Computer Science
Replies
13
Views
1K
  • Programming and Computer Science
2
Replies
47
Views
3K
  • Programming and Computer Science
Replies
3
Views
1K
  • Programming and Computer Science
Replies
10
Views
4K
  • Programming and Computer Science
Replies
32
Views
1K
  • Programming and Computer Science
Replies
1
Views
1K
Back
Top