# Elementary Python Questions: Data Frames, k-nary functions.

Gold Member
2019 Award

## Main Question or Discussion Point

Hi All,

1) Say df is a dataframe in Python Pandas, and I select a specific column from df:
Y=df[column].values.
What kind of data structure is Y?

2)
I want to find the sum of two numbers:
Def Sum(a=0,b=0):
return a+b

If I want to find a sum over sum data structure ( say a list) , how can
I define sum, i.e., how to extend it from a binary operation? Should I use
recursion and/or some 'for' clause?

Thanks.

Related Programming and Computer Science News on Phys.org
jedishrfu
Mentor
Here’s a tutorial on pandas and it shows inuts, floats and strings for columns in a dataframe.

https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Tuples can be used as well as other data types in python. The best way to find out though is to try it yourself.

the key point is a column must use the same data type through all it’s rows although that may not be strictly true either.

its simliar to a sql table schema.

when using .values its sugested to use .to_numpy() instead.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html

so you could extract a column into a numpy array and use numpy functions to sum it or get stats...

Last edited:
WWGD
PeterDonis
Mentor
2019 Award
If I want to find a sum over sum data structure ( say a list) , how can
I define sum
You don't have to, the Python built-in sum function takes any iterable as an argument.

WWGD
Ibix
You don't have to, the Python built-in sum function takes any iterable as an argument.
In this context, I suspect @WWGD wants a function that returns a list whose ith element is a[i]+b[i], which isn't what the builtin sum function does with a list-type object. Builtin sum returns the sum of all elements in one list, and fails with multiple lists.

Assuming I'm understanding the problem correctly, I think you want something like
Python:
def sum(a, b):
if isinstance(a, (list, tuple)) and isinstance(b, (list, tuple)):
return [sum(ai, bi) for (ai, bi) in zip(a, b)]
else:
return a + b
What that does is check if a and b are lists or tuples. If they are it adds them element by element, calling itself so it can handle lists-of-lists. Otherwise it just tries adding them.

I don't know if that's the most efficient way to do things. You would also want to add sense checking - for example, what would you want the behaviour to be if the lists are different lengths? And you may want to replace the isinstance calls with something better suited to your application. And you may not want the recursive behaviour with lists of lists.

Last edited:
WWGD
Gold Member
2019 Award
In this context, I suspect @WWGD wants a function that returns a list whose ith element is a[i]+b[i], which isn't what the builtin sum function does with a list-type object. Builtin sum returns the sum of all elements in one list, and fails with multiple lists.

Assuming I'm understanding the problem correctly, I think you want something like
Python:
def sum(a,b):
if isinstance(a,(list,tuple)) and isinstance(b,(list,tuple)):
return [sum(ai,bi) for (ai,bi) in zip(a,b)]
else:
return a+b
What that does is check if a and b are lists or tuples. If they are it adds them element by element, calling itself so it can handle lists-of-lists. Otherwise it just tries adding them.

I don't know if that's the most efficient way to do things. You would also want to add sense checking - for example, what would you want the behaviour to be if the lists are different lengths? And you may want to replace the isinstance calls with something better suited to your application. And you may not want the recursive behaviour with lists of lists.
Thank you. I was actually trying to define a k-nary function ##(x_1, x_2,...,x_n) \rightarrow x_1+x_2+...+x_n

This is trivial for 2 terms:
Def sum(x_1, x_2):
returns x_1+x_2

But I was hoping to define a sum over, say, a list, or maybe the values in a dictionary, etc and could not think of a way of defining it. Thought I would need a 'for' clause somewhere but not clear otherwise.

I tried using it to define variance but I got an error message on not being able to iterate on floats.

Ibix
In that case, Peter's answer is correct about the builtin sum function. If a is a list, sum(a) will return the sum of the elements of the list, and sum(x1, x2, ..., xn) will return the sum of the n variables.

The general pattern for defining a function without a prespecified argument list is
Python:
def f(*args, **kwargs):
print(args)
print(kwargs)
Calling f with arbitrary arguments will put the values into a list called args or a dict called kwargs, depending on whether you named the arguments or not. f(1, 2, 3, a = 4, b = 5) would make args = [1, 2, 3] and kwargs a two-element dictionary with keys "a" and "b" and corresponding values 4 and 5.

You don't have to specify both of *args and **kwargs if you only need one. As far as I'm aware the names args and kwargs are merely conventional and the asterisks are the important things, but I don't think I've ever seen anyone use anything other than args and kwargs.

Last edited:
WWGD
Ibix
def f(a, b, *args, **kwargs):
is perfectly acceptable usage - f(1, 2, 3) will put 1 and 2 in a and b and args will be [3].