Elementary Python Questions: Data Frames, k-nary functions

WWGD · Jun 26, 2020

Hi All,
A couple of questions, please:

1) Say df is a dataframe in Python Pandas, and I select a specific column from df:
Y=df[column].values.
What kind of data structure is Y?

2)
I want to find the sum of two numbers:
Def Sum(a=0,b=0):
return a+b

If I want to find a sum over sum data structure ( say a list) , how can
I define sum, i.e., how to extend it from a binary operation? Should I use
recursion and/or some 'for' clause?

Thanks.

jedishrfu · Jun 26, 2020

Here’s a tutorial on pandas and it shows inuts, floats and strings for columns in a dataframe.

https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Tuples can be used as well as other data types in python. The best way to find out though is to try it yourself.

the key point is a column must use the same data type through all it’s rows although that may not be strictly true either.

its simliar to a sql table schema.

when using .values its sugested to use .to_numpy() instead.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html

so you could extract a column into a numpy array and use numpy functions to sum it or get stats...

PeterDonis · Jun 26, 2020

WWGD said:

If I want to find a sum over sum data structure ( say a list) , how can
I define sum

You don't have to, the Python built-in sum function takes any iterable as an argument.

Ibix · Jun 27, 2020

PeterDonis said:

You don't have to, the Python built-in sum function takes any iterable as an argument.

In this context, I suspect @WWGD wants a function that returns a list whose ith element is a[i]+b[i], which isn't what the builtin sum function does with a list-type object. Builtin sum returns the sum of all elements in one list, and fails with multiple lists.

Assuming I'm understanding the problem correctly, I think you want something like

Python:

def sum(a, b):
    if isinstance(a, (list, tuple)) and isinstance(b, (list, tuple)):
        return [sum(ai, bi) for (ai, bi) in zip(a, b)]
    else:
        return a + b

What that does is check if a and b are lists or tuples. If they are it adds them element by element, calling itself so it can handle lists-of-lists. Otherwise it just tries adding them.

I don't know if that's the most efficient way to do things. You would also want to add sense checking - for example, what would you want the behaviour to be if the lists are different lengths? And you may want to replace the isinstance calls with something better suited to your application. And you may not want the recursive behaviour with lists of lists.

WWGD · Jun 27, 2020

Ibix said:
In this context, I suspect @WWGD wants a function that returns a list whose ith element is a[i]+b[i], which isn't what the builtin sum function does with a list-type object. Builtin sum returns the sum of all elements in one list, and fails with multiple lists.

Assuming I'm understanding the problem correctly, I think you want something like
Python:
def sum(a,b):
    if isinstance(a,(list,tuple)) and isinstance(b,(list,tuple)):
        return [sum(ai,bi) for (ai,bi) in zip(a,b)]
    else:
        return a+b
What that does is check if a and b are lists or tuples. If they are it adds them element by element, calling itself so it can handle lists-of-lists. Otherwise it just tries adding them.

I don't know if that's the most efficient way to do things. You would also want to add sense checking - for example, what would you want the behaviour to be if the lists are different lengths? And you may want to replace the isinstance calls with something better suited to your application. And you may not want the recursive behaviour with lists of lists.

Thank you. I was actually trying to define a k-nary function ##(x_1, x_2,...,x_n) \rightarrow x_1+x_2+...+x_n

This is trivial for 2 terms:
Def sum(x_1, x_2):
returns x_1+x_2

But I was hoping to define a sum over, say, a list, or maybe the values in a dictionary, etc and could not think of a way of defining it. Thought I would need a 'for' clause somewhere but not clear otherwise.

I tried using it to define variance but I got an error message on not being able to iterate on floats.

Ibix · Jun 27, 2020

In that case, Peter's answer is correct about the builtin sum function. If a is a list, sum(a) will return the sum of the elements of the list, and sum(x1, x2, ..., xn) will return the sum of the n variables.

The general pattern for defining a function without a prespecified argument list is

Python:

def f(*args, **kwargs):
    print(args)
    print(kwargs)

Calling f with arbitrary arguments will put the values into a list called args or a dict called kwargs, depending on whether you named the arguments or not. f(1, 2, 3, a = 4, b = 5) would make args = [1, 2, 3] and kwargs a two-element dictionary with keys "a" and "b" and corresponding values 4 and 5.

You don't have to specify both of *args and **kwargs if you only need one. As far as I'm aware the names args and kwargs are merely conventional and the asterisks are the important things, but I don't think I've ever seen anyone use anything other than args and kwargs.

Ibix · Jun 27, 2020

Oh, and

Python:

def f(a, b, *args, **kwargs):

is perfectly acceptable usage - f(1, 2, 3) will put 1 and 2 in a and b and args will be [3].

pasmith · Jul 16, 2020

WWGD said:

I tried using it to define variance

There are library functions for that.

For values in a pandas DataFrame, there's DataFrame.var.

Python:

df = pandas.DataFrame(...)

#Column-wise variance
df.var()
df.var(axis=0)

# Row-wise variance
df.var(axis=1)

# variance of single column
df.loc[:,column].var()

For general iterables, there's numpy's var

Python:

numpy.var(a_list)

# Should also work on a pandas.Series object:
numpy.var(df[column])

WWGD · Jul 16, 2020

pasmith said:
There are library functions for that.

For values in a pandas DataFrame, there's DataFrame.var.
Python:
df = pandas.DataFrame(...)

#Column-wise variance
df.var()
df.var(axis=0)

# Row-wise variance
df.var(axis=1)

# variance of single column
df.loc[:,column].var()
For general iterables, there's numpy's var
Python:
numpy.var(a_list)

# Should also work on a pandas.Series object:
numpy.var(df[column])

Thanks. I was trying to practice by defining it on my own and getting an error re iterating on floats when defining it as sum[ ( x_i-xbar)(x_i-xbar) for x_i in list] where xbar is the mean .

Elementary Python Questions: Data Frames, k-nary functions

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

File Structure vs Data Structure

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect