Does this python script match the math?

member 428835 · Aug 23, 2022

Hi PF!

I've posted a bunch lately and you have been SO helpful (seriously, thank you all). Can you corroborate if my python script below matches the math? I'd seriously appreciate it!

The math: $$\max_{x\in [0.7,1)}F(x) : F(x) := \sum_{t = 250}^{750} \log\left( P_{\nu=3} \left( \frac{y_t^2}{\sigma_t^2} \right) \right) : \sigma_t^2 = x\sigma_{t-1}^2 + (1-x)y_{t-1}^2,$$ $$ \sigma_{250}^2 = \sum_{t=1}^{250} y_t^2 / 250$$. Here ##P## is the student-t PDF with 3 degrees of freedom. The data ##y_t## are generated as

$$y_t = N(0,\hat\sigma_t) :\hat\sigma_t^2 = 0.95\hat\sigma_{t-1}^2 + (1-0.95)y_{t-1} $$

where ##y_0 = N(0,0.2)## where ##N(0,\hat\sigma)## is a normally random distributed variable with mean 0 and standard deviation ##\hat\sigma##. Do the maximizing values of this function correspond to the following python code (I minimize the negative of ##F##, which is the same as maximizing F)?

Python:

import numpy as np
from scipy.stats import t
import scipy.optimize as optimize
import matplotlib.pyplot as plt
from scipy.stats import t
# STEP 1: BUILD DATA
def build_data(N=750, lmda=0.95, sig_0=0.2):
    # PREALLOCATE AND INITIATE DATA
    sig_sq = np.zeros(N)
    sig_sq[0] = sig_0**2
    y = np.zeros(N)
    y[0] = np.random.normal(loc=0, scale=sig_0)
    # CONSTRUCT SUCCESSIVE DATA
    for i in range(1, N):
        sig_sq[i] = lmda*sig_sq[i-1] + (1-lmda)*y[i-1]**2
        y[i] = np.random.normal(loc=0, scale=np.sqrt(sig_sq[i]))
    plt.plot(y)
    plt.plot(sig_sq)
    plt.show()
    return y, sig_sq
# STEP 2: OPTIMIZE
def arg_max(M = 250):
    # GET DATA
    data = build_data(750, 0.95, 0.2)[0]
    y = data[M:]
    # OPTIMIZING FUNCTION
    def F(X):
        # BUILD pdf argument
        sig_sq_pre = np.sum((data[0:M]) ** 2) / M
        lmda_star = -np.log(t.pdf(y[0] ** 2 / sig_sq_pre, df=3))
        for i in range(len(y)):
            sig_sq = X*sig_sq_pre + (1-X)*(y[i-1])**2
            sig_sq_pre = sig_sq
            lmda_star += -np.log( t.pdf( y[i]**2/sig_sq, df=3 ))
        print(X)
        return lmda_star
    bnds = [(0.7, 0.999)]
    IC = np.array(0.75)
    result = optimize.minimize(F, method='TNC', bounds=bnds, x0=IC, options={'maxiter': 200})
    print(result)
    return result.x
if __name__ == '__main__':
    arg_max()

EDIT: question stem and python code accurate as of 8:18 PM EST (sorry, for some reason PF rarely shows me math mode and python mode until I've posted the problem, leading to tons of edits after. Maybe a Chrome issue, or ubuntu?

jedishrfu · Aug 23, 2022

@joshmccraney you should be checking that aspect. We could say it does but then you discover we were wrong.

I would place prints inside your loops to show variable values so you know the indexing and intermediate values are correct.

member 428835 · Aug 23, 2022

jedishrfu said:

@joshmccraney you should be checking that aspect. We could say it does but then you discover we were wrong.

I would place prints inside your loops to show variable values so you know the indexing and intermediate values are correct.

I've done this so many times it's unbelievable. I'm lead to believe it doesn't work (for alternative reasons I won't discuss), but I can't see where the issue is. I think I've implemented it correctly. I spent about 3 hours debugging, which didn't fix any bugs but lead to me writing it this way (which I think is the cleanest, aside from OOP, which I'm still learning)

berkeman · Aug 23, 2022

joshmccraney said:

I've done this so many times it's unbelievable. I'm lead to believe it doesn't work (for alternative reasons I won't discuss), but I can't see where the issue is. I think I've implemented it correctly. I spent about 3 hours debugging, which didn't fix any bugs but lead to me writing it this way (which I think is the cleanest, aside from OOP, which I'm still learning)

If that were happening to me, I would simplify the problem until I could get the results to agree, and then keep adding in more and more of the original complexity to see where it breaks.

This is an important learning lesson for you -- you will need to be able to figure these kinds of problems/puzzles out many times in your life, and PF will not always be able to help you (like when the problem is proprietary to your work... Been there and done that).

member 428835 · Aug 23, 2022

berkeman said:

If that were happening to me, I would simplify the problem until I could get the results to agree, and then keep adding in more and more of the original complexity to see where it breaks.

This is an important learning lesson for you -- you will need to be able to figure these kinds of problems/puzzles out many times in your life, and PF will not always be able to help you (like when the problem is proprietary to your work... Been there and done that).

This is very simplified actually (sad I know).

The problem is a simple one isn't it? Find the minimizing value (under constraint) for a function of a single variable. Sure, there are some noisy constants, but nothing too bad truthfully. I honestly think a sophomore in college could solve this as I have, or am I missing something?

berkeman · Aug 23, 2022

Have you implemented that equation in Excel? You can compare what you get for each summation step result in Excel with single-stepping through your Python code...

member 428835 · Aug 23, 2022

berkeman said:

Have you implemented that equation in Excel? You can compare what you get for each summation step result in Excel with single-stepping through your Python code...

I don't actually own excel (but am about to code it in Mathematica, and will try MATLAB too). I'll report back when I get results.

berkeman · Aug 23, 2022

joshmccraney said:

I don't actually own excel

Ah, now we find the core problem. You're a Windows hater are you?! Well, that'll teach you!

(J/K, I just live with the hatred boiling in my gut every day, but I still use their tools to get my work done...)

member 428835 · Aug 23, 2022

berkeman said:

Ah, now we find the core problem. You're a Windows hater are you?! Well, that'll teach you!

(J/K, I just live with the hatred boiling in my gut every day, but I still use their tools to get my work done...)

Hahahahahahahah jumbo shrimp, country music, microsoft works...oxymorons right there hahahaha I'm of course joking

jedishrfu · Aug 23, 2022

There are alternatives in the free open office suite.

https://www.openoffice.org/

Using the spreadsheet is a great way to check things because each cell could be an intermediate value.

Where I've been tripped up by python is in converting 2.7 code to 3.x code where the meaning of the / operator changes. In 2.7 division of two integers produces an integer and division of one or more floats produces a float.

I had code that took advantage of the integer feature to compute indices into a random file database. The 3.x code failed miserably and I had to do the step by step thing to isolate the problem. It was an absolute nightmare.

Here's more on the change from / in 2.7 to two separate operators // and / for 3.x code

https://www.tutorialspoint.com/division-operators-in-python

Filip Larsen · Aug 24, 2022

joshmccraney said:

Can you corroborate if my python script below matches the math

While a good code review by other insightful people can be very valuable, nothing beats the general approach of (unit) testing as this is a working proof that the code works as specified (in the tests). In your case I would recommend you look into unit testing in Python (e.g. https://docs.python.org/3/library/unittest.html) and then cover your code with as many test as you feel relevant of what you know must be true. Sometimes this means you have to break your code into smaller units in order to test it properly, but that is only a good thing.

Ibix · Aug 24, 2022

joshmccraney said:

for some reason PF rarely shows me math mode and python mode

The maths is a known issue - if there isn't already LaTeX on the page maths doesn't render until you refresh the page. There's a workaround for replies, which is to enter some LaTeX, copy your text to clipboard (just in case) then go to preview and refresh the page. It doesn't work for new threads, unfortunately.

Not displaying code blocks is not an issue I've seen, and it seemed to work when I created a dummy thread just now. That one may be a browser issue, don't know.

Incidentally, your definition of ##\sigma_t##:

joshmccraney said:

$$\sigma_t^2 = x\sigma_{t-1}^2 + (1-x)\sigma_{t-1}^2,$$

simplifies to ##\sigma_t^2=\sigma_{t-1}^2##.

Looking at your maths you seem to have a lot of divisions and logs and sums over five hundred terms. Accumulated rounding errors may be an issue, particularly if your terms are highly variable in magnitude or if some terms can be negative and only slightly different from others (e.g. if a=1e17, then a+1 is also 1e17 because the 1 is too small for the machine to represent, and even worse (a+1)-(a-1)=0, not 2). What precision does np.zeros give you by default? Does increasing or decreasing it significantly change your result?

Can you test a simpler thing? Can whatever external sanity checks are complaining be run for ##\sum_{t=250}^{252}## or something? If so does the sanity check fail then, or only when you increase the number of terms?

Can you re-order the maths to be better behaved? Note that ##\sum_i\log(a_i)=\log\left(\prod_ia_i\right)##. Is the product of terms better behaved than the sum of their logs?

pbuk · Aug 24, 2022

I think you are getting confused with what variables are vectors and what are (or should be) scalars (i.e. floats).

In particular your objective function F(X) is passed a np.array of the same size as the x0 parameter so what do you think (1 - X) is? Also F(X) should return a float but you are returning lambda_star which is set to the return value of np.log, a np.array.

member 428835 · Aug 24, 2022

Ibix said:

The maths is a known issue - if there isn't already LaTeX on the page maths doesn't render until you refresh the page. There's a workaround for replies, which is to enter some LaTeX, copy your text to clipboard (just in case) then go to preview and refresh the page. It doesn't work for new threads, unfortunately.

Not displaying code blocks is not an issue I've seen, and it seemed to work when I created a dummy thread just now. That one may be a browser issue, don't know.

Incidentally, your definition of ##\sigma_t##:simplifies to ##\sigma_t^2=\sigma_{t-1}^2##.

I thought I wrote: ##\sigma_t^2=y_{t-1}^2##?

Interesting observation about the product! It might make things better behaved, so I'll have to check this out!

Funny you said sanity checks: I actually got this thing working in Mathematica and was able to do some sanity checks for 250:251,252,...260 and really look at the plots. Indeed, I think I've programmed it correctly (same result in both Mathematica and python, and VERY different languages) and am using the wrong test to check whether it works or not. Your post was very insightful; thank you!

Vanadium 50 · Aug 24, 2022

joshmccraney said:

'm lead to believe it doesn't work

Boy if someone asked me to check code for them and I found out they didn't think it worked, I'd be pretty cheesed off. Only when you think it's right should you be asking people to check it. If you think it's wrong, you should ideally check it yourself, not leaving it for your helpers to figure out. I'd be really, really cheesed off.

I agree with the suggestion to try Excel.

I also think it;s good to build up the calculation a little at a time, printing out the intermediate steps.

member 428835 · Aug 24, 2022

Vanadium 50 said:

Boy if someone asked me to check code for them and I found out they didn't think it worked, I'd be pretty cheesed off. Only when you think it's right should you be asking people to check it. If you think it's wrong, you should ideally check it yourself, not leaving it for your helpers to figure out. I'd be really, really cheesed off.

I agree with the suggestion to try Excel.

I also think it;s good to build up the calculation a little at a time, printing out the intermediate steps.

Turns out everything I had was correct. The issue is someone in a position of authority thought it should tend to a particular value, hence me thinking it was wrong. By my account and looking at the math alone, I indeed thought it was correct.

Does this python script match the math?

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Does this python script match the math?

Similar threads