Statistics proof regarding integration of cdf

In summary: I guess.OP: I wrote an article on my blog about the difference quotient of a cdf: that is, the limit of the difference quotient of a cdf is the pdf of that distribution. So, it's possible to write the difference quotient of a cdf in terms of a double integral.It's been a while since I have thought about this stuff, but I think that the result you are trying to prove is a simple application of the fundamental theorem of calculus.Finally, this is a bit of a nitpick, but I don't like the notation \int_{-\infty}^\infty [F(x+c) - F(x)]dx. (That's a difference of two integrals;
  • #1
phiiota
29
0

Homework Statement


For any cdf F(x) of a continuous random variable, show that

[tex]\int_{-\infty}^{\infty}[F(x+b)-F(x+a)]dx=b-a[/tex]

for any a<b.


Homework Equations





The Attempt at a Solution


Not really sure where to begin. I figure I can split the integrals and do u subs, and (after some magic I don't understand) I'll end up with something along the lines of R+b-(R+a)=b-a, but I have no idea what to do with these cdfs (i mean, I have no idea what R would be, or even if that's right at all). I've looked all over the internet, and the only thing I could find that talked about integrating a cdf was that E(x) = int (1-F(x)) when f(x) was non negative, but I don't seem to have that situation here.

Anyway, a push in the right direction would be most appreciated.
 
Physics news on Phys.org
  • #2
phiiota said:

Homework Statement


For any cdf F(x) of a continuous random variable, show that

[tex]\int_{-\infty}^{\infty}[F(x+b)-F(x+a)]dx=b-a[/tex]

for any a<b.


Homework Equations





The Attempt at a Solution


Not really sure where to begin. I figure I can split the integrals and do u subs, and (after some magic I don't understand) I'll end up with something along the lines of R+b-(R+a)=b-a, but I have no idea what to do with these cdfs (i mean, I have no idea what R would be, or even if that's right at all). I've looked all over the internet, and the only thing I could find that talked about integrating a cdf was that E(x) = int (1-F(x)) when f(x) was non negative, but I don't seem to have that situation here.

Anyway, a push in the right direction would be most appreciated.

It is probably a bit easier to re-write the problem: let y = x+a. Then you need to show that
[tex] \int_{-\infty}^{\infty} [F(y+c) - F(y)] \, dy = c,[/tex]
where c = b-a. It is important to remember that the improper integral is defined as
[tex] \int_{-\infty}^{\infty} [F(y+c) - F(y)] \, dy
= \lim_{M,N \rightarrow \infty} \int_{-M}^{N} [F(y+c)-F(y)] \, dy.[/tex]

RGV
 
  • #3
Thank you for your reply. I'm still a bit lost. Is that integral a special form I should recognize? I'm still don't understand how I can say anything about what that equals without knowing what F(x) is. All I know is that it's non-negative and bounded by 0,1.

It took me a little while to exactly figure out what this looks like, and I've done a few graphs to show what the curve of F(y+c)-F(y) looks like for a sample CDF, but, like I said, I'm still lost in the general case.
 
  • #4
phiiota said:
Thank you for your reply. I'm still a bit lost. Is that integral a special form I should recognize? I'm still don't understand how I can say anything about what that equals without knowing what F(x) is. All I know is that it's non-negative and bounded by 0,1.

It took me a little while to exactly figure out what this looks like, and I've done a few graphs to show what the curve of F(y+c)-F(y) looks like for a sample CDF, but, like I said, I'm still lost in the general case.

Look at WHY we need to use the limiting definition of the integral. If we did not do that we would not be allowed to write
[tex]\int_{-\infty}^{\infty}[F(x+c) - F(x)] dx[/tex]
as
[tex] \int_{-\infty}^{\infty} F(x+c)\,dx - \int_{-\infty}^{\infty} F(x) \, dx,[/tex]
because that would be a difference of two divergent integrals and so would = ∞-∞. That does not occur when we start with the finite M,N form.

RGV
 
  • #5
phiiota said:
Thank you for your reply. I'm still a bit lost. Is that integral a special form I should recognize? I'm still don't understand how I can say anything about what that equals without knowing what F(x) is. All I know is that it's non-negative and bounded by 0,1.

It took me a little while to exactly figure out what this looks like, and I've done a few graphs to show what the curve of F(y+c)-F(y) looks like for a sample CDF, but, like I said, I'm still lost in the general case.

You know something about F(x):

$$F(x) = \int_{-\infty}^x dt f(t),$$

where f(t) is the probability density function.

Now, my recommendation would be to forget about the overall integral over x for a moment, and instead focus on

$$F(x+c) - F(x).$$

Given that you can express each of these terms as an integral over the probability density, can you see how you might be able to combine these two terms into a single term? (This will also have the benefit that you will not have to treat the overall x integral as having finite integration limits which are taken to infinity at the end of the calculation, as Ray suggests).

See how far you can get with this hint, and if you can't get it we'll give you another push in the right direction.
 
  • #6
Mute said:
You know something about F(x):

$$F(x) = \int_{-\infty}^x dt f(t),$$

where f(t) is the probability density function.

Now, my recommendation would be to forget about the overall integral over x for a moment, and instead focus on

$$F(x+c) - F(x).$$

Given that you can express each of these terms as an integral over the probability density, can you see how you might be able to combine these two terms into a single term? (This will also have the benefit that you will not have to treat the overall x integral as having finite integration limits which are taken to infinity at the end of the calculation, as Ray suggests).

See how far you can get with this hint, and if you can't get it we'll give you another push in the right direction.

Your suggestion is a good one, but the result is true in general, even if F does not have a density: it just has to have the standard properties of a cdf on ℝ. However, a density gives the OP some place to start.

RGV
 
  • #7
So I get that I can rewrite F(x+c)-F(x) as ∫xx+cf(x)dx...

is it alright for me to change the order of integration? Because then i would have
[tex]\intop_{-\infty}^{\infty}\int_{y}^{y+c}f(t)dtdx=\int_{y}^{y+c}\int_{-\infty}^{\infty}f(t)dtdx=\int_{y}^{y+c}1\cdot dx=(y+c)-y=c[/tex]

But I'm not sure if I'm justified in doing this.
 
  • #8
phiiota said:
So I get that I can rewrite F(x+c)-F(x) as ∫xx+cf(x)dx...

is it alright for me to change the order of integration? Because then i would have
[tex]\intop_{-\infty}^{\infty}\int_{y}^{y+c}f(t)dtdx=\int_{y}^{y+c}\int_{-\infty}^{\infty}f(t)dtdx=\int_{y}^{y+c}1\cdot dx=(y+c)-y=c[/tex]

But I'm not sure if I'm justified in doing this.

Fubini's theorem tells you the conditions under which it is valid to exchange the order of integrals.

Note that, as Ray says, you can do the calculation without using the density function and instead manipulate the integrals using finite integration limits that you take to infinity at the end of the calculation.
 
  • #9
I'm guessing off your last reply that I'm not in fact allowed to do what I did, and I think I see the reason why: I switched the limits, but not the variables; so I first should have integrated f(t)dx as xf(t), which, evaluated from the proper limits should go to infinity minus negative infinity times f(t), which is nonsense. Is this correct?

However, going over what Ray wrote, I'm starting to see what he was getting at. There's a point where F(x+c)=F(x) (or at least converging, which is what I'm guessing he meant when he talked about taking the limits), two points actually, so I'm bound not only with F(x) being between 0 and 1, but with x being between some two finite points. So I can more or less restrict myself to considering those points.

There's a proof I haven't done yet, which says that F(x) has a uniform distribution(0,1), so when I integrate that, I get just x. When I take the difference then, between the curves F(x) and F(x+c), I get a parallelogram with base c and height 1, thus giving me the area I'm looking for. I'm still a bit unclear on how to get this exactly, or if this is alright (I haven't proven the thing I'm quoting yet), but this problem is starting to make more sense, now that I can visualize it more.
 
  • #10
phiiota said:
However, going over what Ray wrote, I'm starting to see what he was getting at. There's a point where F(x+c)=F(x) (or at least converging, which is what I'm guessing he meant when he talked about taking the limits),
Take a good look at what Ray wrote in reply #4, it's a really good way to do the proof. You can make a substitution like [itex]u = x+c[/itex] for the first integral, but it's important to also write the integral bounds using limits, eg [itex]\lim T \rightarrow \infty[/itex], [itex]\int_{-T}^{T} \ldots dx[/itex], so that you can adjust the bounds on the integral to reflect the new variable. (eg [itex]\int_{-T+c}^{T+c} \ldots du[/itex]).
 
Last edited:
  • #11
phiiota said:
I'm guessing off your last reply that I'm not in fact allowed to do what I did, and I think I see the reason why: I switched the limits, but not the variables; so I first should have integrated f(t)dx as xf(t), which, evaluated from the proper limits should go to infinity minus negative infinity times f(t), which is nonsense. Is this correct?

No, you are just switching the order of the integrals, ##\int_{-\infty}^{\infty}dt \int_{y}^{y+c} dx \rightarrow \int_{y}^{y+c} dx \int_{-\infty}^{\infty}dt##. Your calculation was correct; the link to Fubini's theorem was to give you a way to show that switching the order of integration was valid in this case. (There are cases for which it is not).

However, going over what Ray wrote, I'm starting to see what he was getting at. There's a point where F(x+c)=F(x) (or at least converging, which is what I'm guessing he meant when he talked about taking the limits), two points actually, so I'm bound not only with F(x) being between 0 and 1, but with x being between some two finite points. So I can more or less restrict myself to considering those points.

There's a proof I haven't done yet, which says that F(x) has a uniform distribution(0,1), so when I integrate that, I get just x. When I take the difference then, between the curves F(x) and F(x+c), I get a parallelogram with base c and height 1, thus giving me the area I'm looking for. I'm still a bit unclear on how to get this exactly, or if this is alright (I haven't proven the thing I'm quoting yet), but this problem is starting to make more sense, now that I can visualize it more.

See uart's suggestion for help solving the problem following Ray's suggestion.
 

1. What is the purpose of integrating the cumulative distribution function (CDF)?

The integration of the CDF allows us to calculate the probability of a random variable falling within a certain range of values. It is a critical tool in statistical analysis and hypothesis testing.

2. How is the CDF related to the probability density function (PDF)?

The CDF is the integral of the PDF, which represents the probability distribution of a continuous random variable. The CDF provides the cumulative probability of the random variable taking on a value less than or equal to a specific value.

3. Can the CDF be used to calculate the probability of a specific value?

Yes, the CDF can be used to calculate the probability of a specific value by subtracting the CDF value at the lower bound from the CDF value at the upper bound. This gives the probability of the random variable falling within that specific range of values.

4. How is the CDF used in hypothesis testing?

The CDF is used to calculate the p-value in hypothesis testing. The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. By integrating the CDF and comparing it to the observed test statistic, we can determine the likelihood of the null hypothesis being true.

5. Are there any limitations to using the CDF in statistics?

One limitation of using the CDF is that it assumes the underlying distribution of the data is known. In real-world applications, this may not always be the case. Additionally, the CDF may not be well-defined for certain distributions, making it difficult to use in these situations. It is important to carefully consider the assumptions and limitations of using the CDF in statistical analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
740
  • Calculus and Beyond Homework Help
Replies
2
Views
152
  • Calculus and Beyond Homework Help
Replies
2
Views
840
  • Calculus and Beyond Homework Help
Replies
4
Views
304
  • Calculus and Beyond Homework Help
Replies
15
Views
1K
  • Calculus and Beyond Homework Help
Replies
9
Views
957
  • Calculus and Beyond Homework Help
Replies
8
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
11
Views
2K
Back
Top