Range of Difference: Bounds for Length of Stay

WWGD · Jul 24, 2022

Ok, so I'm given hotel data :{Arrival Date, Departure Date}, each in terms of nth day of the year , and I want to estimate whether the range/difference, aka, the length of stay is below a bound. Say a week ( 7 days) for definiteness.

I'm thinking of using either the distribution of the range or to use order statistics for the auxiliary variable Difference in Dates := D= Departure Date - Arrival Date and use the distribution of the range ##D_{Max}- D_{Min}##, or maybe just the distribution of the Max.
Is this a good way?
Thing is I don't know the distribution of neither Arrival Date nor of Departure Date, so I don't see how to compute the distribution of any of these 3: Max Departure, Min Arrival, Max Departure- Min Arrival, to compute the order statistics.
Maybe @StephenTashi can comment?

Stephen Tashi · Jul 25, 2022

WWGD said:

and use the distribution of the range ##D_{Max}- D_{Min}##

I don't understand the data. Is there a possibly different ##D_{max}## for each person who stayed at the hotel? Do most people in the data have more than one stay?

pbuk · Jul 25, 2022

Stephen Tashi said:

I don't understand the data.

Me either. If you are interested in the length of stay then surely it is trivial to compute it for each stay and do stats on that directly? Are you interested in some mathematical equivalence or actually computing values, and if the latter how is the data stored and what language are you using to analyse it?

WWGD · Jul 25, 2022

Stephen Tashi said:

I don't understand the data. Is there a possibly different ##D_{max}## for each person who stayed at the hotel? Do most people in the data have more than one stay?

Thank for your reply. Apologies

pbuk said:

Me either. If you are interested in the length of stay then surely it is trivial to compute it for each stay and do stats on that directly? Are you interested in some mathematical equivalence or actually computing values, and if the latter how is the data stored and what language are you using to analyse it?

Yes, this is what I meant to do, but I was on my phone, which was dying on me. I wanted to define order statistics on the length of stay = Departure Date - Arrival Date. Each date described as nth day of the year.

WWGD · Jul 25, 2022

Length of stay is a discrete random variable, taking values in the Natural Numbers and ##\{0\}##. As such, we can compute its deciles, including median, etc., unless I'm missing something. Or maybe there's some other statistic to evaluate claims about its range.

pbuk · Jul 26, 2022

WWGD said:

Length of stay is a discrete random variable,

I would rather say length of stay can be modeled by a discrete random variable.

WWGD said:

taking values in the Natural Numbers and ##\{0\}##.

Unless this is the kind of hotel that rents rooms by the hour I think the range is strictly positive

WWGD said:

As such, we can compute its deciles, including median, etc., unless I'm missing something.

Yes of course, I'm still not seeing where the difficulty lies?

WWGD said:

Each date described as nth day of the year.

That will cause problems over year ends; I would be inclined to convert to some other format such as posix timestamp so you have ## days = \lfloor \frac{end - start}{86,400} \rfloor ##.

WWGD · Jul 26, 2022

pbuk said:

I would rather say length of stay can be modeled by a discrete random variable.Unless this is the kind of hotel that rents rooms by the hour I think the range is strictly positive Yes of course, I'm still not seeing where the difficulty lies?That will cause problems over year ends; I would be inclined to convert to some other format such as posix timestamp so you have ## days = \lfloor \frac{end - start}{86,400} \rfloor ##.

But In order to compute the order statistics, I need to know the distribution of the length of stay. How do I do that? Maybe a bootstrap? Sorry if my question is too simple. I'm not too familiar with this topic.

pbuk · Jul 26, 2022

WWGD said:

I need to know the distribution of the length of stay. How do I do that?

By constructing the sample space ## \{ l_i \} = \{ depart_i - arrive_i \} ##.

Stephen Tashi · Jul 26, 2022

WWGD said:

I wanted to define order statistics on the length of stay = Departure Date - Arrival Date.

More vocabulary issues: For a specific sample of data, "order statistics" is already a defined term - just like "sample mean" is already a defined term. An order statistic for a specific set of data is a constant. Considering it as a formula for computing that number, an order statistic is a random variable.

The same applies to terms like "quartiles" except that one might also apply such a term to a probability distribution instead of a sample. If you think of "quartiles" applying to a probability distribution then they are population parameters instead of sample statistics.

pbuk · Jul 27, 2022

For a concrete example, if the source data is a SQL table we might have

SQL:

SELECT
  decile
  , MIN(stay) AS min
  , MAX(stay) AS max
  , AVG(stay) AS mean

FROM (
  SELECT
    DATEDIFF(depart, arrive) AS stay
    , NTILE (10) OVER (
      ORDER BY DATEDIFF(depart, arrive)
    ) AS decile
  FROM
    stays
) AS stays

GROUP BY
  decile
;

WWGD · Jul 27, 2022

pbuk said:

For a concrete example, if the source data is a SQL table we might have

SQL:

SELECT
  decile
  , MIN(stay) AS min
  , MAX(stay) AS max
  , AVG(stay) AS mean

FROM (
  SELECT
    DATEDIFF(depart, arrive) AS stay
    , NTILE (10) OVER (
      ORDER BY DATEDIFF(depart, arrive)
    ) AS decile
  FROM
    stays
) AS stays

GROUP BY
  decile
;

Thanks. But how do I use this for a test on a given length of stay/ or to construct a confidence interval of some sort? Say the claim is made that average length of stay is 5 days. What statistic do I compute , and what is its distribution? What is an estimator for the population range?

pbuk · Jul 27, 2022

WWGD said:

what is its distribution?

That is indeed the question. You see we can't say anything about the relationship between a sample and the population unless we know how the data are distributed.

WWGD said:

Say the claim is made that average length of stay is 5 days.

Well let's say the claim is made that length of stay is Poisson distributed with a mean value of 5 days. We could test this with a chi-squared test.

WWGD said:

What is an estimator for the population range?

Again that depends on the distribution: many distributions including Poisson have no upper bound. On the other hand a linear distribution is bounded. But think about the implications of this: are you saying that by looking at a sample of some people that have stayed in some hotels during a certain period you want to draw a conclusion that nobody stays in any hotel ever for more than n days?

This situation has other dangers. Let's say you calculate that the average length of stay is 5 days, what does this actually tell you? Certainly not that people who stay for 5 days are your most important customers, for two reasons:

You may not have any customers who stay for 5 days - the mean could be made up of 100 stays of 1 night and 200 stays of 7 nights! This points towards the bigger problem:
Length of stay is probably not a useful statistic anyway, you probably want length of stay squared, which is what hotels generally measure although in a slightly different form: they look at bed nights, or room nights. So in the above example we would see 100 bed nights on stays of 1 night and 1,400 bed nights on stays of 7 nights, so the average length of stay would be (100 x 1 + 1,400 x 7) / 1,500 = 6.6 nights.

So if you are looking for concise answers to come from means and variances you are going to have to know a lot more about the population distribution.

Range of Difference: Bounds for Length of Stay

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect