Regression Analysis - Constructing a Model ((Need ))

In summary: There are methods that use "simplex" geometry. There are methods that use "interior point" geometry. In summary, the problem described involves finding a general function/method that can take on new values in different situations based on minimum/maximum data point constraints. The function must satisfy a set of constraints and minimize the mean square distance between the function and a line between two points. To make the problem tractable, it can be assumed that the function belongs to a specific family, such as quadratic functions, and can be defined by a few parameters. There are various methods for solving this type of optimization problem, including simulated annealing, genetic programming, and algorithms that use calculus, simplex geometry, or interior point geometry.
  • #1
Fjolvar
156
0
Hello,

I am trying to construct a general function/method based on two sets of minimum/maximum data point constraints, which can take on new values in different situations. The only known data for this general function is the starting point (y-axis intercept) and the x-range. The rate of change over time must equal zero, so the amount increased/decreased must be compensated within the x-range. Optimally will vary as little as possible, as long as it meets the constraints.

I attached a figure of an example plot. The minimum constraint data points are marked as the 'necessary' function. The maximum constraint data points are marked as 'maximal.' The function marked as 'case 2' is an example of the general function that would satisfy the constraints.

I would appreciate any advice or suggestions on how to approach this problem as I've had very little success so far. Thank you in advance.
 

Attachments

  • Example Graph.jpg
    Example Graph.jpg
    14 KB · Views: 455
Last edited:
Physics news on Phys.org
  • #2
In my opinion, a procedure for solving this general type of problem would boil down to having a computer program that minimizes a function of several variables subject to many constraints using a numerical method. There are many methods known for solving this type of problem, but they are not methods where you apply a relatively simple formula that produces the answer.

Let my try to interpret your description of the problem as mathematics.

We are given data for two functions [itex] y_{lower}, y_{upper}[/itex] that are defined on [itex]N[/itex] points [itex]{x_0,x_1,...x_N}[/itex]. We want to find a function [itex] f(x) [/itex] satisfying:

[itex]N [/itex] inequality constraints:

[itex] y_{lower}(x_i) \lt f(x_i) \lt y_{upper}(x_i) [/itex] for [itex] i = 1,2..N [/itex]

Two equality constraints:

[itex] f(x_0) = f(x_N) = y_0 [/itex] for some number [itex] y_0 [/itex]

You want [itex] f(x) [/itex] to vary as little as possible One way to express that in mathematics is to say that we want to minimize the mean square distance between [itex] f(x) [/itex] and the line between [itex] (x_0,y_0) [/itex] and [itex] (x_n,y_0) [/itex].

So the "objective function" is [itex] \frac{1}{N} \sum_{i=1}^N (f(x_i) - y0)^2 [/itex]

If you know the mathematical definition of "a function", you know that functions can be defined with symbols or words and combinations thereof. So trying to solve the above problem for "the" best function is intractable because there is such a great variety in functions.

To make the problem tractable. we could assume it comes from a family of functions, such as quadratic function, that is defined by a few parameters.

If we assume [itex] f(x) = Ax^2 + Bx + C [/itex] and also treat [itex] y_0 [/itex] as a variable, the problem becomes:

Minimize [itex] \frac{1}{N} \sum_{i=1}^N (f(x_i) - y0)^2 [/itex]
with respect to the variables, [itex] A,B,C,y_0 [/itex]
subject to the constraints
[itex] y_{lower}(x_i) \lt f(x_i) \lt y_{upper}(x_i) [/itex] for [itex] i = 1,2..N [/itex]
[itex] f(x_0) = f(x_N) = y_0 [/itex] for some number [itex] y_0 [/itex]

This format ( Minimize...with respect to...subject to the constraints) is a standard way to state optimization problems. There are methods of solving such problems that resemble organized forms of trial and error: "simulated annealing", "genetic programming". There are algorithms that rely on calculus and "if ...then..." decisions", such as "the conjugate gradient method", versions of it that handle constraints.
 

1. What is regression analysis and why is it used?

Regression analysis is a statistical method used to analyze the relationship between two or more variables. It is used to understand how changes in one variable affect another variable, as well as to predict future values of the dependent variable based on the values of the independent variables.

2. How do you construct a regression model?

To construct a regression model, you first need to identify the dependent variable and the independent variables. Then, you collect data for these variables and plot them on a scatter plot. Next, you fit a line or curve to the data points using a regression equation. This equation is based on mathematical principles and helps to determine the relationship between the variables.

3. What are the assumptions of regression analysis?

The three main assumptions of regression analysis are linearity, homoscedasticity, and normality. Linearity assumes that there is a linear relationship between the variables being analyzed. Homoscedasticity assumes that the variance of the errors or residuals is constant across all values of the independent variable. Normality assumes that the errors or residuals are normally distributed.

4. How do you evaluate the accuracy of a regression model?

There are several methods for evaluating the accuracy of a regression model, including calculating the coefficient of determination (R-squared), conducting hypothesis tests on the regression coefficients, and examining the residual plots to check for patterns and outliers. Additionally, cross-validation techniques can be used to test the model's performance on new data.

5. Can regression analysis be used for categorical variables?

Yes, regression analysis can be used for both continuous and categorical variables. For categorical variables, techniques such as logistic regression can be used to predict the probability of an event occurring based on the values of the independent variables. However, some assumptions of regression analysis may need to be modified for categorical variables, such as using dummy variables for categorical predictors.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
789
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
412
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
Back
Top