Brute force regression software?

In summary, brute force regression software is a type of statistical analysis tool that is used to determine the best fit for a set of data by testing all possible combinations of variables. This approach can be time-consuming and computationally demanding, but it can provide accurate results even for complex data sets. Brute force regression software is commonly used in fields such as finance, economics, and engineering to identify patterns and make predictions based on numerical data.
  • #1
pslarsen
23
1
Hi all

I have a lot of data, and was thinking if there exists a program that will apply a type of brute force regression tool to basically try any thinkable combination of variables and mathematical expressions to minimize the error between Y and Y_predicted.

The data [(x1 vs Y) (x2 vs Y)... (xn vs Y)] is very scattered data (random example below), so I will need something significantly more complicated than linear terms to get a nice Y to Y_predicted plot.

Br,
Peter
 

Attachments

  • regression.jpg
    regression.jpg
    33.4 KB · Views: 432
Engineering news on Phys.org
  • #2
I don't think what you've asked for is really what you want. If you have N data points, an (N-1) dimensional polynomial will fit the data exactly, with zero error. However, such a model is probably physically meaningless. Usually you provide some physical insight of what the model should look like. What is your data, and what relationship do you expect between X and Y?
 
  • #3
So you are saying that the fit depends only on the number of points in a variable, and not on the number of variables nor the variable dependencies - that sounds pretty strange.

Doesn't the N-1 rule apply to only a single variable with N values?

The purpose of the program would be to discover corrections between variables and start with simple relationships and try minimizing the error using various combinations. One would need some level of tolerance on the error, and the program would scale its complexity until an acceptable solution is found.

If there is enough data I guess that the correlation would eventually be meaningful, or what? One would obviously validate the model against some test data.

Is this in reality of problem for a neutral network? Problem with that is that I don't know if I have enough data..

Br, Peter
 
  • #4
The point is that if I have some number of data points, I can always find a model where the number of free parameters in the model equals the number of parameters in the data. Then the model can fit the data exactly. We usually call this "overfitting" (see the example below). To do what you are proposing, you would have to quantitatively define the following:
(1) What makes a model "simple"? How do you measure the "complexity" a model? Which is more complex, a cubic polynomial, or an error function with one free parameter? Is it the number of free parameters? Or is a linear polynomial model simpler than some highly non-linear function.

I still think in order to do what you are trying to do, you need to inject some physical insight, and not search randomly through the infinite number of mathematical relationships between the variables.

Overfitted_Data.png
 

Attachments

  • Overfitted_Data.png
    Overfitted_Data.png
    4 KB · Views: 530
  • Like
Likes DrClaude
  • #5
So, you have x1, x2 ... xn input variables.
For each value of y that you recorded, did you record what the values of all x1, x2 … xn were?

How many values of y did you record all the xi inputs?
Where is that data table?
 
  • #6
Might be a good idea to try Fourier analysis, see if the findings hint at plausible processes, build hypotheses from such, test the resulting models...

I don't remember much from my Statistics courses, but one lecturer's dire warnings about 'Lies, Damned Lies and Inappropriate Correlations' still echo !
 
  • #7
pslarsen said:
to basically try any thinkable combination of variables and mathematical expressions to minimize the error between Y and Y_predicted.

As others have pointed out, doing that literally would produce nonsensical results. However, there are more sophisticated approaches to fitting models that try to find a trade-off between the number of parameters in the model and the error in fit. This prevents getting an model that fits the data well but has a zillion parameters. The specific software to do this will depend on the general form of model you want. For example, look up info on ANOVA (Analysis of Variance) software.
 
  • #8
Baluncore said:
How many values of y did you record all the xi inputs?
We still do not know if this is a Baysian or a least squares fitting problem.
There are software packages in the cloud that seek to find optimum equations and parameters for big data sets.
 
  • #9
pslarsen said:
I have a lot of data
Sure. Why not tell us how big your n is ?

The pictorial example looks pretty unreal.

If n is fairly small you might try the http://www.sigmaplot.co.uk/products/tablecurve2d/tablecurve2d.php program

Else try PCA
 

1. What is brute force regression software?

Brute force regression software is a type of statistical analysis tool that uses an exhaustive search method to find the best fitting model for a given dataset. This method involves trying all possible combinations of variables and coefficients to determine the optimal model.

2. How does brute force regression software differ from other regression methods?

Unlike other regression methods, brute force regression does not make any assumptions about the data or the relationship between the variables. It systematically tests all possible combinations, making it more accurate but also more computationally intensive.

3. What are the benefits of using brute force regression software?

The main benefit of using brute force regression software is that it can find the optimal model for a given dataset without any prior assumptions. It also provides a comprehensive analysis of all possible models, allowing for a more thorough understanding of the data and potential relationships.

4. Are there any limitations to using brute force regression software?

One limitation of brute force regression is that it can be computationally demanding, especially for large datasets with many variables. It also does not provide any insights into the underlying relationship between the variables, as it simply tests all possible combinations without considering the data's context.

5. How can I choose the best brute force regression software for my needs?

Choosing the best brute force regression software depends on your specific needs and the capabilities of the software. Some factors to consider include the size and complexity of your dataset, the software's computational power, and its ability to handle different types of regression models. It is also helpful to read reviews and compare features before making a decision.

Similar threads

Replies
4
Views
5K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
161K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • General Math
Replies
4
Views
7K
  • STEM Academic Advising
Replies
10
Views
4K
Back
Top