Can someone recommend an algorithm to optimize this data

In summary: Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coeficients that maximizes the sum ##ZN##?Yes, I would like to find the combination of coefficients that maximizes the sum ##ZN##.
  • #1
fahraynk
186
6
I have a data set of samples, and I made up some variables that filter out unwanted samples from the data set.

Say each sample has values ##\vec{x}=x_1, x_2,..., x_n##, and I know the values of ##\vec{x}## for each sample.
Also, if I sum up all of the samples, I get a value, ##Z##, which tells me something about the samples.

Now I want to find a series of parameters, ##a_i, b_i## such that :
$$
(a_1<x_1<b_1),\,\,\,\, (a_2<x_2<b_2)\,\,\,\, ... \,\,\,\,(a_n<x_n<b_n)
$$

These parameters essentially filter out unwanted samples, to affect the value ##Z##. So, for example, sample ##1## has ##x_1=0##, so if I set ##0<x_1<2##, then sample##1## is left out, and only samples with ##x_1 \in (0,2)## are included.

How can I optimize the parameters ##a_i, b_i##, such that I get the best value of ##Z## ?

##Z## is interesting also, I want the highest ##Z## with the largest number of samples ##(N)##, so it's probably better to multiply ##Z*N## and fit for this.

I don't think this is a standard regression type of problem. I tried a brute force solution, but it will take way too long. Can anyone recommend an algorithm, or a new way to represent the problem so that it can be solved by some algorithm?
 
Technology news on Phys.org
  • #2
fahraynk said:
How can I optimize the parameters ##a_i##, ##b_i##, such that I get the best value of ##Z## ?

By "best" do you mean "highest"?

Also, can any of the ##x_i## be negative? If not, it should be obvious that removing any samples will decrease ##Z##.

fahraynk said:
I want the highest ##Z## with the largest number of samples ##(N)##, so it's probably better to multiply ##Z*N## and fit for this.

You need to pick what you want to optimize for. Optimizing for the highest ##Z## is not the same as optimizing for the highest ##Z * N##. You can only optimize for one thing, so you need to pick which.
 
  • #3
Thank you for your reply,

PeterDonis said:
By "best" do you mean "highest"?
You need to pick what you want to optimize for. Optimizing for the highest ##Z## is not the same as optimizing for the highest ##Z * N##. You can only optimize for one thing, so you need to pick which.

Let me define ##Z## better.
Let each sample ##i## have a value, ##z_i##, then ##Z## is the average of ##z##, or $$Z=\frac{\sum_1^N z_i}{N} \,\,\text{where N is the number of samples}$$

It's my hypothesis that the values in ##\vec{x_i}## for each sample say something about the value of ##z_i##.
##a_i, b_i, z_i## can be positive or negative as well.

I am interested in both ##Z*N## and ##\text{max}(Z)##. Somewhat more interested in ##Z*N## though.
 
  • #4
fahraynk said:
Let each sample ##i## have a value, ##z_i##,

I thought you said each sample had multiple values ##x_i##. Is ##z_i## just one of those, or an average of them, or a sum of them, or what?
 
  • #5
PeterDonis said:
I thought you said each sample had multiple values ##x_i##. Is ##z_i## just one of those, or an average of them, or a sum of them, or what?

##z_i = f(\vec{x_i})##, the problem is the ##x_i## are somewhat piecewise, so I don't think I can do a simple regression. For example, say ##x_1 \in [-22,-18] \cup [+18,+22]##, and ##z_i## might be higher in the cases where ##x_i\in[-22,-20] \cup [20,22]##.
 
  • #6
fahraynk said:
##z_i = f(\vec{x_i})##,

Ok, so ##z_i## is just some number computed from the vector ##\vec{x}_i##. And ##Z## is the average of all those numbers. Got it.

fahraynk said:
I am interested in both ##Z*N## and ##\text{max}(Z)##. Somewhat more interested in ##Z*N## though.

From the above, ##Z * N## would just be the sum of all the ##z_i##. Are you saying you're interested in how to maximize that sum?
 
  • #7
PeterDonis said:
Ok, so ##z_i## is just some number computed from the vector ##\vec{x}_i##. And ##Z## is the average of all those numbers. Got it.
From the above, ##Z * N## would just be the sum of all the ##z_i##. Are you saying you're interested in how to maximize that sum?
Yeah, I would like to find that sum, ##ZN##, and as another problem find the maximum ##Z##
 
  • #8
fahraynk said:
I would like to find that sum, ##ZN##,

Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coeficients that maximizes the sum ##ZN##?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either ##ZN## or ##Z##: you can always just pick larger coefficients since there is no largest real number.
 
  • #9
PeterDonis said:
Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coefficients that maximizes the sum ##ZN##?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either ##ZN## or ##Z##: you can always just pick larger coefficients since there is no largest real number.

Yes, I want to find the coefficients that maximize the sum, and it would be nice to find the coefficients that maximize ##Z## as well.
The coefficients can be anything, but if you make them too weird then there are no samples which match them. So for example if I set ##x_1>1000## then ##N\rightarrow 0##
 
  • #10
fahraynk said:
The coefficients can be anything, but if you make them too weird then there are no samples which match them.

Where are the samples coming from?

fahraynk said:
if I set ##x_1>1000## then ##N \rightarrow 0##

Why?
 
  • #11
PeterDonis said:
Where are the samples coming from?
Why?
Think of it like a population. if ##x_1## is the amount of money a rich person makes for them to be able to buy a car, ##x_1>100,000,000,000## will have no people if no one in the population makes that much money.
 
  • #12
fahraynk said:
if ##x_1## is the amount of money a rich person makes for them to be able to buy a car, ##x_1>100,000,000,000## will have no people if no one in the population makes that much money.

This doesn't answer my question. Where are the samples coming from? What kind of data? Where from?

I can't help you if I don't understand where your data is coming from.
 
  • #13
Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?

I'm looking for an algorithm to create a subset ##S^* \subset S## , where ##S## is the set of all samples in dataset.

S is of size ##N>n##, ##S^*## is of size ##n##.

{##A_i##}##_1^N## is the sequence of all samples in ##S##. Each ##A_i## has a list of properties ##\vec{x_i}=x_1,x_2,...,x_n##, and ##z_i=f(\vec{x_i})##.

##x, z \in R##

I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3##}, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##
 
  • #14
fahraynk said:
I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

This is simple: just remove all the samples that have negative ##z_i##.
 
  • #15
You're trolling.
 
  • #16
fahraynk said:
You're trolling.

No, I'm responding to what you actually wrote. You already said that ##z_i## for each sample is just a number (this number is obtained by computing some function of the numbers ##x_i## in the vector ##\vec{x}_i##, but that doesn't change the fact that in the end it's just a number for each sample). So if you have ##N## samples, you have ##N## numbers that sum to some number. If you want to find a subset of the numbers that maximizes the sum, you take out any numbers which are negative. That's obvious from basic arithmetic.

If that answer isn't what you were looking for, then you need to do a better job of describing what you are actually doing and what you are trying to accomplish.
 
  • #17
You have written a poorly specified question, and Peter has shown more patience with it than many others would.
 
  • #18
Vanadium 50 said:
You have written a poorly specified question, and Peter has shown more patience with it than many others would.
I wrote the following :

fahraynk said:
I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3##}, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##

Then, Peter wrote the following :

PeterDonis said:
This is simple: just remove all the samples that have negative ##z_i##.

So, I wrote I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##, where ##\sum_1^n z_i=C##.
Meaning I want to find constraints on ##\vec{x}## to maximize ##C##. And I am asking for an algorithm.

Peter then wrote : just remove the samples with negative ##z_i##.
Does he mean by hand? I'm trying to optimize a set of constraints to do that for me. It just felt like a troll.
 
  • #19
fahraynk said:
Does he mean by hand?

You don't have to do it "by hand"; you can just program the computer to do it for you.

fahraynk said:
I'm trying to optimize a set of constraints to do that for me.

Isn't ##z_i > 0## a constraint? What other constraint would you need?
 
  • #20
I need to find constraints on ##\vec{x}## to maximize the sum of ##z_i## in the subset. You are suggesting I just choose a subset without negative ##z_i##.

Yeah, I would like to, but I need to do that by finding the best constraints on ##\vec{x_i}##.
fahraynk said:
Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?
For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3## }, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##
 
  • #21
fahraynk said:
I would like to, but I need to do that by finding the best constraints on ##\vec{x_i}##.

Why would you do it in that roundabout fashion, when you already know how to compute ##z_i## for each ##\vec{x}_i##? Just compute ##z_i## for each sample, and throw out the samples where it's negative. Simple algorithm. Done.
 
  • #22
PeterDonis said:
Why would you do it in that roundabout fashion, when you already know how to compute ##z_i## for each ##\vec{x}_i##? Just compute ##z_i## for each sample, and throw out the samples where it's negative. Simple algorithm. Done.

Because I don't know future ##z_i##. I only know ##\vec{x}## and I want to predict ##z_i##. So I have a dataset of known ##z_i, \vec{x_i}##, and I want to find the best constraints on ##\vec{x}## to predict ##z_i##.

I stated what I wanted to do, find best subset by finding constraints on ##\vec{x}##. Is my question poorly defined because I did not state why?

I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?
Because I want to make ice, if I could start off with ice, why would I ask how to freeze water?

I don't know dood. I dunno...
 
  • #23
fahraynk said:
Because I don't know future ##z_i##.

What "future" are you talking about? You said you have a set of samples. You can compute ##z_i## for each of them.

fahraynk said:
I only know ##\vec{x}## and I want to predict ##z_i##.

This makes no sense. You know the function that computes ##z_i## from any ##\vec{x}##. There's nothing to predict.

fahraynk said:
I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?

No, that's not what I'm saying. I'm saying that if you have water, and you want to freeze it, you already know how to do that: you just cool it down until it freezes.

What you're asking looks to me like "I have water and I know how to freeze it now, but I might get some different water in the future and I want to know how to predict how to freeze that." Why would you do anything different from what you do with the water you have now?
 
  • #24
fahraynk said:
I have a dataset of known ##z_i##, ##\vec{x_i}##, and I want to find the best constraints on ##\vec{x}## to predict ##z_i##.

Do you not even see the contradiction here? You say you know ##z_i##, but then you say you want to predict ##z_i##? Huh?
 
  • #25
PeterDonis said:
Do you not even see the contradiction here? You say you know ##z_i##, but then you say you want to predict ##z_i##? Huh?

I want to predict ##z_i## on samples of which I don't know ##z_i##.

I have a dataset of ##[\vec{x_i},z_i##] points, where I already know ##z_i##.

I know ##z_i## is a function of ##\vec{x_i}##, but I don't know the function.

I am trying to find constraints on ##x_i##. I want to find these constraints to create a subset of my dataset with a maximum ##\sum z_i##.

Even if I knew ##z_i## initially, if I wanted to find a function to predict ##z_i## from ##x_i##, it would still not be a contradiction. It can give you useful information about ##z_i##.
 
  • #26
fahraynk said:
I want to predict ##z_i## on samples of which I don't know ##z_i##.

You can't, unless there is some pattern in the samples. But whether there is a pattern in the samples is a completely separate question from how to maximize the sum of the ##z_i##.

fahraynk said:
I know ##z_i## is a function of ##\vec{x_i}##, but I don't know the function.

Then, again, the only way to find it out is to look for some pattern in the samples; in short, trial and error. But, again, that's a completely separate question from how to maximize the sum of the ##z_i##.

fahraynk said:
I am trying to find constraints on ##x_i##. I want to find these constraints to create a subset of my dataset with a maximum ##\sum z_i##.

Finding the subset of the dataset you already have that maximizes ##\sum z_i## is easy: as I said before, just throw out all the samples with negative ##z_i##.

Finding a set of constraints on the ##\vec{x}_i## that pick out that particular subset of the samples is a matter of trial and error; there is no algorithm for doing it. Plus, even if you found such a function on the known samples, you would have no way of knowing whether the same function also works for the samples you're going to get in the future.

Also, finding constraints on the ##\vec{x}_i## that pick out the subset of known samples that maximizes ##\sum z_i## is a separate question from finding the function ##\vec{x}_i \rightarrow z_i## that let's you compute ##z_i## for each sample. Finding the latter is also a process of trial and error, as noted above, but it's a different process.
 

1. What is an algorithm?

An algorithm is a set of instructions or rules that are used to solve a problem or perform a specific task. In computer science, algorithms are used to process and manipulate data.

2. How do algorithms optimize data?

Algorithms optimize data by analyzing and organizing the data in a way that makes it easier and more efficient to access, process, and retrieve information. They can also identify patterns and make predictions based on the data.

3. Can algorithms be used for any type of data?

Yes, algorithms can be used for any type of data, including numerical data, text data, images, and more. Different types of data may require different algorithms to optimize them effectively.

4. How do I choose the right algorithm for my data?

Choosing the right algorithm for your data depends on the specific problem you are trying to solve and the type of data you have. It is important to research and understand different algorithms and their capabilities before selecting one for your data.

5. Are there any potential drawbacks to using algorithms to optimize data?

While algorithms can greatly improve the efficiency and accuracy of data optimization, they are not infallible. Depending on the quality and quantity of the data, algorithms may produce incorrect or biased results. It is important to regularly evaluate and adjust algorithms to ensure they are still effectively optimizing the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
455
  • Programming and Computer Science
Replies
2
Views
713
  • Programming and Computer Science
Replies
30
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
997
  • General Math
Replies
5
Views
829
  • Science and Math Textbooks
Replies
4
Views
645
  • General Math
Replies
6
Views
779
Replies
27
Views
1K
  • Topology and Analysis
Replies
1
Views
969
Replies
22
Views
2K
Back
Top