# Can someone recommend an algorithm to optimize this data

#### fahraynk

I have a data set of samples, and I made up some variables that filter out unwanted samples from the data set.

Say each sample has values $\vec{x}=x_1, x_2,..., x_n$, and I know the values of $\vec{x}$ for each sample.
Also, if I sum up all of the samples, I get a value, $Z$, which tells me something about the samples.

Now I want to find a series of parameters, $a_i, b_i$ such that :
$$(a_1<x_1<b_1),\,\,\,\, (a_2<x_2<b_2)\,\,\,\, ... \,\,\,\,(a_n<x_n<b_n)$$

These parameters essentially filter out unwanted samples, to affect the value $Z$. So, for example, sample $1$ has $x_1=0$, so if I set $0<x_1<2$, then sample$1$ is left out, and only samples with $x_1 \in (0,2)$ are included.

How can I optimize the parameters $a_i, b_i$, such that I get the best value of $Z$ ?

$Z$ is interesting also, I want the highest $Z$ with the largest number of samples $(N)$, so it's probably better to multiply $Z*N$ and fit for this.

I don't think this is a standard regression type of problem. I tried a brute force solution, but it will take way too long. Can anyone recommend an algorithm, or a new way to represent the problem so that it can be solved by some algorithm?

Related Programming and Computer Science News on Phys.org

#### PeterDonis

Mentor
How can I optimize the parameters $a_i$, $b_i$, such that I get the best value of $Z$ ?
By "best" do you mean "highest"?

Also, can any of the $x_i$ be negative? If not, it should be obvious that removing any samples will decrease $Z$.

I want the highest $Z$ with the largest number of samples $(N)$, so it's probably better to multiply $Z*N$ and fit for this.
You need to pick what you want to optimize for. Optimizing for the highest $Z$ is not the same as optimizing for the highest $Z * N$. You can only optimize for one thing, so you need to pick which.

#### fahraynk

By "best" do you mean "highest"?
You need to pick what you want to optimize for. Optimizing for the highest $Z$ is not the same as optimizing for the highest $Z * N$. You can only optimize for one thing, so you need to pick which.
Let me define $Z$ better.
Let each sample $i$ have a value, $z_i$, then $Z$ is the average of $z$, or $$Z=\frac{\sum_1^N z_i}{N} \,\,\text{where N is the number of samples}$$

It's my hypothesis that the values in $\vec{x_i}$ for each sample say something about the value of $z_i$.
$a_i, b_i, z_i$ can be positive or negative as well.

I am interested in both $Z*N$ and $\text{max}(Z)$. Somewhat more interested in $Z*N$ though.

#### PeterDonis

Mentor
Let each sample $i$ have a value, $z_i$,
I thought you said each sample had multiple values $x_i$. Is $z_i$ just one of those, or an average of them, or a sum of them, or what?

#### fahraynk

I thought you said each sample had multiple values $x_i$. Is $z_i$ just one of those, or an average of them, or a sum of them, or what?
$z_i = f(\vec{x_i})$, the problem is the $x_i$ are somewhat piecewise, so I don't think I can do a simple regression. For example, say $x_1 \in [-22,-18] \cup [+18,+22]$, and $z_i$ might be higher in the cases where $x_i\in[-22,-20] \cup [20,22]$.

#### PeterDonis

Mentor
$z_i = f(\vec{x_i})$,
Ok, so $z_i$ is just some number computed from the vector $\vec{x}_i$. And $Z$ is the average of all those numbers. Got it.

I am interested in both $Z*N$ and $\text{max}(Z)$. Somewhat more interested in $Z*N$ though.
From the above, $Z * N$ would just be the sum of all the $z_i$. Are you saying you're interested in how to maximize that sum?

#### fahraynk

Ok, so $z_i$ is just some number computed from the vector $\vec{x}_i$. And $Z$ is the average of all those numbers. Got it.

From the above, $Z * N$ would just be the sum of all the $z_i$. Are you saying you're interested in how to maximize that sum?
Yeah, I would like to find that sum, $ZN$, and as another problem find the maximum $Z$

#### PeterDonis

Mentor
I would like to find that sum, $ZN$,
Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coeficients that maximizes the sum $ZN$?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either $ZN$ or $Z$: you can always just pick larger coefficients since there is no largest real number.

#### fahraynk

Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coefficients that maximizes the sum $ZN$?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either $ZN$ or $Z$: you can always just pick larger coefficients since there is no largest real number.
Yes, I want to find the coefficients that maximize the sum, and it would be nice to find the coefficients that maximize $Z$ as well.
The coefficients can be anything, but if you make them too weird then there are no samples which match them. So for example if I set $x_1>1000$ then $N\rightarrow 0$

#### PeterDonis

Mentor
The coefficients can be anything, but if you make them too weird then there are no samples which match them.
Where are the samples coming from?

if I set $x_1>1000$ then $N \rightarrow 0$
Why?

#### fahraynk

Where are the samples coming from?

Why?
Think of it like a population. if $x_1$ is the amount of money a rich person makes for them to be able to buy a car, $x_1>100,000,000,000$ will have no people if no one in the population makes that much money.

#### PeterDonis

Mentor
if $x_1$ is the amount of money a rich person makes for them to be able to buy a car, $x_1>100,000,000,000$ will have no people if no one in the population makes that much money.
This doesn't answer my question. Where are the samples coming from? What kind of data? Where from?

#### fahraynk

Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?

I'm looking for an algorithm to create a subset $S^* \subset S$ , where $S$ is the set of all samples in dataset.

S is of size $N>n$, $S^*$ is of size $n$.

{$A_i$}$_1^N$ is the sequence of all samples in $S$. Each $A_i$ has a list of properties $\vec{x_i}=x_1,x_2,...,x_n$, and $z_i=f(\vec{x_i})$.

$x, z \in R$

I want to create a subset $S^*$ that maximizes $\sum_1^n z_i$, a sum over all $A_i \in S^*$

For example, if I create a subset, $S^*=${$A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3$}, the sum, $\sum_1^n z_i$, will be a certain value, say $\sum_1^n z_i=C$. I want to find the set of constraints on $\vec{x_i}$ to create the subset that maximizes $C$

#### PeterDonis

Mentor
I want to create a subset $S^*$ that maximizes $\sum_1^n z_i$, a sum over all $A_i \in S^*$
This is simple: just remove all the samples that have negative $z_i$.

You're trolling.

#### PeterDonis

Mentor
You're trolling.
No, I'm responding to what you actually wrote. You already said that $z_i$ for each sample is just a number (this number is obtained by computing some function of the numbers $x_i$ in the vector $\vec{x}_i$, but that doesn't change the fact that in the end it's just a number for each sample). So if you have $N$ samples, you have $N$ numbers that sum to some number. If you want to find a subset of the numbers that maximizes the sum, you take out any numbers which are negative. That's obvious from basic arithmetic.

If that answer isn't what you were looking for, then you need to do a better job of describing what you are actually doing and what you are trying to accomplish.

Staff Emeritus
You have written a poorly specified question, and Peter has shown more patience with it than many others would.

#### fahraynk

You have written a poorly specified question, and Peter has shown more patience with it than many others would.
I wrote the following :

I want to create a subset $S^*$ that maximizes $\sum_1^n z_i$, a sum over all $A_i \in S^*$

For example, if I create a subset, $S^*=${$A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3$}, the sum, $\sum_1^n z_i$, will be a certain value, say $\sum_1^n z_i=C$. I want to find the set of constraints on $\vec{x_i}$ to create the subset that maximizes $C$
Then, Peter wrote the following :

This is simple: just remove all the samples that have negative $z_i$.
So, I wrote I want to find the set of constraints on $\vec{x_i}$ to create the subset that maximizes $C$, where $\sum_1^n z_i=C$.
Meaning I want to find constraints on $\vec{x}$ to maximize $C$. And I am asking for an algorithm.

Peter then wrote : just remove the samples with negative $z_i$.
Does he mean by hand? I'm trying to optimize a set of constraints to do that for me. It just felt like a troll.

#### PeterDonis

Mentor
Does he mean by hand?
You don't have to do it "by hand"; you can just program the computer to do it for you.

I'm trying to optimize a set of constraints to do that for me.
Isn't $z_i > 0$ a constraint? What other constraint would you need?

#### fahraynk

I need to find constraints on $\vec{x}$ to maximize the sum of $z_i$ in the subset. You are suggesting I just choose a subset without negative $z_i$.

Yeah, I would like to, but I need to do that by finding the best constraints on $\vec{x_i}$.
Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?
For example, if I create a subset, $S^*=${$A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3$ }, the sum, $\sum_1^n z_i$, will be a certain value, say $\sum_1^n z_i=C$. I want to find the set of constraints on $\vec{x_i}$ to create the subset that maximizes $C$

#### PeterDonis

Mentor
I would like to, but I need to do that by finding the best constraints on $\vec{x_i}$.
Why would you do it in that roundabout fashion, when you already know how to compute $z_i$ for each $\vec{x}_i$? Just compute $z_i$ for each sample, and throw out the samples where it's negative. Simple algorithm. Done.

#### fahraynk

Why would you do it in that roundabout fashion, when you already know how to compute $z_i$ for each $\vec{x}_i$? Just compute $z_i$ for each sample, and throw out the samples where it's negative. Simple algorithm. Done.
Because I don't know future $z_i$. I only know $\vec{x}$ and I want to predict $z_i$. So I have a dataset of known $z_i, \vec{x_i}$, and I want to find the best constraints on $\vec{x}$ to predict $z_i$.

I stated what I wanted to do, find best subset by finding constraints on $\vec{x}$. Is my question poorly defined because I did not state why?

I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?
Because I want to make ice, if I could start off with ice, why would I ask how to freeze water?

I dunno dood. I dunno...

#### PeterDonis

Mentor
Because I don't know future $z_i$.
What "future" are you talking about? You said you have a set of samples. You can compute $z_i$ for each of them.

I only know $\vec{x}$ and I want to predict $z_i$.
This makes no sense. You know the function that computes $z_i$ from any $\vec{x}$. There's nothing to predict.

I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?
No, that's not what I'm saying. I'm saying that if you have water, and you want to freeze it, you already know how to do that: you just cool it down until it freezes.

What you're asking looks to me like "I have water and I know how to freeze it now, but I might get some different water in the future and I want to know how to predict how to freeze that." Why would you do anything different from what you do with the water you have now?

#### PeterDonis

Mentor
I have a dataset of known $z_i$, $\vec{x_i}$, and I want to find the best constraints on $\vec{x}$ to predict $z_i$.
Do you not even see the contradiction here? You say you know $z_i$, but then you say you want to predict $z_i$? Huh?

#### fahraynk

Do you not even see the contradiction here? You say you know $z_i$, but then you say you want to predict $z_i$? Huh?
I want to predict $z_i$ on samples of which I don't know $z_i$.

I have a dataset of $[\vec{x_i},z_i$] points, where I already know $z_i$.

I know $z_i$ is a function of $\vec{x_i}$, but I don't know the function.

I am trying to find constraints on $x_i$. I want to find these constraints to create a subset of my dataset with a maximum $\sum z_i$.

Even if I knew $z_i$ initially, if I wanted to find a function to predict $z_i$ from $x_i$, it would still not be a contradiction. It can give you useful information about $z_i$.

"Can someone recommend an algorithm to optimize this data"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving