Can someone recommend an algorithm to optimize this data

fahraynk · Nov 30, 2018

I have a data set of samples, and I made up some variables that filter out unwanted samples from the data set.

Say each sample has values ##\vec{x}=x_1, x_2,..., x_n##, and I know the values of ##\vec{x}## for each sample.
Also, if I sum up all of the samples, I get a value, ##Z##, which tells me something about the samples.

Now I want to find a series of parameters, ##a_i, b_i## such that :
$$
(a_1<x_1<b_1),\,\,\,\, (a_2<x_2<b_2)\,\,\,\, ... \,\,\,\,(a_n<x_n<b_n)
$$

These parameters essentially filter out unwanted samples, to affect the value ##Z##. So, for example, sample ##1## has ##x_1=0##, so if I set ##0<x_1<2##, then sample##1## is left out, and only samples with ##x_1 \in (0,2)## are included.

How can I optimize the parameters ##a_i, b_i##, such that I get the best value of ##Z## ?

##Z## is interesting also, I want the highest ##Z## with the largest number of samples ##(N)##, so it's probably better to multiply ##Z*N## and fit for this.

I don't think this is a standard regression type of problem. I tried a brute force solution, but it will take way too long. Can anyone recommend an algorithm, or a new way to represent the problem so that it can be solved by some algorithm?

PeterDonis · Nov 30, 2018

fahraynk said:

How can I optimize the parameters ##a_i##, ##b_i##, such that I get the best value of ##Z## ?

By "best" do you mean "highest"?

Also, can any of the ##x_i## be negative? If not, it should be obvious that removing any samples will decrease ##Z##.

fahraynk said:

I want the highest ##Z## with the largest number of samples ##(N)##, so it's probably better to multiply ##Z*N## and fit for this.

You need to pick what you want to optimize for. Optimizing for the highest ##Z## is not the same as optimizing for the highest ##Z * N##. You can only optimize for one thing, so you need to pick which.

fahraynk · Nov 30, 2018

Thank you for your reply,

PeterDonis said:

By "best" do you mean "highest"?
You need to pick what you want to optimize for. Optimizing for the highest ##Z## is not the same as optimizing for the highest ##Z * N##. You can only optimize for one thing, so you need to pick which.

Let me define ##Z## better.
Let each sample ##i## have a value, ##z_i##, then ##Z## is the average of ##z##, or $$Z=\frac{\sum_1^N z_i}{N} \,\,\text{where N is the number of samples}$$

It's my hypothesis that the values in ##\vec{x_i}## for each sample say something about the value of ##z_i##.
##a_i, b_i, z_i## can be positive or negative as well.

I am interested in both ##Z*N## and ##\text{max}(Z)##. Somewhat more interested in ##Z*N## though.

PeterDonis · Nov 30, 2018

fahraynk said:

Let each sample ##i## have a value, ##z_i##,

I thought you said each sample had multiple values ##x_i##. Is ##z_i## just one of those, or an average of them, or a sum of them, or what?

fahraynk · Nov 30, 2018

PeterDonis said:

I thought you said each sample had multiple values ##x_i##. Is ##z_i## just one of those, or an average of them, or a sum of them, or what?

##z_i = f(\vec{x_i})##, the problem is the ##x_i## are somewhat piecewise, so I don't think I can do a simple regression. For example, say ##x_1 \in [-22,-18] \cup [+18,+22]##, and ##z_i## might be higher in the cases where ##x_i\in[-22,-20] \cup [20,22]##.

PeterDonis · Nov 30, 2018

fahraynk said:

##z_i = f(\vec{x_i})##,

Ok, so ##z_i## is just some number computed from the vector ##\vec{x}_i##. And ##Z## is the average of all those numbers. Got it.

fahraynk said:

I am interested in both ##Z*N## and ##\text{max}(Z)##. Somewhat more interested in ##Z*N## though.

From the above, ##Z * N## would just be the sum of all the ##z_i##. Are you saying you're interested in how to maximize that sum?

fahraynk · Nov 30, 2018

PeterDonis said:

Ok, so ##z_i## is just some number computed from the vector ##\vec{x}_i##. And ##Z## is the average of all those numbers. Got it.
From the above, ##Z * N## would just be the sum of all the ##z_i##. Are you saying you're interested in how to maximize that sum?

Yeah, I would like to find that sum, ##ZN##, and as another problem find the maximum ##Z##

PeterDonis · Nov 30, 2018

fahraynk said:

I would like to find that sum, ##ZN##,

Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coeficients that maximizes the sum ##ZN##?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either ##ZN## or ##Z##: you can always just pick larger coefficients since there is no largest real number.

fahraynk · Nov 30, 2018

PeterDonis said:

Just "finding the sum" is easy: compute it. Do you mean you want to find the combination of coefficients that maximizes the sum ##ZN##?

Also, what constraints are there on the coefficients? If the coefficients can be any real numbers whatever, there is no way to maximize either ##ZN## or ##Z##: you can always just pick larger coefficients since there is no largest real number.

Yes, I want to find the coefficients that maximize the sum, and it would be nice to find the coefficients that maximize ##Z## as well.
The coefficients can be anything, but if you make them too weird then there are no samples which match them. So for example if I set ##x_1>1000## then ##N\rightarrow 0##

PeterDonis · Nov 30, 2018

fahraynk said:

The coefficients can be anything, but if you make them too weird then there are no samples which match them.

Where are the samples coming from?

fahraynk said:

if I set ##x_1>1000## then ##N \rightarrow 0##

Why?

fahraynk · Nov 30, 2018

PeterDonis said:

Where are the samples coming from?
Why?

Think of it like a population. if ##x_1## is the amount of money a rich person makes for them to be able to buy a car, ##x_1>100,000,000,000## will have no people if no one in the population makes that much money.

PeterDonis · Dec 1, 2018

fahraynk said:

if ##x_1## is the amount of money a rich person makes for them to be able to buy a car, ##x_1>100,000,000,000## will have no people if no one in the population makes that much money.

This doesn't answer my question. Where are the samples coming from? What kind of data? Where from?

I can't help you if I don't understand where your data is coming from.

fahraynk · Dec 1, 2018

Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?

I'm looking for an algorithm to create a subset ##S^* \subset S## , where ##S## is the set of all samples in dataset.

S is of size ##N>n##, ##S^*## is of size ##n##.

{##A_i##}##_1^N## is the sequence of all samples in ##S##. Each ##A_i## has a list of properties ##\vec{x_i}=x_1,x_2,...,x_n##, and ##z_i=f(\vec{x_i})##.

##x, z \in R##

I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3##}, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##

PeterDonis · Dec 1, 2018

fahraynk said:

I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

This is simple: just remove all the samples that have negative ##z_i##.

fahraynk · Dec 1, 2018

You're trolling.

PeterDonis · Dec 1, 2018

fahraynk said:

You're trolling.

No, I'm responding to what you actually wrote. You already said that ##z_i## for each sample is just a number (this number is obtained by computing some function of the numbers ##x_i## in the vector ##\vec{x}_i##, but that doesn't change the fact that in the end it's just a number for each sample). So if you have ##N## samples, you have ##N## numbers that sum to some number. If you want to find a subset of the numbers that maximizes the sum, you take out any numbers which are negative. That's obvious from basic arithmetic.

If that answer isn't what you were looking for, then you need to do a better job of describing what you are actually doing and what you are trying to accomplish.

Vanadium 50 · Dec 1, 2018

You have written a poorly specified question, and Peter has shown more patience with it than many others would.

fahraynk · Dec 1, 2018

Vanadium 50 said:

You have written a poorly specified question, and Peter has shown more patience with it than many others would.

I wrote the following :

fahraynk said:

I want to create a subset ##S^*## that maximizes ##\sum_1^n z_i##, a sum over all ##A_i \in S^*##

For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3##}, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##

Then, Peter wrote the following :

PeterDonis said:

This is simple: just remove all the samples that have negative ##z_i##.

So, I wrote I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##, where ##\sum_1^n z_i=C##.
Meaning I want to find constraints on ##\vec{x}## to maximize ##C##. And I am asking for an algorithm.

Peter then wrote : just remove the samples with negative ##z_i##.
Does he mean by hand? I'm trying to optimize a set of constraints to do that for me. It just felt like a troll.

PeterDonis · Dec 1, 2018

fahraynk said:

Does he mean by hand?

You don't have to do it "by hand"; you can just program the computer to do it for you.

fahraynk said:

I'm trying to optimize a set of constraints to do that for me.

Isn't ##z_i > 0## a constraint? What other constraint would you need?

fahraynk · Dec 1, 2018

I need to find constraints on ##\vec{x}## to maximize the sum of ##z_i## in the subset. You are suggesting I just choose a subset without negative ##z_i##.

Yeah, I would like to, but I need to do that by finding the best constraints on ##\vec{x_i}##.

fahraynk said:

Do you know any algorithms that sound like they might help? Do you think something like support vector machine could help?
For example, if I create a subset, ##S^*=##{##A_i : x_1 \in [-30,-20.5] \cup [20.5,30], ||\vec{x_i}||>3## }, the sum, ##\sum_1^n z_i##, will be a certain value, say ##\sum_1^n z_i=C##. I want to find the set of constraints on ##\vec{x_i}## to create the subset that maximizes ##C##

PeterDonis · Dec 1, 2018

fahraynk said:

I would like to, but I need to do that by finding the best constraints on ##\vec{x_i}##.

Why would you do it in that roundabout fashion, when you already know how to compute ##z_i## for each ##\vec{x}_i##? Just compute ##z_i## for each sample, and throw out the samples where it's negative. Simple algorithm. Done.

fahraynk · Dec 1, 2018

PeterDonis said:

Why would you do it in that roundabout fashion, when you already know how to compute ##z_i## for each ##\vec{x}_i##? Just compute ##z_i## for each sample, and throw out the samples where it's negative. Simple algorithm. Done.

Because I don't know future ##z_i##. I only know ##\vec{x}## and I want to predict ##z_i##. So I have a dataset of known ##z_i, \vec{x_i}##, and I want to find the best constraints on ##\vec{x}## to predict ##z_i##.

I stated what I wanted to do, find best subset by finding constraints on ##\vec{x}##. Is my question poorly defined because I did not state why?

I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?
Because I want to make ice, if I could start off with ice, why would I ask how to freeze water?

I don't know dood. I dunno...

PeterDonis · Dec 1, 2018

fahraynk said:

Because I don't know future ##z_i##.

What "future" are you talking about? You said you have a set of samples. You can compute ##z_i## for each of them.

fahraynk said:

I only know ##\vec{x}## and I want to predict ##z_i##.

This makes no sense. You know the function that computes ##z_i## from any ##\vec{x}##. There's nothing to predict.

fahraynk said:

I ask you, what is the best way to freeze water? You say, well why don't you just start off with ice?

No, that's not what I'm saying. I'm saying that if you have water, and you want to freeze it, you already know how to do that: you just cool it down until it freezes.

What you're asking looks to me like "I have water and I know how to freeze it now, but I might get some different water in the future and I want to know how to predict how to freeze that." Why would you do anything different from what you do with the water you have now?

PeterDonis · Dec 1, 2018

fahraynk said:

I have a dataset of known ##z_i##, ##\vec{x_i}##, and I want to find the best constraints on ##\vec{x}## to predict ##z_i##.

Do you not even see the contradiction here? You say you know ##z_i##, but then you say you want to predict ##z_i##? Huh?

fahraynk · Dec 1, 2018

PeterDonis said:

Do you not even see the contradiction here? You say you know ##z_i##, but then you say you want to predict ##z_i##? Huh?

I want to predict ##z_i## on samples of which I don't know ##z_i##.

I have a dataset of ##[\vec{x_i},z_i##] points, where I already know ##z_i##.

I know ##z_i## is a function of ##\vec{x_i}##, but I don't know the function.

I am trying to find constraints on ##x_i##. I want to find these constraints to create a subset of my dataset with a maximum ##\sum z_i##.

Even if I knew ##z_i## initially, if I wanted to find a function to predict ##z_i## from ##x_i##, it would still not be a contradiction. It can give you useful information about ##z_i##.

PeterDonis · Dec 1, 2018

fahraynk said:

I want to predict ##z_i## on samples of which I don't know ##z_i##.

You can't, unless there is some pattern in the samples. But whether there is a pattern in the samples is a completely separate question from how to maximize the sum of the ##z_i##.

fahraynk said:

I know ##z_i## is a function of ##\vec{x_i}##, but I don't know the function.

Then, again, the only way to find it out is to look for some pattern in the samples; in short, trial and error. But, again, that's a completely separate question from how to maximize the sum of the ##z_i##.

fahraynk said:

I am trying to find constraints on ##x_i##. I want to find these constraints to create a subset of my dataset with a maximum ##\sum z_i##.

Finding the subset of the dataset you already have that maximizes ##\sum z_i## is easy: as I said before, just throw out all the samples with negative ##z_i##.

Finding a set of constraints on the ##\vec{x}_i## that pick out that particular subset of the samples is a matter of trial and error; there is no algorithm for doing it. Plus, even if you found such a function on the known samples, you would have no way of knowing whether the same function also works for the samples you're going to get in the future.

Also, finding constraints on the ##\vec{x}_i## that pick out the subset of known samples that maximizes ##\sum z_i## is a separate question from finding the function ##\vec{x}_i \rightarrow z_i## that let's you compute ##z_i## for each sample. Finding the latter is also a process of trial and error, as noted above, but it's a different process.

Can someone recommend an algorithm to optimize this data

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Can someone recommend an algorithm to optimize this data

Similar threads