Multidimensional machine learning

caseybasichis · May 12, 2012

Hi,

I am working on a project that involves classifying many 'situations' where there are some number of objects and the objects can be defined by the makeup and weighting of their parameters. There are probably 200-2000 parameters per object, 1 to 100 objects and hundreds of situations.

The situations may all be very different from each other and are graded by an unknown process -- no scheme can be assumed.

The goal is to generate new combinations of objects, where only objects above a grade threshold are selected and the combinations are derived from correlations between similar situations, grade them and so on.

I've been looking at c++ libraries like Shark Machine Learning and Classias

*sorry I can't include links yet, they are both at the tip top of google*

I am having trouble understanding what approaches are most appropriate for the problem. I am new to this which I suppose is apparent in the terminology.

I would really appreciate any thoughts on which, if any, of the approaches in those libraries would be appropriate?

Ultimately I would like to try any approaches that would be appropriate to find the best results.

chiro · May 12, 2012

Hey caseybasichis and welcome to the forums.

I'm afraid that you will need to be a little more specific. Your threshold seems to indicate some quantitative minimum of some sort, but what this pertains to is completely uncertain for the readers here.

Also one would need to know how you are creating new objects. The actual method of creating new objects is vital for your ability to get the results. Knowing how to first combine objects and knowing what kind of properties you get upon combination will be critical to how you analyze your data (in terms of techniques) and how you take it from there.

Even if you don't know the exact combination method per se (and are trying to find it), you still need to know what combining objects do in terms of the final objects parameters (which I imagine you have).

Maybe you could quantify what you are trying to achieve (or even qualify) with a bit more specificity because it is very hard to give specific advice for a highly non-specific question.

caseybasichis · May 12, 2012

Hi,

I am a composer. This is related to music.

I have a time based structure whose narrative is described with a number of continuous -1...1 parameters that vary over a duration. These are meta parameters and the range is a weight for the parameters relevance in the context of the sections position in the narrative and musical structures.

The situations span a section. A section may have many situations with only one active at a time. Situations are graded as a whole in terms of aesthetics according to weighted meta parameters. The situations are a hierarchical combination of modules with hundreds, sometimes thousands of parameters. An early level of the hierarchy is split into 1 to 50 parts which may be ordered in terms of preference as a secondary sort of meta tag. Each of those branches at that split level stores a 0...1f value of normalized divisions - so a branch 25 of 100 can be compared to 5 of 20. So the highest rated branches will always be 1.0 etc...

One situation is active in a section at a time -- the rest are candidates and are used as training data weighted in some proportion against the active data and may also be swapped out with other situations from the same section in instances where it might allow pattern propagation across setting parameters in adjacent sections with in a range... So if patterns are detected they are encouraged and propagated - possibly by swapping out the situation with an existing inactive situation from that section if it fulfills a greater number patterns -- or by some other reasonable method.

Additionally there are many other parametric measurements of the situations internal and surrounding musical environment. These are defining characteristics of the music.

This hierarchy can be generated any number of ways but is more or less equally distributed currently. In working with these sorts of libraries I am thinking that there will be processes to integrating the generation stage from the Machine Learning library itself, for whichever approaches I end up taking. I would like to have control in steering towards or away from a target

There 2-10 input parameters for defining the narrative. There are around 20 situations in a section. There are around a hundred sections but that might increase to 400 or more. The pattern influence area for finding and propagating sections would be 0-16 adjacent sections on either side. In each situation exists a hierarchy of modules and parameters with thousands of nodes -- these parameters are settings not meta parameters. The situation will likely have 1-5 weighted parameters - one of which is a standard grade that all situations have.

Ultimately other elements can have meta parameters attached as well and a smattering of other data representing facets of the situation -- with the hopes of tagging as many identifiers to the character as possible.

The process starts with no situations. The meta input data is in place and some of the musical contextual data that exists in and around where the situation is going to occur already exists, so of it is a result of the situation hierarchy and is updated by that.

Situations are generate and weighted with meta parameters. As they accumulate they should influence the the later generations. I am looking to choose a full set of parameters for each situation depending a modular set of motives, primarily: detecting and propagating patterns found in parameter data across sections in active situations and generating settings that were common with or related to heavily weighted situations are the two main interests, but I would also like to know more about what possibilities the tools introduce.

I'd like to know the best bet, but I am interested in experimenting with the aesthetic effect of the bias that one approach might apply over another in the process but I want to avoid avenues that are not suitable.So to clarify, the parts are --
the structure and input meta parameters of the absolute time base

the situations which may be graded by a number of meta parameters that may correspond to the input tags

the hierarchy of modules and parameters - the system should try to preserve larger chunks of the hierarchy while satisfying its other conditions.I hope that is clear enough. Let me know if there something I need to clarify. I don't know the jargon yet.

Thanks for taking a look at this, I've been reading about it for the last few days but I could really use some guidance.

chiro · May 12, 2012

I think I understand what you are trying to do, but I have a few questions in terms of the specifics.

They boil down to this: do you want to want to create a model with your training data based on some specific categorization that you have provided with your training data or do you want the algorithm to create it's own clustering based on some kind of 'hint' you provide or some other information?

From what you have said about the weights, it seems that you may want to give the weights as a foundation for it to build a training model, but I'm not sure because you also said that you want to generate different processes for different scenarios.

The question above basically boils down to what kind of patterns (and forgive me if this an obvious answer from your post) you are looking for in terms of whether these are 'hard' or 'soft'. The first situation basically (the 'hard') says that you have definitive classifications and that your algorithm will train the model for this, and the 'soft' one says that you introduce leighway for the 'algorithm' to create its own classifications and then you can look at those.

The difference is subtle, but in the jargon, the 'hard' one is known as classification and the 'soft' one is known as clustering. If you classify, you are telling the algorithm specifically what the classes are and the algorithm will use that in various ways to take your training data and build the model to classify untrained data.

In the clustering example it is a little different. The process will do anything from completely unguided clustering to semi-supervised clustering. The unsupervised clustering will typically use general algorithms to see whatever it thinks a 'good pattern' is. Good techniques for this usually involve probabilistic techniques including entropy and conditional probability (Bayesian methods).

If you have a semi-supervised, then this means that you give some hard criteria but leave the algorithm to do the rest.

I'm not sure, but it seems you want to generate different processes with different sets of criteria which in my mind seems semi-supervised.

The thing that you have to do is to quantify exactly the criteria for this semi-supervisation. I don't know what your mathematical background is, but all this means is coming up with the conditions mathematically for your constraints to look for at the most general level.

What I mean by the above is that you if you don't provide any constraints, then basically what you end up with is an unsupervised classification (completely unsupervised) which means you will have no control over what you want.

If you specify mathematically the 'maximum' flexibility of what you want the algorithm to find, then we can help you figure out precisely and specifically what to actually do not only terms of the algorithms and packages, but also in the parameters you pass to those packages.

Also you should know that this is not a trivial step for anyone and things change as you get new results and as a consequence of lack of foresight in thinking about these things.

If you can mathematically tell us the minimum number of things that you 'must have' in terms of what you have to be enforced, then this will eventually lead you to the solution you acquire.

This probably going to be an absolute ***** of a thing for you given your dataset, but you can do it in steps. You can start by looking at small numbers of variables together, and then build up the model like building up a house, get rid of things that are unnecessary and finally you will get to a situation that you can start your data mining journey.

Stephen Tashi · May 12, 2012

caseybasichis said:

Hi,

I am working on a project that involves classifying many 'situations'

The goal is to generate new combinations of objects, where only objects above a grade threshold are selected and the combinations are derived from correlations between similar situations, grade them and so on.

Grading objects by assigning them a rating or rank is a different problem than merely classifying them. Grading objects according to a single number is a different problem than rating them by a vector of numbers, each of which grades the object on a different aspect. You haven't described your goal clearly.

In my opinion, if you need a mathematical consultant for a real world problem, you shouldn't try to abstract the mathematical details of the problem yourself. Just state the real world problem. The hazy picture that I get of the real world problem is that you want to write software to compose music.

Your links:
Classias software: http://www.chokkan.org/software/classias/
Shark Machine Learning: http://shark-project.sourceforge.net/

caseybasichis · May 12, 2012

It's not really a system that composes music, its a bit of everything but. It is not relying on specific music theory in anyway or trying to adhere to a set of static rules.

So I from what I understand from your response, the algorithm creates its own clustering using soft patterns. It is semisupervised but there is no numerical gauge of accuracy. I am generating different process but their output destinations are the same and the system isn't concerned about how the parameters were made other than the fact that the settings of the generation algorithms themsleves represet parameters that embody patterns that can be included with the rest.

The 5 input meta parameters are essentially tags with positive and negative weights. I won't actually know the reason why a situation was tagged -- it could be for arbitrary reasons that I can't predict. A continuos -1..1 'good' curve that spans the time duration of the narrative is an example.

The language will be linked like a hierarchical thesaurus so fast < speedy < ludicrous-speed - these relationships are also defined for unknown reasons.

The input value tag weighting acts as a filter to determine what sitations are elidigble for inclusion for trimming source material for new situations. Situations where similar simple patterns between values are found between adjacent sections are preffered over situations with less pattern correspondence amongst the parameters.

Taking the above 'speed,' I don't want any specific model as to how to make speed faster, but I want more general models on how to derive new values from the surrounding environment. As more iterations are tagged and weighted the system should adapt to avoid certain parameter settings in the hierarchy based on the previous tagging of the sitations in the current and similar sections over many interations. There are a few main approaches, which are selected with a parameter that is itself within that system of derivation.

The first method is to trim and copy hierarchical branches from situations who have meta parameters and musical contexts whose values and value patterns are similar to the current context. So If speed is positively weighted, it might pull a large branch from another distant inactive situation that is similarly weighted.

It might prefer one situation over another 'fast' situation if patterns between any values of n adjacent sections (mostly binary with few n-unique element type patterns) are discovered -- maybe something like an auto correlation value if that would be appropriate. The pattern matches are general and don't relate specifically to speed though they can. Ideally the matches could be general and handle inversions transposition and scaling but I understand that adds a lot more complexity.

A branch might also be copied, not because its pattern matches another pattern between values in adjacent section situations, but because those values from the copied branch will, to a degree, satisfy a pattern propagation between the situation that is currently being generated and the adjacent sections.

Branches might be pulled from different places, combined with previous generation values, or generated in an unknown way.

There is not a numerical way to verify accuracy. Instead the process is repeated, generating with the aim of exploring new combinations of values that may lean toward pattern trends amongst swarms of parameters, while trying minimize the occurrence of certain values based on negative weighting in the accumulated tagging.

The goal is to present a very broadspectrum and narrow it down iteritively with positive and negative weighting across different parameters. Weightings are not done to create an ideal system, weightings occur in context and have no application outside the current section and sections with tags with weighting similar input meta parameters. These weights are not like the weights of a neural network that are designed to be tuned to precices values.

That said a neural network, with its own distinct weights, might be applicable. I don't know. I just wanted to clarify what kind of weighting I am referring to.

Given this, is the 'maximum' flexibility still applicable? I am looking to use all approaches that are relavant as the approach itself is something that is parametized. I'm looking to find a handful of the most applicable ones.

I have considered getting a mathematical consultant but it this stage I want get a sense of it for my own experience as I intead to expand upon it in a number of ways. I understand that there is a giant learning curve, but I also know that a great number of these libraries are designed to be user friendly.

ClifDavis · May 18, 2012

It's not completely clear what you want the system you are building to do vs. what you want the machine learning algorithm to do.

I'm kind of guessing here, but it sounds like you may want the machine learning algorithm to learn how to apply tags to your objects or situations based on their parameters. The goal is to decide which situations match based on the tagged patterns they share.

If this understanding is correct then I think your largest and most immediate problem is that your parameters aren't a consistent number of reals, but instead the parameters are variable in number and have a hierarchical structure. Most machine learning algorithms assume that you have an input vector rather than a variable set of trees as input.

Multidimensional machine learning

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Find the Number of Triangles

High School Ant on a stretchy rope puzzle

High School Potato paradox

Geometric Game: Fun With Matches (Safe!)

Undergrad Three Circle Problem

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect