Graduate Algorithm creates representative set of data

mundek88 · May 23, 2016

Hi all,
I have algorithm to analyze and make it easier to implement in programming language (Python). We have table with data and we want to select only representative part.

It looks like:
ID_PRODUCT | CARDINALITY | SET VARIANCE WITH THIS ELEMENT AND ABOVE
10 ---------------- 110 --------------- 400
11 ---------------- 90 ---------------- 350
12 ---------------- 80 ---------------- 300
... --------------- ... ---------------- ...

* variance is calculated for cardinality columnAlgorithm works as follows:
Iterate over rows from the top of table and in each loop add new row and count variance for cardinality column. Stop iteratation if variance is equal or less than specified (so, finally we want to produce set of rows with variance bigger than X) and then return created (now representative) set

Question:
This is legacy solution and hard to say for me how we can do it better. Is there any math tool which cut away elements hardly representative? We can not statically based on the cardinality (like: just give rows with cardinality > 50) because the day-to-day can change the order of magnitude.

Thanks in advice!

.Scott · May 27, 2016

I'm guessing that English is not your first language.
In any case, I think I understand roughly what you are looking for. So I'll describe what I understand and if I miss your point, let me know.

You have a table of "N" rows and 3 columns. On field (column) is a numeric "variance" and the rows are order by descending variance.
Basically, you are looking for a function that will discover how many of these rows have a variance greater than a specified value. So a declaration of this function might look something like this: int RecordCount(array Table, int RowCount, float LowVariance)

The most obvious optimization here would be to perform a Boolean search.
For example, if the RowCount is 120, the answer will be a value of 0 to 120, so take the average of those numbers (60) and check the variance in that row.
If the variance at row 60 is less than or equal to LowVariance, then you know that your answer will be 0 to 59, so repeat the process with those numbers.
If the variance at row 60 is greater than LowVariance, then your answer will be 60 to 120, so repeat the process with those numbers.
Finally, you will have a range that is from and to the same value - which you then return as your answer.
Be careful in the way you round for the average. You want to make sure that you don't start a new iteration with the same range that you used in the previous iteration.

Graduate Algorithm creates representative set of data

Thread 'Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers