N% uniqueness of x columns * y rows of output

AI Thread Summary
The discussion revolves around generating unique combinations from multiple lists while maintaining a specified percentage of uniqueness. The user seeks a method to calculate the number of combinations that meet a certain uniqueness threshold and an algorithm to generate only those combinations without exhaustively comparing all possible outputs. The challenge lies in the varying number of lists and items within them, complicating the uniqueness calculations. There is no straightforward algorithm available due to the unpredictable nature of the input lists, and any solution would likely require a complex implementation. Ultimately, the user is looking for efficient ways to handle large datasets while ensuring the desired uniqueness in output.
thallstd
Messages
3
Reaction score
0
Hi all. I need more brain power than I can muster.

This is a two part question for a software function I need to write.

I'm trying to


  • 1) pre-determine how many n% unique ways there are to intermix items from various lists and
    2) produce output of only combinations that are n% unique

For example, assume 8 list with varying items:

  • Size list (3 items): small, medium, large
    Color list (7 items): orange, blue, red, purple, yellow, green, black
    Weight list (7 items): 4oz, 12oz, 1lb, 4lb, 10lb, 15lb
    Age list (5 items): ancient, antique, very old, old, new
    Temp list (3 items): cold, warm, hot
    Animal list (6 items): dog, cat, mouse, bear, horse, zebra
    Thing list (4 items): ball, chair, TV, floor, lamp
    Action list (3 items): run, jump, hop

With these 8 lists there are (3*7*7*5*3*6*4*3) 158,760 combinations of possible unique output. ie

  • 1) small orange 4oz ancient cold dog ball run
    2) small orange 4oz ancient cold dog ball jump
    3) small orange 4oz ancient cold dog ball hop
    4) small orange 4oz ancient cold dog chair run
    5) small orange 4oz ancient cold dog chair jump
    6) small orange 4oz ancient cold dog chair hop
    ... etc

Each list occupies one column in the ouput. So each column reflects 12.5% uniqueness or sameness.

Comparing any of the first 3 outputs to each other, they are all only 12.5% unique since they are identical except for the Action column - run, jump, hop. The same is true if you compare outputs 4 thru 6 with each other.

If we compare output 1 to output 4 it is again only 12.5% unique since it varies only by the Thing column - ball vs chair. If we compare output 1 to outputs 5 or 6, it is 25% unique - the first 6 columns are identical, only the last 2 of output 1 differ from the last 2 of outputs 5 & 6 - ball run vs chair jump and chair hop.

If I were looking for combinations that were at least 25% unique, from these first 6 I could choose (1,5), (1,6), (2,4), (2,6), (3,4) or (3,5).

The first question: Is there is a formula, recursive or not, that can determine how many combinations there are that would be at least n% unique? It would need to work with varied data sets, each having a different number of lists numbering in the hundreds with each list containing from 1 to hundreds, possibly thousands of items.

The second question: Is there an algorithm that would allow the generation of only n% unique output? So instead of generating all possible combinations then going back and comparing them (as I did manually in the example above), to determine as each output row is being generated which columns should be populated with which item from that list.

My fall-back is to generate all possible combinations then go through and programmatically compare each output to all the others, which could take quite a long time to run. Any assistance that could speed up that effort would be greatly appreciated.
 
Physics news on Phys.org
You've done a better job than most people who ask real world questions in math forums, but a few things need to be clarified.

When you ask for the number of "combinations" of these property lists that meet the requirement that they differ by exactly k columns, do you mean "combinations of two"? Or are you interested in larger groupings ? 3? 4? etc.?

( It's simpler to talk about the number of entries where these lists differ than to use a percentage measure.)
 
Hi Stephen,

By combinations I mean output rows, where each row has one column for each list and contains one element from that list.

The # of columns that must be unique will vary depending on how many lists there are and what % of uniqueness is required.

If there are 100 lists and 40% uniqueness is asked for, then each output row would have to differ from every other output row by at least 40 columns. If 200 lists and 40% then 80 columns... etc.
 
thallstd said:
Hi Stephen,

By combinations I mean output rows, where each row has one column for each list and contains one element from that list.

You still aren't making it clear whether "output rows" means two rows (i.e. a pair of rows, as in your example) or whether it might mean a group of 3 rows, each of which differed by, say, 2 columns from the other rows in the group.
 
It is n rows.

If I have 100 lists that average 25 entries each and someone want to generate all possible rows that are 40% unique, how many rows is that?

Question 1: how to compute n.

Question 2. can each column of each row be algorithmically determined?
 
I don't know if you intended to answer my question.

As I understand your problem, there is no simple algorithm to determine the number of combinations of lists that have a given property since we don't know in advance what types of lists will exist. The lists that you get as input presumably won't have an example of every possible list that could exist.

A computer program would have to duplicate the work that you did manually. This is certainly possible. Any sane implementation of it is going to use a variety of different functions, so it won't look like one simple algorithm.
 
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top