Evaluating similarity between two subsets

  • Thread starter Thread starter bagatur
  • Start date Start date
  • Tags Tags
    Subsets
AI Thread Summary
The discussion revolves around evaluating the similarity between two subsets of IDs, specifically in the context of clinical research. The user seeks a standard method to determine how different the distributions of two subsets are, given that they have been split into two groups. They describe a method of assigning values to IDs in each subset and calculating a scalar product to measure similarity, where a value of 1 indicates identical subsets and -1 indicates complete dissimilarity. The user inquires if there is a widely accepted term for this calculation, such as a similarity or correlation coefficient. The conversation highlights the need for robust statistical measures to assess subset similarity in clinical research contexts.
bagatur
Messages
2
Reaction score
0
i have the following problem that i can't figure out.
i have a set ID's which i pseudo-randomly split into 2 subsets A and B. let's say for the sake of simplicity i did it twice only, so i have subsets A&B and A`&B`. the sizes in the different splittings are the same - not sizes of A and B, but sizes of A and A`, and of B and B`.
what i need to know is if there is a standard way of telling how different those distributions are. i just want to make sure that subset A is not different from A` or B` by just a couple of ID's (sizes of A and B differ by 1 only).
 
Physics news on Phys.org
Well, if you don't have a distance measure between pairs of ID's--if two ID's are either equal or unequal, no shades of gray--then your problem is simple. Define the similarity between two subsets to be the size of their intersection. If you want to compare subsets of different sizes, one possibility is to define the similarity of A and B to be 2 |A n B| / (|A| + |B|).
 
thank you for your reply!
i don't think i was clear about my problem, but what you suggested is something what i tried to do.
ID's are all different from each other (we do clinical research and ID's are identifying numbers of our volunteers).
what i did was assign number 1 to an ID if it is in the subgroup A and -1 if it is in B. then i did the same for the second selection - assign 1 if the ID in the subgroup A1 and -1 if it is in B1.
so i get vectors for 2 different selections, let's name them C and C1, consisting of 1's and -1's.
then i just take a scalar product and divide by the total number of ID's.
in the extreme cases, if A is same as A1 and B is same as B1, we get 1; and if A is same as B1 and B is same as A1, we get -1. the closer this value to 0 is, the more they differ.
now is there some widely accepted name for this kind of calculation? like a similarity or correlation coefficient? or something else?
 
I'm taking a look at intuitionistic propositional logic (IPL). Basically it exclude Double Negation Elimination (DNE) from the set of axiom schemas replacing it with Ex falso quodlibet: ⊥ → p for any proposition p (including both atomic and composite propositions). In IPL, for instance, the Law of Excluded Middle (LEM) p ∨ ¬p is no longer a theorem. My question: aside from the logic formal perspective, is IPL supposed to model/address some specific "kind of world" ? Thanks.
I was reading a Bachelor thesis on Peano Arithmetic (PA). PA has the following axioms (not including the induction schema): $$\begin{align} & (A1) ~~~~ \forall x \neg (x + 1 = 0) \nonumber \\ & (A2) ~~~~ \forall xy (x + 1 =y + 1 \to x = y) \nonumber \\ & (A3) ~~~~ \forall x (x + 0 = x) \nonumber \\ & (A4) ~~~~ \forall xy (x + (y +1) = (x + y ) + 1) \nonumber \\ & (A5) ~~~~ \forall x (x \cdot 0 = 0) \nonumber \\ & (A6) ~~~~ \forall xy (x \cdot (y + 1) = (x \cdot y) + x) \nonumber...
Back
Top