MATLAB Calculating similarity between users using matlab

  • Thread starter Thread starter Minaxi
  • Start date Start date
  • Tags Tags
    Matlab
AI Thread Summary
To calculate the similarity between users in a database of 963 individuals, a formula is used that weighs age similarity at 80%, and gender and occupation similarities at 10% each. The age similarity is computed using the formula sim(age) = 1 - [(Ai - Aj) / (agemax - agemin)], where agemax and agemin are the maximum and minimum ages in the dataset. Gender similarity is binary, returning 1 if genders match and 0 otherwise, while occupation similarity also returns 1 for matching occupations and 0 for different ones.To efficiently compute a 963x963 similarity matrix, a method using matrix operations is recommended over nested loops. The attributes should be encoded as integers for optimal storage. The code provided outlines how to create a similarity matrix by leveraging matrix multiplication and broadcasting functions, significantly speeding up the computation for large datasets.
Minaxi
Messages
1
Reaction score
0
Hi all,
I am having database of 963 users .
Records of two users are

uid gender occupation age
1 F student 23
2 M teacher 30

Now i need to calculate the similarity of each user with every other as

sim(ui,uj)=0.8*sim(age) + 0.1*sim(gender) + 0.1*sim(occupation)


where sim(age)=1-[(Ai-Aj)/(agemax-agemin)];

agemax is the maximum age and agemin is minimum age of user.

sim(gender)=1 if G1=G2 else 0
sim(occup)=1 if occupation is same .


Kindly tell me the code fir it so that i can get a matrix of similarities.(963*963)
 
Physics news on Phys.org
the data import depends on the form of your database, but assuming you can do this and obtain a 4-by-963 cell array, you can do this using a pair of for loops, eg.

Code:
sim = zeros(963);
for i = 1:962
    for j = i+1:963
        sim(i,j) = [your specific sim function]
    end
end

this should give you a strictly triangular matrix of the result of every possible pairing.
 
Hi, if I were you I'd encode all attributes as integers, so that all records can be stored as rows in a matrix of type int. I'll assume that you've done this, and your database is a matrix "db" of size [numusers 4]. I'll let you work out the details of that. First, let me give names to the columns:

Code:
g = db(:,2); % the genders, assuming male=1 female=2
o = db(:,3); % the occupations, assuming integers
a = db(:,4); % the ages

Now you can compute the pairwise similarity matrix, S, pretty quickly:

Code:
G = g*g' ~= 1*2;
O = bsxfun(@minus,o,o') == 0;
A = 1-bsxfun(@minus,a,a')/(agemax-agemin);
S = 0.8*A + 0.1*G + 0.1*O;

This'll give a nice speedup over for-loops when the number of users gets very large.
 
Back
Top