Calculating similarity between users using matlab

  • Context: MATLAB 
  • Thread starter Thread starter Minaxi
  • Start date Start date
  • Tags Tags
    Matlab
Click For Summary
SUMMARY

This discussion focuses on calculating user similarity using MATLAB for a dataset of 963 users. The similarity is computed based on age, gender, and occupation using a weighted formula: sim(ui,uj)=0.8*sim(age) + 0.1*sim(gender) + 0.1*sim(occupation). The method utilizes matrix operations to efficiently compute a pairwise similarity matrix, significantly speeding up the process compared to traditional for-loops. Key MATLAB functions such as bsxfun are employed to optimize calculations.

PREREQUISITES
  • Understanding of MATLAB programming and syntax
  • Familiarity with matrix operations in MATLAB
  • Knowledge of user similarity metrics and their calculations
  • Basic understanding of data encoding for categorical variables
NEXT STEPS
  • Explore MATLAB's bsxfun function for efficient array operations
  • Learn about advanced matrix manipulation techniques in MATLAB
  • Research user similarity algorithms in machine learning
  • Investigate data preprocessing methods for categorical data in MATLAB
USEFUL FOR

This discussion is beneficial for data scientists, machine learning practitioners, and MATLAB users interested in user similarity analysis and optimization techniques for large datasets.

Minaxi
Messages
1
Reaction score
0
Hi all,
I am having database of 963 users .
Records of two users are

uid gender occupation age
1 F student 23
2 M teacher 30

Now i need to calculate the similarity of each user with every other as

sim(ui,uj)=0.8*sim(age) + 0.1*sim(gender) + 0.1*sim(occupation)


where sim(age)=1-[(Ai-Aj)/(agemax-agemin)];

agemax is the maximum age and agemin is minimum age of user.

sim(gender)=1 if G1=G2 else 0
sim(occup)=1 if occupation is same .


Kindly tell me the code fir it so that i can get a matrix of similarities.(963*963)
 
Physics news on Phys.org
the data import depends on the form of your database, but assuming you can do this and obtain a 4-by-963 cell array, you can do this using a pair of for loops, eg.

Code:
sim = zeros(963);
for i = 1:962
    for j = i+1:963
        sim(i,j) = [your specific sim function]
    end
end

this should give you a strictly triangular matrix of the result of every possible pairing.
 
Hi, if I were you I'd encode all attributes as integers, so that all records can be stored as rows in a matrix of type int. I'll assume that you've done this, and your database is a matrix "db" of size [numusers 4]. I'll let you work out the details of that. First, let me give names to the columns:

Code:
g = db(:,2); % the genders, assuming male=1 female=2
o = db(:,3); % the occupations, assuming integers
a = db(:,4); % the ages

Now you can compute the pairwise similarity matrix, S, pretty quickly:

Code:
G = g*g' ~= 1*2;
O = bsxfun(@minus,o,o') == 0;
A = 1-bsxfun(@minus,a,a')/(agemax-agemin);
S = 0.8*A + 0.1*G + 0.1*O;

This'll give a nice speedup over for-loops when the number of users gets very large.