Calculating similarity between users using matlab

  Nov 19, 2012 #1
    Hi all,
    I am having database of 963 users .
    Records of two users are

    uid gender occupation age
    1 F student 23
    2 M teacher 30

    Now i need to calculate the similarity of each user with every other as

    sim(ui,uj)=0.8*sim(age) + 0.1*sim(gender) + 0.1*sim(occupation)

    where sim(age)=1-[(Ai-Aj)/(agemax-agemin)];

    agemax is the maximum age and agemin is minimum age of user.

    sim(gender)=1 if G1=G2 else 0
    sim(occup)=1 if occupation is same .

    Kindly tell me the code fir it so that i can get a matrix of similarities.(963*963)
  2. jcsd
  Nov 22, 2012 #2
    the data import depends on the form of your database, but assuming you can do this and obtain a 4-by-963 cell array, you can do this using a pair of for loops, eg.

    Code (Text):
    sim = zeros(963);
    for i = 1:962
        for j = i+1:963
            sim(i,j) = [your specific sim function]
    this should give you a strictly triangular matrix of the result of every possible pairing.
  Dec 5, 2012 #3
    Hi, if I were you I'd encode all attributes as integers, so that all records can be stored as rows in a matrix of type int. I'll assume that you've done this, and your database is a matrix "db" of size [numusers 4]. I'll let you work out the details of that. First, let me give names to the columns:

    Code (Text):
    g = db(:,2); % the genders, assuming male=1 female=2
    o = db(:,3); % the occupations, assuming integers
    a = db(:,4); % the ages
    Now you can compute the pairwise similarity matrix, S, pretty quickly:

    Code (Text):
    G = g*g' ~= 1*2;
    O = bsxfun(@minus,o,o') == 0;
    A = 1-bsxfun(@minus,a,a')/(agemax-agemin);
    S = 0.8*A + 0.1*G + 0.1*O;
    This'll give a nice speedup over for-loops when the number of users gets very large.
