Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question about Applications 2 in Leon

  1. Sep 11, 2012 #1
    Hi all. I'm studying from Leon's Linear Algebra with Applications. I'm wondering if one of his examples has an error.

    In Application 2, he's talking about searching databases. He says we should imagine a database of m documents and n possible search words. Then he says that this can be put into an mxn matrix in which each column represents a book-- the jth entry of the column is 1 if that book contains the word, 0 if it doesn't.

    He then says a "search vector" lives in R^m, and it's a column vector whose jth entry is 1 if you are seraching for that word.

    Does he have his m's and n's mixed up? It seems to me like if you have m documents and each column of the matrix represents one, you need an nxm matrix. Similarly, you can't search through n words by using a vector in R^m, correct? Does he mean you take an nxm matrix and multiply its transpose by the vector in R^n?
  2. jcsd
  3. Sep 17, 2012 #2
    Any ideas?!
  4. Sep 18, 2012 #3


    User Avatar
    Science Advisor

    I think he has it right: remember you have m records with n attributes so you have m rows (one row each record) and n columns (one column for each attribute).

    Now lets say you have a column vector with some values set. If you want to find out how many words were successful, you simply add the number of positive hits you get and this is just a matrix multiplication with a column vector where if you want to search for n words, you simply set n entries in the column vector to 1 and the rest to zero and when you check the value of the multiplication, if you get > 0 then you got n hits.

    This guy is basically checking it one at a time, so he sets one entry to 1 and the rest to zero and if you get a 1 in that entry it means the word was found and 0 means it wasn't.
  5. Sep 18, 2012 #4
    But if you have n search words, how can that be embedded in a vector in R^m?
  6. Sep 18, 2012 #5


    User Avatar
    Science Advisor

    You either do n individual searches with n individual search vectors (with only 1 non-zero entry) or you can return the "number" of hits by flagging every term you are interested in as a 1 and keep everything else zero.

    Imagine three terms: a normal search for the first term would be [1 0 0]^T. Do to three individual searches for each term you have [1 0 0]^T [0 1 0]^T [0 0 1]^T respectively. Do find out how many terms appear in each document by considering all terms, your vector will be [1 1 1]^T and if we have say 6 observations and we get the output [1 1 1 3 2 1]^T then it means that the fourth has all three, the fifth have 2 and the rest have only 1.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook