I need to construct a term-document matrix to use for LSA (latent semantic analysis) but I have no information about the term frequency (the frequency of a term in documents). What I only have is two things; the probability of having a term in each document and the probability of having a document for each term. Can I use these pieces of information to construct a reliable term-document matrix with little arbitrariness? Can I consider the first probability as a local weighting and the second one as a global weighting?

# I Constructing a reliable term-document matrix in LSA

