First of all, it's "tensor product", not "direct product". If U, V and W are vector spaces, a bilinear function \tau:U\times V\rightarrow W is said to be a tensor product if for every bilinear function \sigma:U\times V\rightarrow X where X is a vector space, there's a unique linear bijection f:X\rightarrow W, such that f(\sigma(u,v))=\tau(u,v) for all u in U and all v in V.
W is then said to be the tensor product of U and V, and we write U\otimes V instead of W. We also write u\otimes w instead of \tau(u,w), and we define the scalar product on U\otimes V by \langle u\otimes v,u'\otimes v'\rangle_{U\otimes W}=\langle u,u'\rangle_U\langle v,v'\rangle_W.
The definition guarantees that U\otimes V is unique up to isomorphisms. To prove that a tensor product \tau:U\times V\rightarrow W exists for arbitrary U and V is tricky. You do it by explicitly constructing a suitable space W and explicitly defining \tau. This construction involves the following steps: Find a vector space that has a basis that can be mapped bijectively onto U\times V. Then use it to define a vector space that consists of certain equivalence classes of vectors from the the first vector space. When U and V are Hilbert spaces, the construction involves an additional step. You define a third vector space, which consists of equivalence classes of Cauchy sequences of vectors from the second vector space.
So the construction is quite complicated, and serves no other purpose than to prove existence.
I don't know enough about quantum logic to know if it will give you the best motivation for why the tensor product is used. What I do know is that the Born rule gives you some motivation. The Born rule is the rule that says that if a system is in state |\psi\rangle when we measure the operator A, the probability that we will get the result a is P(a)=|\langle a|\psi\rangle|^2. Now consider the case where the two systems aren't interacting. The probability that the result of a simultaneous measurement of A on the first system and B on the second system will give use the results a and b, must satisfy P(a,b)=P(a)P(b), and this is automatically satisfied when we take the Hilbert space of the combined system to be the tensor product of the Hilbert spaces of the component systems.Suppose that the Hamiltonian can be expressed as H=H_1\otimes I+I\otimes H_2=H_1'+H_2'. Here the \otimes symbol is defined by X\otimes Y (|\psi\rangle\otimes|\phi\rangle)=X|\psi\rangle\otimes Y|\psi\rangle. Now the time evolution operator can be expressed as
U(t)=e^{-iHt}=e^{-iH_1't}e^{-iH_2't}=e^{-iH_1t}\otimes e^{-iH_2t}=U_1(t)\otimes U_2(t)
which means that the systems evolve completely independently of each other. This is what we would expect if and only if the systems aren't interacting, so it looks like we have found a definition of "non-interacting". The systems are "non-interacting" if the Hamiltonian can be expressed as above.