Becke's paper, "A new mixing of Hartree-Fock and local density-functional theories," gives a pretty good motivation for hybrid functionals. It's unclear from your post how much you already understand about HF and DFT, so let me know if I assume too much.
The rationale for using DFT instead of HF is that HF doesn't explicitly include certain electron correlation effects. There is some correlation already present in HF in the form of the exchange integral, which enforces the Pauli principle (so that electrons with like spins avoid each other). However, the coulomb repulsion between electrons in HF theory is treated in a mean-field way: in effect, each electron sees the average electric field of all the other electrons, instead of the field from each individual electron. This is what we mean when we say that HF doesn't include electron correlation (Technical note: this is sometimes referred to as 'dynamic correlation.' There is another type of correlation called 'static correlation' which refers to situations in which the ground state of the molecule is not well-described by a single Slater determinant. This isn't particularly relevant to the present discussion). Most post-HF methods that do include correlation (Moller-Plesset, configuration interaction, coupled cluster, etc.) are quite computationally expensive, and DFT provides a method which includes correlation explicitly while being somewhat more computationally tractable than other post-HF methods.
As it turns out, the exchange-correlation energy is dominated by the "exchange" portion, with the "correlation" portion being a small correction. This suggests splitting the exchange-correlation energy into two parts:
E_{XC} \approx E_X + E_C Typically, some approximation from an electron gas (such as LDA or GGA) is used to give functionals representing exchange E_X and correlation E_C. You can, however, use the HF exchange integral for E_X and, e.g., the LDA expression for E_C. In the paper mentioned above, Becke notes that this technique improves slightly upon HF, but it's nowhere near chemical accuracy. The reason is because the above equation is actually a terrible approximation and the exchange and correlation parts of the energy can't really be that cleanly separated. In addition, the exchange portion of E_{XC} is not the same for HF as it is for something like LDA or GGA. The approximation gets somewhat better if you mix the overall exchange-correlation from DFT E^{DFT}_{XC} with the pure exchange from HF E^{HF}_{X} in some linear combination. For example, the PBE0 functional has the form:
E^{PBE0}_{XC} = \frac{1}{4} E^{HF}_X + \frac{3}{4} E^{PBE}_X + E^{PBE}_C You might ask: If HF is so crappy, why not just use the LDA exchange-correlation functional? Why even drag HF exchange into it in the first place? In fact, the HF exchange does add something to DFT. In general, exchange-correlation energies are nonlocal, but most DFT functionals assume some sort of local approximation (LDA=local density approximation). This causes local DFT to underestimate the true exchange-correlation energy. Including HF exchange as a nonlocal correction improves results for calculations like bond energies and ionization potentials, among other things.