In data analysis, one often measures the average value of a sample of the variable,Y, for a small range of sample values of another variable, X. This is a way to see if knowledge of X gives you information about Y. If the random variables are independent these conditional expectations are constant, that is they do not depend on the values of X. The values of X tell you nothing about the expected values of Y.
As X varies, the condition expectations of Y given X are a function of X. This function is known as the regression of Y on X. In general this function could have any shape but in the case where X and Y are jointly normal this function is linear. In this special case, the slope of the regression line is the correlation of X and Y times the square root of the variance of Y. So correlation describes the dependence of conditional expectations of Y given X in the case of a bivariate normal.
While the bivariate normal may seem like a special case, in practice many random variables are close to normal and if they are not one can form new random variables by taking sample averages and the distributions of the averages will be closer to normal. If then the two sample averages are jointly normal, their regression line will be linear.
Also in practice one may want to extract the linear part of a complex relationship between X and Y. A linear regression is an attempt to locate the linear part. Sometimes the linear part may only be valid for small values of X but fail miserably for large values. This happens in security prices where large values can mean extreme and unusual events - such as a country defaulting on its debt - when the business as usual relationships no longer hold.
Often the relation ship between two variables may be weak and difficult to detect, but if there are many weakly related variables it may be possible to discover significant relationships between aggregates of the variables - e.g. weighted averages. Principal components is a way of selecting aggregates and works well for nearly jointly normal random variables. Again, these components may only be valid for small values of the underlying random variables. For instance, one might decompose short term stock returns into principal components that are each represented by large baskets of stocks. These components may work well in stable markets but fall apart when that country defaults on its debt.
Abstractly, two random variables can be uncorrelated yet completely dependent. The dependence can be arbitrarily complicated. In practice, one generally hopes for a linear relationship ,at least in some range of the variables, because non-linear relationships are difficult to estimate and require large amounts of data.