Convolution is simple. The explanations on the web make it seem like you need to be a genius to understand it. But it still feels hard to explain in words. Here's a try.
We have an LTI system. This means the system is time invariant -- its impulse response or transfer function does not change with time. If you apply an input now or 20 minutes from now, you will get the same response. It also means the system is linear. If you apply input x1(t) and get y1(t) as output and you later apply input x2(t) and get y2(t) as output, then the output due to x1+x2 will be y1+y2. This describes most engineering systems and when it doesn't, we make approximate models of nonlinear systems that are linear to simplify matters (eg, the small signal models of transistors which approximate the exponential transistor characteristics).
We begin by knowing the response due to an impulse δ(0) is the impulse response h(t). Because the system is linear, the response to two impulses, perhaps δ(0)+δ(1), is h(t)+h(t-1). That is, the system begins to respond with h(t) at time t=0 due to the first impulse and then begins to generate a second impulse response h(t-1) at time t=1 due to the second impulse. The total output is the summation of the two.
We can express an arbitrary signal x(t) as an infinite sum of impulses. You've already done this by sampling x(t) with period T to get a representation as a summation of impulses separated in time by T. If you continue to decrease T, the impulses become closer together until at T=0, x(t) is represented exactly as a solid wall of impulses. The proposal is the system response to x(t) can then be regarded as a summation of impulse responses h(t).
The computation of this summation of impulse responses is done with convolution. I've attached a badly drawn diagram.
On the left side I've drawn x(t), an input, chosen to be a rectangle. On the right is the system impulse response h(t), chosen to be a decreasing ramp.
The second image down on the left is a graph of x(-t), which is found by reflecting x(t) about the y axis. The important characteristic of this graph is that x(t) is ordered such that later input values appear further to the left. The value of x at the start t=0 is still at the origin.
At t=0 the output to the system begins because x(0) presents itself as an impulse. This is shown in the second diagram on the right where both x(-t) and h(t) are drawn. The impulse x(0) will cause the output at t=0 to be x(0)×h(0), which is the area of overlap between x(-t) and h(t) at the time shown.
Next look at the output at time t=0.5. The third image on the left is a graph of x(-t) shifted to the right by 0.5. The third image on the right shows x(0.5-t) superimposed on a graph of h(t).
That impulse x(0) that occurred at t=0 is still generating its impulse response and at t=0.5, the output due to impulse x(0) only is x(0)×h(0.5). On that third graph on the right, I marked this impulse in red. In fact as time passes, that impulse slides right tracing out the part of the impulse response it is responsible for generating. I've marked another impulse on that graph x(0.25). That input was presented to the system at time t=0.25 and began tracing out its impulse response at that time. At t=0.5, it is responsible for generating the response h(0.25)×x(0.25). In fact, there is a solid wall of impulses to the right of t=0 that are generating a part of their impulse responses at this time. We need to add up all the responses to get the total response of the system at this time. We do that with an integral.
At this snapshot in time (at t=0.5), the output will then be y(t=0.5) = ∫00.5 h(t) x(0.5-t) dt
A moment Δτ later, the x part of the graph on the bottom right slides Δt to the right as each impulse selects the part of the impulse response that it is generating. The function x graphed is x(0.5+Δτ - t). And the output at this time y(0.5+Δτ) can be found by adding up all the parts of the impulse response each impulse in x is generating. y(0.5+Δτ) = ∫00.5+Δτ h(t) x(0.5+Δτ - t) dt
In general the output at any time t due to the impulse wall x(t) is:
y(t) = ∫0th(τ) x(t-τ) dτ and this is the convolution integral.
We flipped x(t) so that the part of x that occurs soonest would overlap h(t) first as it was slid to the right. This corresponds to earlier impulses in x each generating h(t) before later impulses of x arrive.
To find the output at a specific time, we shifted the flipped x to the right by that amount of time. The part of the impulse response each impulse in x was generating is then coincident so that multiplying each impulse of x with the impulse response value at the same t would yield the current output due to that impulse. Then to get the total response to all impulses in x currently generating their impulse responses, we need to add them up with an integration.
A note on the various graphs of x(t).
x(t) is some function of t = 1 + (t) + (t)^2 + ...
x(-t) is a reflection around the y axis. To find this function, we replace the 't' in x(t) by -t:
x(-t) = 1 + (-t) + (-t)^2 + ...
To shift this last function right by τ seconds, we need to replace the 't' with 't-τ':
1 + (-(t-τ)) + (-(t-τ))^2 + ...
= 1 + (τ-t) + (τ-t)^2 + ...
Compare this to the original x(t), this means the result of flipping and shifting right by τ can be found by replacing 't' in x(t) by 'τ-t' ie x(τ-t).
Many people become confused about why it isn't x(-t-τ) to shift x(-t) right by τ seconds.
I hope that helped to explain. It seems hard to articulate the idea but it is simple once you've grasped it.