The way distributions work is that you have a function space corresponding to a set of test functions, \phi(x), and a dual which consists of the generalized functions. The nicer your test functions, the meaner the kinds of dual functions you can deal with. For example, if your test functions are only once differentiable, then you can have the delta function in the dual space, but not the derivative of the delta function. If twice differentiable, you can have the derivative, but not the second, and so on.
With the space of test functions chosen, one defines a pairing between a test function and a dual function, using the dirac delta as an example:
(\phi,\delta) = \phi(0)
This replaces the formal integral notation \int dx~\phi(x) \delta(x). Although for distributions the integral notation is formal, things seem to be more or less set up such that the notation suggests correct results. If you use a genuine function from the dual space instead of just a test function, then the integral notation reduces to an actual integral. Anyways, as long as the test functions are smooth enough and decay rapidly at the infinities and have enough derivatives, we can define derivatives of the delta function as
(\phi,\delta^{(n)}) = (-1)^n \phi^{(n)}(0)
In general, for a distribution v(x) one can define a "weak derivative" u(x) by
(v,\phi) \equiv -(u,\phi').
Obviously the derivative of phi has to exist for the weak derivative to exist, etc.
This description comes from
this book. It also references F.G. Friedlander's
Introduction to the Theory of distributions if you're looking for more information on the topic.