I realize it's not a function in the classical sense, but how would one show that the delta dirac function is a distribution i.e. how do I show it's continuous and linear given that it's not truly a function?
The set of set of test functions is a vector space. The reals are are also a vector space. Use that for linearity.
Test functions are smooth, use that for continuity.
I think your question really amounts to 'what is a distribution'? As mentioned by the previous posters it has to do with test functions, and more generally with special vector spaces (usually complete ones and those endowed with a norm).
An amazing triumph of functional analysis is representing vectors (in this case non pathological functions) in terms of their actions on other vectors. By action on other vectors, I mean given any vector [itex]v[/itex] in the vector space [itex]V[/itex], define a mapping [itex]\hat{v}:V\rightarrow \mathbb{R}[/itex]. This mapping is given by the Riesz Representation Theorem, and in our case it means [itex]\hat{v}(g):=\int fg \mathrm{d}x[/itex]
[itex]\hat{v}[/itex] is called linear because [itex]\hat{v}(f+g)=\hat{v}(f)+\hat{v}(g)[/itex].
The second important property we want [itex]\hat{v}[/itex] to have is that of continuity. Another surprising result of functional analysis says that a functional (any linear map from the vector space into the reals, like [itex]\hat{v}[/itex] for example) is continuous if and only if it is bounded in the operator sense. That is, [itex]\hat{v}[/itex] is bounded if and only if sup[itex]\{\hat{v}(f): ||f||_\infty = 1 \}< \infty [/itex]
What I have done is built the necessary machinery to generalize functions. What I have shown is that any vector (or in this case non-pathological function) can be thought of as a continuous linear functional. A distribution is then just one of these continuous linear functionals.
So to answer your question, the dirac delta function [itex]\delta[/itex] is defined as a functional, mapping some space of functions to the real line by [itex]\delta (f) = \int f\delta \mathrm{d}x := f(0)[/itex]. It is clear that [itex]\delta[/itex] is linear because integral is linear (actually strictly speaking the integral doesn't make sense, hence the need for generalized functions to begin with. We really define [itex]\delta[/itex] to be linear).
Why is [itex]\delta[/itex] continuous? Because if [itex]||f||_\infty = 1[/itex] and [itex]f[/itex] is continuous, then [itex]f(x)\leq 1[/itex] for any [itex]x[/itex]. Hence [itex]\delta[/itex] is bounded by [itex]1[/itex], and therefore continuous.