The probability value obtained from a probit model cannot be negative. The probit variable (latent variable, usually denoted y) itself can well be negative.
The probability of an outcome as estimated with a probit model cannot be negative because it is derived from a proper distribution (Normal or Gaussian). By definition probabilities cannot be negative.
But suppose I want to estimate a "probability of success" from the data and I don't have the luxury of using a probit model (perhaps because it is too time consuming and I only need a quick estimate). In that case I could apply OLS to my data and hope that it would give me somewhat sensible results. The danger is, since OLS is a linear model, there is nothing that keeps away the "probability estimate" from becoming negative. But if all someone needs is a quick estimate, they must live with its consequences.
On the other hand, if someone cannot accept a "probability" being negative (contrary to everything that one is hopefully taught in a probability course), then they must go with the complicated probit model. Contrary to OLS, a Probit curve (the normal PDF) has a suitable nonlinear shape which prevents negative probability estimates.
If you read Saint-Exupery's The Little Prince, you should remember the snake that swallowed an elephant; it looked like a cross-sectional hat or bell. A normal distribution also looks like that (it is also called the bell curve); see the left-hand side graph in the following link.
You should mentally "erase" the "x" on the horizontal axis in the graph and put a "y" in its place. The horizontal axis is the latent variable y = a + b1x1 + ... + u.
As you can see, horizontal axis extends to both sides of zero (at the vertical line in the middle). So just like OLS, the probit variable y can be negative. But y is not a probability estimate. The probability estimate is the area under the bell. For example if y = 0 then the probability is Prob(y < 0) = 0.5. That's because exactly half of the area under the bell lies to the left of zero (the origin). No matter which y value that your model predicts (depending on your x), the area under the bell curve up until that y value (say, y*) will always be positive. For a very negative y value (e.g. y* = -1,000,000), that probability will be very very small, but still positive. As y gets very positive (e.g. y* = +1,000,000) then the probability will be practically all the area under the curve. By definition (of a probability distribution), that area is equal to 1.
So a probit model predicts 0 < Prob(y < y*) < 1 for all y* [itex]( -\infty < y^* < +\infty )[/itex].
For a picture of a snake swallowing an elephant whole and how it looks afterwards, click here. This picture looks less like a bell curve of the normal distribution, because the bell curve has only a single peak. In statistics, a distribution with a single peak is called a unimodal distribution (like the bell curve). A distribution that looks like this picture (having two peaks) would be called a bimodal distribution.
Here's a specific probit example:
Suppose in a medical trial for a new drug, the outcome is coded "1" for "success" and "0" for "failure." When the outcome variable was regressed against another variable showing the administered dosage level (x), the probit equation was estimated as y = -1.5 + 0.5 x, where y is the probit (latent) variable. At dosage level x* = 2, what is the predicted probability of success? What is the predicted outcome?
Next, look up Prob(y < -0.5). You could either look it up from a normal distribution table, or calculate it using a computer software. For example, in Excel, go to a blank cell and type:
then press enter. The cell value should display the value 0.30853753. So the probability of success for x* = 2 is approximately 0.31. Since this value is less than 0.5, the predicted outcome is "failure."