I have posted this on other forums, and I have discussed this with my professors, but I thought I would share it here for those interested. Essentially, I have a function that efficiently approximates arctangent on [-1,1] and ln(1+x) on [0,1]. For some background about me, I am a Z80 programmer and so naturally, efficiency is a huge priority. For those that do not know, the Z80 is an 8-bit processor that can add, subtract, and shift 8-bit registers. There are no instructions for multiplication, division, or any of that fancy stuff. First, I will tempt y'all with the equations: [itex]ln(1+x)\approx \frac{x(8+x)}{8+5x}, x\in[0,1][/itex] [itex]tan^{-1}(x)\approx \frac{x(9+2x^{2})}{9+5x^{2}}, x\in[-1,1][/itex] Looking at them, you can probably tell that they stem from a similar process, but both achieve better than 8-bits of accuracy. To achieve that with Taylor series for these two functions, we would need hundreds of terms and that is very slow. The derivation is very simple: First, observe that [itex]\frac{x(1+ax^{k})}{1+bx^{k}}=x-(b-a)x^{k}+b(b-a)x^{2k}-b^{2}(b-a)x^{3k}+b^{3}(b-a)x^{4k}-...[/itex]. This can be derived from McLaurin series if you were smarter than I when I first derived that, or you could follow the process I used: [itex]\frac{x(1+ax^{k})}{1+bx^{k}}[/itex] [itex]=x\left(\frac{1+bx^{k}-bx^{k}+ax^{k}}{1+bx^{k}}\right)[/itex] [itex]=x\left(1+\frac{-bx^{k}+ax^{k}}{1+bx^{k}}\right)[/itex] [itex]=x\left(1+x^{k}\frac{-b+a}{1+bx^{k}}\right)[/itex] [itex]=x\left(1+x^{k}(-b+a)\frac{1}{1+bx^{k}}\right)[/itex] [itex]=x\left(1-x^{k}(b-a)\frac{1}{1+bx^{k}}\right)[/itex] [itex]=x\left(1-x^{k}(b-a)\frac{1+bx^{k}-bx^{k}}{1+bx^{k}}\right)[/itex] [itex]=x\left(1-x^{k}(b-a)(1+\frac{-bx^{k}}{1+bx^{k}}\right)[/itex] [itex]=x\left(1-x^{k}(b-a)(1-bx^{k}\frac{1}{1+bx^{k}}\right)[/itex] [itex]...[/itex] (At the last step, notice that we end up just rewriting [itex]\frac{1}{1+bx^{k}}[/itex] over and over, iterating the last 3 steps to infinity.) Notice then the Taylor series for arctangent(x) and ln(1+x) on the intervals described above: [itex]ln(1+x)= x-\frac{x^{2}}{2}+\frac{x^{3}}{3}-\frac{x^{4}}{4}+..., x\in[0,1][/itex] [itex]tan^{-1}(x)= x-\frac{x^{3}}{3}+\frac{x^{5}}{5}-\frac{x^{7}}{7}+..., x\in[-1,1][/itex] These all look pretty similar. By using k=1 and 2, respectively for the polynomial, I realised I might be able to approximate infinitely many terms of the Taylor Series of arctan and ln() just by using a division. Indeed, for ln(x+1), to eliminate some error, we try to match the first two terms of the Taylor Series giving us the constraint, [itex]b-a= \frac{1}{2}[/itex]. Adding another constraint can allow us to solve a system of equations, but we have a few options. We can match the next term as [itex]b(b-a)= \frac{1}{3}\Rightarrow \frac{b}{2}=\frac{1}{3}\Rightarrow b=\frac{2}{3}\Rightarrow a=\frac{1}{6}[/itex]. From this, we get a formula [itex]\frac{x(1+\frac{x}{6})}{1+\frac{2x}{3}}[/itex] which deviates as x approaches 1 to a maximal error on [0,1] of about .00685. However, I tried a different constraint, noticing that the error grew as x→1. If I place a constraint specifically at x=1, I get [itex]\frac{1+a}{1+b}=z\Rightarrow 1+a=z+zb\Rightarrow -a+zb=z-1[/itex]. Ideally, it may seem that we want z=ln(2), but this is not actually always the best idea. As well, I had Z80 programming in mind, so I used a rational approximation of ln(2). Using continued fractions, I truncated to [itex]ln(2)\approx \frac{9}{13}[/itex]. Solving the system of linear equations, we end up with [itex]a=\frac{1}{8},b=\frac{5}{8}[/itex]. Indeed, the error is now much better at x=1, but the location of the worst error is now moved to about x=.8000, with an error of .001119998 according to my TI-84+. Similarly, we can derive for arctangent, using an approximation of [itex]tan^{-1}(1)=\frac{\pi}{4}\approx \frac{22/7}{4}=\frac{11}{14}[/itex]. The resulting values for a and b are then [itex]a=\frac{2}{9}, b=\frac{5}{9}[/itex] and the maximal error is smaller than [itex]\frac{1}{2^{10}}[/itex]. After realising this, I did some searching, and the closest thing I found to my derivation is here, dealing with applications in signal processing. However, it does not show the derivation of their formula, and it is less accurate. The only cost for several more bits of accuracy would have been a single addition. In fact, here is an implementation in code using my formula with the Axe programming language (for the TI-83+/84+ graphing calculators): Code (Text): Lbl ATAN r1+9.0→r2+r1*4→r3 r1**(r2+r1)/*r3 Return Lbl LN 8.0+r1→r4 r1*4+r4→r3 r4**r1/*r3+r2 Return From Wolfram Alpha, the error for ln: And arctangent: For further exploration, adding additional terms in the numerator or denominator can yield better accuracy for the target ranges. Each additional term allows you to add a further constraint, so you can either choose to approximate the Taylor or Chebyshev approximation to additional terms, or you can add more points on the actual function (nodes). For example, matching the first two terms in the Taylor series of arctangent and then the endpoint of x=1 as 355/452, using the following approximating model: [itex]\frac{x(1+ax^{2})}{1+bx^{2}+cx^{4}}[/itex] I obtained [itex]a=\frac{23}{48}, b=\frac{13}{16}, c=\frac{17}{240}[/itex] and we get the approximating function of [itex]tan^{-1}(x)\approx \frac{x(240+115x^{2})}{240+195x^{2}+17x^{4}}, x\in [-1,1][/itex] That has 4 decimal digits of accuracy.
Those are interesting functions. I'm not so sure about speed- a division is time-consuming. I found a worse approximation of the arctan with just multiplication and addition here: atan(x) ≈ pi/4*x - x(|x|-1)*(0.2447+0.0663*|x|). Difference in WA, it stays below 0.0015. I would be interesting to see how fast approximations with more digits are compared to conventional algorithms.
The division for my implementations takes about twice as long as a multiplication. Using a Taylor series would require >100 terms for the desired accuracy which is a lot of multiplications (even optimising it for arctangent to compute x^{2} once and not compute any more powers of x). And that is an excellent formula! The multiplications are constant multiplications that can reduce some clock cycles, so it really becomes just two full multiplications, two constant multiplications and 2 add/subtracts and it fit 8-bit accuracy. To make a comparison, in software, CORDIC is way too slow on the Z80, as is the Arithmetic Geometric Mean. These may be faster to use for much better accuracy on a device that has hardware multiplication and a barrel shifter, such as the ARM. Your formula would probably be faster on the ARM than mine, and slightly slower on the Z80. I modified your constants a little for easier computations (for integer optimisations) and the following still stays within the 8-bit range: atan(x) ≈ 201*x/256 - x(|x|-1)*(63/256+|x|/16) How did you come up with that formula? Was it through a specific process, or testing values? I have been looking at atan() and ln() by starting with a linear function, then slightly curving it toward the x axis by modifying additional terms.
See the link in my post, I found it with google. Looking at the plot of differences, the parameters are tuned to reduce the maximal deviation.
I was debating on posting since my reply is programming oriented instead of math/physics) I did not originally see that link (I just saw the other), so thanks! I tried implementing it in code, too, and it worked phenomenally. I managed to get it to work in an average of 545 clock cycles (compared to an optimised 8-bit multiplication which averages 230 clock cycles, or 16-bit square root that averages 509 t-states). Using the formula I gave, the fixed point division alone would require an average of 370 clock cycles. Together with that optimised multiplication and excluding the rest of the code puts it at over 600 clock cycles. I said it worked 'phenomenally' because on top of being able to make lots of math shortcuts and get it to be really fast, I managed to avoid precision issues. It correctly evaluates arctangent to every bit for all inputs on [-1,1] with 8.8 fixed point format.