I have just created a Java native wrapper for my fast Walsh Hadamard transform and O'Connor Transform code. I am getting 2000 65536-point WHT's per second and 540 65536-point OCT's per second. That is about as fast as you will get without resorting to Nvidia CUDA or whatever.
The code is...