I'm trying to find a little bit more information for efficient square root algorithms which are most likely implemented on FPGA. A lot of algorithms are found already but which one are for example from Intel or AMD? By efficient I mean they are either really fast or they don't need much memory.
EDIT: I should probably mention that the question is generally a floating point number and since most of the hardware implements the IEEE 754 standard where the number is represented as: 1 sign bit, 8 bits biased exponent and 23 bits mantissa.
Thanks!
Not a full solution, but a couple of pointers.
I assume you're working in floating point, so point 1 is remember that floating point is stored as a mantissa and exponent. The exponent of the square root will be approximately half the exponent of the original number thanks to logarithms.
Then the mantissa can be approximated with a look-up table, and then you can use a couple of newton-raphson rounds to give some accuracy to the result from the LUT.
I haven't implemented anything like this for about 8 years, but I think this is how I did it and was able to get a result in 3 or 4 cycles.