# Xilinx XC7A200T vs Intel Processor

#### bhobba

Mentor
Hi

There is a device made by Chord called an M-Scaler:
https://chordelectronics.co.uk/product/hugo-mscaler/

It uses a 740 core Xilinx XC7A200T and took a couple of years for the engineer to write the code for it. It pushed the power of the device to its limit to get 1 million taps on the filter used in the device. The designer claims it was only the release of processors of this power that allowed him to do it.

However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.

What do people here think?

Thanks
Bill

Related Programming and Computer Science News on Phys.org

#### Mark44

Mentor
However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.
I don't know anything about the XC7A200T, but I know something about Intel processors. The top-of-the-line Intel processors right now are the Intel Xeon Scalable Platinum models, with up to 28 cores, and the Intel Core i9 models, with up to 24 cores. The Skylake microarchitecture on which these processor families are based, also includes a set of 32 registers of 512 bits in size. These registers can hold a vector of 16 four-byte integers or single-precision floats, or 8 eight-byte doubles, and the instruction set can perform arithmetic operations in a component-wise fashion.

Having said that, comparing the XC7A200T with an Intel processor seems to me sort of an apples and oranges comparison. Having gobs of cores is one thing, but keeping them busy is entirely something else, and seems to be a much harder problem in general-- how to parallelize an arbitrary program to take advantage of all the processor cores. The mscaler appears to be dedicated to processing audio signals, so what it seems to be intended to do is different from what a general processor from Intel or AMD or whatever is intended to do.

#### anorlunda

Mentor
Gold Member
took a couple of years for the engineer to write the code for it.
For a digital filter? I find that hard to believe. The code should be short, simple and repetitive. Maybe it took him a couple of years to debug it?

The M-Scaler is a 5000 US$box that up-scales 44kHz audio to more than 700 kHz that is marketed to a certain type of HiFi enthusiast (I have a different name for them). You can safely assume most things written about it to be completely wrong and miss-informed especially if it comes from the company producing it or from the mouth of that particular type of HiFi enthusiast. #### anorlunda Mentor Gold Member up-scales 44kHz audio to more than 700 kHz Wow. That's not me. My hearing cuts off at 1200Hz. Last edited: #### russ_watters Mentor Wow. That's no me. My hearing cuts off at 1200Hz. Searching for sarcasm.... #### Vanadium 50 Staff Emeritus Science Advisor Education Advisor The M-Scaler is a 5000 US$ box that up-scales 44kHz audio to more than 700 kHz
Do you need Pear Cables for this? And on the motherboard traces?

#### glappkaeft

Funny thing, when I think of it, while we can't know one way or another without Chord disclosing the design of what they claim is "the world’s most advanced digital filter" the claim that it can't be run on an Intel processor is at least plausible. Running a 1 million tap filter at audio rates is one thing a custom design using a FPGA like the Xilinx Artix-7 XC7A200T is well suited for.

Last edited:

#### bhobba

Mentor
We wont get into the claims of Hi Fi enthusiasts - I am one myself and that is a massive can of worms.

I didn't mention it in the original post but some simple math sheds a bit of light on it. Each tap is a multiplication. The device takes a signal sampled at 96khz and performs a million multiplications meaning about 96 billion multiplications a second. You have two channels so about 200 billion multiplications per second. My reading of Intel processors is modern powerful versions can do about 100 billion instructions per second. Of course they have a number of cores.

The issue then boils down how fast can modern Intel processors do multiplications? My reading from stuff Intel has written is that multiplications can take from 3 to 15 times an addition. So these rough calculations indicate an Intel processor might have a hard time of it.

Thanks
Bill

Last edited:

#### glappkaeft

Like I said in the previous post without the details it is not possible to tell.

#### bhobba

Mentor
Funny thing, when I think of it, while we can't know one way or another without Chord disclosing the design of what they claim is "the world’s most advanced digital filter" the claim that it can't be run on an Intel processor is at least plausible. Running a 1 million tap filter at audio rates is one thing a custom design using a FPGA like the Xilinx Artix-7 XC7A200T is well suited for.
There are a lot of unknowns here such as the precise nature of the filter algorithm used. It's what the designer Rob Watt's call a WTA filter and this is what he says about it:
'Yes all my WTA filters are extremely fast roll-off; the 49,152 tap WTA filter is at -1dB within 75Hz of FS/2; the 1M tap WTA filter in the M scaler gets to 4Hz of FS/2. The M scaler WTA coefficients are identical to an ideal sinc function coefficients to a better than 16 bit accuracy. This implies that under all circumstances the M scaler 16FS WTA filter will recover the bandwidth limited un-sampled analogue signal in the ADC to a better than 16 bit accuracy.'

Its not much actual detail, as you would expect of a proprietary technology they obviously want to keep hush hush. But my gut tells me its horses for courses - this type of processing is what the XC7A200T is well suited for - Intel processors not so much. Can an Intel processor do it? I think we don't have enough information to be 100% sure, but it at least seems plausible it may be pushing it.

What I do know is a guy who posts under the name of Miska has an ongoing debate with Rob over exactly what a Tap is - when it gets down to that level it really makes things hard.

Thanks
Bill

Staff Emeritus
An AVX512 processor can do two 8-wide double precision FMA instructions per cycle. For example, an i7-7800X runs at 3.5 GHz, so you have 56 billion multiplications per second per core. You have 6 physical cores, so in principle you have the horsepower.

In practice, this would require that you can keep the cores fed, and these million multiplications involve information that is always in registers. That is almost certainly not the case, and it's also much easier to achieve with an FPGA. The other problem is that Intel chips throttle when one part of them gets hot, so you run the risk of it slowing down the FP arithmetic if you tried to run it flat out. Again, an FPGA is less prone to this.

#### Mark44

Mentor
My reading of Intel processors is modern powerful versions can do about 100 billion instructions per second. Of course they have a number of cores.
That's pretty close. I recently bought a new computer with a Xeon Scalable Silver processor, with 10 cores, and running at 3.0 GHz.
The vmulps instruction can multiply 16 pairs of floats in half a clock (there's a latency of 4 clocks if all values are in registers). So at 3.0 GHz, my computer could theoretically do $6.0 \times 10^9$ multiplications of 16 pairs of floats, or 96 billion multiplications per second. That's in one half of one core. Delegating the work to additional threads would bump up the throughput.
An AVX512 processor can do two 8-wide double precision FMA instructions per cycle. For example, an i7-7800X runs at 3.5 GHz, so you have 56 billion multiplications per second per core. You have 6 physical cores, so in principle you have the horsepower.

In practice, this would require that you can keep the cores fed
No easy task to keep the pipeline full. Loop unrolling helps, but can only go so far.
, and these million multiplications involve information that is always in registers. That is almost certainly not the case, and it's also much easier to achieve with an FPGA. The other problem is that Intel chips throttle when one part of them gets hot, so you run the risk of it slowing down the FP arithmetic if you tried to run it flat out. Again, an FPGA is less prone to this.

#### Rive

However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.

What do people here think?
There is a proverb for this kind of thing: it is about the useless but very seriously taken comparison of the size of a certain human part (often associated as inversely proportional to car size).

Just mix in some videocard stats and keep safe distance.

Last edited:

#### willem2

No easy task to keep the pipeline full. Loop unrolling helps, but can only go so far.
It's a big problem that the filter coefficients and the window of the signal that the filter is looking at, won't fit in the second level cache, so you'll have to go to the third level cache, and there is only one of those per chip. Multiple cores on a single processor chip likely won't help at all.

Since the Filter is only be a giant convolution, using a Fast Fourier transform on block of sound, and on the filter, than multiply them in the frequency domain, and transform back, should be much faster. A problem wil be that there will be a larger delay between the input and the output, because you need to work on large blocks on samples.

#### f95toli

Gold Member
However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.
l
This is -as was pointed out above- comparing apples and oranges. A regular PC running a normal OS (Windows, Linux etc) is really, really bad at real-time DSP. The speed of the processor is only one problem, another issue is the time it takes to actually transfer data between the different memories; especially if the data is coming from an external source via say the PCI bus.
However, the main issue is that the time it takes to process something can vary a LOT (ms), meaning unless you are able to cache data (although I guess that should be possible in this case) it simply won't work. You can improve things by running a real-time OS but then you are already going down the "non-standard" route.

There is a good reason for why there is a thriving market for "easy" to program FPGA cards. Some of them are made to fit into a standard PC, others use PXI or other standards.

In many applications you end up using a combination of CPUs, GPUs and FPGAs. They each have their strength. For sa filter I would have thought that even a good GPU would be way faster than a state-or-the-art processor; but again only if you do not need to do it real-time (it takes a lot of time it takes to transfer data between RAM and the GPU; especially if you have to involve the CPU)

"Xilinx XC7A200T vs Intel Processor"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving