Xilinx XC7A200T vs Intel Processor

  • Thread starter bhobba
  • Start date
  • Tags
    Processor
In summary, the M-Scaler is a 5000 US$ box that up-scales 44kHz audio to more than 700 kHz. It is marketed to a certain type of HiFi enthusiast and many things written about it are wrong. An Intel processor might have a hard time of it running a 1 million tap filter at audio rates.
  • #1
10,776
3,636
Hi

There is a device made by Chord called an M-Scaler:
https://chordelectronics.co.uk/product/hugo-mscaler/

It uses a 740 core Xilinx XC7A200T and took a couple of years for the engineer to write the code for it. It pushed the power of the device to its limit to get 1 million taps on the filter used in the device. The designer claims it was only the release of processors of this power that allowed him to do it.

However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.

What do people here think?

Thanks
Bill
 
Technology news on Phys.org
  • #2
bhobba said:
However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.
I don't know anything about the XC7A200T, but I know something about Intel processors. The top-of-the-line Intel processors right now are the Intel Xeon Scalable Platinum models, with up to 28 cores, and the Intel Core i9 models, with up to 24 cores. The Skylake microarchitecture on which these processor families are based, also includes a set of 32 registers of 512 bits in size. These registers can hold a vector of 16 four-byte integers or single-precision floats, or 8 eight-byte doubles, and the instruction set can perform arithmetic operations in a component-wise fashion.

Having said that, comparing the XC7A200T with an Intel processor seems to me sort of an apples and oranges comparison. Having gobs of cores is one thing, but keeping them busy is entirely something else, and seems to be a much harder problem in general-- how to parallelize an arbitrary program to take advantage of all the processor cores. The mscaler appears to be dedicated to processing audio signals, so what it seems to be intended to do is different from what a general processor from Intel or AMD or whatever is intended to do.
 
  • #3
bhobba said:
took a couple of years for the engineer to write the code for it.
For a digital filter? I find that hard to believe. The code should be short, simple and repetitive. Maybe it took him a couple of years to debug it?
 
  • #4
The M-Scaler is a 5000 US$ box that up-scales 44kHz audio to more than 700 kHz that is marketed to a certain type of HiFi enthusiast (I have a different name for them). You can safely assume most things written about it to be completely wrong and miss-informed especially if it comes from the company producing it or from the mouth of that particular type of HiFi enthusiast.
 
  • Like
Likes russ_watters
  • #5
glappkaeft said:
up-scales 44kHz audio to more than 700 kHz

Wow. That's not me. My hearing cuts off at 1200Hz.
 
Last edited:
  • #6
anorlunda said:
Wow. That's no me. My hearing cuts off at 1200Hz.
Searching for sarcasm...
 
  • #7
glappkaeft said:
The M-Scaler is a 5000 US$ box that up-scales 44kHz audio to more than 700 kHz

Do you need Pear Cables for this? And on the motherboard traces?
 
  • #8
Funny thing, when I think of it, while we can't know one way or another without Chord disclosing the design of what they claim is "the world’s most advanced digital filter" the claim that it can't be run on an Intel processor is at least plausible. Running a 1 million tap filter at audio rates is one thing a custom design using a FPGA like the Xilinx Artix-7 XC7A200T is well suited for.
 
Last edited:
  • #9
We won't get into the claims of Hi Fi enthusiasts - I am one myself and that is a massive can of worms.

I didn't mention it in the original post but some simple math sheds a bit of light on it. Each tap is a multiplication. The device takes a signal sampled at 96khz and performs a million multiplications meaning about 96 billion multiplications a second. You have two channels so about 200 billion multiplications per second. My reading of Intel processors is modern powerful versions can do about 100 billion instructions per second. Of course they have a number of cores.

The issue then boils down how fast can modern Intel processors do multiplications? My reading from stuff Intel has written is that multiplications can take from 3 to 15 times an addition. So these rough calculations indicate an Intel processor might have a hard time of it.

Thanks
Bill
 
Last edited:
  • #10
Like I said in the previous post without the details it is not possible to tell.
 
  • Like
Likes bhobba
  • #11
glappkaeft said:
Funny thing, when I think of it, while we can't know one way or another without Chord disclosing the design of what they claim is "the world’s most advanced digital filter" the claim that it can't be run on an Intel processor is at least plausible. Running a 1 million tap filter at audio rates is one thing a custom design using a FPGA like the Xilinx Artix-7 XC7A200T is well suited for.

There are a lot of unknowns here such as the precise nature of the filter algorithm used. It's what the designer Rob Watt's call a WTA filter and this is what he says about it:
'Yes all my WTA filters are extremely fast roll-off; the 49,152 tap WTA filter is at -1dB within 75Hz of FS/2; the 1M tap WTA filter in the M scaler gets to 4Hz of FS/2. The M scaler WTA coefficients are identical to an ideal sinc function coefficients to a better than 16 bit accuracy. This implies that under all circumstances the M scaler 16FS WTA filter will recover the bandwidth limited un-sampled analogue signal in the ADC to a better than 16 bit accuracy.'

Its not much actual detail, as you would expect of a proprietary technology they obviously want to keep hush hush. But my gut tells me its horses for courses - this type of processing is what the XC7A200T is well suited for - Intel processors not so much. Can an Intel processor do it? I think we don't have enough information to be 100% sure, but it at least seems plausible it may be pushing it.

What I do know is a guy who posts under the name of Miska has an ongoing debate with Rob over exactly what a Tap is - when it gets down to that level it really makes things hard.

Thanks
Bill
 
  • #12
An AVX512 processor can do two 8-wide double precision FMA instructions per cycle. For example, an i7-7800X runs at 3.5 GHz, so you have 56 billion multiplications per second per core. You have 6 physical cores, so in principle you have the horsepower.

In practice, this would require that you can keep the cores fed, and these million multiplications involve information that is always in registers. That is almost certainly not the case, and it's also much easier to achieve with an FPGA. The other problem is that Intel chips throttle when one part of them gets hot, so you run the risk of it slowing down the FP arithmetic if you tried to run it flat out. Again, an FPGA is less prone to this.
 
  • Like
Likes bhobba
  • #13
bhobba said:
My reading of Intel processors is modern powerful versions can do about 100 billion instructions per second. Of course they have a number of cores.
That's pretty close. I recently bought a new computer with a Xeon Scalable Silver processor, with 10 cores, and running at 3.0 GHz.
The vmulps instruction can multiply 16 pairs of floats in half a clock (there's a latency of 4 clocks if all values are in registers). So at 3.0 GHz, my computer could theoretically do ##6.0 \times 10^9## multiplications of 16 pairs of floats, or 96 billion multiplications per second. That's in one half of one core. Delegating the work to additional threads would bump up the throughput.
Vanadium 50 said:
An AVX512 processor can do two 8-wide double precision FMA instructions per cycle. For example, an i7-7800X runs at 3.5 GHz, so you have 56 billion multiplications per second per core. You have 6 physical cores, so in principle you have the horsepower.

In practice, this would require that you can keep the cores fed
No easy task to keep the pipeline full. Loop unrolling helps, but can only go so far.
Vanadium 50 said:
, and these million multiplications involve information that is always in registers. That is almost certainly not the case, and it's also much easier to achieve with an FPGA. The other problem is that Intel chips throttle when one part of them gets hot, so you run the risk of it slowing down the FP arithmetic if you tried to run it flat out. Again, an FPGA is less prone to this.
 
  • Like
Likes bhobba
  • #14
bhobba said:
However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.

What do people here think?
There is a proverb for this kind of thing: it is about the useless but very seriously taken comparison of the size of a certain human part (often associated as inversely proportional to car size).

Just mix in some videocard stats and keep safe distance. :wink:
 
Last edited:
  • Like
Likes harborsparrow and bhobba
  • #15
Mark44 said:
No easy task to keep the pipeline full. Loop unrolling helps, but can only go so far.
It's a big problem that the filter coefficients and the window of the signal that the filter is looking at, won't fit in the second level cache, so you'll have to go to the third level cache, and there is only one of those per chip. Multiple cores on a single processor chip likely won't help at all.

Since the Filter is only be a giant convolution, using a Fast Fourier transform on block of sound, and on the filter, than multiply them in the frequency domain, and transform back, should be much faster. A problem wil be that there will be a larger delay between the input and the output, because you need to work on large blocks on samples.
 
  • #17
bhobba said:
However some have said an Intel processor should be able to do it, possibly even a PC. I find that quite hard to believe - I would think it would not have the power of the XC7A200T with its 740 cores. But the person concerned is insistent.
l

This is -as was pointed out above- comparing apples and oranges. A regular PC running a normal OS (Windows, Linux etc) is really, really bad at real-time DSP. The speed of the processor is only one problem, another issue is the time it takes to actually transfer data between the different memories; especially if the data is coming from an external source via say the PCI bus.
However, the main issue is that the time it takes to process something can vary a LOT (ms), meaning unless you are able to cache data (although I guess that should be possible in this case) it simply won't work. You can improve things by running a real-time OS but then you are already going down the "non-standard" route.

There is a good reason for why there is a thriving market for "easy" to program FPGA cards. Some of them are made to fit into a standard PC, others use PXI or other standards.

In many applications you end up using a combination of CPUs, GPUs and FPGAs. They each have their strength. For sa filter I would have thought that even a good GPU would be way faster than a state-or-the-art processor; but again only if you do not need to do it real-time (it takes a lot of time it takes to transfer data between RAM and the GPU; especially if you have to involve the CPU)
 
  • Like
Likes harborsparrow and bhobba

1. What is the difference between Xilinx XC7A200T and Intel Processor?

Both the Xilinx XC7A200T and Intel Processor are high-performance processors commonly used in a variety of applications. However, they differ in their architecture and capabilities. The Xilinx XC7A200T is a Field-Programmable Gate Array (FPGA) which can be programmed and configured to perform specific tasks, while the Intel Processor is a traditional Central Processing Unit (CPU) with a fixed set of instructions and functions. This means that the Xilinx XC7A200T can be reconfigured and optimized for different tasks, while the Intel Processor is designed for general-purpose computing.

2. Which processor is better for performance?

It is difficult to determine which processor is better for performance as it largely depends on the specific application and task at hand. However, in general, the Xilinx XC7A200T may have an advantage in tasks that require parallel processing and real-time data processing, while the Intel Processor may excel in tasks that require sequential processing and complex mathematical calculations.

3. What is the cost difference between Xilinx XC7A200T and Intel Processor?

The cost difference between the Xilinx XC7A200T and Intel Processor can vary greatly depending on the specific model and configuration. In general, FPGAs like the Xilinx XC7A200T tend to be more expensive than traditional CPUs like the Intel Processor due to their flexibility and customization options. However, the overall cost also depends on the volume and demand for the particular processor.

4. Can Xilinx XC7A200T and Intel Processor be used together?

Yes, Xilinx XC7A200T and Intel Processor can be used together in certain applications. For example, the Xilinx XC7A200T can be used as a co-processor to offload specific tasks from the Intel Processor, or they can be integrated into a heterogeneous computing system to take advantage of their respective strengths. However, this may require specialized software and programming to ensure compatibility and performance.

5. Which processor is more energy efficient?

Again, the energy efficiency of a processor depends on the specific task and usage. In general, FPGAs like the Xilinx XC7A200T tend to be more energy-efficient than traditional CPUs like the Intel Processor due to their parallel processing capabilities and ability to reconfigure and optimize their operations. However, the overall energy consumption also depends on the specific model and configuration, as well as the workload placed on the processor.

Similar threads

Replies
1
Views
587
Replies
14
Views
3K
  • Other Physics Topics
Replies
0
Views
730
  • Science Fiction and Fantasy Media
Replies
17
Views
5K
  • General Discussion
Replies
2
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
7
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
7
Views
3K
Back
Top