Processor Question for Intense Single Core Calculations (audio signal processing)

  • #1
167
0

Main Question or Discussion Point

I have a question for which I can't seem to find a satisfactory answer.

We do a lot of intense real time processing of audio signals in my line of work. (Generally programmed in C)
I am a mechanical engineer with only a rudimentary understanding of the process that goes on in the software for the project but I can't seem to find an answer that really fits our situation.

We will often spec laptops/PCs that we will use for field work, and I will often have computers that I select for approval with newest architecture and high performance overruled with sometimes multi-year old processors that seemingly only have a higher clock rate. 3.1 GHz (2-3 years old) as opposed to 2.5 - 2.8 Ghz (current generation, new architecture).

When performing intense processing is it still purely the clock rate that matters? Why do bench marks show improved performance in the lower clock rate processors with newer architecture, but this does not seem to be important when selecting a PC for custom coded applications?

I guess my question is whether or not there is actually a need, or if the other engineers are following a rule of thumb that discounts the advantages of new architecture.

When trying to research this I found that most of the comments centered around perceived performance advantage of current generation processor on specific metrics such as file transfer or improved game performance that could seemingly be improved with the architecture, but not necessarily be attributed to actual processing being done faster.
 

Answers and Replies

  • #2
33,264
4,965
We will often spec laptops/PCs that we will use for field work, and I will often have computers that I select for approval with newest architecture and high performance overruled with sometimes multi-year old processors that seemingly only have a higher clock rate. 3.1 GHz (2-3 years old) as opposed to 2.5 - 2.8 Ghz (current generation, new architecture).
Your thread title is "Processor Question for Intense Single Core Calculations," which confused me somewhat. If you're asking about Intel or AMD processors produced in the last two or three years, I believe all of them are multicore processors, with at least two physical cores.

When performing intense processing is it still purely the clock rate that matters?
Not necessarily. The CPU designers can improve performance by reducing the number of micro-instructions for frequently used instructions, so that even at the same clock rate, a new design can outperfom a CPU with the older design. Other possibilities are the addition of better branch prediction algorithms, so that there is less chance of incorrectly predicting what the next instruction will be, so the instruction pipeline doesn't need to be flushed.
Pattonias said:
Why do bench marks show improved performance in the lower clock rate processors with newer architecture, but this does not seem to be important when selecting a PC for custom coded applications?
The benchmarks are based on commonly performed tasks, and probably don't include the kinds of tasks for signal processing that you're doing. That would be my guess.
 
  • #3
167
0
Your thread title is "Processor Question for Intense Single Core Calculations," which confused me somewhat. If you're asking about Intel or AMD processors produced in the last two or three years, I believe all of them are multicore processors, with at least two physical cores.
Sorry, I should have clarified. The code we are running is not generally designed to run on multiple cores, so the single core processing power seems to be the driving metric. In this instance a dual core laptop with higher clock rate was chosen over a newer laptop with quad core, but lower clock rate.

Your answer is very helpful. I imagine that at the very least the computer chosen can likely do the work, but it may remain something of a mystery to me until I dive into what the program is actually doing that becomes so CPU intensive. It very well could be that the new architecture does not offer any advantages to the specific process we are running.
 
  • #4
33,264
4,965
Besides my previous comments, processor designers add new features to new architectures. One feature that I'm particularly interested in is AVX-512 (Advanced Vector eXtensions, with 512-bit registers), which is currently available only on Xeon Phi and the Xeon Scalable processors (Xeon Bronze, Xeon Silver, Xeon Gold, Xeon Platinum) from Intel. AFAIK, AMD doesn't have this capability in their Zen and Ryzen processors.

Having 512-bit registers means you can load 16 floats (@ 4 bytes each) into a single register, and another 16 floats into another register, to do whatever computation you need, all in a single operation. This could be advantageous in audio signal processing.

My previous computer runs an Intel i-7 (from about 5 years ago). It supports some 256-bit registers, but only a limited number of operations can be performed on them. I just bought a new computer with a Xeon Silver processor, with support for many AVX-512 capabilities. As soon as I get it up and running, I will start writing programs to take advantage of these new capabilities.(writing x-64 assembly code).
 
  • #5
33,264
4,965
Sorry, I should have clarified. The code we are running is not generally designed to run on multiple cores, so the single core processing power seems to be the driving metric.
I thought that might be the case, based on your thread title.
Pattonias said:
In this instance a dual core laptop with higher clock rate was chosen over a newer laptop with quad core, but lower clock rate.
Have you or your team thought about modifying the code to take advantage of the multiple cores? Instead of having a single thread doing all the work, it might be advantageous to divide the work so that two or more threads are working on different, but independent, parts of the problem. I've done some exploration of this, but you need to give each thread a big chunk of stuff to work on, otherwise the overhead of starting and stopping threads can eat up the time saved by having multiple threads involved.
 
  • #6
167
0
Have you or your team thought about modifying the code to take advantage of the multiple cores? Instead of having a single thread doing all the work, it might be advantageous to divide the work so that two or more threads are working on different, but independent, parts of the problem. I've done some exploration of this, but you need to give each thread a big chunk of stuff to work on, otherwise the overhead of starting and stopping threads can eat up the time saved by having multiple threads involved.
We were actually talking about this yesterday. We have the capability of writing the code to multiple threads, but the project is in the prototype phase, so currently it is only written to a single thread. I would guess that the code could eventually be optimized should we develop a next generation of the system, but currently they are going with quick and "easy" on the software side. (I say easy in jest. These guys are very good at what they do.)

With regards to the Xeon Phi's bit register, I'll have to do more research but you have given me some really interesting starting points.
 
  • #7
808
280
It may be a silly question, but won't those extra cores off-load the PC's house-keeping from the core running your algorithm ??

This 8-core CAD-tower PC is clocked significantly slower than the single-core zoomer I built some years ago, but processes my 'work' much faster as 'house-keeping' now only uses 3~5 % of CPU cycles...
YMMV.
 
  • #8
33,264
4,965
With regards to the Xeon Phi's bit register, I'll have to do more research but you have given me some really interesting starting points.
The Xeon Phi processors are very pricey. The top of the line is model Xeon Phi 7290, with 72 cores. The clock speed is relatively low, 1.5 GHz normal or 1.7 GHz turbo. One web site offers this processor for about $6000 -- that's just the processor and nothing else.

For what I needed, I chose a Xeon Silver model 4114, with 10 cores. The site listed above offers this processor for about $1200, but I've seen it offered at $740 or thereabouts at other sites..
 
  • #9
1,582
919
When performing intense processing is it still purely the clock rate that matters?
Definitely not, but without further details it is hard to say anything for sure. There are just too many components (with many parameters) around a CPU.

When trying to research this I found that most of the comments centered around perceived performance advantage of current generation processor on specific metrics...
Actually this is, what might bring you closer to practical answers. You should just start naggig the guys at the software department to add a benchmark-mode to the software... o0)
 
  • #10
analogdesign
Science Advisor
1,139
354
Hi Pattonias,

We have a similar use case where I work in that we need to optimize our single-core performance since several of the codes we use aren't written for multi-threading and will not be upgraded to multi-threading.

What we have found is that newer architectures are optimized for things other than raw single-core performance. For instance, they can have better multi-core performance and are typically much more power efficient (watts/MIP).

Our experience is this: you have to benchmark candidate machines using your own code. The published benchmarks have not been as helpful for us as I expected. We have found our fastest machine for doing single-core processing is a 6-year machine using the Intel Xeon with the Westmere architecture. It still works best for us. We do a lot of multi-core stuff and for that we have a lot of newer machines (best overall is based on Intel Xeon with Broadwell architecture).

Still, the surprising thing is we still keep the Westmere machine around to run a specific simulator because it is the fastest thing we have for single-core operations. (Moore's Law? haha). I have no idea if this is relevant but our simulation machines run Linux.

So my advice to you is to look into a used server that is Westmere-based. It may be just the ticket (and cheap too!).
 
  • #11
FactChecker
Science Advisor
Gold Member
5,380
1,950
On-board memory access might be another speed factor to consider. More memory on each level of memory is an advantage. And you should not forget to bind your program to a core at the highest priority. If you are running Windows, you need at least one other processor core to take care of the Windows OS tasks.
 
Last edited:
  • #12
On-board memory access might be another speed factor to consider. More memory on each level of memory is an advantage. And you should not forget to bind your program to a core at the highest priority. If you are running Windows, you need at least one other processor to take care of the Windows OS tasks.
How does one tell a single core that self-written program has the highest priority? How does one tell another core to take care of Windows?
Do you have a guide or something to this?
 
  • #13
1,582
919
For windows, I believe it is called something like 'CPU affinity'.
 
  • #15
Baluncore
Science Advisor
2019 Award
7,412
2,453
Take a look for OpenCL to run with your C on your system. It may make it easy to use more of your CPU processors, or justify a cheap graphics card for the GPU. That should give you 20 times the throughput without trying.
 

Related Threads on Processor Question for Intense Single Core Calculations (audio signal processing)

  • Last Post
Replies
6
Views
861
  • Last Post
Replies
4
Views
773
  • Last Post
Replies
15
Views
3K
  • Last Post
Replies
8
Views
51K
Replies
11
Views
7K
  • Last Post
Replies
16
Views
1K
Replies
3
Views
988
Replies
3
Views
481
Top