# AVX-512 Programming: Extracting Column Subtotals from a Table

• Insights
• Mark44
In summary, AVX-512 programming is a relatively simple way to extract column subtotals from a table. This is useful for calculating mortgage, homeowner insurance, utilities, and other monthly expenses. This is an appropriate use for AVX-512 because it is an efficient way to do a task that CPUs and GPUs are good at. However, because only a few chips support AVX-512, software vendors may not code for it. Additionally, because AVX-512 is an uncomfortable space between CPUs and GPUs, it may be difficult to take advantage of parallelization.
Mark44
Mentor
Greg Bernhardt submitted a new blog post

AVX-512 Programming: Extracting Column Subtotals from a Table

Continue reading the Original Blog Post.

#### Attachments

• AVX-512_Programming_subtotals.png
5 KB · Views: 1,084
Greg Bernhardt
I hope that the example I'm presenting here will be of interest to some of you. The program in the article uses a list of monthly expenses for four months in eight categories, such as mortgage, homeowner insurance, utilities, and so on. The example program can calculate the subtotals of any combination of the eight categories in a loop that has three lines of code. The heart of the loop reads all eight values for a given month in one operation, but writes only the ones of interest into a 32-byte destination register (that could hold all eight values, if necessary). The remainder of the loop adds the items of interest to an accumulator, and starts the loop body again until the data is exhausted.

Greg Bernhardt
Now that AMD has annouced that Zen4 will support AVX-512, do you think it will gain in popularity, especially for consumer chips?

One problem is that AVX-512 really shines in fused multiply-adds. It's 512 bits wide, not 64, so there's a factor of 8, and it takes half a clcok, so that's 16. But this runs into two problems - one is Amdahl's Law: if half your time is spent doing FMAs, an infinitely fast FMA buys you a factor of 2. The other is that GPUs are allready pretty good at FMAs.

Related, if most CPUs don't have this feature, vendors won't code for it.

Thanks for commenting!
Now that AMD has annouced that Zen4 will support AVX-512, do you think it will gain in popularity, especially for consumer chips?
I don't have any idea about whether AVX-512 will gain in popularity now that AMD also supports it. I've been interested in vector (SIMD - single instruction multiple data) operations at the assembly level since I first found out that Intel supported them, back nearly 20 years ago. When I learned that some Intel processors supported 512-bit instructions in their AVX-512 extensions I went out and bought a Dell computer running a Xeon Scalable processor, one of the few Intel products that had support for AVX-512. Not long after getting that computer, about 4 years ago, I wrote about a dozen example assembly programs that used a number of different AVX-512 instructions. The program described in this Insights article was one of them.

Related, if most CPUs don't have this feature, vendors won't code for it.
IIRC, when I bought the Dell computer with its Xeon Scalable, the high-end AMD processors didn't have support for AVX-512. Now that AMD, the other big player in the CPU market also has it, perhaps software vendors will be more apt to use it.

What I do know is that compiler vendors, such as Microsoft with their Visual Studio product, and likely other compilers such as gcc, generate code that takes advantage of at least some processor capabilities, such as SSE (streaming SIMD extensions) rather than the legacy 8087 floating point instructions. My current VS compiler is about 6 year old -- I keep thinking I will upgrade to a newer version, but haven't done so just yet. I haven't investigated the extent to which that compiler and newer versions take advantage of these newer processor extensions.

The AVX-512 extensions for CPUs and the capabilities of GPUs from nVidia and others are naturals for parallel processing. The hard part seems to be coming up with ways to partition a program to take advantage of parallelization, at least from my perspective.

Greg Bernhardt
Well, IMHO the problem with paralellization is that people are trying to optimize the wroing thing - i.e. CPU efficiency. In a SIMD world, you don;t try and figure out which calculation you want before you do it - you do them all and then pick the one you want. People don't learn to code that way. Based on some of the posts here,. many don't learn to code at all.

Intel told the world "Use AVX-512" and then released only a few chips that supported it - and did so via bewildering array of subsets of the instructions. I think if AMD comes out with a fairly wide-ramging standard that will be different than if they too offer it only here and there. And remember, we don't always get the chance to recompile our code - sometimes we buy it and are at the vendor's mercy.

I may want to eb more explicit - AVX-512 lies in an uncomfortable space between CPUs and GPUs. There certainly are workflows where it's better than either scalar or GPU's, but are these enough to get people to pay the price premium?

Update - the AMD chips with AVX512 accept AVX512 instructions, but execute them through two sequential 256-bit operations.

pbuk

## What is AVX-512 and why is it used in programming?

AVX-512 stands for Advanced Vector Extensions-512 and it is an instruction set architecture extension for x86 processors. It is used in programming to accelerate performance, especially for tasks that involve heavy mathematical computations.

## How do you extract column subtotals from a table using AVX-512 programming?

To extract column subtotals from a table using AVX-512 programming, you would first need to load the table data into AVX-512 registers. Then, you can use specific AVX-512 instructions, such as VPERMILPD, to perform operations on the data and extract the subtotals for each column.

## What are the benefits of using AVX-512 programming for extracting column subtotals?

The main benefits of using AVX-512 programming for extracting column subtotals are improved performance and efficiency. AVX-512 instructions allow for parallel processing, which can significantly speed up the extraction process. Additionally, AVX-512 registers have a larger capacity compared to older registers, allowing for more data to be processed at once.

## Are there any limitations or compatibility issues when using AVX-512 programming for extracting column subtotals?

Yes, there are some limitations and compatibility issues to consider when using AVX-512 programming. AVX-512 instructions are only compatible with x86 processors that support AVX-512, which may not be available on all systems. Additionally, AVX-512 instructions may not be supported by certain compilers, so it is important to ensure compatibility before using them in your code.

## What are some practical applications for extracting column subtotals using AVX-512 programming?

Extracting column subtotals from a table using AVX-512 programming can be useful in a variety of applications, such as data analysis, financial calculations, and scientific simulations. It can also be beneficial for processing large datasets in real-time, making it a valuable tool for industries such as finance, healthcare, and engineering.

• Programming and Computer Science
Replies
10
Views
2K
• Programming and Computer Science
Replies
2
Views
1K
• Programming and Computer Science
Replies
1
Views
1K
• Programming and Computer Science
Replies
2
Views
1K
• Programming and Computer Science
Replies
11
Views
2K
• Programming and Computer Science
Replies
12
Views
2K
• Programming and Computer Science
Replies
25
Views
3K
• Programming and Computer Science
Replies
2
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
• Quantum Interpretations and Foundations
Replies
343
Views
27K