Drawing of a mountain sunset

Sopy's Blog

Changing the world, a line of code at a time

Internship Halfway Point: Learning to Optimize AArch64

As I reached the halfway point of my Outreachy internship, I wanted to share what I’ve been working on for the AArch64 back-end in LLVM. My first task involved optimizing two large shifts and a vector combine, which could be narrowed down into a single combine and vector shift.

The Problem

We start with the following unoptimized code, where two right shifts by 20 bits (ushr) are applied to two vectors, followed by a combine instruction (uzp1) that interleaves the least significant half of the bits from each vector into a new vector.

C
/*
        ushr    v0.4s, v0.4s, #20
        ushr    v1.4s, v1.4s, #20
        uzp1    v0.8h, v0.8h, v1.8h
*/
uint16x8_t unoptimized(uint32x4_t a, uint32x4_t b) {
    a = vshrq_n_u32(a, 20);
    b = vshrq_n_u32(b, 20);
    return vcombine_u16(vmovn_u32(a), vmovn_u32(b));
}

In this example, because the shift is greater than the output scalar bit size (20 > 16), we can skip shifting both vectors and instead directly take the most significant 16 bits, placing them into a new vector. Since the original vectors are already at the 128-bit register limit, we first need to narrow them down to 16-bit pieces. This allows us to combine them into a 16×8 vector.

In the optimized version, we take the narrow down 32 bit scalars taking the most significant 16 bits directly (uzp2) and then shift right by 4 on the combined vector (ushr).

C
/*
        uzp2    v0.8h, v0.8h, v1.8h
        ushr    v0.8h, v0.8h, #4
*/
uint16x8_t optimized(uint32x4_t a, uint32x4_t b) {
    uint16x4_t a_u16 = vshrn_n_u32(a, 16);
    uint16x4_t b_u16 = vshrn_n_u32(b, 16);
    uint16x8_t r = vcombine_u16(a_u16, b_u16);
    return vshrq_n_u16(r, 4);
}

In this version, we cut down on a shift instruction, resulting in a small speedup.

TableGen approach

At first, I thought I could handle this optimization with a few lines of LLVM TableGen code, a domain-specific language used in LLVM to define instructions, patterns, and optimizations. I believed it would be straightforward, but I soon found myself struggling for nearly a month, even with guidance from my mentor.

Challenges with TableGen

The main issue I struggled with was a fundamental misunderstanding how concat_vectors worked in TableGen. I initially thought concat_vectos was just a simple operation gluing nodes together for the purpose of the pattern, but in reality it was actually just joining multiple smaller vectors together into a single vector.

This realization happened while I was working on the C++ custom lowering version of the fix when it was easier to finish with the custom lowering than go back to TableGen.

I was stuck attempting to optimize with TableGen for around a month, so the logical conclusion was that it might not be the right tool for this particular optimization, so I shifted my focus to implementing the optimization directly in C++ with a custom lowering.

Shifting to C++

Since I switched to C++, things started to make more sense. The optimization code essentially checks if the operands of a concat_vector are both right shifts (ushr). If the shift amounts are equal and greater than the scalar size in bits of the concat_vector, it replaces the concat_vector with a uzp2 followed by a shift.

Here’s a simplified version of the code logic:

  1. Check for Right Shifts: Ensure both operands of the concat_vector are right shifts.
  2. Compare Shift Amounts: Verify that the shift amounts are the same and greater than the size of the scalar elements.
  3. Replace with Optimized Sequence: Replace the concat_vector with a uzp2 instruction followed by a single shift.

This approach was much more straightforward in C++, and I was able to implement the optimization successfully. The code is soon going to be under review, and I’m excited to see the reviews.

Closing notes

While this all takes longer than I was expecting to. I am excited to work on the AArch64 back-end. I never expected working on something so hard to wrap my head around make me feel so proud of myself and my ability to learn.

12 responses to “Internship Halfway Point: Learning to Optimize AArch64”

  1. Модные советы по выбору необычных образов на любой день.
    Обзоры профессионалов, новости, все новинки и мероприятия.
    https://mvmedia.ru/novosti/122-gde-luchshe-pokupat-originalnye-brendovye-veshchi-kak-vybrat-nadezhnye-magaziny-i-platformy/

  2. Модные заметки по выбору крутых луков на любой день.
    Заметки стилистов, события, все дропы и мероприятия.
    https://lecoupon.ru/news/2024-09-26-5-samyh-udachnyh-modeley-krossovok-dolce-gabbana/

  3. Модные заметки по подбору необычных луков на любой день.
    Обзоры экспертов, новости, все показы и шоу.
    https://lecoupon.ru/news/2029-10-02-7-prichin-lyubit-brend-herno/

  4. Модные советы по подбору крутых образов на каждый день.
    Мнения экспертов, события, все показы и мероприятия.
    https://luxe-moda.ru/chic/564-10-prichin-lyubit-brend-brunello-cucinelli/

  5. Одежду и аксессуары модного дома Balenciaga можно приобрести на нашем сайте. Широкий ассортимент товаров доступен для покупателей, желающих обновить гардероб. Ознакомьтесь с предложениями и найдите идеальный вариант уже сегодня.
    Balenciaga одежда купить

  6. На этом сайте доступны стильные сумки Боттега Венета. Здесь можно приобрести актуальные модели, которые станут изюминкой вашего гардероба. Каждая сумка отличается превосходным исполнением, что присуще бренду Боттега Венета
    https://dirstop.com/story20975030/bottega-veneta

  7. Its excellent as your other posts : D, appreciate it for putting up.

  8. Can you tell us more about this? I’d like to find out some
    additional information.!

  9. Hello! Do you know if they make any plugins
    to help with SEO? I’m trying to get my website to rank for
    some targeted keywords but I’m not seeing very good results.
    If you know of any please share. Kudos! I saw
    similar art here: Eco blankets

  10. Hey! Do you know if they make any plugins to assist with
    SEO? I’m trying to get my site to rank for some targeted keywords
    but I’m not seeing very good success. If you know of any
    please share. Appreciate it! You can read similar art here:
    Eco blankets

  11. whoah this blog is fantastic i love reading your posts. Stay up the great paintings! You already know, lots of individuals are looking around for this information, you can aid them greatly.

Leave a Reply

Your email address will not be published. Required fields are marked *