Dmitri Makarov

PhD student
Università della Svizzera italiana
Faculty of Informatics
Via G. Buffi 13
6904 Lugano
Switzerland

Email
dmitri.makarov@usi.ch

Phone
Office +41 58 666 4455 ext. 2110
Fax +41 58 666 4536

Profiles
conf.researchr.org
Google Scholar

Branch Optimizations in GCC

INTRODUCTION

The goal of the project is to implement a branch optimization pass in GCC. The optimizations are mainly oriented to systems based on the PowerPC 604. The transformations to be implemented are

  • gluing
  • down code motion
  • branch swapping

These transformations were designed to eliminate delays caused by branches in systems that do not have dynamic branch prediction (RS6000, POWER2). In such cases, measurements have shown noticeable speedup. In the case of the PowerPC 604, which has dynamic branch prediction, only the first two transformations are useful. By moving a few non-branch instructions between two close branches, we can better exploit the superscalar features of the CPU. We simply increase the distance between any two branches. Experiments (on contrived examples) verify the usefulness of these techniques for 604-based systems. Each of the transformations is applicable to a pair of branch instructions close to each other.

GLUING

The gluing transformation is applicable to the case where a conditional branch is close to an unconditional one (a BC-B pair). Several instructions from the target of the unconditional branch are copied between the two branches. This transformation is illustrated in Figure 1

     BEFORE:                  AFTER:

     BC CR, LABEL             BC CR, LABEL
     B OUT                    INSN1
     ........                 INSN2
                              B NEW_OUT
                              ........
     ........                 ........
OUT:                     OUT:
     INSN1                    INSN1
     INSN2                    INSN2
     ........            NEW_OUT:
     ........                 ........

The number of instructions to be copied depends on the number of instructions already existing between the two branches. The optimal number of instructions is determined by the tuning. The transformation should be performed carefully so as not to create a new pair of close branches. A sufficient number of instructions should be left between the target of the unconditional branch and the first branch following it.

DOWN CODE MOTION

This transformation is applicable when two conditional branches are in close proximity. It is illustrated below for the case where the first conditional branch is a regular conditional branch (BC) and the second is a branch on count register (BCT). The code may be improved by moving a few of the instructions preceding the first conditional branch to a location between the conditional branches (i.e. moving them down). The complete transformation is illustrated in Figure 2

     BEFORE:                  AFTER:

LOOP:                    LOOP:
     .......                  .......
     .......                  .......
     INSN1                    BC CR, LABEL1
     INSN2                    INSN1
     BC CR, LABEL             INSN2
     BCT LOOP                 BCT LOOP
     .......                  .......
     .......                  .......
                              B LABEL
                         LABEL1:
                              INSN1
     .......                  INSN2
LABEL:                   LABEL:

An instruction may be moved down only if there is no chain of data dependencies between it and the first conditional branch of the pair. It should be noted that in the context of GCC, it is easy to determine which instructions are suitable for such motions as the backward data dependencies built during the second pass of scheduling are still available. It is natural to require the second conditional branch to be a BCT. This pair of branches usually occurs at the end of a loop. As a consequence, the code in the most frequent path is improved. The transformation works even if the BCT instruction is replaced by an ordinary conditional branch. Again, the optimal number of instructions to be moved between the two branches is determined by tuning.

BRANCH SWAPPING

Although it seems that branch swapping yields no results for systems based on the PowerPC 604, it still useful for POWER and POWER2 systems. This transformation is applied to BC-BCT pairs. As in the down code motion transformation, requiring the last of the two branches to be a BCT, usually results in such a pair occurring at the end of a loop. Consequently the code on the most frequent path is improved. The transformation is illustrated in Figure 3

     BEFORE:                  AFTER:

                              B LOOP1
LOOP:                    LOOP:
     .........                BC CR, LABEL
                         LOOP1:
     .........                .........
     BC CR, LABEL             BCT LOOP
     BCT LOOP                 BC CR, LABEL