COMP207P Compilers Guidelines Part 2: Java Bytecode Optimisation

Abstract

As described in the previous article, during my time in UCL I had to work on several pieces of coursework related to the development of compilers. In this article I'll be expressing some thoughts and suggestions about the second coursework, the one concerned with altering Java bytecode to implement things like constant folding and constant propagation.

In contrast to the previous article, this one will be relatively short and won't have as many instructions. Instead, it will provide some food for thought that will hopefully help you wrap your head around this coursework. I'll talk a bit about Java bytecode and then will move on to Byte Code Engineering Library (BCEL).

FYI, I got Federica Sarro's permission to share this article. She asked me to remind everyone that these are just my own thoughts and this article was not reviewed by any lecturers.

Java bytecode

Understanding how Java bytecode is interpreted by the Java Virtual Machine (JVM) is crucial for this coursework. A good place to start would be Mastering Java Bytecode at the Core of the JVM by Anton Ahripov - it's quite a lengthy guide but it covers pretty much everything you'd want to know about bytecode at this stage. That said, you most likely won't need to read the whole thing, just the first couple of pages to get a basic idea of what you're working with.

Once you understand what Java bytecode is, I'd suggest you begin playing with the actual code provided as a part of the coursework. The process for preparing the project is extremely straightforward - Just install Ant, unzip the coursework archive somewhere and run ant test in the project root. You might notice that all tests will pass although you haven't done anything - that is because the tests only confirm that the alterations you made to the bytecode do not change the behaviour of the code but don't actually test how efficiently you've optimised the code.

Quick tip: IDEs like IntelliJ Idea usually have built-in features to decompile bytecode, which might be pretty useful when it comes to not only understanding how bytecode works but also confirming that your optimisations worked as intended.

To see what bytecode is produced from the example code in the project, you can add the snippet below to your ConstantFolder.java. Keep in mind that this is just a quick way to peek at the bytecode, I'd suggest coming up with a different abstraction for the actual implementation. In fact, the implementation I used differed significantly from what you see below.

public void optimize() {  
    ClassGen classGen = new ClassGen(original);
    ConstantPoolGen constPoolGen = classGen.getConstantPool();
    Method[] methods = classGen.getMethods();
    for (Method method : methods) {
        MethodGen methodGen = new MethodGen(method, classGen.getClassName(), constPoolGen);
        System.out.println(classGen.getClassName() + " > " + method.getName());
        System.out.println(methodGen.getInstructionList());
    }
    this.optimized = gen.getJavaClass();
}

The output this code will give you can be seen below. On the left you'll see decompiled bytecode of one of the dummy test classes and on the right is the list of instructions that BCEL has extracted for you.

Decompiled Java bytecode on the left, instruction list outputted by BCEL on the right

I found Java bytecode instruction listings on Wikipedia extremely useful when trying to find how bytecode relates to the actual Java code. Once you get to a level where you're confident with logic that bytecode instructions follow, you shouldn't have any issues coming up with strategies for different optimisations.

For this coursework, you'll mostly be working ConstantPushInstruction, StoreInstruction, LoadInstruction, ArithmeticInstruction, GotoInstruction and the children of these classes. Sadly this is as far as I can go for Java bytecode without giving too much info away. In the next section I'll talk a bit about using BCEL.

BCEL

First of all, obligatory link to BCEL API manual. Now BCEL is reasonably flexible and there are multiple ways in which you could design your solution. I'm going to talk a bit about the abstraction that worked reasonably well for me.

If you found this post useful, feel free to like and share:

Comments