Jikes RVM hacker's guide

Table of Contents

WARNING: The information in this guide may be outdated. I haven't touched the RVM code since August 2014. There were a lot of commits in the RVM code base, so that it doesn't even build on Mac OS X anymore (they migrated to Classpath 0.99, and it fails to build on OS X). I was writing this guide while working on MVVM (now abandoned) to help myself keep track of the Jikes RVM internals. If I ever work on the Jikes RVM again, I'll update the guide then.

These notes are intended to help a developer who starts hacking the Jikes RVM. They cannot be used as a substitute for reading the Jikes RVM User's Guide, rather they are intended to make it easier to navigate the vast space of the Jikes RVM source code.

The Jikes RVM is a metacircular VM, implemented in the same language for which it acts as a runtime environment. In case of the Jikes RVM, it is a managed runtime for Java applications and also itself is written in Java. Although RVM is a Java application, it does not run on an external JVM, rather it is bootstrapped by another program, the boot image runner, that loads the Jikes RVM image files and transfers control to the native VM code that executes on the host processor.

Metacircular.png

Figure 1: Stages of building the Jikes RVM

1 Working with Jikes RVM

This section contains helpful hints about working in Jikes RVM environment.

1.1 Building Jikes RVM

Jikes RVM build is a complicated system, but for most day-to-day tasks it's sufficient to know only a handful of build properties and a few targets to build the VM. In addition, Jikes RVM provides a shell script buildit to perform most of the build related operations.

Here's an example of a command to build the Jikes RVM on Mac OS X and place the generated files in a directory structure separate from the Jikes RVM source tree

ant -Dhost.name=x86_64-osx -Dtarget.name=x86_64-osx               \
    -Dconfig.name=development                                     \
    -Ddist.dir=/Users/makarovd/rvm/dist                           \
    -Dbuild.dir=/Users/makarovd/rvm/target                        \
    -Dcomponents.dir=/Users/makarovd/rvm/components               \
    -Dcomponents.cache.dir=/Users/makarovd/rvm/cache -e

The directory ${components.cache.dir} is where the external packages such as GNU Classpath are downloaded and ${components.dir} is where the packages are being built. Both directories are not cleaned when any of Jikes RVM build clean targets are being used. Downloading and building the external packages need to be done only once and afterwards the cache and components directories should remain intact.

On Mac OS X the VM executable and the supporting shared object libraries are built as 32-bit code, even when x86_64-osx target is used. At least one of the reasons is that GNU Classpath cannot currently be built as 64-bit library on Mac OS X. Another one is that Jikes RVM compilers cannot generate 64-bit machine code for X8664.

More than one build of the Jikes RVM can co-exist under the same directory structure. The builds differ by the configuration and the target names. If the VM builds successfully, the dist/${config.name}_${target.name} directory should contain the wrapper shell script rvm, the boot image runner executable JikesRVM, many native libraries packaged as shared-object libraries with the file extension jnilib on Mac OS X, the three image files, a text output of the boot image writer messages BootImageOutput.txt, a text dump of the JTOC in RVM.map, the class library jar rvmrt.jar

Sometimes the build can fail for no obvious reason. The most fragile part of the build system seems to be the GNU Classpath library. For example, setting GREP_OPTIONS environment variable to something that changes the normal output of grep command (e.g. --color=always) can break the build of GNU Classpath.

1.2 Running Jikes RVM

Jikes RVM is started by a shell script rvm located in ${dist.dir}/${config.name}_${target.name}/ where config.name is the name of the configuration used to build the RVM and target.name is the platform it was build for. However currently, host and target must be the same. The script is responsible for setting up environment and executing the boot image runner (JikesRVM) with the correct command line arguments. The environment includes setting of paths for the dynamic linker to search for shared object libraries needed for the VM to operate.

1.3 Debugging Jikes RVM

Debugging Jikes RVM is a tedious process and without convenient user-interface to GDB it becomes unbearable. Emacs provides a special mode for running command-line GDB from within Emacs. XCode includes command-line GDB version 6.3.50, but it does not work with Emacs. Luckily, it's easy to install the latest version of GDB using Homebrew package manager. Homebrew maintains several formula repositories. Only one of those is cloned by the default installation of Homebrew. In order to use a formula from other repositories, one needs to tap the repository. Concretely, to install GDB (version 7.4 as of this writing) it is necessary to tap homebrew/dupes repository

$ brew tap homebrew/dupes

Now, GDB can be installed

$ brew install gdb

Mach kernel prevents unsigned programs from getting control of other processes. This page contains instructions for creating a certificate and signing GDB with it. Follow the instructions precisely and sign /usr/local/Cellar/gdb/7.4/bin/gdb. Once the GDB executable is signed, change the permissions of the executable

$ sudo chgrp procmod /usr/local/Cellar/gdb/7.4/bin/gdb
$ sudo chmod g+s /usr/local/Cellar/gdb/7.4/bin/gdb

The shell wrapper script rvm that runs Jikes RVM recognizes the command line parameter -gdb to start the debugger and run the VM from it. In order to run GDB from Emacs it's necessary to start GDB in MI mode. The rvm is located in RVM_BUILD_ROOT/dist/development_x86_64-osx/rvm. First, modify the last line of rvm script to include -i=mi parameter, similar to this

gdb -i=mi "${gdb_args[@]}" "$RVM_HOME/JikesRVM" "--command=$TMP_GDB_COMMANDS"

Now, start the gud session in Emacs M-x gdb and specify how to run GDB

RVM_BUILD_ROOT/dist/development_x86_64-osx/rvm -gdb -classpath whatever_you_need YourApp [app args if any]

1.4 Options to trigger internal dumps

In addition to the usual JVM options, Jikes RVM defines many options specific to its subsystems. Here are some of the options especially useful for debugging of the VM

  • -X:vm:verboseTraceClassLoading=true dump (begin) (end) sequences for class loading, resolving, instantiating and initialization.
  • -X:base:mc=true and -X:opt:mc=true dump the machine code of every method as it is being compiled.
  • -X:base:method_to_print=string and -X:opt:method_to_print=string dump the machine code only for methods with names containing the string.

1.5 Adding RVM command line options

To define a new set of -X:subsystem: options

  • The following three files need to be created
    • XxxOptions.template: mostly Java code that specifies how to print the help message for the subsystem specific options and how the options can be processed. For the most common case this file can include MasterOptions.template, which is used by both compilers options.
    • BooleanOptions.Xxx.dat: declarations of switch on/off options.
    • ValueOptions.Xxx.dat: declarations of value passing options.
  • The bootImageRunner must recognize the new options. For that a new index need to be created in tools/bootImageRunner/cmdLine.h and used in tools/bootImageRunner/RunBootImage.C.
  • A static object of type XxxOptions must be instantiated at some time before options are parsed by the VM in VM.finishBooting().
  • Two new cases need must be added to the switch statement in CommandLineArgs.earlyProcessCommandLineArguments
  • In build.xml for gen-options target a new <GenerateFromTemplate> command must be added.

1.6 Modifying classes loaded by the VM

Callbacks declares several interfaces that can be used to define observers for certain VM events. For example once a new class is loaded, an observer implementing ClassLoadedMonitor interface will be notified and passed RVMClass object created for the loaded class. Similarly, class resolution, initialization, instantiation and method compilation events can be monitored.

The class VM is the most important primordial class. It boots the VM and creates the main application thread. The method finishBoot() initializes various VM subsystems such as JIT compilers. When the VM is fully booted it can execute arbitrary Java code. This is a good place to add a custom byte-code transformer. The transformer is added by passing the transformer object as argument to Callbacks.addMethodCompileMonitor() method. The transformer class must implement Callbacks.MethodCompileMonitor interface and in particular the method notifyMethodCompile() which is passed an RVMMethod object representation of the method to be compiled and a compiler invoked to compile the method. The compiler is an int constant defined in compilers.common.CompiledMethod class. Currently there are 3 compilers – baseline, opt and JNI compiler.

A new RVMClass object is created when a class 'file' is being loaded. Loading of a class file is started by defineClassInternal method of RVMClassLoader. The newly created RVMClass object is associated with the corresponding TypeReference object. The method defineClassInternal is invoked from BootstarpClassLoader's method findClass and saved in the hash map loaded, which is a private field in BootstrapClassLoader object. Generated class doesn't need to be in the loaded map. It's necessary to create a corresponding TypeReference object, which will be saved in a dictionary of TypeReference's. To create a class named "klass" use the factory method TypeReference.findOrCreate( String ).

1.7 Printing debug output

Some classes and methods in RVM are @Uninterruptible. For printing from @Uninterruptible methods VM.sysWrite can be used, but no String operations are allowed. For example to print some similar to System.out.println("value " + value); use VM.sysWriteln("value", value); VM.sysWrite* overloaded with many possible argument combinations. The important thing is not to use + operation on String objects.

1.8 Troubleshooting

The output of VM.sysWrite is buffered when the boot image is being written. If there's too much output you will run out of memory. To prevent this, protect the debugging output statements with if(!VM.writingBootImage) condition.

2 The RVM image files

BootImageWriter application creates the VM image consisting of three binary files. When the systems is started by the boot image runner, a C program compiled to an executable file JikesRVM, it maps the files at predefined addresses in the memory. BootImageWriter takes the addresses as the command-line arguments, as well as the names of the image files, the name of the file containing the list of primordial classes, the name of the file to output JTOC in textual form and the file name to output all the log message BootImageWriter may have. The last two files are not needed for the RVM to run, but they are useful for debugging purposes.

This is a possible command line for running BootImageRunner

`/usr/libexec/java_home -v 1.6`/bin/java                                                                             \
-Xbootclasspath/a:target/mvvm_x86_64-osx/jksvm.jar:target/mvvm_x86_64-osx/rvmrt.jar                                  \
-Xmx500M                                                                                                             \
-Dmmtk.hostjvm=mm.mmtk.Factory                                                                                       \
-Dmmtk.properties=mvvm/build/mmtk/default.properties                                                                 \
-Drvm.properties=target/mvvm_x86_64-osx/rvm.properties                                                               \
-classpath target/mvvm_x86_64-osx/bootimage-writer:target/mvvm_x86_64-osx/jksvm.jar:target/mvvm_x86_64-osx/rvmrt.jar \
'tools.bootImageWriter.BootImageWriter'                                                                              \
-log target/mvvm_x86_64-osx/BootImageWriterOutput.txt                                                                \
-classpath target/mvvm_x86_64-osx/jksvm.jar:target/mvvm_x86_64-osx/rvmrt.jar                                         \
-n target/mvvm_x86_64-osx/Primordials.txt                                                                            \
-oc target/mvvm_x86_64-osx/RVM.code.image                                                                            \
-od target/mvvm_x86_64-osx/RVM.data.image                                                                            \
-or target/mvvm_x86_64-osx/RVM.rmap.image                                                                            \
-demographics                                                                                                        \
-m target/mvvm_x86_64-osx/RVM.map                                                                                    \
-littleEndian                                                                                                        \
-da 0x31000000                                                                                                       \
-ca 0x35000000                                                                                                       \
-ra 0x38000000                                                                                                       \
-numThreads=1                                                                                                        \
-classlib Classpath

Once the boot image runner loaded the image files, the layout of the RVM in memory is similar to the diagram below

RVM Image.png

Figure 2: RVM memory image layout

The lower memory addresses are occupied by the sections of the boot image runner. The image files are mapped to the addresses 0x31000000 for the data, 0x35000000 for the code and 0x38000000 for the root map. The first addresses of the data are allocated for the Boot record. The Boot record is a normal RVM object, therefore its two words are the header of the object consisting of the TIB reference and the status word. Some of the Boot record's fields are initialized to valid values when the boot image is created, while other ones need to be initialized by the boot image runner after it loaded the data image to the memory. In the diagram the latter fields are in gray color.

The C declaration of BootRecord structure is generated by tools.header_gen.GenerateInterfaceDeclarations and saved in InterfaceDeclarations.h header file. The definition of the corresponding class is runtime.BootRecord. Most of the fields in BootRecord are addresses of some sort and therefore unboxed types. The only reference is heapRanges field which points to an array of address pairs, one for each range, defining the start and end addresses of a range. The fields of BootRecord are laid in the order they appear in the class definition, except heapRanges is the very first field. This is because the current field layout scheme places the reference fields at the beginning of an object containing the fields. The field lay-out scheme is defined in objectmodel.FieldLayout.

The field tocRegister of the BootRecord contains the address of the zeroth element of the global table of contents (JTOC). JTOC contains literals, values of static fields of primitive types, addresses of static methods, addresses of literal Strings and addresses of objects referred by static reference fields. JTOC is special because its zeroth element is in the middle of the table. The lower half indexed by negative numbers contains primitive literals and static fields of primitive types. The values of the long type occupy two words with the low part being closer to the middle of the table and the high part being further. As currently implemented JTOC has fixed size. Jikes RVM can run out slots in JTOC for statics, although it's unlikely for a reasonable application to declare either 128K static numeric values or 128K static references. The class that implements JTOC is runtime.Statics. However, the table is not stored as an object of the Statics class, but rather it is constructed during the boot image writing phase and its elements laid-out directly in memory. In order to find a slot X of JTOC in the data image file extract the value of tocRegister field of the Boot record, subtract the value of bootImageDataStart field of the Boot record. This is the offset from the beginning of the RVM.data.image file where the 0-th element of JTOC is located. Multiply the index X by 4 and add it to the 0-th element offset. The last value is the offset of the JTOC element X from the beginning of the RVM.data.image file. The boot image writer conveniently dumps the JTOC in human readable format in the file RVM.map in dist/${config.name}_${target.name} directory.

JTOC.png

Figure 3: Jikes RVM JTOC

Everything in the memory of the Jikes RVM is an object, with one or two exceptions (JTOC being one). The Jikes RVM object model is defined in the classes of the objectmodel package. There are two types of objects from the point of view of the object representation: scalar and array objects. The difference between the two representations is that array object's header contains the length of the array. The object's header comprises three parts: GCHeader, MiscHeader, and JavaHeader. The latter two headers are defined as classes objectmodel.MiscHeader and objectmodel.JavaHeader. In the diagram displaying the object layouts, there is only JavaHeader present. Its first word is the address of the TIB (Type Information Block) for the object's class.

Object model.png

Figure 4: Jikes RVM Object model

The RVM.code.image is a collection of CodeArray objects. Every CodeArray object is actually a byte array of machine code for a compiled method. The third image file generated by the boot image write is RVM.rmap.image. This is a map of the root references for the data image. Each element of the map is a location relative to the beginning of the data image such that it contains a reference to an object. The elements of the table are compressed. There are three kinds of data in the table. The 4 byte long encoding is a direct representation of an offset in little-endian ordering. This encoding is characterized by setting the least significant bit of the least significant byte (byte with the lowest address) to 1. The next kind of encoding is a run of consecutive offsets encoded as one byte number of such offsets. This means that if the last decoded offset was X, and the current entry is a run with the value N, then the next N offsets are X + 4, X + 8, …, X + 4 * N. This encoding is specified by setting the second least significant bit of the previous entry to 1. For example if the previous entry was the direct 4-byte encoded entry, the second bit of its least significant byte will be set to 1. The last kind of encoding is 1-byte delta from the previous offset. If the last decoded offset was X and the value of the delta is Y, then the offset represented by this entry is X + Y & 0xFC. This is the default encoding. The second least significant bit of the delta byte can be set to 1 if the next entry is a run encoding.

This diagram illustrates the three kinds of encoding the RMAP elements.

RMAP.png

Figure 5: Root map encoding

Using the JTOC and the Root map in the image files one can browse the entire Jikes RVM in the state as it would be immediately after bootstrapping. A reference to an object X allows to find the object's TIB. The TIB contains the address of the RVMClass object that represents the class of the original object X. Knowing the fields of RVMClass it is easy to fully describe the object X, all its fields with their descriptors, as well as the methods. The following diagram illustrates the relationships between the various entities of the Jikes RVM image files.

DATA.png

Figure 6: Connections between items in the Jikes RVM data segment

2.1 Boot image writer application

Boot image writer is a stand-alone Java application. The Jikes RVM build process runs BootImageWriter.main on the host VM (e.g. HotSpot JVM). The application loads a list of names of the boot image classes from the file specified with -n command line option of the program. The list is passed as an argument to createBootImageObjects method. This methods invokes TypeReference.findOrCreate static method to load the class file containing the bytecode for each class in the list of the boot image classes. Each class is then resolved.

To resolve a class, the boot image writer used a backdoor method loadVMClass of classloader.BootstrapClassLoader. It defines the class in terms of Jikes RVM. The corresponding class file is located, and opened as a stream using getResourceAsStream method. The stream is passed to RVMClassLoader.defineClassInternal which creates the necessary data structures and returns an object of RVMType created for the class. All of the work is done in ClassFileReader.readClass.

3 Booting process

Boot image runner is a C++ program. Its main() functions ends with a call to createVM() function which is platform specific. The functions is defined separately for IA32 and PPC architectures in libvm.c. In particular, it sets up the initial stack frame and loads a function pointer to pass the control to. The function pointer is located in the ipRegister field of the bootRecord data structure. The data structure is initialized in main() method of BootImageWriter class. Concretely, ipRegister is set to a value returned from Entrypoints.bootMethod.getCurrentEntryCodeArray() which is the address of VM.boot() method.

The JikesRVM executable consists of 4 major parts

  • a normal executable file JikesRVM that processes command line options, loads the code and data image files and passes control to the main thread of the VM.
  • RVM.code.image contains machine code of each method necessary to start the VM and have the application classes loaded, compiled and run
  • RVM.data.image a memory dump of the VM heap as it would be for a running VM without any application classes loaded.
  • RVM.rmap.image root references map.

The target addresses for mmaping of image files are found in build/targets/* properties files, passed as parameters to the bootImageWriter and built into bootImageRunner executable. The latter is coming from a generated header file InterfaceDeclarations.h which is created by the tools.header_gen.GenerateInterfaceDeclarations application.

The stack size for the boot thread is specified in ia32.StackframeLayoutConstants.java and is currently set to STACK_SIZE_GUARD + STACK_SIZE_GCDISABLED + 30 * 1024 which equals to 64K + 4K (or 8K in 64-bit build) + 30K = 98K or 100352 bytes. The stack is allocated as a byte[] array.

TIB tables are not of equal size. Their minimal size appears to be 8, which means a JavaHeader header of two words and no methods or other headers. TIB pointer in each object's header is pointing to the array backing this TIB. TIB Class itself is an object in memory and as such it contains a pointer to its TIB in the header, which is a TIB for RVMClass. TIB data size depends on the number of virtual methods, so there can be as many different TIB table as there are classes loaded in the VM. The size of TIB also depends on the number of specialized methods. These are used by memory manager and their number depends on the GC being used. MemoryManager.numSpecializedMethods() returns the number of specialized methods slots needed to be allocated in each TIB table. All other slots in the table are fixed and described in objectmodel.TIBLayoutConstants.

4 Loading class files

Jikes RVM includes specific implementations of java.lang.Object, java.lang.Class etc that interact properly with classloader.RVMType. When application class is loaded and RVMClass object is created for it, the constructor of RVMClass superclass RVMType also creates an object of Class that matches the RVMClass. The Class object is accessible via classForType private field of RVMType and public methods of RVMType that return classForType value. The corresponding java.lang.Class object contains a reference to the matching RVMType object, which is actually RVMClass object, so many methods of java.lang.Class can extract the necessary information from the RVMType object.

5 Object layout

org.jikesrvm.objectmodel.ObjectModel is the main class responsible for the layout of objects in memory.

6 Baseline compiler

This section covers the creation of bytecode maps, reference maps and code generation by the baseline compiler. The dynamic linking is briefly described in the next section to provide context for understanding the purpose of the various tables the compiler creates.

The method compilers.baseline.BaselineCompiler.compile() is the top-level driver for baseline compilation. The compilation process consists of 5 phases.

6.1 Phase 1: Reference maps

In this phase RVM creates reference maps and computes stack heights for the compiled method. A reference map specifies for a bytecode instruction which slots of the operand stack, as it exists before executing the instruction, contain references. Reference maps represented by two tables; a byte array referenceMaps and an int array MCSites. The elements of the first array are bit-vectors; each bit-vector is a map for an instruction at particular index in the method's code array. MCSites elements are offsets in the machine code array corresponding to the reference maps with the same indexes. Reference maps are built by traversing the CFG of basic blocks of a method. Consequently, the entries in referenceMaps are not ordered by the bytecode index they correspond to. Instead MCSites elements are used to identify which reference map should be used for a particular GC site.

The basic block boundaries are computed in compilers.baseline.BuildBB.determineTheBasicBlocks. This methods creates an array of BasicBlock elements and a mapping from a bytecode index to the basic block number the bytecode instruction belongs to (byteToBlockMap). Also, it computes the number of GC points in the method being compiled. In Jikes RVM garbage collection can interrupt an active thread only at GC points. These points are call sites, new instructions, branch instructions or instructions that can trigger null pointer exceptions such as getfield or putfield. There is exactly one reference map for each GC point.

The main method that computes the reference maps is compilers.baseline.BuildReferenceMaps.buildReferenceMaps. It loops over the basic blocks until every block has been processed. For each basic block it iterates over the bytecode instructions of the block. It is known for every bytecode instruction how it modifies the operand stack. Maintaining the abstraction of the operand stack as each instruction is being visited, the RVM computes the reference maps for and records the maps for every GC point it encounters. The reference maps encoded as bit vectors in an array of byte=s. The size of the arrays is determined from two values, the number of GC points, computed in =determineTheBasicBlocks and the maximum number of bits per map, the sum of the number of local variables and the maximum operand stack size for the compiled method. These values are available in the class file containing the class declaring the method. The right-most bit of the each reference map is reserved for the JSR bit, indicating whether the reference map is for a JSR or not.

Classes involved ReferenceMaps, BuildBB, BuildReferenceMaps (modified for MVVM: BuildBB, BuildReferenceMaps).

6.2 Phase 2: OSR setup

OSR setup: recompute stack heights. Not important for our purposes.

6.3 Phase 3: Code generation

During the phase 3 uncompressed bytecode maps are built. Element i of an int array bytecodeMap, declared in TemplateCompilerFramework, is the machine code offset corresponding to the first native code instruction generated for a JVM instruction located at bytecode index i. If a JVM instruction occupies more than a single byte, the elements of bytecodeMap corresponding to the locations of the JVM instruction operands contain 0s.

The main purpose of the phase 3 is to translate JVM instructions to the native machine code. This is done in TemplateCompilerFramework.genCode method. The method iterates over the bytecode instructions and generates assembly code using emit_* methods overridden by the platform specific subclass of TemplateCompilerFramework. For Intel architecture the subclass is compilers.baseline.ia32.BaselineCompilerImpl. The emit_* methods rely on the values computed in stackHeights array for generating correct offsets from the stack pointer when instructions need to access values on the compiled method's operand stack.

Classes involved TemplateCompilerFramework, MachineCode, CodeArray.

6.4 Phase 4: OSR part 2

Adjust bytecode map and restore the original bytecode for building reference maps later. Not important for MVVM.

6.5 Phase 5: Maps encoding

In this phase bytecode maps are packed (method compilers.baseline.BaselineCompiledMethod.encodeMappingInfo) and MCSites elements are updated to the actual machine code offsets (method ReferenceMaps.translateByte2Machine).

Jikes RVM uses a simple compression algorithm to pack the sparse bytecode maps. Each map is stored as a byte array. Every map entry contains two pieces of information: an increment in bytes from the previous JVM instruction index and an increment in bytes from the previous machine code offset. The map entries are encoded as either 5 or 1 byte. In 1 byte encoding the 3 most significant bits of the byte contain the JVM instruction index increment and the 5 least significant bits contain the machine code offset increment. If the first component is larger than 6 or the second is larger than 31 the 5 byte encoding is used. The value 255 is reserved to designate the beginning of 5 byte encoding sequence (this is why 7 and 31 pair cannot be encoded in a single byte). The next 2 bytes are hi and lo parts of the JVM instruction index increment and the last two bytes are the machine code offset increment.

Finally, ReferenceMaps.translateByte2Machine method rewrites the elements of MCSites array to contain machine code offsets instead of JVM instruction offset for every GC point.

6.6 Example

This is an example of the code and maps the Baseline compiler produces for a simple Java method compute(). The JVM instructions are interleaved with the machine code generated for them. The numbers in square brackets are bytecode indexes of the first byte of each instruction. The 6 digit numbers in the left column are the machine code offset for each machine code instruction. The red ones designate the GC points for this method. Both unencoded and encoded bytecode maps are shown. For encoded map we show the binary encoding next to the hexadecimal value stored in the byte array. Also the reference maps with the corresponding MCSites are presented. Reference maps are stored as byte arrays separately from MCSites which stored as int arrays. For reference maps we show binary representation of every byte element. The map elements each can occupy more than one byte.

Bytecode Map.png

Figure 7: Machine code and reference maps generated by the baseline compiler

7 Dynamic Linker

If a method invokes another method for the first time the called method has to be compiled dynamically. Also the class that defines the method may have to be loaded, resolved, instantiated and initialized. These activities are triggered by the Dynamic Linker.

When Jikes RVM loads, resolves and instantiates a class it builds data structures necessary for creation of objects of this class as well as tables that are used to find addresses of methods of the class. Addresses of virtual methods stored in a table called TIB (Type Information Block) and addresses of static methods are stored in another table JTOC (table of contents). These tables are initialized and updated in the process of class instantiating. Initially, the slots for method entry points contain the address of runtime.DynamicLinker.lazyMethodInvoker

The first time a method is invoked the control is passed to lazyMethodInvoker. Its purpose is to resolve the dynamic invocation, compile the invoked method, update the corresponding TIB or JTOC entry and pass the control to the newly compiled method.

To resolve the dynamic invocation RVM finds the return address and the frame of the method that invoked lazyMethodInvoker. Given its frame pointer RVM obtains the calling method ID. It is always located in the first slot of the method's frame. The ID is an ordinal number assigned to a method as it was compiled. CompiledMethods class keeps a static table of all compiled methods indexed by their IDs. The caller frame also contains the return address. RVM uses the return address to compute the offset of the call instruction from the beginning of the method's machine code. The offset is reverse mapped to the bytecode index of the corresponding JVM invoke* instruction. This reverse mapping is computed in compilers.baseline.BaselineCompiledMethod.getDynamicLink and findBytecodeIndexForInstruction methods. The latter method iterates over the entries of the bytecodeMap table build by the baseline compiler. Once it finds an entry with machine code offset greater than the offset of the call instruction it stops and returns the previous bytecode index. Given the bytecode index RVM decodes the invoke* instructions located at that index in the bytecode for the method and extracts a constant pool index from the instruction. Next, it uses the index to obtain a constant pool entry containing the reference of the method invoked by the bytecode instruction. Finally, the reference is used to obtain an RVMMethod object matching the reference and start compilation of the method. As soon as the compilation returns, DynamicLinker updates the declaring class TIB entry for the method if it's a virtual or special method, or the JTOC if it's a static method.

At this point an entry in TIB or JTOC for the method contains the machine code address of the method entry point and all consequent invocations of the method will jump directly to its fist instruction without passing through lazyMethodInvoker indirection.

Clearly, dynamic linking mechanism depends heavily on the rigid structure imposed by using the class file constant pool to find the invoked method in order to compile it before its first time invocation.

7.1 Example

Dynamic Linker.png

Figure 8: Dynamic linking

8 Optimizing compiler

Here's the master plan of the optimization planner

Compilation plan
  1. Optimization plan
    • Convert Bytecodes to HIR
      • Generate HIR [ ConvertBCtoHIR ]
      • AdjustBytecodeIndexes [ AdjustBCIndexes ]
      • OsrPointConstructor
      • Branch Optimizations [ BranchOptimizationDriver ]
      • HIR Verification [ OptimizationPlanner.BC2HIR ]
      • Adjust Branch Probabilities [ AdjustBranchProbabilities ]
      • IRPrinter: Initial HIR[ IRPrinter ]
    • CFG Transformations
      • Tail Recursion Elimination
      • Basic Block Frequency Estimation
      • Build LST [ BuildLST ]
      • Estimate Block Frequencies
      • Static Splitting
      • Loop Normalization [ CFGTransformations ]
      • Loop Unrolling
      • Branch Optimizations [ BranchOptimizationDriver ]
    • CFG Structural Analysis
      • Build LST [ BuildLST ]
      • Yield Point Insertion
      • Estimate Block Frequencies
    • Simple Opts
    • Escape Transformations
    • Branch Optimizations [ BranchOptimizationDriver ]
    • Local CopyProp
    • Local ConstantProp
    • Local CSE
    • Field Analysis
    • Convert HIR to LIR
      • IRPrinter: Final HIR[ IRPrinter ]
      • Expand Runtime Services
      • Branch Optimizations [ BranchOptimizationDriver ]
      • Local Cast Optimizations
      • HIR Operator Expansion [ ConvertHIRtoLIR ]
      • Branch Optimizations [ BranchOptimizationDriver ]
      • Adjust Branch Probabilities [ AdjustBranchProbabilities ]
      • IRPrinter: Initial LIR[ IRPrinter ]
    • Local CopyProp
    • Local ConstantProp
    • Local CSE
    • Simple Opts
    • Basic Block Frequency Estimation
      • Build LST [ BuildLST ]
      • Estimate Block Frequencies
    • Code Reordering
    • Branch Optimizations [ BranchOptimizationDriver ]
    • Convert LIR to MIR
      • SplitBasicBlock
      • IRPrinter: Final LIR[ IRPrinter ]
      • Mutate Splits
      • Instruction Selection
      • Reduce Operators [ ReduceOperators ]
      • ConvertALUOps
      • Normalize Constants [ NormalizeConstantsPhase ]
      • Live Handlers [ DoLiveness ]
      • DepGraph & BURS [ DoBURS ]
      • Complex Operators [ ComplexOperators ]
      • NullCheckCombining
      • IRPrinter: Initial MIR[ IRPrinter ]
    • Register Mapping
      • MIR Range Splitting
      • Expand Calling Convention
      • Expand Calling Convention
      • Live Analysis
      • Register Allocation
      • Register Allocation Preparation
      • Linear Scan Composite Phase
      • Interval Analysis
      • Register Restrictions
      • Linear Scan
      • Update GCMaps 1
      • Spill Code
      • Update GCMaps 2
      • Update OSRMaps
      • Insert Prologue/Epilogue
    • Branch Optimizations [ BranchOptimizationDriver ]
    • Generate Machine Code
      • Final MIR Expansion
      • Assembler Driver [ AssemblerDriver ]

Author: Dmitri Makarov

Email: dmakarov@alumni.stanford.edu

Created: 2015-09-12 Sat 22:16

Emacs 25.0.50.1 (Org mode 8.2.10)

Validate