Java Bytecode: Stack-based Machine Code

CS 301 Lecture, Dr. Lawlor

OK, so a typical machine like x86 has:
Java is not like that.  Java has:
Instead of just undifferentiated memory, divided into sections by convention at runtime and accessed via uniform pointers, Java uses different machine code instructions to access data in different places:
That's a lot of different ways to get at integers!  The big advantage of doing this, though, is reliability: pointers can't go haywire, because there are no pointers.  The operand stack can't overflow, because you can statically check the stack size.  It looks strange at first, but it's actually quite well done.

Here are all the java bytecodes.  See the full table here, and the instruction descriptions starting with "A".
0x00 nop

Load immediate value:
0x01 aconst_null
0x02 iconst_m1
0x03 iconst_0
0x04 iconst_1
0x05 iconst_2
0x06 iconst_3
0x07 iconst_4
0x08 iconst_5
0x09 lconst_0
0x0a lconst_1
0x0b fconst_0
0x0c fconst_1
0x0d fconst_2
0x0e dconst_0
0x0f dconst_1

Load from next byte:
0x10 bipush
0x11 sipush

Load from constant pool:
0x12 ldc
0x13 ldc_w
0x14 ldc2_w

Read from local variable:
0x15 iload
0x16 lload
0x17 fload
0x18 dload
0x19 aload
0x1a iload_0
0x1b iload_1
0x1c iload_2
0x1d iload_3
0x1e lload_0
0x1f lload_1
0x20 lload_2
0x21 lload_3
0x22 fload_0
0x23 fload_1
0x24 fload_2
0x25 fload_3
0x26 dload_0
0x27 dload_1
0x28 dload_2
0x29 dload_3
0x2a aload_0
0x2b aload_1
0x2c aload_2
0x2d aload_3

Read from an array index:
0x2e iaload
0x2f laload
0x30 faload
0x31 daload
0x32 aaload
0x33 baload
0x34 caload
0x35 saload
Write to local variable:
0x36 istore
0x37 lstore
0x38 fstore
0x39 dstore
0x3a astore
0x3b istore_0
0x3c istore_1
0x3d istore_2
0x3e istore_3
0x3f lstore_0
0x40 lstore_1
0x41 lstore_2
0x42 lstore_3
0x43 fstore_0
0x44 fstore_1
0x45 fstore_2
0x46 fstore_3
0x47 dstore_0
0x48 dstore_1
0x49 dstore_2
0x4a dstore_3
0x4b astore_0
0x4c astore_1
0x4d astore_2
0x4e astore_3

Write to an array index:
0x4f iastore
0x50 lastore
0x51 fastore
0x52 dastore
0x53 aastore
0x54 bastore
0x55 castore
0x56 sastore



Stack Manipulation:
0x57 pop
0x58 pop2
0x59 dup
0x5a dup_x1
0x5b dup_x2
0x5c dup2
0x5d dup2_x1
0x5e dup2_x2
0x5f swap

Arithmetic:
0x60 iadd
0x61 ladd
0x62 fadd
0x63 dadd
0x64 isub
0x65 lsub
0x66 fsub
0x67 dsub
0x68 imul
0x69 lmul
0x6a fmul
0x6b dmul
0x6c idiv
0x6d ldiv
0x6e fdiv
0x6f ddiv
0x70 irem
0x71 lrem
0x72 frem
0x73 drem
Negate:
0x74 ineg
0x75 lneg
0x76 fneg
0x77 dneg

Bitwise:
0x78 ishl
0x79 lshl
0x7a ishr
0x7b lshr

Unsigned shift:
0x7c iushr
0x7d lushr

0x7e iand
0x7f land
0x80 ior
0x81 lor
0x82 ixor
0x83 lxor
0x84 iinc

Conversions:
0x85 i2l
0x86 i2f
0x87 i2d
0x88 l2i
0x89 l2f
0x8a l2d
0x8b f2i
0x8c f2l
0x8d f2d
0x8e d2i
0x8f d2l
0x90 d2f
0x91 i2b
0x92 i2c
0x93 i2s


Compares:
0x94 lcmp
0x95 fcmpl
0x96 fcmpg
0x97 dcmpl
0x98 dcmpg

Jump after compare:
0x99 ifeq
0x9a ifne
0x9b iflt
0x9c ifge
0x9d ifgt
0x9e ifle

Int-only cmp-&-jmp:
0x9f if_icmpeq
0xa0 if_icmpne
0xa1 if_icmplt
0xa2 if_icmpge
0xa3 if_icmpgt
0xa4 if_icmple
0xa5 if_acmpeq
0xa6 if_acmpne

Jmp with 2-byte offset:
0xa7 goto
0xa8 jsr
0xa9 ret
0xaa tableswitch
0xab lookupswitch

Return a value:
0xac ireturn
0xad lreturn
0xae freturn
0xaf dreturn
0xb0 areturn
0xb1 return
Read/write object:
0xb2 getstatic
0xb3 putstatic
0xb4 getfield
0xb5 putfield

Call a method:
0xb6 invokevirtual

Call a ctor/dtor:
0xb7 invokespecial

Call static method:
0xb8 invokestatic

Call "extends" method:
0xb9 invokeinterface

Allocate objects, arrays.
0xbb new
0xbc newarray
0xbd anewarray

Check array length:
0xbe arraylength

0xbf athrow
0xc0 checkcast
0xc1 instanceof
0xc2 monitorenter
0xc3 monitorexit
0xc4 wide
0xc5 multianewarray
0xc6 ifnull
0xc7 ifnonnull

Like goto, but 4-byte offset:
0xc8 goto_w
0xc9 jsr_w

Stuff that's weird to see in machine code:
Stuff you don't see there:

Example Java Bytecode & Disassembly

Here's a tiny piece of source code, "Hello.java":

class Hello {
   public static int fn(int i) {
     return i+3;
   }
   public static void main(String args[]) {
    System.out.println("fn="+fn(100));
   }
};

You compile this with:
    javac Hello.java

And run with:
    java Hello

Disassemble with:
    javap -c Hello

Which prints (with my comments, and the important stuff highlighted in red):
Compiled from "Hello.java"
class Hello extends java.lang.Object{
Hello();    (This is the compiler's default auto-generated constructor)
  Code:
   0:    aload_0    (my "this" pointer)
   1:    invokespecial    #1; //Method java/lang/Object."<init>":()V    (call my superclass's constructor)
   4:    return

public static int fn(int);
  Code:
   0:    iload_0    (my function argument is stored in my local variable array, load onto operand stack)
   1:    iconst_3  (load 3 onto operand stack)
   2:    iadd           (add i and 3)
   3:    ireturn       (return the sum)

public static void main(java.lang.String[]);
  Code:
   0:    getstatic    #2; //Field java/lang/System.out:Ljava/io/PrintStream;   (Basically Java's "cout")
   3:    new    #3; //class java/lang/StringBuilder  (used to assemble the output string)
   6:    dup   (we've got several things to print, so make several copies of the stringbuilder)
   7:    invokespecial    #4; //Method java/lang/StringBuilder."<init>":()V
   10:    ldc    #5; //String fn=
   12:    invokevirtual    #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   15:    bipush    100   (load the constant 100 onto the operand stack; this function argument will be copied into the local variable stack during invocation)
   17:    invokestatic    #7; //Method fn:(I)I   (call fn)
   20:    invokevirtual    #8; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;   (concat return value)
   23:    invokevirtual    #9; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   26:    invokevirtual    #10; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   29:    return

}

This is a pretty darn straightforward translation of the original code!