APCS – Unit 2 – Lesson 2. Compilers, Interpreters and Java’s Hybrid Approach

1. What is a compiler?

A compiler is a program that reads in as input a program (in some high-level programming language) and outputs machine language code (for some machine architecture). The machine language code can subsequently be executed any number of times using different input data each time.

Example of compiler in C++:

A programmer writes the text of the program using a software program called an editor. The text of a program is referred to as source code and the file is called source file.

As an example, the Unix program g++ transforms a C++ source file into a machine executable file a.out which can be run natively on Sparc microprocessors. As a second example, the Java compiler javac transforms a .java source file into a .class file that is written in Java bytecode, which is the machine language for an imaginary machine known as the Java Virtual Machine.

Some of the most popular high – level programming languages are compiled languages, like C, C++, Objective – C, Rust and Swift.

2. What are the advantages and disadvantages of compiled languages?

Programs compiled into native code at compile time tend to be faster than those translated at run time, due to the overhead of the translation process. However, newer technologies such as just-in-time compilation, and general improvements in the translation process are starting to narrow this gap. Mixed solutions using bytecode tend toward intermediate efficiency.
Low-level programming languages are typically compiled, especially when efficiency is the main concern, rather than cross-platform support. For such languages, there are more one-to-one correspondences between the programmed code and the hardware operations performed by machine code, making it easier for programmers to control the use of central processing unit (CPU) and memory in fine detail.
With some effort, it is always possible to write compilers even for traditionally interpreted languages. For example, Common lisp can be compiled to Java bytecode (then interpreted by the Java virtual machine), C code (then compiled to native machine code), or directly to native code.
Programming languages that support multiple compiling targets give more control to developers to choose either execution speed or cross-platform compatibility. [Source: Wikipedia]

3. What is an interpreter?

An interpreter is a program that reads in as input a source program, along with data for the program, and translates the source program instruction by instruction.

For example, the Java interpreter java translate a .class file into code that can be executed natively on the underlying machine. As a second example, the program VirtualPC interprets programs written for the Intel Pentium architecture (IBM-PC clone) for the PowerPC architecture (Macintosh). This enable Macintosh users to run Windows programs on their computer.

Some of the most popular interpreted languages are JavaScript, Perl and PHP.

4. What are the advantages and disadvantages of interpreted languages?

Advantages:

Interpreting a language gives implementations some additional flexibility over compiled implementations. Features that are often easier to implement in interpreters than in compilers include:

platform independence (Java’s byte code, for example)
reflection and reflective use of the evaluator (e.g. a first-order eval function)
dynamic typing
smaller executable program size (since implementations have flexibility to choose the instruction code)
dynamic scoping

Furthermore, source code can be read and copied, giving users more freedom.

Disadvantages:

Without static type-checking, which is usually performed by a compiler, programs can be less reliable, because type checking eliminates a class of programming errors (though type-checking of the code can be done by using additional stand-alone tools.
Interpreters can be susceptible to Code injection attacks.
Slower execution compared to direct native machine code execution on the host CPU. A technique used to improve performance is just-in-time compilation which converts frequently executed sequences of interpreted instruction to host machine code. JIT is most often combined with compilation to byte-code as in Java.
Source code can be read and copied (e.g. JavaScript in web pages), or more easily reverse engineered through reflection in applications where intellectual property has a commercial advantage.

5. What is Java’s hybrid approach?

Java uses both compiler and interpreter. The diagram below illustrates Java’s compiler + interpreter approach.

After a programmer uses an editor to enter a simple Java program, names and saves it as Hello, the program is saved as a source file with the file extension .java. Then the Java compiler compiles the source file Hello.java, transforms the source file into Hello.class that is written in Java bytecode, which is the machine language for an imaginary machine known as the Java Virtual Machine (JVM). Finally, the Java interpreter translate Hello.class into code that can be executed natively on the underlying machine.

Why does Java typically interpret instead of compile? The main advantage of compilation is that you end up with raw machine language code that can be efficiently executed on your machine. However, it can only be executed on one type of machine architecture (Intel Pentium, PowerPC). A primary advantage of a compiling to an intermediate language like Java bytecode and then interpreting is that you can achieve platform independence: you can interpret the same .class file on differently types of machine architectures. However, interpreting the bytecode is typically slower than executing pre-compiled machine language code. A second advantage of using the Java bytecode is that it acts as a buffer between your computer and the program. This enables you to download an untrusted program from the Internet and execute it on your machine with some assurances. Since you are running the Java interpreter (and not raw machine language code), you are protected by a layer of security which guards against malicious programs. It is the combination of Java and the Java bytecode that yield a platform-independent and secure environment, while still embracing a full set of modern programming abstractions.

List of sources and references:

Introduction to Programming, Princeton University
Wikipedia
Java Methods, Litvin
Images online