1.32k likes | 1.34k Views
This semester-long course covers Java programming for grids, e-Science, e-Business, and e-Government applications. Topics include networking, XML, web services, grid systems, and advanced technology discussions.
E N D
e-Science e-Business e-Government and their TechnologiesAdvanced Java Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories Indiana University Bloomington IN 47404 January 12 2004 dbcarpen@indiana.edu gcf@indiana.edu mpierce@cs.indiana.edu http://www.grid2004.org/spring2004
What are we doing • This is a semester-long course on Grids (viewed as technologies and infrastructure) and the application – mainly to science but also to business and government • We will assume a basic knowledge of the Java language and then interweave 6 topic areas – first four cover technologies that will be used by students • 1) Advanced Java:including networking, Java Server Pages and perhaps servlets • 2) XML: Specification, Tools, Linkage to Java • 3) Web Services: Basic Ideas, WSDL, Axis and Tomcat • 4)Grid Systems: GT3/Cogkit, Gateway, XSOAP, Portlet • 5) Advanced Technology Discussions: CORBA as istory, OGSA-DAI, security, Semantic Grid, Workflow • 6) Applications: Bioinformatics, Particle Physics, Engineering, Crises, Computing-on-demand Grid, Earth Science
Course Topic 1 • Advanced Java Programming • We will assume basic Java programming proficiency • We will cover Java client/server, three-tiered and network programming. • Ancillary but interesting Java topics to be covered include Apache Ant, XML-Beans, and Java Message Service • Material in the last bullet will mostly be introduced in later sections, as the course unfolds. • First lecture of the segment starts with a fairly discursive review of Java features.
Reading Material • No particular text for this section, but some material will come from earlier related courses: • Java HPC Course, September 2003 http://www.hpjava.org/courses/arl • Opennet Technologies Online Course, Fall 2001 http://aspen.ucs.indiana.edu/ptliu • Applications of Information Technology I and II, Spring 2001 http://aspen.ucs.indiana.edu/it1spring01 http://aspen.ucs.indiana.edu/it2spring01
Java History • The Java language grabbed public attention in 1995, with the release of the HotJava experimental Web browser, and the subsequent incorporation of Java into the Netscape browser. • Java had originally been developed—under the name of Oak—as an operating environment for PDAs, a few years before. • Very suddenly, Java became one of the most important programming languages in the industry. • The trend continued. Although Web applets are less important today than they were originally, Java was rapidly adopted by many other sectors of the programming community.
The Java Virtual Machine • Java programs are not compiled to machine code in the same way as conventional programming language. • To support safe execution of compiled code on multiple platforms (portability, security), they are compiled to instructions for an abstract machine called the Java Virtual Machine (JVM). • The JVM is a specification originally published by Sun Microsystems. • JVM instructions are called Java byte codes. They are stored in a class file. • This execution model is part of the specification of the Java platform. There are a few compilers from the Java language to machine code, but it is hard to get these recognized as “Java compliant”.
JVM and Performance • The first implementations of the JVM simply interpreted the byte codes. These implementations were very slow. • This led to a commonmisconception that Java is an interpreted language and inherently slow. • Modern JVMs normally perform some form of compilation from byte codes to machine code on the fly, as the Java program is executed.
Run-time Compilation • In one form of Just-In-Time compilation, methods may be compiled to machine code immediately before they are executed for the first time. Then subsequent calls to the method just involve jumping into the machine code. • More sophisticated forms of adaptive compilation (like in the Sun Hotspot JVMs) initially run methods in interpreted mode, monitor program behavior, and only spend time compiling portions of the byte code where the program spends significant time. This allows more intelligent allocation of CPU time to compilation and optimization. • Modern JVMs (like the Hotspot server JVM) implement many of the most important kinds of optimization used by the static compilers of “traditional” programming languages. • Adaptive compilation may also allow some optimization approaches that are impractical for static compilers, because they don’t have the run-time information.
Prerequisites • We assume you know either Java or C++ moderately well. • But some things, like threaded and network programming with Java, will be covered from an introductory level later on. • In this section I will only point out some features and terminologies that are characteristic of Java and that you probably should understand. • And highlight some of the differences from C++.
What Java Isn’t • C++, mainly—now hard to think of languages as closely related. • Similar syntax for expressions, control constructs, etc, but these are perhaps the least characteristic features of C++ or Java. • In C++ use features like operator overloading, copy constructors, templates, etc, to create “little languages” through class libraries. • Worry about memory management and efficient creation of objects. • Worry about inline versus virtual methods, pointers versus references, minimizing overheads. • In Java most of these things go away. • Minimal control over memory management, due to automatic garbage collection. • Highly dynamic : all code is loaded dynamically on demand; implicit run-time descriptors play an important role, through run-time type checks, instanceof, etc. • Logically all methods are virtual; overloading and implementation of interfaces is ubiquitous. • Exceptions, rarely used in C++, are used universally in Java.
Java Class Structure • All methods and (non-local) variables are explicitly member of classes (or interfaces). • No default, global, namespace (except for the names of classes and interfaces). • Java discards multiple inheritance at the class level. Inheritance relations between classes are strictly tree-like. • Every class inheritance diagram has the universal base class Object at its root.
Java Class Structure (2) • Java introduces the important idea of an interface, which is logically different from a class. Interfaces contain no implementation code for the methods they define. • Multiple inheritance of interfaces is allowed, and this is one way Java manages without it at the class level. • Since Java 1.2, classes and interfaces can be nested. • This is a big change to the language: read JLS 2nd Edition in detail if you don’t believe this!
Classes and Instances • Will consistently use the following terminologies (which are “correct”): • A class is a type, e.g. public class A {int x ; void foo() {x = 23 ;}} • An interface is a type, e.g . public interface B {void goo() ;} • An instance is an object. An object is always an instance of one particular class. • That class may extend other classes, and implement multiple interfaces.
Pointers in Java? • Any expression in Java that has class type (or interface type) is a reference to some instance (or it is a null reference). E.g. a variable declared: A a ; holds a reference to an instance. The objects themselves are “behind the scenes” in Java: we can only manipulate pointers (references) to them. • E.g. a = b ; Only copies a reference, not an object. • But important to note references to objects and arrays are the only kinds of pointer in Java. E.g. there are no pointers to fields or array elements or local variables.
Instance and static members • The following terminologies are common. In: public class A { int x void foo() {…} static int y ; static void goo() {…} } We say: x is an or instance variable, or non-static field. foo() is an instance method, or non-static method. y is a static field, or class variable. goo() is a static method, or class method.
Class Loading • A Java program is typically written as a class with a public, static, void, main() method, as follows public class MyProgram { public static void main(String [] args) { … body of program … } } and started by a command like: $ java MyProgram • This command creates a Java Virtual Machine, loads the class MyProgram into the JVM, then invoke its main() method. • As this process unfolds, dependencies on other class and interfaces and their supertypes will be encountered, e.g. through statements that use other classes. The class loader brings in the class files for these types on demand. Code is loaded, and methods linked, incrementally, throughout execution.
The CLASSPATH • Many people have problems getting the CLASSPATH environment variable right. • Because all linking is done at run-time, must ensure that this environment variable has the right class files on it. • The class path is a colon-separated (semicolon-separated in Windows) list of directories and jar files. • If the class path is empty, it is equivalent to “.”. But if the class path is not empty, “.” is not included by default. • A directory entry means a root directory in which class files or package directories are stored; a jar entry means a jar archive in which class files or package directories are stored.
Binary Compatibility • There is a useful property called binary-compatibility between classes. This means that (within some specified limits) two class files that implement the same public interface can be used interchangeably. • It also means that if you pick up an inappropriate implementation of a given class from the CLASSPATH at runtime, things can go wrong in an opaque way.
Java Native Interface • Some methods in a class may be declared as native methods, e.g.: class B { public native long add(int [] nums) ; } Notice the method add() has the modifier native, and the body of the method declaration is missing • It is replaced by a semicolon—similar to abstract methods in interfaces, etc. But in this case the method isn’t abstract. • The implementation of a native method will be given in another language, typically C or C++ (we consider C). • Implementing native methods is quite involved. • Arguably a good thing—it discourages casual use! Generally need a good reason for resorting to JNI.
A Definition of Java_B_add() JNIEXPORT jlong JNICALL Java_B_add(JNIEnv * env, jobject this, jintArray nums) { jint *cnums ; int i, n ; jlong sum = 0 ; n = (*env)->GetArrayLen(env, nums) ; cnums = (*env)->GetIntArrayElements(env, nums, NULL) ; for(i = 0 ; i < n ; i++) sum += cnums [i] ; return sum ; }
The Invocation API • JNI also provides a very powerful mechanism for going the other way—calling from a C program into Java. • First the C program needs to create a JVM (initialize all the data structures associated with a running JVM), which it does with a suitable library call. • The standard java command works exactly this way—it uses the JNI invocation API to create a JVM, and call the main() method of the class specified on the command line.
The Rest of this Segment • Will cover three core topics in “advanced Java”: • Multithreaded Programming in Java • Java as a multithreaded language; Java thread synchronization primitives. • Network Programming in Java • Traditional Java class libraries for sockets, URLs. • Overview of Java “New I/O”. • Java Servlets and Java Server Pages. • Java technologies for “Web Applications”. • Other Java techniques (e.g. Java for XML, Web Services) will be introduced as the course unfolds.
Need for Concurrent Programming • This course is mostly about distributed programming. • This is a different discipline from concurrent or multithreaded programming, but doing distributed programming without understanding concurrent programming is error prone. • Some frameworks (e.g. EJB) try to enable distributed programming while insulating the programmer from the difficulties of concurrent programming, but eventually you are likely to hit concurrency issues. + Partial failures + Non-determinism Sequential programming Concurrent programming Distributed programming
Java as a Threaded Language • In C, C++, etc it is possible to do multithreaded programming, given a suitable library. • e.g. the pthreads library. • Unlike other languages, Java integrates threads into the basic language specification in a much tighter way. • Every Java Virtual Machine must support threads.
Features of Java Threads • Java provides a set of synchronization primitives based on monitor and condition variable paradigm of C.A.R. Hoare. • Underlying functionality similar to e.g. POSIXthreads. • Syntactic extension for threads (deceptively?) small: • synchronized attribute on methods. • synchronized statement. • volatile keyword. • Other thread management and synchronization captured in the Thread class and related classes. • But the presence of threads has a wide-ranging effect on language specification and JVM implementation.
Contents of this Lecture • Introduction to Java Threads. • Mutual Exclusion. • Synchronization between Java Threads using wait() and notify(). • Other features of Java Threads. • Suggested Exercises
Threads of Execution • Every statement in a Java program is executed in a context called its thread of execution. • When you start a Java program in the normal way, the main() method—and any methods called from that method—are executed in a singled out (but otherwise ordinary) thread sometimes called the main thread. • Other threads can run concurrently with the main thread. These threads share access to the same classes and objects as the main thread, but they execute asynchronously, in their own time. • The main thread can create new threads; these threads can create further threads, etc.
Creating New Threads • Any Java thread of execution (including the main thread) is associated with an instance of the Thread class. Before starting a new thread, you must create a new instance of this class. • The Java Thread class implements the interface Runnable. So every Thread instance has a method: public void run() { . . . } • When the thread is started, the code executed in the new thread is the body of the run() method. • Generally speaking the new thread ends when this method returns.
Making Thread Instances • There are two ways to create a thread instance (and define the thread run() method). Choose at your convenience: • Extend the Thread class and override the run() method, e.g.: class MyThread extends Thread { public void run() { System.out.println(“Hello from another thread”) ; } } . . . Thread thread = new MyThread() ; • Create a separate Runnable object and pass to the Thread constructor: class MyRunnable implements Runnable { public void run() { System.out.println(“Hello from another thread”) ; } } . . . Thread thread = new MyThread(new MyRunnable()) ;
Starting a Thread • Creating the Thread instance does not in itself start the thread running. • To do that you must call the start() method on the new instance: thread.start() ; This operation causes the run() method to start executing concurrently with the original thread. • In our example the new thread will print the message “Hello from another thread” to standard output, then immediately terminate. • You can only call the start() method once on any Thread instance. Trying to “restart” a thread causes an exception to be thrown.
Example: Multiple Threads class MyThread extends Thread { MyThread(int id) { this.id = id ; } public void run() { System.out.println(“Hello from thread ” + id) ; } private int id ; } . . . Thread [] threads = new Thread [p] ; for(int i = 0 ; i < p ; i++) threads [i] = new MyThread(i) ; for(int i = 0 ; i < p ; i++) threads [i].start() ;
Remarks • This is one way of creating and starting p new threads to run concurrently. • The output might be something like (for p = 4): Hello from thread 3 Hello from thread 4 Hello from thread 2 Hello from thread 1 Of course there is no guarantee of order (or atomicity) of outputs, because the threads are concurrent. • One might worry about the efficiency of this approach for large numbers of threads (massive parallelism).
JVM Termination and Daemon Threads • When a Java application is started, the main() method of the application is executed in the main thread. • If the main method never creates any new threads—the JVM keeps running until the main() method completes (and the main thread terminates). • Typically, the java command finishes. • If main() creates new threads, by default the JVM terminates when all user-created threads have terminated. • More generally there are system threads executing in the background (e.g. threads might be associated with garbage collection). These are marked as daemon threads—meaning that they don’t have the property of “keeping the JVM alive”. So actually the JVM terminates when all non-daemon threads terminate. • Ordinary user threads can create daemon threads by applying the setDaemon() method to the thread instance before starting it.
Avoiding Interference • In any non-trivial multithreaded (or shared-memory-parallel) program, interference between threads is an issue. • Generally interference (or a race condition) occurs if two threads are trying to do operations on the same variables at the same time. This often results in corrupt data. • But not always. It depends on the exact interleaving of instructions. This non-determinism is the worst feature of race conditions. • A popular solution is to provide some kind of lock primitive. Only one thread can acquire a particular lock at any particular time. The concurrent program can be written so that operations on some given variables are only performed by threads holding the lock for those variables. • In POSIX threads, for example, the lock objects are called mutexes.
Monitors • Java adopts a version of monitors, proposed by C.A.R. Hoare. • Every Java object is created with its own lock (and every lock is associated with an object—there is no way to create an isolated mutex). In Java this lock is often called the monitor lock. • Methods of a class can be declared to be synchronized. • The object’s lock is acquired on entry to a synchronized method, and released on exit from the method. • Synchronized static methods need slightly different treatment. • If methods generally modify the fields (instance variables) of the method instance, this leads to a natural and systematic association between locks and the variables they guard. • The critical region is the body of the synchronized method.
Example use of Synchronized Methods Thread A Thread B … call to counter.increment() … // body of synchronized method tmp1 = count ; count = tmp1 + 1 ; … call to counter.decrement() … Blocked … counter.increment() returns … // body of synchronized method tmp2 = count ; count = tmp2 - 1 ; … counter.decrement() returns …
Caveats • This approach helps to encourage good practices, and make multithreaded Java programs less error-prone than, say, multithreaded C programs. • But it isn’t magic—it still depends on correct identification of the critical regions, to avoid race conditions. • Concurrent programming is hard, and if you start with the assumption Java somehow makes concurrent programming “easy”, you are probably going to write some broken programs!
Example: A Simple Queue public class SimpleQueue { public synchronized void add(Object data) { if (front != null) { back.next = new Node(data) ; back = back.next ; } else { front = new Node(data) ; back = front ; } } public synchronized Object rem() { Object result = null ; if (front != null) { result = front.data ; front = front.next ; } return result ; } private Node front, back ; }
Remarks • This queue is implemented as a linked list with a front pointer and a back pointer. • The method add() adds a node to the back of the list; the method rem() removes a node from the front of the list. • The rem() method immediately returns null when the queue is empty. • The Node class just has a data field (type Object) and a next field (type Node). • The following slide gives an example of what could go wrong without mutual exclusion. It assumes two threads concurrently add nodes to the queue. • In the initial state, Z is the last item in the queue. In the final state, the X node is orphaned, and the back pointer is null.
The Need for Synchronized Methods Thread A: add(X) null Z back back.next = new Node(X) ; Thread B: add(Y) null X back.next = new Node(Y) ; Z null X back Z null Y back = back.next ; back null X Z back = back.next ; null Y back null X Z Corrupt data structure! null Y back null
The synchronized construct • The keyword synchronized also appears in the synchronized statement, which has syntax like: synchronized (object) { … critical region … } • Here object is a reference to any object. The synchronized statement first acquires the lock on this object, then executes the critical region, then releases the lock. • Typically you might use this for the lock object, somewhere inside a non-synchronized method, when the critical region is smaller than the whole method body. • In general, though, the synchronized statement allows you to use the lock in any object to guard any code.
Deadlock • Deadlock occurs when a group of threads are mutually waiting for one another in such a way that none can proceed. • This happens if there is a cycle of waits-for dependencies, e.g. A waits for B, B waits for C, … , D waits for A. • There are unfortunately many ways this can occur. One common situation is if two threads try to acquire the same pair of locks in different orders, e.g.: Thread A synchronized(x) { … synchronized(y) { … } } Thread B synchronized(y) { … synchronized(x) { … } }
Performance Cost of synchronized • Acquiring locks introduces an overhead in execution of synchronized methods. See, for example: “Performance Limitations of the Java Core Libraries”, Allan Heydon and Marc Najork (Compaq), Proceedings of ACM 1999 Java Grande Conference. • Many of the original utility classes in the Java platform (e.g. Vector, etc) were specified to have synchronized methods, to make them safe for the multithreaded environment. • This was probably a mistake: newer replacement classes (e.g. ArrayList) don’t have synchronized methods—the programmer provides synchronization as needed, e.g. through wrapper classes.
Beyond Mutual Exclusion • The mutual exclusion provided by synchronized methods and statements is an important category of synchronization. • But there are other interesting forms of synchronization between threads. Mutual exclusion by itself is not enough to implement these more general sorts of thread interaction (not efficiently, anyway). • POSIX threads, for example, provides a second kind of synchronization object called a condition variable to implement more general inter-thread synchronization. • In Java, condition variables (like locks) are implicit in the definition of objects: every object effectively has a single condition variable associated with it.
A Motivating Example • Consider the simple queue from the previous example. • If we try to remove an item from the front of the queue when the queue is empty, SimpleQueue was specified to just return null. • This is reasonable if our queue is just meant as a data structure buried somewhere in an algorithm. But what if the queue is a message buffer in a communication system? • In that case, if the queue is empty, it may be more natural for the “remove” operation to block until some other thread added a message to the queue.