290 likes | 430 Views
Serialization Flatten your object for automated storage or network transfer. Software object persistence. Persistence : Saving information about an object to recreate at different time, or place or both.
E N D
SerializationFlatten your object for automated storage or network transfer Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Software object persistence • Persistence: Saving information about an object to recreate at different time, or place or both. • Object serialization means of implementing persistence: convert object’s state into byte stream to be used later to reconstruct (build-deserialized) a virtually identical copy of original object. • Default serialization for an object writes: • the class of the object, • the class signature, • values of all non-transient and non-static fields. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serialization protocol • For serialization: • java.io.ObjectOutputStreamvia writeObject which calls on defaultWriteObject, • For deserialization: • java.io.ObjectInputStreamvia readObject which calls on defaultReadObject. • Any object instance that belongs to the graph of the object being serialized must be serializable as well. • Superclass must be Serializable. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serialization protocol • Customize default: implement extended versions of default methods in: • writeObject • readObject • But final fields cannot be read with readObject. Need to use default. • Create own complete serialization by implementing the interface Externalizable. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Specifying persistent objects • Class of the object to be serializable must implement interface: java.io.Serializable • This interface is an empty interface and is used to mark the objects of such class as persistent. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Deserialization • It reads values written during serialization • Static fields in the class are left untouched. • If class needs to be loaded, then normal initialization of the class takes place, giving static fields its initial values. • Transient fields will be initialized to default values • Recreation of the object graph will occur in reverse order from its serialization. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Example import java.io.Serializable; import java.util.Date; import java.util.Calendar; public class PersistentTime implements Serializable { public PersistentTime() { time = Calendar.getInstance().getTime(); } public Date getTime() { return time; } private Date time; } Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Class java.io.ObjectOutputStream • An ObjectOutputStream instance writes primitive data types and graphs of Java objects to an OutputStream. The objects can be read (reconstituted) using an ObjectInputStream. Persistent storage of objects can be accomplished by using a file for the stream. If the stream is a network socket stream, the objects can be reconstituted on another host or in another process. • Only objects that support the java.io.Serializable interface can be written to streams. The class of each serializable object is encoded including the class name and signature of the class, the values of the object's fields and arrays, and the closure of any other objects referenced from the initial objects. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Class java.io.ObjectOutputStream • The method writeObject is used to write an object to the stream. Any object, including Strings and arrays, is written with writeObject. Multiple objects or primitives can be written to the stream. The objects must be read back from the corresponding ObjectInputstream with the same types and in the same order as they were written. • Primitive data types can also be written to the stream using the appropriate methods from DataOutput. Strings can also be written using the writeUTF method. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
import java.io.ObjectOutputStream;import java.io.FileOutputStream; import java.io.IOException;public class FlattenTime{ public static void main(String [] args){ String filename = "time.ser"; if(args.length > 0){ filename = args[0]; } PersistentTime time = new PersistentTime(); FileOutputStream fos = null; ObjectOutputStream out = null;try{ fos = new FileOutputStream(filename); out = new ObjectOutputStream(fos); out.writeObject(time); out.close(); } catch(IOException ex){ ex.printStackTrace(); } }} Example Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
import java.io.ObjectInputStream;import java.io.FileInputStream;import java.io.IOException;import java.util.Calendar;public class InflateTime{ public static void main(String [] args){ String filename = "time.ser"; if(args.length > 0){ filename = args[0]; } PersistentTime time = null; FileInputStream fis = null; ObjectInputStream in = null; try{ fis = new FileInputStream(filename); in = new ObjectInputStream(fis); time = (PersistentTime)in.readObject(); in.close(); } catch(IOException ex){ ex.printStackTrace(); } catch(ClassNotFoundException ex){ ex.printStackTrace(); } System.out.println("Flattened time: " + time.getTime()); System.out.println("Current time: " + Calendar.getInstance().getTime()); }} Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serializable vs. Non-Serializable objects • Java.lang.Object does not implement serializable, so you must decide which of your classes need to implement it. • AWT, Swing components, strings, arrays are defined serializable. • Certain classes and subclasses are not serializable: Thread, OutputStream, Socket • When a serializable class contains instance variables which are not or should not be serializable they should be marked as that with the keyword transient. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Transient fields • These fields will not be serialized. • When deserialized, these fields will be initialized to default values • Null for object references • Zero for numeric primitives • False for boolean fields • If these values are unacceptable • Provide a readObject() that invokes defaultReadObject() and then restores transient fields to their acceptable values. • Or, the fields can be initialized when used for the first time. (Lazy initialization.) Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serial version UID • You should explicitly declare a serial version UID in every serializable class. • Eliminates serial version UID as a potential source of incompatibility. • Small performance benefit, as Java does not have to come up with this unique number. • private static final long serialVersionUID =rlv; • rlv can be any number out thin air, but must be unique for each serializable class in your development. • If you want to make a new version of the class incompatible with existing version, choose a different UID. Deserialization of previous version will fail with InvalidClassException. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Customizing OutputObjectStream, InputObjectStream • To provide special behavior in the writing or reading of stream object bytes implement private void writeObject(ObjectOutputStream out) throws IOException; private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException; Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Creating your own protocol: Externalizable • Instead of implementing the Serializable interface, implement Externalizable: interface Externalizable{ public void writeExternal(ObjectOutput out) throws IOException; public void readExternal(ObjectInput in) throws IOException; } Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Performance • Serialization is a very expensive process. You must clearly have reasons to serialize instead of you directly writing what you need to save about the state of an object. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Default or Customized serialization? OrImplementing Serializable judiciously • Allowing a class’s instances to be serializable can be as simple as adding the words “implements Serializable” to the class specification. • This is a common misconception, the truth is far more complex. • While efficiency it is one cost associated with it, there are other long-term costs that are much more substantial. • Using default serialization is very easy but this a very specious Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serialization Costs • Your object’s private structure is out for the viewing!!!! It’s become part of the API. • A major cost is that it decreases flexibility to change a class’s implementation once the class has been release • Increases the likelihood of bugs and security holes. • Increases the testing associated with releasing a new version of the class. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Serialization caveats • Implementing Serializable is not a decision to be undertaken lightly. • Classes design for inheritance should rarely implement serializable and interfaces should rarely extend it. • You should provide parameterless constructor on non-serializable classes designed for inheritance, in case it is subclassed and the subclass wants to provide serialization. • Inner classes should rarely if ever, implement Serializable. • A static member class can be serializable. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Consider using a custom serialized form • The default serialized form of an object is an encoding of the physical representation of the object graph rooted at the object • Data contained in the object • Data contained in every object reachable from it. • Topology by which all of these objects are interlinked. • The ideal serialized form contains only the logical data represented by the object. It is independent of its physical representation. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Consider using a custom serialized form • Default serialization is likely to be appropriate if an object’s physical presentation is identical to its logical content. • Appropriate: A Name class. • Not appropriate: A doubly linked List class. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Consider using a custom serialized form • Disadvantages of default serialization when physical and logical representation differ: • Permanently ties the exported API to the internal representation. • Can consume excessive space. • Can consume excessive time. • Can cause stack overflow. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Consider using a custom serialized form • A reasonable serialized form for a List is the number of entries followed by each of the entries. • Although default serialized form is correct for a List case, it may not be the case for any object whose invariants are tied to implementation-specific details. • Example: a hash table using buckets. This is based on the hash code of the key, which may change from JVM to JVM, or for different runs of the hash table in same JVM. Thus default serialized form can violate the invariant for hash tables in this case. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
readObject() and security attacks • Deserialization uses defaultReadObject() and readObject() to create a new instance of a class. • Thus readObject is a constructor!!!!! • So, readObject must behave like any other constructor: • Check for argument’s validity if need be • Make copies of parameters where needed • Otherwise, a very simple job for an attacker to violate object’s invariants. • Provide a hand-made serialization of the attack object. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
Guide for writing a bulletproof readObject • Private reference fields should be initialized with copies of its values. • Check invariants and throw an InvalidObjectException if they fail. • As with constructors, do not invoke any overridable methods. • If an entire object graph must be check for validity after deserialization, the objectInputValidation interface should be used. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
writeReplace() • Sometimes it may not be appropriate to serialize the actual object, but some specifically given object. <access> Object writeReplace() throws ObjectStreamException; Returns an object that will replace the current object during serialization. Any object may be returned including the current one. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
A comment about access qualifier • These methods can be of any accessibility • They will be used if they are accessible to the object type being serialized • If a class has private readResolve, it only affects serialization of objects that are exactly its type. • If package-accessible readResolve affects only subclasses within the same package • public and protected readResolve affect objects of all subclasses. Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads
readResolve() • Recall that deserialization produces an instance of a class object. • If a given class should only have one instance (singleton pattern), then via deserialization we can provide a different instance!!! • In general you need to be concerned of what is being created for instance-controlled classes. • Enter: readResolve(); this is a method that returns the appropriate instance of the class at hand by the readObject() or defaultReadObject() methods. <access> readResolve() throws ObjectStreamException; Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads