340 likes | 518 Views
History, Architecture, and Implementation of the CLR Serialization and Formatter Classes. Peter de Jong April 24, 2003. History. J++ DCOM 1997 J++ SOAP 1998 CLR .Net Remoting 1999 Spring CLR Serialization Classes 1999 Spring CLR SoapFormatter 1999 Spring CLR BinaryFormatter 1999 December
E N D
History, Architecture, and Implementation of the CLR Serialization and Formatter Classes Peter de Jong April 24, 2003
History • J++ DCOM 1997 • J++ SOAP 1998 • CLR .Net Remoting 1999 Spring • CLR Serialization Classes 1999 Spring • CLR SoapFormatter 1999 Spring • CLR BinaryFormatter 1999 December • CLR V1 2002 January
Original Soap Spec (Bob Atkinson) 1997 Protocol HTTP Bi-Directional Give me a call - Server callback using response from a hanging http request. XML No namespaces, no xsd RPC Soap Header root for Soap Headers and parameter graph No Envelope J++ Proxy/Stub for serialization/deserialization of Interface parameters J++ Soap Http Server Client Soap Root Parameters Soap Headers
CLR Soap • Soap .9 spec • Section 5 specifies how to map objects • Namespaces, no xsd • Soap Envelope • Rpc - rooted Headers and Parameters • Serialization – root of object graph • Most annoying part • Headers are really an array of objects • For XML beauty specified as xml field elements. • Lead to specification of root attribute
Soap Moving Target • Original Soap • Soap .9 • Soap as a cottage industry • Easy to produce a subset of soap • Microsoft had 5 or so implementations • Individuals and companies set up Soap Web sites • Soap Interop Meeting (IBM 2000-2001) • Soap Application Bench marks • Led to Web sites which implemented the Applications • ~15 sites to test interoperability • Soap 1.0 • Standards effort which included many of the Soap producers. • Envelope, body - no header or parameter root • Moved Section 5 to an appendix • Soap 1.1 • Nest top level object
Architecture BinaryFormatter SoapFormatter Serializer ----------- Parser Serializer ----------- Parser Binary Stream Soap XML Stream Object Reader ---------------------- Object Writer Object Reader ---------------------- Object Writer Object Reader ---------------------- Object Writer Serialization Classes
Serialization Classes • Designed to make it easy to produce Formatters. • True for a subset of CLR • False for the complete CLR object model • SoapFormatter and BinaryFormatter are the only Serialization/Deserialization engines which support the complete CLR model.
Serialization Classes Services • System controlled serialization (Serializable, NotSerialized) • User controlled serialization (ISerializable) • Type substitution (ISerializationSurrogate, ISurrogateSelector) • Object Substitution (IObjectReference) • Object Sharing Fixups
System Controlled Serialization • Serialization • Serialization Custom Attribute • NotSerialized Customer Attribute • public, internal, private fields serialized • Deserialization • Creates Uninitialized object • Populates the fields • Constructor is not called
User Controlled Serialization • Inherits from ISerializable • Serialization – GetObjectData give name/value pairs to serializer • Deserialization – Constructor used to retrieve name/value pairs and populate object. • Constructor is not in Interface, so compiler can’t check whether it present • Constructor isn’t inherited, so each subclass needs its own constructor • Earlier version used SetObjectData instead of constructor
Surrogates • Type substitution • Objects of specified type replaced by a new object of a different type. MarshalByRefObject Proxy ObjRef
Object Substitution • IObjectReference • GetRealObject method returns deserialized object • When object is returned, it and its descendents are completely deserialized • Used extensively for returning singleton system objects • Types, Delegates
Reference before object Serialization swizzles objref to integer Object Fixup
Object Fixup Complications • Value classes must be fixed up before boxed • ISerializable directly referenced object graphs must be deserialized one level • IObjectReference object graph must be completely deserialized
IDeserializationCallBack • Used to signal that deserialization is complete • E.g. Hashtable can’t create hashes until all the objects are deserialized.
IFormatter Object Graph • Serialize(Stream s, Object graph) • Object Deserialize(Stream s) • Properties • ISurrogateSelector • SerializationBinder (Type substitution when deserializing) • StreamingContext • CrossProcess • CrossMachine • File • Persistence • Remoting • Other • Clone • CrossAppDomain • All
IRemotingFormatter - RPC • Serialize(Stream s, Object graph, Header[] headers) • Two Serializations • Graph (parameter array) • Headers (Header array) • Object Deserialize(Stream s, HeaderHandler handler) • Delegate Object HeaderHandler(Headers[] headers) • Headers handed to delegate, delegate returns object into which parameters are deserialized.
Formatter Property Enums • FormatterTypeStyle • TypesWhenNeeded – types outputted for • Arrays of Objects • Object fields, inheritable fields • ISerializable • TypesAlways • version compatibility • MemberInfo -> ISerializable • FormatterAssemblyStyle • Simple – No version information • Full – Full assembly name Defaults Remoting – Serialization Full, Deserialization Simple Non-Remoting – Serialization Full, Deserialization Full
SoapFormatter additional Properties • ISoapMessage – Alternate way of specifying Parameter/Header serialization. • ParamNames • ParamValues • ParamTypes • MethodName • XmlNameSpace • Header[] headers
BinaryFormatter • Binary Stream Format Design • Primitive types are written directly • Array of primitives - bytes are copied directly from the CLR (100x faster then using reflection) • All other types are written as records • Basic record types • SerializedStreamHeader, Object, ObjectWithMap, ObjectWithMapAssemId, ObjectWithMapTyped, ObjectWithMapTypedAssemId, ObjectString, Array, MemberPrimitiveTyped, MemberReference, ObjectNull, MessageEnd, Assembly • Record types added later for performance • ObjectNullMultiple256, ObjectNullMultiple, ArraySinglePrimitive, ArraySingleObject, ArraySingleString, CrossAppDomainMap, CrossAppDomainString, CrossAppDomainAssembly, MethodCall, MethodReturn
Serialization 1 5 1 2 2 6 3 3 4 7 5 6 4 8 7 9 8 9 10 10
Serialization Complications • MethodCall/MethodReturn • CrossAppDomain • Determine when Type information is needed • Value classes are nested/Non-Value classes are top level • Arrays – mix of jagged and multi-dimensional [][,,][] • Array of primitives copied to stream as a collection of bytes • Surrogates • ISerializable
Deserialization 5 1 2 6 3 7 Fixups Process 1, fixups 2, 3, 4 Process 2, fixups 5,6 Process 3, fixups 7 Process 4, fixups 8,9 4 8 9 10
Deserialization Binary • Parsing • Record Headers specify what is coming next in stream • Primitives do not have headers so need to use previously encountered record headers as map for reading primitive
Deserialization Complications • Remoting • MethodCall/MethodReturn optimization • CrossAppDomain • Value Type • ISerializable • Surrogate
What Went Wrong -1 • Beta1 gave GC a workout • Object oriented style is dangerous for plumbing. Lots of objects created. • Solution • Use object singletons (or fixed number) • Object pools • Start with larger storage for growing objects such as ArrayLists • Special cases – Primitive parameters - serialization classes aren’t used so aren’t initialized.
What Went Wrong - 2 • Performance is never good enough • Reflection is slow • Boxes value types • Interpretive • Serialization classes are slow • Boxes value types • Keeps lots of state around in resizable arrays
What Went Wrong - 3 • Formatters are slow • Object type and field information inflates size of stream (reflection and versioning requirement) • Lots of irregular cases • Clr – value types, singletons, transformations • Serialization – ISerializable, Resolving graph rules • Code more general then it has to be • now we know, but during development underlying system kept changing • Clr object model (variants, reflection, security, BCL, etc) • Serialization model (ISerializable underwent many changes) • Soap spec kept changing • Binary Format changed for perf reasons • Fixups used too much – strings and value classes are put in stream when encountered, object references are put in stream, with object coming later • Soap 1.2 nests reference objects • BinaryFormatter should be changed to nest objects
What Went Wrong -4 • Why didn’t we use Reflection.Emit • 1200 serialization to make up cost • Couldn’t serialize private and internal fields • BinaryFormatter Primitive Arrays uses array copy rather then reflection • 100x faster when switch was made • Cross Appdomain smuggling • Primitive and strings bypasses the BinaryFormatter results in faster times then COM cross process • BinaryFormatter prototyped option to omit type information in stream • 4 byte point class serialized in 10 bytes instead of 125 bytes. • Future version of the Formatters will be much faster • Improvements to Reflection.Emit • Cross Appdomain Serialization Prototype implemented in the EE.
What Went Wrong - 5 • Web Services • The BinaryFormatter and SoapFormatter existed before Web Service classes • Serialization, Formatter, and Remoting classes are based on object oriented programming, RPC and COM models • Web Services started to gain importance late in the development of the .Net Frameworks • Future releases will combine the two models, use same custom attributes and underlying messaging model • SoapFormatter • Specify shape of stream to some extent • Object WSDL, added additional schema information to WSDL to allow generation of the CLR object model in client proxies • Object WSDL is only way in .Net Frameworks V1 to copy clr metadata without copying dll which includes code
The Formatters are Great (at least useful) • Only way to make a deep copy of an object graph with complete fidelity • Integrated with .Net Remoting • Combines the CLR Object Model with the Web Services Model • Version resilient (at least the attempt is made) • Secure • Perf isn’t all that bad