A Small Description about Serialization
Definition of Serialization
Serialization is the process of taking an object and converting it to a format in which it can be transported across a network or persisted to a storage location. The storage location could be as simple as using a file or a database. The serialized format contains the object’s state information. Deserialization is the process of using the serialized state information to reconstruct the object from the serialized state to its original state. In essence, the process of serialization allows an object to be serialized, shipped across the network for remoting or persisted in a storage location such as the ASP.NET cache, and then be reconstructed for use at a later point in time.
There are three formats provided by the Microsoft .NET framework to which objects can be serialized. The formats are binary, SOAP, and XML. The format is controlled based upon what object is used to perform the serialization. The XML format is produced by using the System.Xml.Serialization.XmlSerializer class. The SOAP and binary formats are produced by using classes under the System.Runtime.Serialization.Formatters namespace.
There are subtle differences among the serialized formats. The binary-based format is the most compact and light of the three formats. The XML formatter only serializes public fields and properties, while binary and SOAP do not adhere to that limitation.
The .NET Framework features two serializing technologies:
· Binary serialization preserves type fidelity, which is useful for preserving the state of an object between different invocations of an application. For example, you can share an object between different applications by serializing it to the Clipboard. You can serialize an object to a stream, to a disk, to memory, over the network, and so forth. Remoting uses serialization to pass objects “by value” from one computer or application domain to another.
· XML serialization serializes only public properties and fields and does not preserve type fidelity. This is useful when you want to provide or consume data without restricting the application that uses the data. Because XML is an open standard, it is an attractive choice for sharing data across the Web. SOAP is likewise an open standard, which makes it an attractive choice.
Serialization has a number of advantages. It provides:
- a method of persisting objects which is more convenient than writing their properties to a text file on disk, and re-assembling them by reading this back in.
- a method of issuing remote procedure calls, e.g., as in SOAP
- a method for distributing objects, especially in software componentry such as COM, CORBA, etc.
- a method for detecting changes in time-varying data.
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages.
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.
Even on a single machine, primitive pointer objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling.
Since both serializing and deserializing can be driven from common code, (for example, the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time, and thus 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy, since differences can be detected “on the fly”. This is a way to understand the technique called Differential Execution. It is useful in the programming of user interfaces whose contents are time-varying – graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.
Serialization, however, breaks the opacity of an abstract data type by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of proprietary software often keep the details of their programs’ serialization formats a trade secret. Some deliberately obfuscate or even encrypt the serialized data. This process is often termed “instantated oatmealization” by the open source community, a pun on the homophony of serialization and cereal. 
Yet, interoperability requires that applications be able to understand each other’s serialization formats. Therefore remote method call architectures such as CORBA define their serialization formats in detail and often provide methods of checking the consistency of any serialized stream when converting it back into an object.
In the late 1990s, a push to provide an alternative to the standard serialization protocols started: the XML markup language was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte stream based encoding, which is generally more practical. A future solution to this dilemma could be transparent compression schemes (see binary XML).
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.
There are also libraries available that add serialization support to languages that lack native support for it.
In the .NET languages, classes can be serialized and deserialized by adding the
Serializable attribute to the class.
<Serializable()> Class Employee
// C# Example
If new members are added to a serializable class, they can be tagged with the
OptionalField attribute to allow previous versions of the object to be deserialized without error. This attribute affects only deserialization, and prevents the runtime from throwing an exception if a member is missing from the serialized stream. A member can also be marked with the
NonSerialized attribute to indicate that it should not be serialized. This will allow the details of those members to be kept secret.
To modify the default deserialization (for example, to automatically initialize a member marked
NonSerialized), the class must implement the
IDeserializationCallback interface and define the
Objects may be serialized in binary format for deserialization by other .NET applications. The framework also provides the
XmlSerializer objects to support serialization in human-readable, cross-platform XML.
In the Objective-C programming language, serialization (most commonly known as archival) is achieved by overriding the
read: methods in the Object root class. (NB This is in the GNU runtime variant of Objective-C. In the NeXT-style runtime, the implementation is very similar.)
The following example demonstrates two independent programs, a “sender”, who takes the current time (as per time in the C standard library), archives it and prints the archived form to the standard output, and a “receiver” which decodes the archived form, reconstructs the time and prints it out.
When compiled, we get a sender program and a receiver program. If we just execute the sender program, we will get out a serialization that looks like:
GNU TypedStream 1D@îC¡
(with a NULL character after the 1). If we pipe the two programs together, as sender | receiver, we get
showing the object was serialized, sent, and reconstructed properly.
In essence, the sender and receiver programs could be distributed across a network connection, providing distributed object capabilities.
@interface Sender : Object
– (id) setTime;
– (time_t) time;
– (id) send;
– (id) read: (TypedStream *) s;
– (id) write: (TypedStream *) s;
– (id) setTime
//Set the time
current_time = time(NULL);
– (time_t) time;
– (id) write: (TypedStream *) stream
*Write the superclass to the stream.
*We do this so we have the complete object hierarchy,
*not just the object itself. */
*Write the current_time out to the stream.
*time_t is typedef for an integer.
*The second argument, the string “i”, specifies the types to write
*as per the @encode directive.
objc_write_types(stream, “i”, ¤t_time);
– (id) read: (TypedStream *) stream
*Do the reverse to write: – reconstruct the superclass…
*And reconstruct the instance variables from the stream…
objc_read_types(stream, “i”, ¤t_time);
– (id) send
//Convenience method to do the writing. We open stdout as our byte stream
TypedStream *s = objc_open_typed_stream(stdout, OBJC_WRITEONLY);
//Write the object to the stream
//Finish up – close the stream.
Sender *s = [Sender new];
– (id) receive
//Open stdin as our stream for reading.
TypedStream *s = objc_open_typed_stream(stdin, OBJC_READONLY);
//Allocate memory for, and instantiate the object from reading the stream.
t = [[Sender alloc] read:s];
– (id) print
fprintf(stderr, “received %d\n“, [t time]);
Receiver *r = [Receiver new];
No comments yet.