Improve Your Technology

Just another blog for techology

A Small Description about Serialization

Serialization

Definition of Serialization

 

Serialization is the process of taking an object and converting it to a format in which it can be transported across a network or persisted to a storage location. The storage location could be as simple as using a file or a database. The serialized format contains the object’s state information. Deserialization is the process of using the serialized state information to reconstruct the object from the serialized state to its original state. In essence, the process of serialization allows an object to be serialized, shipped across the network for remoting or persisted in a storage location such as the ASP.NET cache, and then be reconstructed for use at a later point in time.

 

Serialization Formats

 

There are three formats provided by the Microsoft .NET framework to which objects can be serialized. The formats are binary, SOAP, and XML. The format is controlled based upon what object is used to perform the serialization. The XML format is produced by using the System.Xml.Serialization.XmlSerializer class. The SOAP and binary formats are produced by using classes under the System.Runtime.Serialization.Formatters namespace.

 

There are subtle differences among the serialized formats. The binary-based format is the most compact and light of the three formats. The XML formatter only serializes public fields and properties, while binary and SOAP do not adhere to that limitation.

 

The .NET Framework features two serializing technologies:

·          Binary serialization preserves type fidelity, which is useful for preserving the state of an object between different invocations of an application. For example, you can share an object between different applications by serializing it to the Clipboard. You can serialize an object to a stream, to a disk, to memory, over the network, and so forth. Remoting uses serialization to pass objects “by value” from one computer or application domain to another.

·          XML serialization serializes only public properties and fields and does not preserve type fidelity. This is useful when you want to provide or consume data without restricting the application that uses the data. Because XML is an open standard, it is an attractive choice for sharing data across the Web. SOAP is likewise an open standard, which makes it an attractive choice.

 

 

 Uses

Serialization has a number of advantages. It provides:

  • a method of persisting objects which is more convenient than writing their properties to a text file on disk, and re-assembling them by reading this back in.
  • a method of issuing remote procedure calls, e.g., as in SOAP
  • a method for distributing objects, especially in software componentry such as COM, CORBA, etc.
  • a method for detecting changes in time-varying data.

For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages.

Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.

Even on a single machine, primitive pointer objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling.

Since both serializing and deserializing can be driven from common code, (for example, the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time, and thus 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy, since differences can be detected “on the fly”. This is a way to understand the technique called Differential Execution. It is useful in the programming of user interfaces whose contents are time-varying – graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.

Consequences

Serialization, however, breaks the opacity of an abstract data type by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of proprietary software often keep the details of their programs’ serialization formats a trade secret. Some deliberately obfuscate or even encrypt the serialized data. This process is often termed “instantated oatmealization” by the open source community, a pun on the homophony of serialization and cereal. [1]

Yet, interoperability requires that applications be able to understand each other’s serialization formats. Therefore remote method call architectures such as CORBA define their serialization formats in detail and often provide methods of checking the consistency of any serialized stream when converting it back into an object.

Human-readable serialization

In the late 1990s, a push to provide an alternative to the standard serialization protocols started: the XML markup language was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte stream based encoding, which is generally more practical. A future solution to this dilemma could be transparent compression schemes (see binary XML).

XML is today often used for asynchronous transfer of structured data between client and server in Ajax web applications. An alternative for this use case is JSON, a more lightweight text-based serialization protocol which uses JavaScript syntax but is supported in numerous other programming languages as well.

Scientific serialization

For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.

Programming language support

Several object-oriented programming languages directly support object serialization (or object archival), either by syntactic sugar elements or providing a standard interface for doing so.

Some of these programming languages are Ruby, Smalltalk, Python, PHP, Objective-C, Java, and the .NET family of languages.

There are also libraries available that add serialization support to languages that lack native support for it.

.NET Framework

In the .NET languages, classes can be serialized and deserialized by adding the Serializable attribute to the class.

‘VB Example

<Serializable()> Class Employee

// C# Example

[Serializable]

class Employee

If new members are added to a serializable class, they can be tagged with the OptionalField attribute to allow previous versions of the object to be deserialized without error. This attribute affects only deserialization, and prevents the runtime from throwing an exception if a member is missing from the serialized stream. A member can also be marked with the NonSerialized attribute to indicate that it should not be serialized. This will allow the details of those members to be kept secret.

To modify the default deserialization (for example, to automatically initialize a member marked NonSerialized), the class must implement the IDeserializationCallback interface and define the IDeserializationCallback.OnDeserialization method.

Objects may be serialized in binary format for deserialization by other .NET applications. The framework also provides the SoapFormatter and XmlSerializer objects to support serialization in human-readable, cross-platform XML.

Objective-C

In the Objective-C programming language, serialization (most commonly known as archival) is achieved by overriding the write: and read: methods in the Object root class. (NB This is in the GNU runtime variant of Objective-C. In the NeXT-style runtime, the implementation is very similar.)

Example

The following example demonstrates two independent programs, a “sender”, who takes the current time (as per time in the C standard library), archives it and prints the archived form to the standard output, and a “receiver” which decodes the archived form, reconstructs the time and prints it out.

When compiled, we get a sender program and a receiver program. If we just execute the sender program, we will get out a serialization that looks like:

GNU TypedStream 1D@îC¡

(with a NULL character after the 1). If we pipe the two programs together, as sender | receiver, we get

received 1089356705

showing the object was serialized, sent, and reconstructed properly.

In essence, the sender and receiver programs could be distributed across a network connection, providing distributed object capabilities.

Sender.h

#import <objc/Object.h>

#import <time.h>

#import <stdio.h>

@interface Sender : Object

{

   time_t  current_time;

}

– (id) setTime;

– (time_t) time;

– (id) send;

– (id) read: (TypedStream *) s;

– (id) write: (TypedStream *) s;

@end

Sender.m

#import “Sender.h”

@implementation Sender

– (id) setTime

{

   //Set the time

   current_time = time(NULL);

   return self;

}

– (time_t) time;

{

   return current_time;

}

– (id) write: (TypedStream *) stream

{

   /*

    *Write the superclass to the stream.

    *We do this so we have the complete object hierarchy,

    *not just the object itself.    */

   [super write:stream];

   /*

    *Write the current_time out to the stream.

    *time_t is typedef for an integer.

    *The second argument, the string “i”, specifies the types to write

    *as per the @encode directive.

    */

   objc_write_types(stream, “i”, &current_time);

   return self;

}

– (id) read: (TypedStream *) stream

{

   /*

    *Do the reverse to write: – reconstruct the superclass…

    */

   [super read:stream];

   /*

    *And reconstruct the instance variables from the stream…

    */

   objc_read_types(stream, “i”, &current_time);

   return self;

}

– (id) send

{

   //Convenience method to do the writing. We open stdout as our byte stream

   TypedStream *s = objc_open_typed_stream(stdout, OBJC_WRITEONLY);

   //Write the object to the stream

   [self write:s];

  //Finish up – close the stream.

   objc_close_typed_stream(s);

}

@end

Sender.c

#import “Sender.h”

int

main(void)

{

   Sender *s = [Sender new];

   [s setTime];

   [s send];

   return 0;

}

Receiver.m

#import “Receiver.h”

@implementation Receiver

– (id) receive

{

   //Open stdin as our stream for reading.

   TypedStream *s = objc_open_typed_stream(stdin, OBJC_READONLY);

   //Allocate memory for, and instantiate the object from reading the stream.

   t = [[Sender alloc] read:s];

   objc_close_typed_stream(s);

}

– (id) print

{

   fprintf(stderr, “received %d\n“, [t time]);

}

@end

Receiver.c

#import “Receiver.h”

int

main(void)

{

   Receiver *r = [Receiver new];

   [r receive];

   [r print];

   return 0;

}

Advertisements

September 13, 2008 - Posted by | Serialization, Technology | ,

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: