< Chap 4 TOC: Serialization | Main | C++ Binary SOAP Program Example >

 


 

Chapter 4 Part 1:

Serialization

 

 

What do we have in this chapter 4 Part 1?

  1. An Overview

  2. Serialization Formats

  3. Binary Serialization

  4. Controlling Binary Serialization

 

 

An Overview

 

Serialization is the process of packaging data structures into a format that can be easily transported. Typically, an instance of a class in one process is taken and sent to another process over some kind of streaming mechanism such that the remote process can reconstruct an instance of the same class with the same member values. This chapter will discuss the process of serialization as well as the different types of serialization available.

Serialization is a very powerful and useful mechanism for sharing data. If you’ve ever had to send all the properties or data or both associated with a data structure to another process, say on another machine, you’re familiar with the difficulty. The challenge is that a class or other data structure can reference memory and pointers that are valid only in the context of the current process. Implementing your own scheme for packaging complex structures can be extremely cumbersome, especially if the members are variable-length arrays or arbitrary string values. Fortunately, the Microsoft Windows .NET Framework offers built-in support for serialization.

In our scenario, we transport the structure across process boundaries, but with the .NET Framework, serialization can occur over application domains. An application domain can be a process boundary, but it can also be an application running in a “sandbox” in a process with other applications.

In the .NET Framework, objects are serialized onto a stream and deserialized from a stream. As we saw in Chapter 2, a stream can take many forms, network, memory, file, and so on. In this chapter, we’ll concentrate on the process of serialization and deserialization and not the medium over which it is transported. Before getting into the details of serialization, we’ll first look at the different formats that data can be serialized into. Then we’ll have a detailed discussion of each type of serialization.

 

Serialization Formats

 

There are three major formats of serialization available:

 

  1. Binary,
  2. XML, and
  3. SOAP

 

The most obvious difference between these formats is the physical output of the serialization process and, of course, the source format from which the object is deserialized. Binary serialization produces a non-printable sequence of byte-oriented data that represents the source object. For example, if a class is binary serialized to a file, the file contents might look like the following code, where the left side contains the raw hex and the right side contains the printable characters (with a dot indicating a non-printable character). You can use hex editor to produce this output:

 

00 01 00 00 00 FF FF FF-FF 01 00 00 00 00 00 00  ................

00 0C 02 00 00 00 3D 73-69 6D 70 6C 65 2C 20 56  ......=simple, V

65 72 73 69 6F 6E 3D 30-2E 30 2E 30 2E 30 2C 20  ersion=0.0.0.0,

43 75 6C 74 75 72 65 3D-6E 65 75 74 72 61 6C 2C  Culture=neutral,

20 50 75 62 6C 69 63 4B-65 79 54 6F 6B 65 6E 3D   PublicKeyToken=

6E 75 6C 6C 05 01 00 00-00 24 53 69 6D 70 6C 65  null.....$Simple

42 69 6E 61 72 79 53 65-72 6C 69 61 7A 61 74 69  BinarySerliazati

6F 6E 2E 4D 79 42 61 73-69 63 44 61 74 61 03 00  on.MyBasicData..

00 00 09 49 6E 74 46 69-65 6C 64 31 0C 53 74 72  ...IntField1.Str

69 6E 67 46 69 65 6C 64-31 09 49 6E 74 46 69 65  ingField1.IntFie

6C 64 32 00 01 00 08 08-02 00 00 00 2A 04 00 00  ld2.........*...

06 03 00 00 00 0F 42 61-73 69 63 20 44 61 74 61  ......Basic Data

20 49 6E 66 6F 38 06 00-00 0B                     Info8....

 

Another characteristic of binary serialization is that it retains type information within the generated data stream, which means that when the object is deserialized, the re-created object is an exact copy of the original.

The advantage of binary serialization is that the resulting serialized stream is very compact, and the disadvantage is that a binary stream is not very portable. As you’ll see later in this section, the difference in the serialized size between binary and the other serialization methods can be great, which can have considerable impact if the data is being serialized to the network. If both the producer and the consumer of the binary stream use the .NET Framework, portability won’t be an issue. However, if the application needs to send a serialized object to an application running on a different operating system, portability issues are likely because possible differences in byte ordering and data type sizes could introduce compatibility problems.

XML serialization, on the other hand, is all about standards and portability. When a class is serialized to XML, a character stream is created that’s formatted according to the XML language standard. XML is an incredibly flexible markup language used to describe arbitrary data. The element tags used to describe the data are specific to the data itself. The following is an example of valid XML data:

 

<?xml version="1.0" encoding="utf-8" ?>

<MyBasicData xmlns:xsd="http://www.w3.org/2001/XMLSchema"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <IntField1>1234</IntField1>

  <StringField1>Hello World</StringField1>

</MyBasicData>

 

The element tags in this example are MyBasicData, IntField1, and StringField1. Because XML allows arbitrary element names, they could be named something completely different. Of course, if XML is being used to exchange information, the producer and the consumer need to agree on a set of standard element tags to describe the data. As you can see in the previous example, the serialized XML does not contain any type information unless the element tags chosen somehow describe the data type of the element value. For example, there’s no indication whether the element value 1234 is an integer or a string.

As mentioned, XML is ideal when portability is an issue. Also, because many services today use XML to describe data, it’s easy to write code that outputs XML in a specified format that can be used by other entities. The portability and interoperability gains do come with a cost: the generated XML stream is significantly larger than the equivalent binary serialized stream, and XML serialization does not retain type information. Table 4-1 compares the size of the serialized object for both binary and XML serialization in the .NET Framework version 1.

 

Table 4-1: Serialization Efficiency Comparison

 

Serialization Class

Type

Class Size (Bytes)

Field Count

Serialized Size (Bytes)

BinaryFormatter

binary

24

3

202

XmlSerializer

XML

24

3

745

BinaryFormatter

binary

4020

3 (array)

4213

XmlSerializer

XML

4020

3 (array)

16827

 

The class size column indicates the size of all member properties of the class. Field count indicates the number of properties contained in the class, and the serialized size column indicates the size of the resulting serialized object in bytes. Note that the second pair of entries is a class with three fields of which one is a 1000-element array. As you can see, for binary serialization, there’s roughly a 180-byte fixed overhead for serialization regardless of the original class size. On the other hand, with XML serialization, the serialized data size is not directly tied to the field count or the class size because the XML element names generated depend on how the class is defined. We’ll discuss the BinaryFormatter and XmlSerializer classes in the next two sections.

The last serialization type is a protocol based on XML called SOAP (Simple Object Access Protocol), which is a standard method for describing, discovering, and invoking methods. The XML language itself imposes no restrictions on how to describe the meaning of the data; it only defines the format and sequence of the tags that describe the data. The SOAP specification defines the set of common element tags and properties to describe data. The SOAP specification can be found at SOAP.

Although the SOAP protocol is transport-independent, meaning that it can be sent over a number of different transport protocols, SOAP is typically transported over the HTTP protocol in version 1 of the .NET Framework and is an integral part of XML-based Web services, which are covered in greater detail in Chapter 12. The following code listing shows a class serialized to the SOAP protocol:

 

<SOAP-ENV:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

   xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"

   xmlns:clr="http://schemas.microsoft.com/soap/encoding/clr/1.0" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

  <SOAP-ENV:Body>

    <a1:MyBasicData id="ref-1"

    xmlns:a1="http://schemas.microsoft.com/clr/nsassem/SimpleBinarySerialization/simple%2C%20Version%3D0.0.0.0%2C%20Culture%3Dneutral%2C%20Publ

         icKeyToken%3Dnull">

      <IntField1>999</IntField1>

      <StringField1 id="ref-3">Hello World</StringField1>

      <IntField2>1492</IntField2>

    </a1:MyBasicData>

  </SOAP-ENV:Body>

</SOAP-ENV:Envelope>

 

Now it’s time to get into the details of serialization, and we’ll start with binary serialization.

 

Binary Serialization

 

As mentioned earlier, binary serialization takes a class and packages it onto a stream as a series of bytes that can be rebuilt into an exact copy of the object at the destination. For binary serialization to work, both the serializer and deserializer must have access to the assembly that contains the definition for the object. The .NET Framework offers considerable flexibility in controlling how data is serialized. Before an object can be binary serialized, it must be marked as such by adding the [Serializable] attribute to the class, as shown here:

 

C#

 

[Serializable]

public class MyBasicData

{

    public int IntField1;

    public string StringField;

    private int IntField2;

}

 

Visual Basic .NET

 

<Serializable()> _

Class MyBasicData

    Public IntField1 As Integer

    Public StringField1 As String

    Private IntField2 As Integer

End Class

 

Each class that is to be serialized requires the Serializable attribute, which means that classes derived from a serializable class must also have the Serializable attribute set for them to be serialized. Likewise, if a class contains instances of other classes as members, those too must be marked with the Serializable attribute for the entire structure to be packaged. If an error is encountered during serialization or deserialization, the System.Runtime.Serialization.SerializationException exception is thrown.

The next step is to create an instance of the formatter, which will perform the work of serializing the class. In the case of binary serialization, a binary formatter object is created that implements the IFormatter interface. After the object is created, the Serialize() method is invoked on a valid stream with the object to be serialized. The BinaryFormatter class is part of the System.Runtime.Serialization.Formatters.Binary namespace. The following C# code illustrates serializing a simple class (defined earlier) to a file stream.

 

C#

 

FileStream   fileStream;

MyBasicData  myData = new MyBasicData()

FileStream   binFileStream;

IFormatter   binFormatter = new BinaryFormatter();

 

binFileStream = new FileStream(  // Open a file stream to write to

    "Binary_Serialization.bin",

    FileMode.Create,

    FileAccess.Write,

    FileShare.None

    );

try

{

    binFormatter.Serialize( binFileStream, myData );  // Serialize it

    binFileStream.Close();

}

catch (System.Runtime.Serialization.SerializationException err )

{

    Console.WriteLine("Error occurred during serialization: " + err.Message);

}

 

Visual Basic .NET

 

Dim myData As MyBasicData = New MyBasicData()

Dim binFileStream As FileStream

Dim binFormatter As IFormatter = New BinaryFormatter()

 

binFileStream = new FileStream( _

    "Binary_Serialization.bin", _

    FileMode.Create, _

    FileAccess.Write, _

    FileShare.None _

    )

Try

    binFormatter.Serialize( binFileStream, myData )

    binFileStream.Close()

Catch err as System.Runtime.Serialization.SerializationException

    Console.WriteLine("Error occurred during serialization: " + err.Message)

End Try

 

This code serializes an instance of the MyBasicData class to a file. The basic steps are:

 

  1. Create an instance of the serializable class that’s to be serialized.
  2. Create the stream on which the data is to be serialized.
  3. Create the binary formatter.
  4. Serialize the object to the stream.

 

Once the object is serialized to a binary stream, the following code will deserialize the object:

 

C#

 

FileStream    binFileStream;

IFormatter    binFormatter;

MyBasicData   myData;

 

binFileStream = new FileStream(

    "Binary_Serialization.bin",

    FileMode.Open,

    FileAccess.Read,

    FileShare.Read

    );

binFormatter = new BinaryFormatter();

try

{

    myData = (MyBasicData) binFormatter.Deserialize( binFileStream );

}

catch ( System.Runtime.Serialization.SerializationException err )

{

    Console.WriteLine("An error occurred during deserialization: {0}", err.Message);

}

 

Visual Basic .NET

 

Dim binFileStream As FileStream

Dim binFormatter As IFormatter

Dim myData As MyBasicData

 

binFileStream = new FileStream( _

    "Binary_Serialization.bin", _

    FileMode.Open, _

    FileAccess.Read, _

    FileShare.Read _

    )

binFormatter = new BinaryFormatter()

Try

    myData = binFormatter.Deserialize( binFileStream )

Catch err as System.Runtime.Serialization.SerializationException

    Console.WriteLine("An error occurred during deserialization: {0}", err.Message)

End Try

 

To summarize the binary deserialization process:

 

  1. Open the file stream where the serialized object is contained.
  2. Create an instance of the binary formatter.
  3. Call the Deserialize() method on the formatter with the stream as an argument.

 

If an error occurs while serializing or deserializing the data, the exception SerializationException is thrown. This exception typically occurs during deserialization if the data on the stream does not match a valid serialized object header. For example, while serializing the object, if the text string hello is written to the FileStream before the serialized object, and on deserialization, those 5 bytes are not consumed beforehand, the deserialization method will encounter the invalid header, which is the string.

As you can see, the binary serialization process is simple, straightforward, and powerful. Imagine having a class that represents some data set such as a customer database. The entire set of customers (for example, multiple instances of the customer class) can be serialized to a file using a file stream or even transmitted across the network using a network stream.

 

 

Controlling Binary Serialization

 

In the previous code sample, the entire class was serialized to the data stream. There could be a situation in which you do not want all the class data serialized. For example, it would not be a good idea to serialize the password data contained in a class that describes a user account (at the very least, not in clear text). There are two methods for controlling how data is binary serialized: selectively serializing class properties by adding an additional attribute, and implementing a custom serialization interface. If the issue is that only certain fields should be serialized, the attribute [NonSerialized] can be placed before each field not to be serialized. The following class definition illustrates this method:

 

[Serializable]

public class MySelectiveData

{

    public int UserId;

    public string UserName;

    [NonSerialized]

    private string Password;

}

 

In this class definition, the Password element is marked as NonSerialized, which means that this field will not be packaged for transport over a stream when the Serialize() method is invoked.

If exact control over how serialization and deserialization occurs is required, the process can be further customized by having the class implement the ISerializable interface. This method is useful when certain marshaled data is no longer valid in the process where deserialization takes place. For example, if a serializable class contains a reference to the local IP address and is then marshaled to a process on a different machine, it might be desirable to have the local IP field reflect the current machine. If a class implements a distributed service, which can reside on any machine in the network, the class representing this service can be serialized to another machine for load balancing purposes and would need to update the local IP information to re-create sockets to handle client requests. For a class to implement the ISerializable interface, it must implement the following two methods:

  1. public virtual void GetObjectData(SerializationInfo info, StreamingContext context );

  2. protected  MyObjectConstructor ( SerializationInfo info, StreamingContext context );

 

The GetObjectData() method is used in the serialization process. Each field that is to be serialized is assigned a value name in the SerializationInfo object, which is achieved by calling the AddValue() method of SerializationInfo with the value name and the field. The second required method is a constructor for the class, which is called when the object is deserialized. This constructor retrieves the serialized field values and initializes the member properties to values that are meaningful in the current context. The following code illustrates custom serialization:

 

C#

 

[Serializable]

public class MyCustomData : ISerializable

{

    public int IntField1;

    public string StringField1;

    public IPAddress LocalAddress;

 

    // Default constructor

    public MyCustomData()

    {

        IntField1 = 1234;

        StringField1 = "Initialize Data";

        LocalAddress = IPAddress.Any;

    }

    // Called in the serialization process

    public virtual void GetObjectData(SerializationInfo info, StreamingContext context)

    {

        info.AddValue("IntField1", IntField1);

        info.AddValue("whatever", StringField1);

        info.AddValue("LocalIP", LocalAddress);

    }

    // Constructor used in the deserialization process

    protected MyCustomData(SerializationInfo info, StreamingContext context)

    {

        IPHostEntry ipHost = Dns.GetHostByName( "localhost" );

        IPAddress   resolveAddress;

 

        // Retrieve the value of LocalAddress

        try

        {

            ipHost = Dns.GetHostByName("localhost");

            if (ipHost.AddressList.Length > 0)

                resolveAddress = ipHost.AddressList[0];

            else

                resolveAddress = IPAddress.Loopback;

        }

        catch ( SocketException err )

        {

            Console.WriteLine("Unable to resolve localhost; using loopback");

            resolveAddress = IPAddress.Loopback;

        }

        IntField1 = info.GetInt32( "IntField1" );

        StringField1 = info.GetString( "whatever" );

        LocalAddress = resolveAddress;

    }

}

 

Visual Basic .NET

 

<Serializable()> _

Public Class MyCustomData Implements ISerializable

    Public IntField1 As Integer

    Public StringField1 As String

    <NonSerialized()> Public LocalAddress As IPAddress

 

    ' Default constructor

    Public Sub New()

        IntField1 = 1234

        StringField1 = "Initialize Data"

        LocalAddress = IPAddress.Any

    End Sub

 

    ' Called in the serialization process

    Sub GetObjectData(ByVal info As SerializationInfo, ByVal context As StreamingContext) Implements ISerializable.GetObjectData

        info.AddValue("IntField1", IntField1)

        info.AddValue("whatever", StringField1)

    End Sub

 

    ' Constructor used in the deserialization process

    Private Sub New(ByVal info As SerializationInfo, ByVal c As StreamingContext)

        Dim ipHost As IPHostEntry

 

        IntField1 = info.GetInt32("IntField1")

        StringField1 = info.GetString("whatever")

 

        ' Retrieve the value of LocalAddress

        Try

            ipHost = Dns.GetHostByName(Dns.GetHostName())

            If (ipHost.AddressList.Length > 0) Then

                LocalAddress = ipHost.AddressList(0)

            Else

                LocalAddress = IPAddress.Loopback

            End If

        Catch err As System.Net.Sockets.SocketException

            Console.WriteLine("Unable to resolve localhost; using loopback")

            LocalAddress = IPAddress.Loopback

        End Try

    End Sub

End Class

 

In this code snippet example, the GetObjectData() method assigns values to each property that is to be serialized. As you can see, the string values assigned can be arbitrary values, and all the GetObjectData() method needs to do is assign a value to each property that is to be serialized.

The constructor for the class takes SerializationInfo and StreamingContext objects for parameters. In the preceding example, the constructor retrieves the stored values for IntField1 as well as StringField1. Because the LocalAddress field might not have meaning at the destination process, the custom constructor resolves the current host name and assigns the first IPAddress resolved to the member field. If the DNS lookup fails, the Internet Protocol version 4 (IPv4) loopback address is assigned.

The following examples illustrate the methods of binary serialization covered earlier. The sample illustrates three ways to serialize different classes: a simple class, a class with the [NonSerialized] attribute, and a class that implements the ISerializable interface. A more advanced serialization sample that serializes data over a TCP socket connection is given after these examples both for C# and VB .NET

 

 

 


 

< Chap 4 TOC: Serialization | Main | C++ Binary SOAP Program Example >