Wednesday, March 30, 2011

Serialization != Performance

In the course of a new endeavour I tried to serialize a large dictionary full of data. The amazing things was that it worked flawlessly. With just a few lines of code the entire thing was written to disk and easily recreated. Only caveat was that every class or struct in there had to marked with [Serialize] in the source. The code for saving a Dictionary is as follows;

// Saving the Dictionary
private void SaveData(string fileName)

{
using (var fs = new FileStream(fileName, FileMode.Create))
{

var formatter = new BinaryFormatter();
formatter.Serialize(fs, this);

}

}


// Loading (combined with creating the dictionary using a static call))
private static SGFDB LoadData(string fileName)
{
using (var fs = new FileStream(fileName, FileMode.Open))
{

var formatter = new BinaryFormatter();
return (SGFDB)formatter.Deserialize(fs);

}

}


Unfortunately this resulted in a humongous file. And delving deeper into serialization I discovered that it's magic had some serious drawbacks. The entries in the Dictionary were in fact .SGF files parsed into an Object (.SGF is a textformat for BoardGames. ). In the end I created three methods to save the parsed Object. Which came out as follows;

  1. Original .SGF filesize 1.5 Kb
  2. Full Serialized filesize 7 Kb
  3. Smart Serialized filesize 2 Kb
  4. Do It Yourself filesize 0.5 Kb
Smart Serialization means I implemented GetObjectData() and a DeSerializing Constructor. In those I serialized only what was necessary, and not extra objects that were added after parsing like undo/redo information. So Full Serialization had extra information when compared to the .SGF. But Smart Serialization did not. And even with Smart Serialization the file was bigger as the .SGF. I used the BinaryFormatter for serialization, which I expected to be very lean and mean.

As it turns out you get both property names and contents in your file for every little piece of data. And all of it is done using reflection. By doing it yourself (stream all you need into a binary file without labels for each field) you gain both memory and performance. I will back up the performance claim with some numbers soon as this is more of a feeling at the moment.

Doing it yourself in a binary format means ugly code like this;

// Write a ushort
s.Write(BitConverter.GetBytes(Turn), 0, 2);


// Read the same ushort back
var buf2 = new byte[2];
s.Read(buf2, 0, 2);
ushort Turn = BitConverter.ToUInt16(buf2, 0);

I first tried using StreamReader and StreamWriter, but they work on Arrays of Char and I couldn't get Byte[] data into the same file using those without doing really bad stuff with the underlying basestream object.

So I now have a small footprint and fast speed. Unfortunately losing some flexibility (Serialization probably plays nice when an objects definition changes only a little and using reflection everything still goes to the correct place). And I gained a lot of ugly code :-(

No comments:

Post a Comment