Serialization refers to the process of translating data structures or object state into a format that can be stored and resurrected later.
There are several applications of object serialization:
Communication: If you have two machines that are running the same code, and they need to communicate, an easy way is for one machine to build an object with information that it would like to transmit, and then serialize that object to the other machine. It’s not the best method for communication, but it gets the job done.
Persistence: If you want to store the state of a particular operation in a database, it can be easily serialized to a byte array, and stored in the database for later retrieval.
Deep Copy: If you need an exact replica of an Object, and don’t want to go to the trouble of writing your own specialized clone() class, simply serializing the object to a byte array, and then de-serializing it to another object achieves this goal.
Back at Tripvillas, prior to running a data migration script, I had the idea of pickling the objects as a form of backup/persistence. This way, if any problems arose from the changes made by the script, we can easily restore the original state of the objects.
picklemodule implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy.
We are going to first serialize a number of python objects in our test database, and recover those objects by later deserializing our output.
Let’s start by importing the relevant python packages:
We are going to serialize Python lists, but bear in mind that this works for lists, dictionaries, functions, classes, and more complex Python objects.
In the code above, we have
dumps() which takes in a Python object and returns a serialized string output, and
loads() which takes in a string and returns a deserialized Python object. Now let’s try piping out a number of serialized Python dictionaries into a csv file:
With a CSV file open (with append permissions), we use the
csv writer object to start writing rows of serialized Python objects into our file. Do note that the
csv writer and reader objects takes in other optional parameters to output other CSV formats, for example:
You can read up about these optional parameters for more detail on the Python docs.
Now that we have our csv file of serialized Python dictionaries, we can recover them by deserializing, using
csv reader objects and
Here, we are iterating through each row in our CSV file, and deserialize the strings back into their original Python object form.
And we now have recovered the original Python dictionaries
z in our
objects array! Alternatively, instead of using
dumps() which output serialized strings, you can use
dump() to write them directly to your files. Do read the library documentation for their respective API definitions.
Finally, It’s also worth taking note that the
pickle library has a more optimized cousin called
cPickle, which have similar API interface definitions. You can trivially use
cPickle by importing as such:
You should use
cPickle in production rather than
pickle, simply because of how much faster it is in comparison (reputedly up to 1000x faster).
📬 Get updates straight to your inbox!
Subscribe to my newsletter to make sure you don't miss anything.