Yos Riady software craftsman 🌱

Python object Serialization with pickle

Let’s learn how to serialize and deserialize python objects using the built-in pickle library. We’ll also use the csv library to store our serialized output in a CSV (comma separated values) file.

Serialization refers to the process of translating data structures or object state into a format that can be stored and resurrected later.

There are several applications of object serialization:

  • Communication: If you have two machines that are running the same code, and they need to communicate, an easy way is for one machine to build an object with information that it would like to transmit, and then serialize that object to the other machine. It’s not the best method for communication, but it gets the job done.

  • Persistence: If you want to store the state of a particular operation in a database, it can be easily serialized to a byte array, and stored in the database for later retrieval.

  • Deep Copy: If you need an exact replica of an Object, and don’t want to go to the trouble of writing your own specialized clone() class, simply serializing the object to a byte array, and then de-serializing it to another object achieves this goal.

Back at Tripvillas, prior to running a data migration script, I had the idea of pickling the objects as a form of backup/persistence. This way, if any problems arose from the changes made by the script, we can easily restore the original state of the objects.

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy.

Pickle Documentation

We are going to first serialize a number of python objects in our test database, and recover those objects by later deserializing our output.

Let’s start by importing the relevant python packages:

import pickle
import csv

We are going to serialize Python lists, but bear in mind that this works for lists, dictionaries, functions, classes, and more complex Python objects.

import pickle

x = [1,2,3]

# dumps takes in a python object structures and returns a serialized string output which can later be deserialized back into the original python object

serialized_list = pickle.dumps(x) #'(lp0\nI1\naI2\naI3\na.'
deserialized_list = pickle.loads(serialized_list) #[1,2,3]

In the code above, we have dumps() which takes in a Python object and returns a serialized string output, and loads() which takes in a string and returns a deserialized Python object. Now let’s try piping out a number of serialized Python dictionaries into a csv file:

x = {'name':'Mark Z', 'friends':2550, 'account':1}
y = {'name':'Yos R', 'friends':150, 'account':2345}
z = {'name':'Tess A', 'friends':50, 'account':222}
items = [x,y,z]

with open('myfile.csv', 'a') as csvfile:
    writer = csv.writer(csvfile)
    for p in items:
        writer.writerow(cPickle.dumps(p))

With a CSV file open (with append permissions), we use the csv writer object to start writing rows of serialized Python objects into our file. Do note that the csv writer and reader objects takes in other optional parameters to output other CSV formats, for example:

delimiter=' ',
quotechar='|',
quoting=csv.QUOTE_MINIMAL

You can read up about these optional parameters for more detail on the Python docs.

Now that we have our csv file of serialized Python dictionaries, we can recover them by deserializing, using csv reader objects and pickle:

with open('myfile.csv', 'r') as csvfile:
    objects = []
    reader = csv.reader(csvfile)
    for row in reader:
        objects.append(pickle.loads(row))

Here, we are iterating through each row in our CSV file, and deserialize the strings back into their original Python object form.

And we now have recovered the original Python dictionaries x, y, and z in our objects array! Alternatively, instead of using loads() and dumps() which output serialized strings, you can use load() and dump() to write them directly to your files. Do read the library documentation for their respective API definitions.

Finally, It’s also worth taking note that the pickle library has a more optimized cousin called cPickle, which have similar API interface definitions. You can trivially use cPickle by importing as such:

import cPickle as pickle

You should use cPickle in production rather than pickle, simply because of how much faster it is in comparison (reputedly up to 1000x faster).

Additional reading:

Author

Yos is a software craftsman based in Singapore.

📬 Subscribe to my newsletter

Get notified of my latest articles by providing your email below.


Going Serverless book

Interested to find out more about serverless? Going Serverless teaches you how to build scalable applications with the Serverless framework and AWS Lambda. You'll learn how to design, develop, test, deploy, and secure Serverless applications from planning to production.

Learn More →