Custom Type Example

Warning

The following examples document a deprecated feature. The SONManipulator API has limitations as a technique for transforming your data. Instead, it is more flexible and straightforward to transform outgoing documents in your own code before passing them to PyMongo, and transform incoming documents after receiving them from PyMongo.

Thus the add_son_manipulator() method is deprecated. PyMongo 3’s new CRUD API does not apply SON manipulators to documents passed to bulk_write(), insert_one(), insert_many(), update_one(), or update_many(). SON manipulators are not applied to documents returned by the new methods find_one_and_delete(), find_one_and_replace(), and find_one_and_update().

This is an example of using a custom type with PyMongo. The example here is a bit contrived, but shows how to use a SONManipulator to manipulate documents as they are saved or retrieved from MongoDB. More specifically, it shows a couple different mechanisms for working with custom datatypes in PyMongo.

Setup

We’ll start by getting a clean database to use for the example:

>>> from pymongo.mongo_client import MongoClient
>>> client = MongoClient()
>>> client.drop_database("custom_type_example")
>>> db = client.custom_type_example

Since the purpose of the example is to demonstrate working with custom types, we’ll need a custom datatype to use. Here we define the aptly named Custom class, which has a single method, x():

>>> class Custom(object):
...   def __init__(self, x):
...     self.__x = x
...
...   def x(self):
...     return self.__x
...
>>> foo = Custom(10)
>>> foo.x()
10

When we try to save an instance of Custom with PyMongo, we’ll get an InvalidDocument exception:

>>> db.test.insert({"custom": Custom(5)})
Traceback (most recent call last):
InvalidDocument: cannot convert value of type <class 'Custom'> to bson

Manual Encoding

One way to work around this is to manipulate our data into something we can save with PyMongo. To do so we define two methods, encode_custom() and decode_custom():

>>> def encode_custom(custom):
...   return {"_type": "custom", "x": custom.x()}
...
>>> def decode_custom(document):
...   assert document["_type"] == "custom"
...   return Custom(document["x"])
...

We can now manually encode and decode Custom instances and use them with PyMongo:

>>> import pprint
>>> db.test.insert({"custom": encode_custom(Custom(5))})
ObjectId('...')
>>> pprint.pprint(db.test.find_one())
{u'_id': ObjectId('...'),
 u'custom': {u'_type': u'custom', u'x': 5}}
>>> decode_custom(db.test.find_one()["custom"])
<Custom object at ...>
>>> decode_custom(db.test.find_one()["custom"]).x()
5

Automatic Encoding and Decoding

Needless to say, that was a little unwieldy. Let’s make this a bit more seamless by creating a new SONManipulator. SONManipulator instances allow you to specify transformations to be applied automatically by PyMongo:

>>> from pymongo.son_manipulator import SONManipulator
>>> class Transform(SONManipulator):
...   def transform_incoming(self, son, collection):
...     for (key, value) in son.items():
...       if isinstance(value, Custom):
...         son[key] = encode_custom(value)
...       elif isinstance(value, dict): # Make sure we recurse into sub-docs
...         son[key] = self.transform_incoming(value, collection)
...     return son
...
...   def transform_outgoing(self, son, collection):
...     for (key, value) in son.items():
...       if isinstance(value, dict):
...         if "_type" in value and value["_type"] == "custom":
...           son[key] = decode_custom(value)
...         else: # Again, make sure to recurse into sub-docs
...           son[key] = self.transform_outgoing(value, collection)
...     return son
...

Now we add our manipulator to the Database:

>>> db.add_son_manipulator(Transform())

After doing so we can save and restore Custom instances seamlessly:

>>> db.test.remove() # remove whatever has already been saved
{...}
>>> db.test.insert({"custom": Custom(5)})
ObjectId('...')
>>> pprint.pprint(db.test.find_one())
{u'_id': ObjectId('...'),
 u'custom': <Custom object at ...>}
>>> db.test.find_one()["custom"].x()
5

If we get a new Database instance we’ll clear out the SONManipulator instance we added:

>>> db = client.custom_type_example

This allows us to see what was actually saved to the database:

>>> pprint.pprint(db.test.find_one())
{u'_id': ObjectId('...'),
 u'custom': {u'_type': u'custom', u'x': 5}}

which is the same format that we encode to with our encode_custom() method!

Binary Encoding

We can take this one step further by encoding to binary, using a user defined subtype. This allows us to identify what to decode without resorting to tricks like the _type field used above.

We’ll start by defining the methods to_binary() and from_binary(), which convert Custom instances to and from Binary instances:

Note

You could just pickle the instance and save that. What we do here is a little more lightweight.

>>> from bson.binary import Binary
>>> def to_binary(custom):
...   return Binary(str(custom.x()).encode(), 128)
...
>>> def from_binary(binary):
...   return Custom(int(binary))
...

Next we’ll create another SONManipulator, this time using the methods we just defined:

>>> class TransformToBinary(SONManipulator):
...   def transform_incoming(self, son, collection):
...     for (key, value) in son.items():
...       if isinstance(value, Custom):
...         son[key] = to_binary(value)
...       elif isinstance(value, dict):
...         son[key] = self.transform_incoming(value, collection)
...     return son
...
...   def transform_outgoing(self, son, collection):
...     for (key, value) in son.items():
...       if isinstance(value, Binary) and value.subtype == 128:
...         son[key] = from_binary(value)
...       elif isinstance(value, dict):
...         son[key] = self.transform_outgoing(value, collection)
...     return son
...

Now we’ll empty the Database and add the new manipulator:

>>> db.test.remove()
{...}
>>> db.add_son_manipulator(TransformToBinary())

After doing so we can save and restore Custom instances seamlessly:

>>> db.test.insert({"custom": Custom(5)})
ObjectId('...')
>>> pprint.pprint(db.test.find_one())
{u'_id': ObjectId('...'),
 u'custom': <Custom object at ...>}
>>> db.test.find_one()["custom"].x()
5

We can see what’s actually being saved to the database (and verify that it is using a Binary instance) by clearing out the manipulators and repeating our find_one():

>>> db = client.custom_type_example
>>> pprint.pprint(db.test.find_one())
{u'_id': ObjectId('...'), u'custom': Binary('5', 128)}