Introduction to MongoDB & PyMongo

At Data Wow, we use MongoDB to store the metadata of our images (which are used to train our CNN models). Some of us may have never used NoSQL before (like myself, two months ago), so I hope this blog post will get you started!

MongoDB

MongoDB is a NoSQL database. Unlike traditional relational databases, NoSQL is not composed of tables and rows, but instead it stores data as documents in collections.

Relational VS MongoDB | Source: https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb

You can think of a document as one entry in your collection. The format of a document is similar to a JSON object. Thus, all documents in a collection do not have to have the same set of fields, as opposed to relational databases where the data format of all rows is specified at the time of table creation.

PyMongo

If you want to work with MongoDB databases using Python, PyMongo is the tool you will need. This section presents the basic CRUD commands you will often use in order to manipulate the database.

Mongo Shell vs PyMongo

Note that there are two ways to access a MongoDB database:

  • Using the Mongo Shell
  • Through PyMongo

The two differ only slightly in their syntax.

For example, findOneAndUpdate in the Mongo Shell would look like:

db.collection.findOneAndUpdate(
   { "name" : "A. MacDyver" },
   { $inc : { "points" : 5 } },
   { sort : { "points" : 1 } }
)

And the PyMongo findoneand_update syntax would look like:

db.collection.find_one_and_update(
   { "name" : "A. MacDyver" },
   { "$inc" : { "points" : 5 } },
   { "sort" : { "points" : 1 } }
)

The differences (camelCase vs snake_case and quotation marks usage) are highlighted in bold.

Personally, I find the documentation for Mongo Shell a lot more friendly, and I usually refer to the Mongo Shell documentation and adapt them to the PyMongo syntax.

Installing PyMongo

If you already have pip, you can simply run the following command:

pip install pymongo

More info about pip and PyMongo installation.

Accessing the database First of all, you need to import PyMongo into your program. Then, use the MongoClient() object to connect to your database.

For example, if you want to connect to the database hosted at a client with IP address ‘192.168.4.12’, to the database named ‘exampledb’, and the collection named ‘examplecol’, we can use the following code:

>>> import pymongo
>>> ex_col = pymongo.MongoClient("192.168.4.12").example_db.example_col

The variable ‘ex_col’ is a Mongo Collection object. More specifically:

>>> ex_col
Collection(Database(MongoClient(host=['192.168.4.12:27017], document_class=dict, tz_aware=False, connect=True), u'example_db'), u'example_col')

Now, you can run commands on this object to manipulate the ‘ex_col’ collection.

Find

You can use the command find() to query your database.

>>> ex_col.find({"surname": "Doe"})
<pymongo.cursor.Cursor object at 0x10ee1ec90>

The above command returns a Cursor object pointing to all the documents matching the filter criteria: value in the field ‘surname’ must equal ‘Doe’.

You can put the query results into a list of Python dictionaries.

>>> dict_list = list( ex_col.find({"surname": "Doe"}) )
>>> dict_list[0:2]
[   {    u'first_name': u'John', 
         u'_id': ObjectId('5a8fc0c462ee97ee4e5806cf'), 
         u'surname': u'Doe', 
         u'age': 30
    }, 
    {    u'first_name': u'Jane', 
         u'_id': ObjectId('5a8fc36b62ee97ee4e580938'), 
         u'surname': u'Doe'
    }
]

You may notice that the two documents can contain different sets of fields.

Command find_one() finds the first document that matches the filter criteria.

>>> ex_col.find_one({"surname": "Doe"})
{   u'first_name': u'John', 
    u'_id': ObjectId('5a8fc0c462ee97ee4e5806cf'), 
    u'surname': u'Doe', 
    u'age': 30
}

The command find() without any filter is equivalent to find all.

You can then use .count() to see the total number of documents in the collection.

>>> ex_col.find().count()
185914

Insert You can easily insert documents using the insert() command.

>>> ex_col.insert({
        "first_name": "Natchuta", 
        "surname": "Wattanapenpaiboon", 
        "age": 99
    })
>>> ex_col.find_one({"surname": "Wattanapenpaiboon"})
{   u'first_name': u'Natchuta', 
    u'_id': ObjectId('5a8fcc7162ee97ee4e581338'), 
    u'surname': u'Wattanapenpaiboon', 
    u'age': 99
}

The field ‘_id’ of the document will be created automatically by MongoDB and is unique across documents.

Update

The update() command has the following syntax:

ex_col.update(<filter>, <update>)

Recall:

>>> ex_col.find_one({"surname": "Wattanapenpaiboon"})
{   u'first_name': u'Natchuta', 
    u'_id': ObjectId('5a8fcc7162ee97ee4e581338'), 
    u'surname': u'Wattanapenpaiboon', 
    u'age': 99
}

If we want to update the age from 99 to 20, we can do:

>>> ex_col.update(
        {"surname": "Wattanapenpaiboon"},
        {   "$set":
            {   "age": 20
            }
        }
    )

By default, update() method performs update to one document. An alternative to this is the update_one() method.

If you want to update many documents that satisfy the filter criteria, set the ‘multi’ option to true:

>>> ex_col.update(
        {"surname": "Wattanapenpaiboon"},
        {   "$set":
            {    "age": 20
            }
        },
        {"multi": true}
    )

This is equivalent to using the update_many() method without the ‘multi’ option.

Delete

Finally, the deleteone() and deletemany() commands have the syntax:

ex_col.delete_one(<filter>)
ex_col.delete_many(<filter>)

The deleteone() command deletes a document that satisfies the filter criteria, while the deleteall() command deletes all matched documents.

Have a question?

Drop us a line and we will get back to you