At Data Wow, we use MongoDB to store the metadata of our images (which are used to train our CNN models). Some of us may have never used NoSQL before (like myself, two months ago), so I hope this blog post will get you started!
MongoDB is a NoSQL database. Unlike traditional relational databases, NoSQL is not composed of tables and rows, but instead it stores data as documents in collections.
You can think of a document as one entry in your collection. The format of a document is similar to a JSON object. Thus, all documents in a collection do not have to have the same set of fields, as opposed to relational databases where the data format of all rows is specified at the time of table creation.
If you want to work with MongoDB databases using Python, PyMongo is the tool you will need. This section presents the basic CRUD commands you will often use in order to manipulate the database.
Note that there are two ways to access a MongoDB database:
The two differ only slightly in their syntax.
For example, findOneAndUpdate in the Mongo Shell would look like:
db.collection.findOneAndUpdate(
{ "name" : "A. MacDyver" },
{ $inc : { "points" : 5 } },
{ sort : { "points" : 1 } }
)
And the PyMongo findoneand_update syntax would look like:
db.collection.find_one_and_update(
{ "name" : "A. MacDyver" },
{ "$inc" : { "points" : 5 } },
{ "sort" : { "points" : 1 } }
)
The differences (camelCase vs snake_case and quotation marks usage) are highlighted in bold.
Personally, I find the documentation for Mongo Shell a lot more friendly, and I usually refer to the Mongo Shell documentation and adapt them to the PyMongo syntax.
If you already have pip, you can simply run the following command:
pip install pymongo
More info about pip and PyMongo installation.
First of all, you need to import PyMongo into your program. Then, use the MongoClient() object to connect to your database.
For example, if you want to connect to the database hosted at a client with IP address ‘192.168.4.12’, to the database named ‘exampledb’, and the collection named ‘examplecol’, we can use the following code:
>>> import pymongo
>>> ex_col = pymongo.MongoClient("192.168.4.12").example_db.example_col
The variable ‘ex_col’ is a Mongo Collection object. More specifically:
>>> ex_col
Collection(Database(MongoClient(host=['192.168.4.12:27017], document_class=dict, tz_aware=False, connect=True), u'example_db'), u'example_col')
Now, you can run commands on this object to manipulate the ‘ex_col’ collection.
You can use the command find() to query your database.
>>> ex_col.find({"surname": "Doe"})
<pymongo.cursor.Cursor object at 0x10ee1ec90>
The above command returns a Cursor object pointing to all the documents matching the filter criteria: value in the field ‘surname’ must equal ‘Doe’.
You can put the query results into a list of Python dictionaries.
>>> dict_list = list( ex_col.find({"surname": "Doe"}) )
>>> dict_list[0:2]
[ { u'first_name': u'John',
u'_id': ObjectId('5a8fc0c462ee97ee4e5806cf'),
u'surname': u'Doe',
u'age': 30
},
{ u'first_name': u'Jane',
u'_id': ObjectId('5a8fc36b62ee97ee4e580938'),
u'surname': u'Doe'
}
]
You may notice that the two documents can contain different sets of fields.
Command find_one() finds the first document that matches the filter criteria.
>>> ex_col.find_one({"surname": "Doe"})
{ u'first_name': u'John',
u'_id': ObjectId('5a8fc0c462ee97ee4e5806cf'),
u'surname': u'Doe',
u'age': 30
}
The command find() without any filter is equivalent to find all.
You can then use .count() to see the total number of documents in the collection.
>>> ex_col.find().count()
185914
You can easily insert documents using the insert() command.
>>> ex_col.insert({
"first_name": "Natchuta",
"surname": "Wattanapenpaiboon",
"age": 99
})
>>> ex_col.find_one({"surname": "Wattanapenpaiboon"})
{ u'first_name': u'Natchuta',
u'_id': ObjectId('5a8fcc7162ee97ee4e581338'),
u'surname': u'Wattanapenpaiboon',
u'age': 99
}
The field ‘_id’ of the document will be created automatically by MongoDB and is unique across documents.
The update() command has the following syntax:
ex_col.update(<filter>, <update>)
Recall:
>>> ex_col.find_one({"surname": "Wattanapenpaiboon"})
{ u'first_name': u'Natchuta',
u'_id': ObjectId('5a8fcc7162ee97ee4e581338'),
u'surname': u'Wattanapenpaiboon',
u'age': 99
}
If we want to update the age from 99 to 20, we can do:
>>> ex_col.update(
{"surname": "Wattanapenpaiboon"},
{ "$set":
{ "age": 20
}
}
)
By default, update() method performs update to one document. An alternative to this is the update_one() method.
If you want to update many documents that satisfy the filter criteria, set the ‘multi’ option to true:
>>> ex_col.update(
{"surname": "Wattanapenpaiboon"},
{ "$set":
{ "age": 20
}
},
{"multi": true}
)
This is equivalent to using the update_many() method without the ‘multi’ option.
Finally, the deleteone() and deletemany() commands have the syntax:
ex_col.delete_one(<filter>)
ex_col.delete_many(<filter>)
The deleteone() command deletes a document that satisfies the filter criteria, while the deleteall() command deletes all matched documents.
Drop us a line and we will get back to you