MongoDB with Python Crash Course - Tutorial for Beginners

MongoDB with Python Crash Course - Tutorial for Beginners

Learn the most popular NoSQL / document database: MongoDB. In this quickstart tutorial, you'll be up and running with MongoDB and Python.


⭐️Course Contents⭐️

⌨️ (0:00:00) Welcome

⌨️ (0:04:33) Intro to MongoDB

⌨️ (0:07:49) How do document DBs work?

⌨️ (0:10:34) Who uses MongoDB

⌨️ (0:13:02) Data modeling

⌨️ (0:16:30) Modeling guidelines

⌨️ (0:22:11) Integration database

⌨️ (0:24:23) Getting demo code

⌨️ (0:30:07) How ODMs work?

⌨️ (0:32:55) Introduction to mongoengine

⌨️ (0:34:01) Demo: Registering connections with MongoEngine

⌨️ (0:37:20) Concept: Registering connections

⌨️ (0:39:14) Demo: Defining mongoengine entities (classes)

⌨️ (0:45:22) Concept: mongoengine entities

⌨️ (0:49:03) Demo: Create a new account

⌨️ (0:56:55) Demo: Robo 3T for viewing and managing data

⌨️ (0:58:18) Demo: Login

⌨️ (1:00:07) Demo: Register a cage

⌨️ (1:10:28) Demo: Add a bookable time as a host

⌨️ (1:16:13) Demo: Managing your snakes as a guest

⌨️ (1:19:18) Demo: Book a cage as a guest

⌨️ (1:33:41) Demo: View your bookings as guest

⌨️ (1:41:29) Demo: View bookings as host

⌨️ (1:46:18) Concept: Inserting documents

⌨️ (1:47:28) Concept: Queries

⌨️ (1:48:09) Concept: Querying subdocuments with mongoengine

⌨️ (1:49:37) Concept: Query using operators

⌨️ (1:50:24) Concept: Updating via whole documents

⌨️ (1:51:46) Concept: Updating via in-place operators

⌨️ (1:54:01) Conclusion

Thanks to Michael Kennedy for letting us post his course. Check out his other content:

🔗 MongoDB for Developers with Python Full Course:

🔗 Python Jumpstart by Building 10 Apps:

Original video source:

How to integrate MongoDB with Python Applications

How to integrate MongoDB with Python Applications

In this article, you'll learn how to integrate MongoDB with your Python applications.

MongoDB is a leading open-source NoSQL database that is written in C++. This tutorial will give the reader a better understanding of MongoDB concepts needed in integrating MongoDB in your Python applications.

The SQL vs. NoSQL Difference

SQL databases use Structured Query Language(SQL) in defining and manipulating data. When using SQL, we need a Relational Database Management System(RDBMS) server such as SQL Server, MySQL server or MS Access. Data in RDBMS is stored in database objects called tables. A table is a collection of related data entries, and it consists of columns and rows.

A NoSQL database has a dynamic schema for unstructured data. In NoSQL, data is stored in several ways: it can be column-oriented, document-oriented, graph-based or organized as a key-value store. A NoSQL database has the following advantages:

  • Documents can be created without having to first define their structure
  • Each document can have its own unique structure
  • The syntax can vary from database to database
  • Large volumes of structured, semi-structured, and unstructured data
  • Object-oriented programming that is easy to use and flexible
  • It is horizontally scalable
NoSQL Database Types

The following are the different types of NoSQL databases:

  • Document databases pair each key with a complex data structure known as a document. A document is a set of key-value pairs. MongoDB is an example of a document store database. A group of MongoDB documents is known as a collection. This is the equivalent of an RDBMS table.
  • Graph stores are used to store information about networks of data, for instance, social connections. Graph stores include Neo4J and Giraph.
  • Key-value stores databases store every single item in the database as a key together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as an integer, which adds functionality.
  • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
Comparing MongoDB to RDBMS

In order to get a thorough understanding of the terms used in MongoDB, we'll compare them with the equivalent in RDBMS.

MongoDB and Python

In order to start using MongoDB, we first have to install it. Installation instructions are found at the official MongoDB documentation. To run a quick install on Ubuntu run the commands below:

sudo apt update sudo apt install -y mongodb

Once this is done we'll check the service and database by running this command on the terminal:

sudo systemctl status mongodb

● mongodb.service - An object/document-oriented database
   Loaded: loaded (/lib/systemd/system/mongodb.service; enabled; vendor preset:
   Active: active (running) since Thu 2018-09-20 13:14:02 EAT; 23h ago
     Docs: man:mongod(1)
 Main PID: 11446 (mongod)
    Tasks: 27 (limit: 4915)
   CGroup: /system.slice/mongodb.service
           └─11446 /usr/bin/mongod --unixSocketPrefix=/run/mongodb --config /etc

Sep 20 13:14:02 derrick systemd[1]: Started An object/document-oriented database
lines 1-10/10 (END)

The message above means that all is well and that we are set to start using MongoDB.

Now that we have MongoDB installed we need a way to interact with it in our Python code. The official Python MongoDB driver is called PyMongo. We can install it using pip as shown below:

pip install pymongo

Its possible for us to interact with MongoDB from the terminal, however for the purposes of this tutorial we'll run all our code in a Jupyter Notebook.

Making a Connection with MongoClient

The first thing we need to do is import pymongo. The import should run without any errors to signify that we've done our installation well.

import pymongo

Establishing a connection in MongoDB requires us to create a MongoClient to the running MongoDB instance.

from pymongo import MongoClient
client = MongoClient()

The above code will connect to the default host and port, but we can specify the host and port as shown below:

client = MongoClient("localhost", 27017)

MongoDB also has a URI format for doing this.

client = MongoClient('mongodb://localhost:27017/')

Creating a Database

To create a database in MongoDB, we use the MongoClient instance and specify a database name. MongoDB will create a database if it doesn't exist and connect to it.

db = client['datacampdb']

It is important to note that databases and collections are created lazily in MongoDB. This means that the collections and databases are created when the first document is inserted into them.

Data in MongoDB

Data in MongoDB is represented and stored using JSON-Style documents. In PyMongo we use dictionaries to represent documents. Let's show an example of a PyMongo document below:

 article = {"author": "Derrick Mwiti",
            "about": "Introduction to MongoDB and Python",
                ["mongodb", "python", "pymongo"]}

Inserting a Document

To insert a document into a collection, we use the insert_one() method. As we saw earlier, a collection is similar to a table in RDBMS while a document is similar to a row.

articles = db.articles
result = articles.insert_one(article)

When the document is inserted, a special key _id is generated and its unique to this document. We can print the document ID as shown below:

print("First article key is: {}".format(result.inserted_id))
First article key is: 5ba5c05e2e8ca029163417f8

The articles collection is created after inserting the first document. We can confirm this using the list_collection_names method.

['articles', 'user']

We can insert multiple documents to a collection using the insert_many() method as shown below.

article1 = {"author": "Emmanuel Kens",
            "about": "Knn and Python",
article2 = {"author": "Daniel Kimeli",
            "about": "Web Development and Python",
                ["web", "design", "HTML"]}
new_articles = articles.insert_many([article1, article2])
print("The new article IDs are {}".format(new_articles.inserted_ids))
The new article IDs are [ObjectId('5ba5c0c52e8ca029163417fa'), ObjectId('5ba5c0c52e8ca029163417fb')]
Retrieving a Single Document with find_one()

find_one() returns a single document matching the query or none if it doesn't exist. This method returns the first match that it comes across. When we call the method below, we get the first article we inserted into our collection.

{'_id': ObjectId('5ba5c0b52e8ca029163417f9'), 'author': 'Derrick Mwiti', 'about': 'Introduction to MongoDB and Python', 'tags': ['mongodb', 'python', 'pymongo']}
Finding all Documents in a Collection

MongoDB also allows us to retrieve all documents in a collection using the find method.

for article in articles.find():
{'_id': ObjectId('5ba5c0b52e8ca029163417f9'), 'author': 'Derrick Mwiti', 'about': 'Introduction to MongoDB and Python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5ba5c0c52e8ca029163417fa'), 'author': 'Emmanuel Kens', 'about': 'Knn and Python', 'tags': ['Knn', 'pymongo']}
{'_id': ObjectId('5ba5c0c52e8ca029163417fb'), 'author': 'Daniel Kimeli', 'about': 'Web Development and Python', 'tags': ['web', 'design', 'HTML']}

When building web applications, we usually get document IDs from the URL and try to retrieve them from our MongoDB collection. In order to achieve this, we first have to convert the obtained string ID into an ObjectId.

from bson.objectid import ObjectId
def get(post_id):
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

Return Some Fields Only

Sometimes we might not want to return all the fields from our documents. Let's show we'd fetch specific fields. In our case we use 0 to specify that the _id should not be fetched and 1 to specify that author and about should be fetched. MongoDB doesn't allow us to specify zero twice. For example, specify tags to 0 below will generate an error. We are not allowed to specify both 0 and 1 values in the same object (unless one of the fields is the _id field). When we specify a field with the value 0, all other fields get the value 1.

for article in articles.find({},{ "_id": 0, "author": 1, "about": 1}):
{'author': 'Derrick Mwiti', 'about': 'Introduction to MongoDB and Python'}
{'author': 'Emmanuel Kens', 'about': 'Knn and Python'}
{'author': 'Daniel Kimeli', 'about': 'Web Development and Python'}
Sorting the Results

We can use the sort() method to sort the results in ascending or descending order. The default order is ascending. We use 1 to signify ascending and -1 to signify descending.

doc = articles.find().sort("author", -1)

for x in doc:
{'_id': ObjectId('5ba5c0c52e8ca029163417fa'), 'author': 'Emmanuel Kens', 'about': 'Knn and Python', 'tags': ['Knn', 'pymongo']}
{'_id': ObjectId('5ba5c0b52e8ca029163417f9'), 'author': 'Derrick Mwiti', 'about': 'Introduction to MongoDB and Python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5ba5c0c52e8ca029163417fb'), 'author': 'Daniel Kimeli', 'about': 'Web Development and Python', 'tags': ['web', 'design', 'HTML']}
Updating a Document

We update a document using the update_one() method. The first parameter taken by this function is a query object defining the document to be updated. If the method finds more than one document, it will only update the first one. Let's update the name of the author in the article written by Derrick.

query = { "author": "Derrick Mwiti" }
new_author = { "$set": { "author": "John David" } }

articles.update_one(query, new_author)

for article in articles.find():
{'_id': ObjectId('5ba5c0b52e8ca029163417f9'), 'author': 'John David', 'about': 'Introduction to MongoDB and Python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5ba5c0c52e8ca029163417fa'), 'author': 'Emmanuel Kens', 'about': 'Knn and Python', 'tags': ['Knn', 'pymongo']}
{'_id': ObjectId('5ba5c0c52e8ca029163417fb'), 'author': 'Daniel Kimeli', 'about': 'Web Development and Python', 'tags': ['web', 'design', 'HTML']}
Limiting the Result

MongoDB enables us to limit the result of our query using the limit method. In our query below we'll limit the result to one record.

limited_result = articles.find().limit(1)
for x in limited_result:
{'_id': ObjectId('5ba5c0b52e8ca029163417f9'), 'author': 'John David', 'about': 'Introduction to MongoDB and Python', 'tags': ['mongodb', 'python', 'pymongo']}
MongoDB Delete Document

We use the delete_one() method to delete a document in MongoDB. The first parameter for this method is the query object of the document we want to delete. If this method finds more than one document, it deletes only the first one found. Let's delete the article with the id 5ba4cbe42e8ca029163417ce.


Deleting Many Documents

In order to delete many documents, we use the delete_many() method. Passing an empty query object will delete all the documents.

delete_articles = articles.delete_many({})
print(delete_articles.deleted_count, " articles deleted.")
3  articles deleted.
Dropping a Collection

In MongoDB, we can delete a collection using the drop() method.


We can confirm that the collection has been deleted since when we call the list_collection_names, we get an empty list.


It is impossible for us to go through all the MongoDB methods in this tutorial. I would recommend that the reader visits the official documentation of PyMongo and MongoDB to learn more.

MongoDB object document mapper (ODM)

In SQL we have object relational mapper (ORM) mappers that provides an abstraction when working with SQL. MongoDB has something similar know as object document mapper(ODM). MongoEngine is a library that provides a high-level abstraction on top of PyMongo. Run the command below to install it using pip.

pip install mongoengine

There are quite a number of other MongoDB ODMs that we can experiment with and choose the best option for our use. Examples of other MongoDB ODMs include ming, minimongo and, mongokit.

After we have imported mongoengine, we use the connect function and specify the database, port, and the host in order to establish a connection with the MongoDB instance.

from mongoengine import *
connect('datacampdb', host='localhost', port=27017)
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())
Defining our Documents

Let's assume that we are developing a social site that will allow users to post messages. This means that we need a users and a comments document. Just as if we were using a relational database with an ORM, we define the fields a user will have and the data types. We create the document by sub-classing the Document class from mongoengine. required=True means that we have to specify this field when creating a user. Otherwise, an exception will be thrown.

class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=30)
    last_name = StringField(max_length=30)

Now let's show how we'd create a posts document and reference the users document. The ReferenceField enables us to make reference from one document to another in mongoengine.

class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField(User)
Saving Documents

To save the document to the database, call the save() method. If the document does not exist in the database, it will be created. If it does already exist, then changes will be updated atomically.

user = User(email="[email protected]", first_name="Derrick", last_name="Mwiti")

Accessing the just created is very similar to other ORMs

print(,, user.first_name, user.last_name)
5ba5c3bf2e8ca029163417fc [email protected] Derrick Mwiti

In this tutorial, we have learned how we can use MongoDB in Python. We've also introduced mongoengine, an Object Document Mapper that makes it easier for us to interact with MongoDB in Python.

Python+MongoDB = Rapid & scalable app development

Python+MongoDB = Rapid & scalable app development

Accessing MongoDB from Python applications is easy and familiar to many Python developers. PyMongo uses the rich dictionary support in Python to create a similar API as MongoDB’s native JavaScript query syntax. After all, there some understanding for execution and performance. There is also a second API built upon the atomic operators of MongoDB $set, $push, etc. which truly leverages the full power of MongoDB and its aggregate-level atomicity.

MongoDB from Python:
Python and PyMongo allow direct coding against MongoDB from Python. This is most appropriately compared to programming at the level of raw SQL for RDBMSes. That level is a necessary building block, but for most applications working at a higher level and building upon custom classes is more appropriate. This module explores one of the most popular Object-Data Mappers for Python and MongoDB: MongoEngine.
MongoDB from Python:
Designing entities in MongoDB and document databases more generally is very different than 3rd-normal-form from SQL tables. To be successful with MongoDB, as a developer you will need to master this skill. Getting your entity design correct is key to high performance and flexible applications.
MongoDB from Python:
One thing that’s nice about the pymongo connection is that it’s automatically pooled. What this means is that pymongo maintains a pool of connections to the mongodb server that it reuses over the lifetime of your application. This is good for performance since it means pymongo doesn’t need to go through the overhead of establishing a connection each time it does an operation. Mostly, this happens automatically. You do, however, need to be aware of the connection pooling, however, since you need to manually notify pymongo that you’re “done” with a connection in the pool so it can be reused.

The easiest way to connec to to a MongoDB database from python is below :

In: import pymongo

In: conn = pymongo.Connection()

Inserting documents begins by selecting a database. To create a database, you do… well, nothing, actually. The first time you refer to a database, the MongoDB server creates it for you automatically. So once you have your database, you need to decide which “collection” in which to store your documents. To create a collection, you do… right – nothing. 

In: db = conn.tutorial

In: db.test

Out: Collection(Database(Connection('localhost', 27017), u'tutorial'), u'test')

In: db.test.insert({'name': 'My Document', 'ids': [1,2,3], 'subdocument': {'a':2}})  

Out: ObjectId('4f25bcffeb033049af000000')


here the insert command returned us an ObjectId value. This is the value that pymongo generated for the _idproperty, the “primary key” of a MongoDB document. We can also manually specify the _id if we want and we don’t have to use ObjectIds:

In: db.test.insert({'_id': 42, 'name': 'My Document', 'ids': [1,2,3], 'subdocument': {'a':2}})

Out: 42

MongoDB from Python:
Simply put, indexes are the single biggest contributor for extremely high performance MongoDB deployments and applications. Make sure your applications use indexes to full advantage. Finding the queries that need optimized can be tricky, especially when there is a translation layer in the middle such as MongoEngine and an ODM.

MongoDB has an *extremely *fast query that it can use in some cases where it doesn’t have to scan *any *objects, only the index entries. This happens when the only data you’re returning from a query is part of the index:

In: db.test.find({'a':2}, {'a':1, '_id':0}).explain()



u'indexBounds': {u'a': [[2, 2]]},

u'indexOnly': True,

u'isMultiKey': False,


here the indexOnly field is true, specifying that MongoDB only had to inspect the index (and not the actual collection data) to satisfy the query. 

MongoDB from Python:
MongoDB has a facility to store, classify, and query files of virtually unlimited size in binary data, text data, etc. GridFS and show you how to work with it from Python. You can upload, download, and list files in GridFS. Also you can create custom classes and store them within our GridFS files which can then be used for rich reporting and querying that does not exist in standard file systems.

creating a GridFS instance to use:

>>> from pymongo import MongoClient

>>> import gridfs


>>> db = MongoClient().gridfs_example

>>> fs = gridfs.GridFS(db)

Every instance is created with and will operate on a specific database instance.

MongoDB from Python:
The simplest way to work with GridFs is to use its key/value interface. To write data to GridFS, use put()

>>> a = fs.put("hello world")

put() creates a new file in GridFS, and returns the value of the file document’s "_id" key. Given that "_id" we can use get()to get back the contents of the file 

>>> fs.get(a).read()

'hello world'


In addition to putting a str as a GridFS file, we can also put any file-like object (an object with a read() method). GridFS will handle reading the file in chunk-sized segments automatically. We can also add additional attributes to the file as keyword arguments:

>>> b = fs.put(fs.get(a), filename="foo", bar="baz")

>>> out = fs.get(b)


'hello world'

>>> out.filename




>>> out.upload_date


The attributes we set in put()are stored in the file document, and retrievable after calling get(). Some attributes (like "filename") are special and are defined in the GridFS specification. 

MongoDB from Python:
The aggregation framework in MongoDB allows you to execute rich queries and transformations on the server. While normal queries leverage documents in the exact structure, aggregation similar to map-reduce is much more flexible. It can transform, group, and query data as well as act as a data pipeline on the server.
MongoDB from Python:> MongoDB from Python:
Replication is key to MongoDB’s fault tolerance. It can also be used for data locality across data centers, scaled-out reads, offsite backups, reporting without performance degradation, and more. PyMongo makes working with replica setseasy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.
MongoDB from Python:
MongoDB is a high performance database even in single-server mode. However, to truly leverage MongoDB’s performance potential, you will need to use sharding. This technique allows you to run a cluster of MongoDB servers working in concert to each hold some portion of the data and share some portion of the queries. It is sharding that gives MongoDB the ability to scale horizontally on commodity hardware.

To actually add the shards to the cluster, go through the query routers, which are now configured to act as our interface with the cluster. You can do this by connecting to *any *of the query routers like this:

mongo --host --port 27017

This will connect to the appropriate query router and open a mongo prompt. You will add all of shard servers from this prompt.

To add first shard, type:

sh.addShard( "" )

You can then add your remaining shard droplets in this same interface. You do not need to log into each shard server individually. 

sh.addShard( "" )

sh.addShard( "" )

sh.addShard( "" )

If you are configuring a production cluster, complete with replication sets, you have to instead specify the replication set name and a replication set member to establish each set as a distinct shard. The syntax would look something like this:

sh.addShard( "rep_set_name/rep_set_member:27017" )

Getting Started with Python and MongoDB

Getting Started with Python and MongoDB

You can get started with MongoDB and your favorite programming language by leveraging one of its drivers, many of which are maintained by MongoDB engineers, and others which are maintained by members of the community.

You can get started with MongoDB and your favorite programming language by leveraging one of its drivers, many of which are maintained by MongoDB engineers, and others which are maintained by members of the community. MongoDB has a native Python driver, PyMongo, and a team of Driver engineers dedicated to making the driver fit to the Python community’s needs.

In this article, which is aimed at Python developers who are new to MongoDB, you will learn how to do the following:

  • Create a free hosted MongoDB database using MongoDB Atlas
  • Install PyMongo, the Python Driver
  • Connect to MongoDB
  • Explore MongoDB Collections and Documents
  • Perform basic Create, Retrieve, Update and Delete (CRUD) operations using PyMongo

Let’s get started!

You can start working immediately with MongoDB by using a free MongoDB cluster via MongoDB Atlas. MongoDB Atlas is a hosted database service that allows you to choose your database size and get a connection string! If you are interested in using the free tier, follow the instructions in the Appendix section at the end of this article.

Install the Python Driver

For this article we will install the Python driver called “PyMongo”.

Although there are other drivers written by the community, PyMongo is the official Python driver for MongoDB. For detailed documentation of the driver check out the documentation here.

The easiest way to install the driver is through the pip package management system. Execute the following on a command line:

python -m pip install pymongo

Note: If you are using the Atlas M0 (Free Tier) cluster, you must use Python 2.7.9+ and use Python 3.4 or newer. You can check which version of Python and PyMongo you have installed by issuing python --version and pip list commands respectively.

For variations of driver installation check out the complete documentation:

Once PyMongo is installed we can write our first application that will return information about the MongoDB server. In your Python development environment or from a text editor enter the following code.

from pymongo import MongoClient
# pprint library is used to make the output look more pretty
from pprint import pprint
# connect to MongoDB, change the << MONGODB URL >> to reflect your own connection string
client = MongoClient(<<MONGODB URL>>)
# Issue the serverStatus command and print the results

Replace <<MONGODB URL>> with your connection string to MongoDB. Save this file as and run it from the command line via python

An example output appears as follows:

{u'asserts': {u'msg': 0,
              u'regular': 0,
              u'rollovers': 0,
              u'user': 0,
              u'warning': 0},
 u'connections': {u'available': 96, u'current': 4, u'totalCreated': 174L},
 u'extra_info': {u'note': u'fields vary by platform', u'page_faults': 0},
 u'host': u'',
 u'localTime': datetime.datetime(2017, 4, 4, 0, 18, 45, 616000),

Note that the u character comes from the Python output and it means that the strings are stored in Unicode. This example also uses the pprint library, which is not related to MongoDB but is used here only to make the output structured and visually appealing from a console.

In this example we are connecting to our MongoDB instance and issuing the db.serverStatus() command (reference). This command returns information about our MongoDB instance and is used in this example as a way to execute a command against MongoDB.

If your application runs successfully, you are ready to continue!

Exploring Collections and Documents

MongoDB stores data in documents. Documents are not like Microsoft Word or Adode PDF documents but rather JSON documents based on the JSON specification.

An example of a JSON document would be as follows:

Notice that documents are not just key/value pairs, but can include arrays and subdocuments. The data itself can be different data types like geospatial, decimal, and ISODate to name a few. Internally MongoDB stores a binary representation of JSON known as BSON. This allows MongoDB to provide data types like decimal that are not defined in the JSON specification. For more information on the BSON spec check out its site.

A collection in MongoDB is a container for documents. A database is the container for collections. This grouping is similar to relational databases and is pictured below:

Relational concept MongoDB equivalent
Database Database
Tables Collections
Rows Documents
Index Index

There are many advantages to storing data in documents. While a deeper discussion is out of the scope of this article, some of the advantages like dynamic, flexible schema, and the ability to store arrays can be seen from our simple Python scripts. For more information on MongoDB document structure take a look at the online documentation.

Let’s take a look at how to perform basic CRUD operations on documents in MongoDB using PyMongo.

Performing Basic CRUD Operations Using PyMongo

To establish a connection to MongoDB with PyMongo you use the MongoClient class.

from pymongo import MongoClient
client = MongoClient('<<MongoDB URL>>’)

The <<MongoDB URL>> is a placeholder for the connection string to MongoDB. See the connection string documentation for detailed information on how to create your MongoDB connection string. If you are using Atlas for your MongoDB database, refer to the “testing your connection” section for more information on obtaining the connection string for MongoDB Atlas.

We can now create a database object referencing a new database, called business, as follows:

db =

Once we create this object we can perform our CRUD operations. Since we want something useful to query let’s start by building a sample data generator application.

Generating Sample Data Code Example

Create a new file called using your development tool or command line text editor and copy the following code:

from pymongo import MongoClient
from random import randint
#Step 1: Connect to MongoDB - Note: Change connection string as needed
client = MongoClient(port=27017)
#Step 2: Create sample data
names = ['Kitchen','Animal','State', 'Tastey', 'Big','City','Fish', 'Pizza','Goat', 'Salty','Sandwich','Lazy', 'Fun']
company_type = ['LLC','Inc','Company','Corporation']
company_cuisine = ['Pizza', 'Bar Food', 'Fast Food', 'Italian', 'Mexican', 'American', 'Sushi Bar', 'Vegetarian']
for x in xrange(1, 501):
    business = {
        'name' : names[randint(0, (len(names)-1))] + ' ' + names[randint(0, (len(names)-1))]  + ' ' + company_type[randint(0, (len(company_type)-1))],
        'rating' : randint(1, 5),
        'cuisine' : company_cuisine[randint(0, (len(company_cuisine)-1))] 
    #Step 3: Insert business object directly into MongoDB via isnert_one
    #Step 4: Print to the console the ObjectID of the new document
    print('Created {0} of 100 as {1}'.format(x,result.inserted_id))
#Step 5: Tell us that you are done
print('finished creating 100 business reviews')

Be sure to change the MongoDB client connection URL to one that points to your MongoDB database instance. Once you run this application, 500 randomly named businesses with their corresponding ratings will be created in the MongoDB database called business. All of these businesses are created in a single collection called reviews. Notice that we do not have to explicitly create a database beforehand in order to use it. This is different from other databases that require statements like CREATE DATABASE to be performed first.

The command that inserts data into MongoDB in this example is the insert_one() function. A bit self-explanatory, insert_one will insert one document into MongoDB. The result set will return the single ObjectID that was created. This is one of a few methods that insert data. If you wanted to insert multiple documents in one call you can use the insert_many function.

In addition to an acknowledgement of the insertion, the result set for insert_many will include a list of the ObjectIDs that were created. For more information on insert_many see the documentation located here.

For details on the result set of insert_many check out this section of documentation as well.

We are now ready to explore querying and managing data in MongoDB using Python. To guide this exploration we will create another application that will manage our business reviews.

Exploring Business Review Data

Now that we have a good set of data in our database let’s query for some results using PyMongo.

In MongoDB the find_one command is used to query for a single document, much like select statements are used in relational databases. To use the find_one command in PyMongo we pass a Python dictionary that specifies the search criteria. For example, let’s find a single business with a review score of 5 by passing the dictionary, { ‘rating’ : 5 }.

fivestar ={'rating': 5})

The result will contain data similar to the following:

{u'rating': 5,
 u'_id': ObjectId('58e65383ea0b650c867ef195'),
 u'name': u'Fish Salty Corporation', 
u'cuisine': u'Sushi Bar'}

Given that we created 500 sample pieces of data, there must be more than one business with a rating of 5. The find_one method is just one in a series of find statements that support querying MongoDB data. Another statement, called find, will return a cursor over all documents that match the search criteria. These cursors also support methods like count(), which returns the number of results in the query. To find the total count of businesses that are rated with a 5 we can use the count() method as follows:

fivestarcount ={'rating': 5}).count()

Your results may vary since the data was randomly generated but in a test run the value of 103 was returned.

MongoDB can easily perform these straightforward queries. However, consider the scenario where you want to sum the occurrence of each rating across the entire data set. In MongoDB you could create 5 separate find queries, execute them and present the results, or you could simply issue a single query using the MongoDB aggregation pipeline as follows:

from pymongo import MongoClient
# Connect to the MongoDB, change the connection string per your MongoDB environment
client = MongoClient(port=27017)
# Set the db object to point to the business database
# Showcasing the count() method of find, count the total number of 5 ratings 
print('The number of 5 star reviews:')
fivestarcount ={'rating': 5}).count()
# Not let's use the aggregation framework to sum the occurrence of each rating across the entire data set
print('\nThe sum of each rating occurance across all data grouped by rating ')
# The Aggregation Pipeline is defined as an array of different operations
# The first stage in this pipe is to group data
{ '$group':
    { '_id': "$rating",
     "count" : 
                 { '$sum' :1 }
# The second stage in this pipe is to sort the data
{"$sort":  { "_id":1}
# Close the array with the ] tag             
] )
# Print the result
for group in stargroup:

A deep dive into the aggregation framework is outside of the scope of this article, however, if you are interested in learning more about it check out this documentation.

Updating Data with PyMongo

Similar to insert_one and insert_many, there are functions to help you update your MongoDB data including update_one, update_many, and replace_one. The update_one method will update a single document based on a query that matches a document. For example, let’s assume that our business review application now has the ability for users to “like” a business. To illustrate updating a document with this new “likes” field, let’s first take a look at what an existing document looks like from our previous application’s insertion into MongoDB. Next, let’s update the document and requery the document and see the change.

from pymongo import MongoClient
#include pprint for readabillity of the 
from pprint import pprint

#change the MongoClient connection string to your MongoDB database instance
client = MongoClient(port=27020)

ASingleReview ={})
print('A sample document:')

result ={'_id' : ASingleReview.get('_id') }, {'$inc': {'likes': 1}})
print('Number of documents modified : ' + str(result.modified_count))

UpdatedDocument ={'_id':ASingleReview.get('_id')})
print('The updated document:')

When running the sample code above you may see results similar to the following:

A sample document:
{'_id': ObjectId('58eba417ea0b6523b0fded4f'),
 'cuisine': 'Pizza',
 'name': 'Kitchen Goat Corporation',
 'rating': 1}

Number of documents modified : 1

The updated document:
{'_id': ObjectId('58eba417ea0b6523b0fded4f'),
 'cuisine': 'Pizza',
 'likes': 1,
 'name': 'Kitchen Goat Corporation',
 'rating': 1}

Notice that the original document did not have the “likes” field and an update allowed us to easily add the field to the document. This ability to dynamically add keys without the hassle of costly Alter_Table statements is the power of MongoDB’s flexible data model. It makes rapid application development a reality.

If you wanted to update all the fields of the document and keep the same ObjectID, you will want to use the replace_one function. For more details on replace_one check out the PyMongo documentation here.

The update functions also support an option called “upsert”. With upsert you can tell MongoDB to create a new document if the document you are trying to update does not exist.

Deleting Documents

Much like the other commands discussed so far, the delete_one and delete_many commands take a query that matches the document to delete as the first parameter. For example, if you wanted to delete all documents in the reviews collection where the category was “Bar Food”, you would issue the following:

result = db.restaurants.delete_many({“category”: “Bar Food“})

If you are deleting a large number of documents it may be more efficient to drop the collection instead of deleting all the documents.

Where to Go Next

There are lots of options when it comes to learning about MongoDB and Python. MongoDB University is a great place to start and learn about administration, development and other topics such as analytics with MongoDB. One course in particular is MongoDB for Developers (Python). This course covers the topics of this article in much more depth including a discussion on the MongoDB aggregation framework. Check it out here.

Appendix: Creating a free tier MongoDB Atlas database

MongoDB Atlas is a hosted database service that allows you to choose your database size and get a connection string! Follow the steps below to start using your free database.

Build your cluster for free

Follow the steps below to create a free MongoDB database:

  1. Go to MongoDB Atlas.
  2. Click the “Start Free” button.
  3. Fill out the form to create an account. You will use this information to login to and manage your MongoDB.

Once you fill out the form, the website will create your account and you will be presented with the “Build Your New Cluster” pop up as shown:

To use the free tier, scroll down and select “M0”. When you do this the regions panel will be disabled. The free tier has some restrictions, with the ability to select a region being one of them. Your database size will be limited to 512MB of storage. Given that, when you are ready to use MongoDB for more than just some simple operations you can easily create another instance by choosing a size from the “Instance Size” list. Before you click “Confirm & Deploy” scroll down the page and notice the additional options shown here:

From the “Build Your New Cluster” pop up, you can see that there are other options available, including choosing a 3, 5 or 7 node replica set and up to a 12 shard cluster. Note that the free tier does not allow you to chose anything more than the 3 node cluster, but if you move into other sizes these options will become available.

At this point we are almost ready — the last thing to address is the admin username and password. You may also choose to have a random password generated for you by clicking the “Autogenerate Secure Password” button. Finally, click the “Confirm & Deploy” button to create your Atlas cluster.

Setting up your IP Whitelist

While Atlas is creating your database you will need to define which IPs are allowed access to your new database, since MongoDB Atlas does not allow access from the internet by default. This list of granted IP addresses is called the “IP Whitelist”. To add the IP of your machine to this list, click on the “Security” tab, then “IP Whitelist”, and then click the “+ ADD IP ADDRESS” button. This will pop up another dialog shown below.

You can click the “Add current IP Address” button to add your IP or provide a specific IP address, or enable access to the world by not restricting IPs at all (not a fantastic idea, but there in case you have no other choice and need to allow authentication from any IP).

Once you have filled out this dialog click “Confirm” and this will update the firewall settings on your MongoDB Atlas cluster. Next, click on the “Clusters” tab and you should see your new MongoDB database ready for action!

Testing your connection

We want to make sure the MongoDB database is accessible from our development box before we start typing in code. A quick way to test is to make a connection using the Mongo Shell command line tool. Be sure to have your MongoDB connection information available. If you are using MongoDB Atlas, you can obtain the connection information by clicking on the “Connect” button on the Clusters tab as shown below.

The Connect button will launch a dialog that provides connection information. At the bottom of this dialog, you will see a prepared command line ready for you to simply copy and paste into a command prompt.

Note that if you copy the connection text as-is you will have to replace with the password for the admin user, and with the name of the database to which you wish to connect.

The command text that comes from this dialog is lengthy. For clarity, let’s take a look at each of the parameters individually.

 --authenticationDatabase admin 
--username myadmin 
--password S$meComPLeX1!

The first parameter is a string containing the list of all the nodes in our cluster, including the definition of a replica set called Cluster0-shard-0. The next parameter, --authenticationDatabase tells us which database contains the user we want to authenticate.

The --ssl forces the connection to be encrypted via SSL/TLS protocol. Finally, we provide the username and password, and we are connected! Note that if you are not using MongoDB Atlas, your MongoDB deployment may not have security enabled or require SSL. Thus, connecting to it could be as simple as typing “mongo” into the command prompt.

You are now ready to use MongoDB!