Python and MongoDB Kevin Swingler Why? Python is becoming the scripting language of choice in big data It has a library for connecting to a MongoDB: PyMongo Nice mapping betwenpython data structures and JSON objects Reduces relational impedance as database objects = dictionary objects 1
Python Dictionary b = {'one': {'a':'a','b':'b'}, 'two': 2, 'three': [3,2,1]} dict b= 'one'= 'a'='a' 'b'='b' b['one']['b']='b' b['two']=2 b['three'][3]=1 'two'=2 'three'= [3,2,1] dict list Let s jump right in PyMongo # connect and authorise (see later) d=db.col.find_one() for k,v in one.iteritems(): # Do something with k (the key) # and v (the value) 2
Connect and Authenticate <% import pymongo from pymongo import MongoClient client = MongoClient( mqr1.cs.stir.ac.uk") client.kms.authenticate(username,pwd) db = client.name colc=db.colcname %> # Database # Collection Python Server Pages (PSP) You can use Python to drive the output of a web page This can connect to a MongoDB Much like using PHP and MySQL Many methods exist from simple scripts to large frameworks We will illustrate things with PSP 3
Make an HTML form Simple Form to DB Run the Python script when the form is submitted Extract the values and put them into a dictionary object Put the dictionary object into the database First, the form in HMTL: : <form action=form.psp method=post> <input type=text name=key> <input type=text name=val><br> <input type=submit value=add> </form> : 4
Extract Values and Add to DB <% key=form.getfirst('key')# Get key value val=form.getfirst('val')# Get row value if key and val: # If both filled in dct[key] = val# Add to dictionary if len(dct) > 0: colc.insert(dct) # Add to db %> Search and Display <% f=colc.find() # Find every entry for a in f: # Each key,value pair for k,v in a.iteritems(): if k!= "_id": # Ignore _id req.write( %s = %s<br> %(k,v)) %> 5
Some Details d=db.col.find_one()returns a single document the first match f=db.col.find()returns a cursor which iterates through documents for example: for d in f: Specific Search db.col.find({name:"bill"}) # Find all documents where Name = "Bill" db.col.find({name: "Bill", Age: 18}) # Find all documents where Name= "Bill " AND Age=18 is the same as db.col.find({$and: [{Name: "Bill"}, {Age: 18}]}) but we must be explicit if we want to use OR: db.col.find({$or: [{Name: "Bill"}, {Age: 17}]}) $Notdoes as you'd expect and$normeans not and not. 6
Regular Expression Search Find names that contain "ev" anywhere find({name: /ev/}) Names that start with "K", ignoring case (K or k) find({name: /^K/i}) Names that contain "a" or "b" or "c" anywhere find({name: /[abc]/}) Find names that end with some numbers find({name: /\w{1,}\d{1,}/}) Bulk Inserts You can use colc.insert(docs)where docsis iterable That is, it is a list of dictionaries that can be iterated names=[{"name":"bill", " Age":18}, {"Name":"Ted", "Age":18}] colc.insert(names) 7
Counting colc.count(query) Sorting colc.find(query).sort("name") 8
Aggregation In SQL we use Count(), Sum().. GROUP BY In MongoDBthere are a number of aggregation options: Aggregate() method Map Reduce Group() method Aggregate db.things.insert({"x": 1, "tags":["dog", "cat"]}) db.things.insert({"x": 2, "tags":["cat"]}) db.things.insert({"x": 2, "tags":["mouse", "cat", "dog"]}) db.things.insert({"x": 3, "tags":[]}) Then from bson.son import SON db.things.aggregate([ {"$unwind": "$tags"}, {"$group": {"_id": "$tags", "count": {"$sum": 1}}}, {"$sort": SON([("count", -1), ("_id", -1)])} ]) Gives [{'count': 3, '_id': 'cat'}, {'count': 2, '_id': 'dog'}, {'count': 1, '_id': 'mouse'}]} http://api.mongodb.org/python/current/examples/aggregation.html 9
In Detail -unwind {"$unwind": "$tags"} $unwind operator turns a single document with and array value into several documents, each with one value from the array in place of the array unwind({"x": 1, "tags":["dog", "cat"]}) Produces {"x": 1, "tags":"dog"}, {"x": 1, "tags":"cat"} In Detail -group {"$group": {"_id":"$tags", "count":{"$sum":1}}} Means create documents like: {"_id": "cat", "count":3} 10
In Detail -sort {"$sort": SON([("count", -1), ("_id", -1)])} Means sort all the documents descending by count. Python dictionaries don t maintain order, so we use SONto sort SONis Serialized OcumentNotation like a dictionary, but ordered Map Reduce PyMongo sapi supports all of the features of MongoDB s map/reduce engine map_reduce(map, reduce, out, full_response=false, **kwargs) map: map function (as a JavaScript string) reduce: reduce function (as a JavaScript string) out: output collection name or out object(dict). SON can be used to specify multiple options. e.g. SON([( replace, <collection name>), ( db, <database name>)]) full_response(optional): if True, return full response to this command -otherwise just return the result collection **kwargs(optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method db.test.map_reduce(map, reduce, "results", limit=2) 11
Group Write the reducer code in Javascript reducer = Code(""" function(obj, prev){ prev.count++; }""") Run a query providing the key to group by, initial value for the reducer, and the name of the reducer function results = db.things.group(key={"x":1}, condition={}, initial={"count": 0}, reduce=reducer) Iterate through the documents in the result for doc in results: print doc {'count': 1.0, 'x': 1.0}{'count': 2.0, 'x': 2.0}{'count': 1.0, 'x': 3.0} Cursors You might not want your program to retrieve every match to a query at once Perhaps you want to show 20 per page, for example Cursors are how to do this db.test.find()[50] returns an entry db.test.find()[20:25] returns a cursor (contains 5 entries from 20) 12
Cursors Find() returns a cursor: cur = db.test.find() cur.sort( field :1) PyDev Eclipse plug-in In all machines in CS labs Python development environment Includes interactive console 13
PyDev Interactive Console Nice alternative to the Mongo shell PyDev 14
Cursor Example 15