Moving Apache CouchDB Data to Cloudant

Size: px
Start display at page:

Download "Moving Apache CouchDB Data to Cloudant"

Transcription

1 Mving Apache CuchDB Data t Cludant The path t scalable, always-n, and managed CuchDB as a service February 2013 Mving CuchDB Data t the Cludant DBaaS 1

2 Cludant Overview Cludant prvides a managed, clud database as a service (DBaaS) that is based n Apache CuchDB. Cludant is a fast, always-n, scalable database that the big data experts at Cludant perate and grw fr yu s yu can stay fcused n new develpment and nt n database administratin. Samsung, Micrsft, Salesfrce.cm, DHL, Hthead Games, Flurry, and thusands f ther develpers f large-scale r fast-grwing Web and mbile applicatins use Cludant. The Cludant DBaaS features: A schema-less (NSQL) JSON dcument stre Fr peratinal data ptimized fr cncurrent reads & writes, high availability, and data durability Mnitred, scaled, and managed by big-data experts at Cludant Accessed via an Apache CuchDB-cmpatible, RESTful API APIs fr specialized data management features: Data replicatin & sync with mbile devices r lcal data centers Full-text indexing & search (Apache Lucene pwered) Ge-spatial indexing and analytics Incremental MapReduce fr real-time analytics Glbal data distributin Acrss a netwrk f data centers in Nrth America, Eurpe, and Asia Fault-tlerance via crss-data center data distributin Multiple hsting ptins AWS, Azure, Jyent, Rackspace, SftLayer Ge-lad balancing cnnects users t the clsest data surce fr lwer data access latency Why Transitin t Cludant? Develpers chse CuchDB fr a variety f reasns: schema freedm, ease f develpment, and replicatin & sync t name a few. But CuchDB can be difficult t scale ut t handle larger wrklads. The Cludant DBaaS is based n Apache CuchDB, and has been enhanced with a hrizntal scaling framewrk, Lucene full-text search, ge-spatial indexing, fault tlerance, and ther features nt fund in CuchDB s that yu can: Build Mre Cludant enhances the CuchDB develpment experience with built in scaling, fault tlerance, Lucene-pwered fulltext indexing and search, and ge-spatial indexing. Having these built int yur data layer makes it easier t enrich yur apps with advanced data management features. Grw Mre Grwing a CuchDB database t hld mre data r supprt many mre users is hard t d. Cludant includes a hrizntal scaling and fault-tlerance framewrk that makes this easy; it was initially develped t manage the petabytes f data that the Large Hadrn Cllider generates every secnd s that it culd be accessed by physics researchers arund the wrld. Sleep Mre Keeping CuchDB running smthly is a 24x7 peratin, and we d that fr yu. We mnitr, grw (recnfigure, repartitin/rebalance clusters), prtect and administer yur data layer arund the clck s yu can get a gd night s sleep. Mving CuchDB Data t the Cludant DBaaS 2

3 Mving Yur Data t Cludant Migrating data frm CuchDB t Cludant is cnceptually straightfrward. It invlves: 1 Replicating data frm yur CuchDB database t Cludant 2 Optinally, adjusting yur CuchDB design dcs The prcess generally takes a day r tw depending n the scpe f yur applicatin. Taking a Phased Apprach Migrating yur current data layer t Cludant can be dne in phases; it des nt have t be an all r nthing prcess. Yu can start by migrating a single database t Cludant while ther data cntinues t reside n ther servers. Gd candidates fr migratin include databases that need t be scaled ut. Rather than cnfiguring yur wn CuchDB cluster and partitining yur CuchDB data acrss it, cnsider mving yur data t Cludant. Yur data will be scaled ut by Cludant as part f the prcess. Imprting Yur Data int Cludant If Cludant will hld yur data in the same database and JSON structure as yur existing CuchDB database des, yu can simply replicate data frm yur CuchDB database t Cludant. Otherwise, yu ll need t utput yur CuchDB data t a file cntaining an array f JSON bjects and then perfrm ne r mre HTTP POST requests t bulk lad the dcs frm yu exprt file int Cludant. API and Data Design Dc Changes Cludant is based n CuchDB, and its API is largely cmpatible with CuchDB. Cludant has had t make a few changes t the CuchDB API in rder t make it faster, richer and pssible t use as CuchDB a hsted and managed, cluster-based service. These differences might require that yu change yur design dcuments r applicatin cde: View dcs must be written in Javascript, unlike CuchDB, which permits these t be written in ther languages. Temp views are disabled. They are nt a best practice fr prductin systems in CuchDB because f perfrmance. Changes feed might be unrdered. In Cludant, items in the changes feed are cllected frm ndes in the cluster independently, s they might nt be reprted in rder as they are in CuchDB. The since parameter wrks as expected thugh. Sequence numbers are integers in cuch and are rdered. In Cludant they are paque JSON tkens which include cluster state infrmatin. Server cnfiguratin cmmands are disabled. We have disabled server cnfiguratin cmmands and server shutdwn; they aren t applicable within a hsted service. Authrizatin & Authenticatin differences: Cludant permits yu t share database access acrss Cludant user accunts. Cludant supprts the same authenticatin methds as CuchDB, except fr Oauth. Full supprt fr Oauth is currently under develpment. Mving CuchDB Data t the Cludant DBaaS 3

4 Interview with Stckr.cm Stckr.cm is a real-time scial netwrking site that cnnects financial investrs and traders t track stcks and discuss public cmpanies, withut the spam that dminates mst ther stck sites and message bards. We spke with Eugene Kashpureff Jr. f Stckr.cm abut his experience getting started with Cludant. Cludant: Why did yu mve Stckr.cm t Cludant? Eugene: I m a vlunteer firefighter at hme, I m a prfessinal firefighter at Stckr. I have better things t d than t sit and mess with databases all day lng. We use CuchDB t stre ur site s big data stck inf, user psts, statistical data and the like. We have a few tables f relatinal data(mstly user lgin inf) that we keep in MySQL, but 95% f ur data vlume is in CuchDB. Offlading the management and wrry f all that infrmatin is the cre reasn behind ur mve. Cludant: Hw much data, access, strage? Eugene: Between ur varius develpment, test and prductin envirnments? Abut half a terabyte, and grwing daily. But I dn t watch that number; that s why I let yu guys have ur data. Cludant: Hw has the perfrmance been n Cludant? Eugene: Since we mved t Cludant we ve had zer user cmplaints abut ur site s speed, when it used t be a cnstant nag. There are s many fewer prblems that we have t deal with every single day. I can t say we ve had NO prblems since mving t Cludant, but it s far fewer than we used t have. Cludant: What type f issues d yu n lnger have t deal with? Eugene: Database sharding, disk cmpactin, running ut f strage space, managing hardware & daemns, all the stuff I blindly wandered thrugh the CuchDB dcumentatin fr -- nw that s gne frm my life. Cludant: What did yu migrate frm? Eugene: We riginally started with the Ubuntu CuchDB package running n ne nde. Then expanded t three ndes f BigCuch ( Then we tried t add tw ndes t that, and we culdn t get it t wrk, s we decided t just mve t Cludant. In additin, each f ur develpers had a separate envirnment, and we had numerus unpleasant surprises when a cnfiguratin r versin difference was fund. Mving CuchDB Data t the Cludant DBaaS 4

5 Cludant: Hw did the migratin g? Eugene: It tk us a while t get replicatin set up, data imprted ver the weekend, and then we flipped the switch and it just wrked. Only thing was we had t set up a prxy server t deal with SSL endpinting. Replicatin tk a few days t get right because Cludant was having an issue with the hardware SftLayer prvisined; a new switch was installed and cnfigured wrng. Just ne f thse set-up-new-hardware prblems yu always have. We mved ver 40GB f raw data then re-generated indexes. Cludant: Did yu have t make any cde changes? Eugene: Our CuchDB views were written in Pythn, but Cludant requires thse be written in Javascript. Mike Miller did a lt f the wrk cnverting thse views dcuments fr us. I think he said it tk him abut an hur. The nly thing we changed in ur actual applicatin was the address f the CuchDB server. Cludant: Is there anything we culd have dne t make mving t Cludant easier? Eugene: I m nt sure if there s much mre yu flks culd have delivered. I was surprised at hw quick and painless as it was. Outside f Stckr, I use ther services like AWS. It s painful. With Cludant it was a cuple f cnfig prblems, that was it. Fr an easier migratin prcess, it wuld have been nice t have a tl t run against ur BigCuch/CuchDB server t autmatically lad it all up int Cludant, rather than having t lg in as admin and assign all thse relatinships by hand. New custmers wuld appreciate that. Cludant: Any clsing thughts? Eugene: With Cludant nw, I can wrk n making the system faster rather than trying t keep the system up. Getting Mre Help If yu need help getting started with Cludant, visit the Cludant Develper Resurces Site ( r cntact us fr assistance: #cludant n n Twitter supprt@cludant.cm 129 Suth Street, Bstn, MA (857) cludant.cm Abut Cludant Cludant prvides develpers f large-scale and fast-grwing web and mbile applicatins with the wrld s first glbally distributed database as a service (DBaaS) fr lading, string, analyzing, and distributing peratinal applicatin data. As a managed service, Cludant helps develpers eliminate the delays, csts, and distractins inherent in wrking with databases and their administratrs, while prviding unmatched scalability, availability, and perfrmance. The Cludant service is available hsted n AWS, Jyent, Rackspace, SftLayer, and Windws Azure. Cludant custmers include Samsung, Hthead Games, Micrsft Big Park Studis, Flurry, Salesfrce.cm, DHL and thusands f ther develpers wrldwide. Mving CuchDB Data t the Cludant DBaaS 5