MarkLogic 8: Developer Experience Node.js and Java Client APIs, Server-Side JavaScript, Native JSON Justin Makeig November 2014
MarkLogic 8 Feature Presentations Topics Developer Experience: Samplestack and Reference Architecture Product Manager Kasey Alderete Developer Experience: Node.js and Java Client APIs, Server-Side JavaScript, and Native JSON REST Management API, Flexible Replication, Sizing, and Reference Hardware Architectures Bitemporal Justin Makeig Caio Milani Jim Clark Semantics Stephen Buxton SLIDE: 2
DEVELOPER EXPERIENCE
New and Enhanced features in MarkLogic 8 Samplestack: Reference architecture instance Client APIs: Extensible drivers for Java and Node.js Open development: Transparency and responsiveness with GitHub-first Server-Side JavaScript: Run JavaScript close to data Native JSON: Unified indexing, query for today s data SLIDE: 4
Reference (Software) Architecture User Interface Data views, user workflow Middleware Business rules, domain model, integration Database Persistent state, stored procedures Security, Monitoring, Config Mgmt Business Services Resources (Customer, Approval, etc.) JSON over HTTP Data Services Documents, collections, elements JSON/XML over HTTP SLIDE: 5
CLIENT APIS
New and Enhanced features in MarkLogic 8 Samplestack: Reference architecture instance Client APIs: Extensible drivers for Java and Node.js Open development: Transparency and responsiveness with GitHub-first Server-side JavaScript: Not XQuery Native JSON: Unified indexing, query for today s data SLIDE: 7
Application Application Logic {Java, Node} Client API Extensions HTTP Data Services REST Client API MarkLogic {JavaScript, XQuery} Built-ins Extensions User code Framework code SLIDE: 8
Client APIs key capabilities Bulk write and read Patch Extensions Resource Transformation Alert Graphs Query By example Structured String Projection Snippets Paths Bitemporal SLIDE: 9
Java Client API NoSQL agility in a pure Java interface Faster development and less custom code with out-of-the-box data management, search, and alerting Pure Java query builder and conveniences for POJOs, JSON, XML, and binary I/O Built-in extensibility for moving performancecritical code to the database Always open source and developed on GitHub Participate. Contribute. Fork it. SLIDE: 10
Deploy in every environment Increase flexibility by reusing existing skills, tools Minimize integration costs with a pure Java interface Maximize performance by bringing code to the data Scale up (or down) without modifying application code Build, test, instrument, debug with standard tools SLIDE: 11
Simpler data integration Reduce custom code for transactions, security, marshalling, orchestration Increase flexibility by mixing POJOs, JSON, XML, and triples React more quickly to change by using data in its natural format with less ETL Maximize performance by bringing code to the data SLIDE: 12
CODE DEMO
Bulk writes JacksonHandle handle = new JacksonHandle(); GenericDocumentManager docmgr = client.newdocumentmanager(); DocumentWriteSet writeset = docmgr.newwriteset(); for (JsonNode json : mycollection) { handle.set(json); writeset.add("/" + i + ".json", meta, handle); if (i % BATCH_SIZE == 0) { docmgr.write(writeset); System.out.println("Wrote batch"); writeset.clear(); } } SLIDE: 14
POJO façade Manage and query POJOs Inspired by Spring Data Cheap and cheerful : Not a Hibernate/JPA substitute PojoRepository<Product> repo = client.newpojorepository(product.class, Long.class); PojoQueryBuilder qb = productrepo.getquerybuilder(product.class); QueryDefinition query = qb.containerquery("company").value("name", prod1.getname()); for (Product result : productrepo.search(query, 1)) { // process each product } SLIDE: 15
Annotation-based range index creation for POJOs Deployment automation lifecycle 1. Annotate domain class getters 2. Run included GenerateIndexConfig to generate config 3. Post to the Management REST API @PathIndexProperty(scalarType = ScalarType.DOUBLE) public Double getbalance() { return balance; } SLIDE: 16
Eval and invoke Eval ad hoc code, invoke server-side modules JavaScript or XQuery Type marshalling Sharp tool: Lead with resource, transformation extensions ServerEvaluationCall exp = client.newservereval().javascript(javascript) // String of Server-side JavaScript.addVariable("percent", 0.08); for (EvalResult result : exp.eval()) { } SLIDE: 17
Java Client API or XCC? XCC is not going away: Hundreds of customer apps, mlcp, Hadoop Connector,.NET, etc. Start with the Java Client API: Easy to get going, built-in best practices, extensible Eval/invoke narrows the functionality gap. 8000 app server narrows the performance gap. SLIDE: 18
Node.js Client API Enterprise NoSQL database for Node.js applications Focus on application features rather than plumbing with out-of-the-box search, transactions, aggregates, alerting, geospatial, and more Move faster to production with proven reliability at scale Maximize performance and flexibility bringing code to the data Enable modern end-to-end JavaScript development Always open source on GitHub Participate. Contribute. Fork it. SLIDE: 19
Straightforward data integration React faster to change, using data in its most natural form with less ETL: JSON, XML, RDF, text, binary Transactional multi-document updates ensure consistency Async, promises, streams ensure seamlessness with Node Reduce data movement, duplication by moving code to the data and invoking from Node Reuse JSON data models, JavaScript code across app tiers SLIDE: 20
Deploy in every environment JavaScript is everywhere: Reuse skills, tools, investments Node.js as standard middleware for connecting JSON services over HTTP with JavaScript Scale up (or down) without modifying application code Build, test, instrument, debug with standard tools SLIDE: 21
What is Node.js and why is it important? Scripting environment for network services with JavaScript Event loop: Single thread, non-blocking I/O, and asynchronous events N in MEAN: JavaScript-JSON throughout the stack Large, growing ecosystem and significant developer pull SLIDE: 22
Key concepts Promises: Humane async chaining and error handling (using Bluebird) Streams: Observable data flow think UNIX pipes npm: Package manager SLIDE: 23
CODE DEMO
Quick Quiz: What s the order of the function calls? function A(callback) { } function B(callback) { } function C(stuffFromA) { } function D(thingsFromB) { } A(C); B(D); SLIDE: 25
var marklogic = require('marklogic'); var conn = require('./env.js').connection; // Host and auth details var db = marklogic.createdatabaseclient(conn); var q = marklogic.querybuilder; db.documents.query( q.where( q.collection('countries'), q.value('region', 'Africa'), q.or( ) ) ).result(function(documents) { documents.foreach(function(document) { }) });
SERVER-SIDE JAVASCRIPT
New and Enhanced features in MarkLogic 8 Samplestack: Reference architecture instance Client APIs: Extensible drivers for Java and Node.js Open development: Transparency and responsiveness with GitHub-first Server-Side JavaScript: Not XQuery Native JSON: Unified indexing, query for today s data SLIDE: 28
Server-Side JavaScript JavaScript runtime inside MarkLogic using Google s V8 Run code near the data for unparalleled power, efficiency Build applications faster from a growing pool of skills, tools Reduce risk with proven performance and reliability Decrease brittle ETL and lost fidelity and functionality from JSON data conversions Pair with Node.js to ease full-stack JavaScript development + Front End Middle Tier Database Layer SLIDE: 29
Intelligent data layer Maximize performance by bringing code to the data Parallel search and aggregates minimize data movement Built-in HTTP app server simplifies application architecture Reuse JSON data models, JavaScript code across tiers Async tasks, batch processing increase flexibility Intelligent ES6 iterators, generators improve productivity SLIDE: 30
Better answers from today s data Model and manipulate documents, relationships, and metadata combining JSON, XML, RDF, text, and binary Unified JavaScript interface for all indexes, data formats Text search, semantic inference, aggregates, geospatial, alerting Real-time consistency when milliseconds count SLIDE: 31
Simpler data integration with JavaScript Transactional multi-document updates ensure consistency React more quickly to change by using data in its natural format with less ETL Rich real-time indexes reduce custom code Ecosystem of data processing libraries ease development Streamline ETL with JavaScript built for JSON data SLIDE: 32
Application Application Logic {Java, Node} Client API Extensions HTTP Data Services REST Client API MarkLogic {JavaScript, XQuery} Built-ins Extensions User code Framework code SLIDE: 33
CODE DEMO
Hello, world! var q = cts.andquery([cts.wordquery( ), ]); var itr = subsequence(cts.search(q, ), 1, 10); for(var result of itr) { var obj = result.toobject(); } SLIDE: 35
Built-in Types Value:.toObject() and.valueof() ValueIterator: Lazy loaded sequences, ES6 iterator for(var item of iterator) { } Eager? iterator.toarray() Node: Document, ObjectNode, XMLNode, etc. cts.doc('/thundersnow.json') instanceof Node; // true A few others and more coming SLIDE: 36
Nodes vs. Objects Nodes: What s in the database Immutable (just like XQuery) JSON (object, array, number, ), XML (element, attribute, ), binary, text Objects: What s in your code Mutable: obj.fullname = "Nigel Tufnel" Automatic translation, for your convenience: xdmp.documentinsert( ) SLIDE: 37
Nodes vs. Objects fn.collection() // ValueIterator.next() // Iterate.value // Document node //.root // ObjectNode (not required).toobject(); // Returns plain-old object SLIDE: 38
Updates declareupdate(); JS-DECLAREUPDATE: JavaScript updates must begin with declareupdate() for(var item of fn.collection("accounts")) { var obj = item.toobject(); obj.balance = obj.balance * 1.05; var collections = xdmp.documentgetcollections(uri); xdmp.documentinsert(item.nodeuri, obj, xdmp.defaultpermissions(), collections); } SLIDE: 39
Namespaces and modules Global namespaces: xdmp, cts, sem, etc. CommonJS-style modules: module.exports, require() Same path resolution, precedence as XQuery (not Node.js) Import XQuery, employ as JavaScript Public functions and variables Type mapping Automatic camelcase conversion: my:do-something() var my = require('my'); my.dosomething() SLIDE: 40
Server-Side JavaScript!= Node.js Complementary, but very different Both use V8, but are separate environments, processes Share models, libraries, and patterns between them MarkLogic Server-Side JavaScript Sync interface, asyc below xdmp.*request response* and http.* Shares process Node.js Async throughout require('http') Remote service SLIDE: 41
Coverage and Performance Comprehensive coverage of built-ins Import existing XQuery modules (admin, search, etc.) V8 is fast for computation on an E-node SLIDE: 42
JSON
MarkLogic Architecture SLIDE: 44
QUERY LAYER JAVASCRIPT SPARQL SQL XQUERY EVALUATION LAYER EVALUATOR QUERY CACHE BROADCASTER AGGREGATOR DATA LAYER TRANSACTION CONTROLLER DATA CACHE TRANSACTION JOURNAL INDEXES COMPRESSED STORAGE SLIDE: 45
New and Enhanced features in MarkLogic 8 Samplestack: Reference architecture instance Client APIs: Extensible drivers for Java and Node.js Open development: Transparency and responsiveness with GitHub-first Server-side JavaScript: Not XQuery Native JSON: Unified indexing, query for today s data SLIDE: 46
JSON Unified indexing and query for today s web and SOA data Speed up development with powerful built-in search, transformation, and alerting capabilities designed for JSON Reduce lost fidelity and functionality from data model translations and brittle ETL Simplify architecture with data, metadata, and relationships managed consistently and securely together Ease modern, end-to-end JavaScript development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "_id": 1, "name": { "MarkLogic" }, "supports" : [ { "datatype": "XML", "year": 2003 }, { "datatype": "JSON", "year": 2014 } ] } SLIDE: 47
Better answers from today s data Mix and match the most appropriate data formats without costly up-front schemas JSON: Data structures and hierarchies XML: Markup and rich text RDF Triples: Facts and relationships Decrease development time and governance costs with unified management, indexing, security across formats SLIDE: 48
Straightforward data integration Reduce the cost of translations and duplication working with data from multiple sources, likely already JSON React more quickly to change by using data in its natural format with less ETL Reduce risk and governance costs with fewer data silos Reuse existing JavaScript skills and tools for processing SLIDE: 49
JavaScript XQuery JSON XML SLIDE: 50
JSON data model Document unnamed { "name": "Oliver", "scores": [88, 67, 73], "isactive": true, "affiliation": null } "Ol " Array scores Object unnamed true Text name Boolean isactive Null affiliation 88 67 73 Number scores Number scores Number scores SLIDE: 51
Integrated Full-text, value, scalar, geo, triple indexing Strong typing (no tokenization) for numbers, booleans, null Support for GeoJSON and ArcGIS points Dates and Arrays just work XPath and XQuery doc.xpath('/node()[some $n in friends/name satisfies startswith($n, "A")]'); Seamless in JavaScript var b = cts.doc('/u485.json').root.currentbalance; SLIDE: 52
JSON or XML? JSON XML Data structures Types: string, number, boolean, null JavaScript AJAX, Jackson, Markup Text: Language, mixed content XQuery, XPath, XSLT, Schema, SOAP, JAXB, SLIDE: 53
Migrating from the existing JSON façade Sample upgrade script ships with 8.0-1 Additional Code and configuration changes (e.g. xdmp.to-json()) JSON XML JSON SLIDE: 54