MarkLogic 8: Infrastructure Management API, Flexible Replication, Incremental Backup, and Sizing Recommendations Caio Milani November 2014
MarkLogic 8 Feature Presentations Topics Developer Experience: Samplestack and Reference Architecture Product Manager Kasey Alderete Developer Experience: Node.js and Java Client APIs, Server-side JavaScript, and Native JSON REST Management API, Flexible Replication, Sizing, and Reference Hardware Architectures Bitemporal Justin Makeig Caio Milani Jim Clark Semantics Stephen Buxton SLIDE: 2
Agenda Flexible Replication Management API Incremental Backup Reference Hardware Architecture SLIDE: 3
FLEXIBLE REPLICATION
Flexible Replication Customizable information sharing between systems Enable content collaboration across numerous systems Support directly connected or mobile users Provide data that users need using simple configurable parameters or queries Ensure data consistency and security with simple workflows Even better with Bitemporal and Management API SLIDE: 5
Intelligent Data Layer Enabling Data Collaboration Data replicates across many databases No need for a master data store No need for continuous connectivity No need to replicate all data Consistency on edits can be handled by Simple versioning Check-in/outs/publish Conflicts resolution rules Bitemporal collections SLIDE: 6
Users Get Only the Data They Need Data moves based on collections, URIs, or user defined queries User changes to settings and queries update replicated content on their laptops Data can be transformed and filtered before replication Security is consistent across all peers ensuring reliable data access control SLIDE: 7
Choosing the Right Feature For the Job Flexible Replication is a document centric solution aimed at information sharing Flexible Replication is not intended for DR and does not preserve transaction boundaries Database Replication makes a transactionally consistent copy of the primary data in another data center aimed at DR Filter Filter SLIDE: 8
How Documents Are Replicated Flexible Replication is an asynchronous solution built on top of the Content Processing Framework (CPF) running on a task queue Any time a target document changes its properties fragment is updated. Document updates can be pushed (to the replica) or pulled (by the replica) For push targets, an immediate push is attempted. For pull targets, the properties are updated to reflect that the document needs to be replicated Query-based targets typically use pull, and for scalability reasons, query-based push targets will also not have an immediate push attempt If the task server queue is more than half full, the Master Server will not push documents to the Replica and will instead leave it for the scheduled push task SLIDE: 9
Scheduled Tasks Regardless of whether you configure replication as push or pull, you must create a scheduled task to periodically replicate updated content A scheduled replication task does the following: Moves zero-day content that existed before replication was configured Provides a retry mechanism in the event the initial replication fails Replicates deletes on the Master to the Replica Replication retries are a combination of the task frequency, documents per batch and min. and max. wait retry times Zero-day documents replicate after documents that have failed replication SLIDE: 10
Choose What To Replicate Documents are replicated based on domain or serialized queries A domain may be a document, a collection of documents, or a directory A query works as if you were replicating the results of a search Users can manage their queries to control what gets replicated Also can pause/restart replication in order to preserve bandwidth SLIDE: 11
Query-Based Replication Based on Alerting Start from with a FlexRep config Create a query-based target by passing in a user id cfg = flexrep:configurationcreate() flexrep:target-create() admin:group-add-scheduled-task() flexrep:configuration-target-setuser-id() Then use alerting API to manage the user s queries, and any matching documents will be replicated to the target alert:make-rule(. xdmp:user( me"), cts:word-query("apple") ) flexrep:pull-create() SLIDE: 12
Modify Documents Before and After Replication Flexible Replication supports filters that can modify the content, URI, properties, collections, permissions, or anything else about the document Filters can help deciding which documents to replicate and which not to, and which documents should have only pieces replicated Or even wholly transform the content as part of the replication, using something like an XSLT stylesheet to automatically adjust from one schema to another Filters work on master outbound data and/or replica inbound data SLIDE: 13
Multi-Master Each database can be a master for its own documents sets and transmit updates to remote servers A database can be a master for some content and replica for another Updates Updates Reads A database can transitively replicate to additional data centers Domain/Query Application Replication SLIDE: 14
Ownership and Conflicts In cases of conflict, the master by default wins but filters and custom code can assist with more sophisticated conflict handling Implementation Example Logic of a virtual lock using custom code on outbound/ inbound filters Filters can be used to modify document's properties creating virtual locks (example) Or filters can move documents along collections: pending, merging, conflicted to enable automatic or manual resolution This is a proven solution deployed in critical operations SLIDE: 15
Scale and Collaborate Scalability to thousands of systems can be achieved by a tiered architecture Core clusters Core clusters replicate to regional clusters that replicate to personal databases Modifications on personal databases can be cascaded back to core clusters and redistributed globally Regional clusters Personal Databases SLIDE: 16
MANAGEMENT API
Management API REST-based API to manage all MarkLogic capabilities Increase efficiency and agility by automating timeconsuming repetitive tasks across production, testing and development Reduce setup time and admin error by orchestrating multi-step configurations and deployments Fit more seamlessly into IT environments by using REST interfaces unlike CLI or proprietary APIs Perform automated testing and monitor performance using market tools that support REST Even better with Client REST API, Elasticity SLIDE: 18
Adaptive to Every Environment Stateless HTTP calls adapt to changing datacenter topologies unlike CLI and socket based APIs Use filtering and property parameters to scope endpoint calls and reduce client-side code Format payloads and outputs to either HTML, JSON, or XML, adapting to different scripting technics HTTP API Control access to endpoints with the manage-user(get, HEAD) and manage-admin roles Manage simultaneous requests with built in concurrency and lock control, avoiding partial or erroneous updates SLIDE: 19
Script All Operations in MarkLogic 8 Topologies Databases, forests, groups, application servers, clusters coupling and decoupling Security Users, roles, amps, privileges, and external security HA/DR Local failover, database and flexible replication Backup and Storage Backup and restore, Tiered storage, CPF configuration Configuration SQL views, re/index, merge, bitemporal, inference operations Deployment Host bootstrap manipulation, restart and shutdown operations, packaging SLIDE: 20
From Read-Only to Full Control MarkLogic 5: exposed read-only APIs for status and configuration information MarkLogic 7: exposed cluster, host and forest-level interfaces sufficient for standing up a cluster MarkLogic 8: exposing almost all other configuration/management tasks that can be accomplished via GUI, with minor exceptions SLIDE: 21
General Pattern of Endpoints Http /manage/(v2 latest)/ Descritption JSON or XML Output/Input GET resource-type returns a list of the resources Yes POST resource-type accepts a properties flavor and creates a resource of that type. GET resource-type/name returns a description of the resource Yes DELETE resource-type/name deletes the resource N/A POST resource-type/name performs an operation on that resource Yes GET PUT resourcetype/name/properties resourcetype/name/properties returns a description of the resource in a properties flavor. Property representations are generally replayable. accepts a properties flavor and modifies the resource accordingly Yes Yes Yes SLIDE: 22
General Pattern of Parameters and Headers Request parameters format? Request headers: accept? content-type Response headers: content-type Acceptable format: JSON XML Acceptable content types: application/xml application/json application/x-www-form-urlencoded On endpoints that support both content negotiation via accept headers and a format parameter, format parameter will override the accept headers. SLIDE: 23
Example: Payloads for POST { "admin-username" : "adminuser", "admin-password" : "mypassword", "realm" : "public" } <instance-admin xmlns="http://marklogic.com/manage"> <admin-password>adminuser</admin-password> <admin-username>mypassword</admin-username> <realm>public</realm> </instance-admin> SLIDE: 24
Example: Checking Backup Status Post resource-type/name using XML script and JSON payload $payload-status :='{"operation": "backup-status", "job-id" : "' $backup-jobid '","host-name": "' $backuphostname '"}' $status-response := xdmp:httppost("http://localhost:8002/manage/v2/databases/test-db?format=json", <data>{$payload-status}</data> <headers> <content-type>application/json</content-type> <accept>application/json</accept> </headers> SLIDE: 25
Example: Adding a Host to a Cluster curl -X POST -d "" http://${joining_host}:8001/admin/v1/init JOINER_CONFIG=`curl -s -S -X GET -H "Accept: application/xml http://${joining_host}:8001/admin/v1/server-config` curl -s -S --digest --user admin:password -X POST-o cluster-config.zip -d "group=default --data-urlencode "server-config=${joiner_config} -H "Content-type: application/x-www-form-urlencoded http://${bootstrap_host}:8001/admin/v1/cluster-config curl -s -S -X POST -H "Content-type: application/zip --data-binary @./cluster-config.zip http://${joining_host}:8001/admin/v1/cluster-config SLIDE: 26
Example: Adding Flexrep Configuration on a Master POST resource-type/name/properties XML script and JSON payload $payload := '{"domain-name": "marklogic-com-domain-2","alerting-uri": "http://marklogic.com/org/uri"} $response := xdmp:http-post("http://localhost:8002/manage/v2/databases/flxrepmaster-db/flexrep-configs?format=json", <data>{$payload}</data> <headers> <content-type>application/json</content-type> <accept>application/json</accept> </headers> SLIDE: 27
INCREMENTAL BACKUP
Incremental Backup Faster backups while using less storage Store only changes since the previous full or incremental backup Consume less storage for backup copies Reduce backup window Improve availability with multiple daily backups Work with Log Archiving to enable fine-grained point-in-time recovery INCREMENTAL BACKUP (delta/differential) FULL FULL SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY SLIDE: 29
Uncompromised Data Resiliency Reduce Recovery Point Objective (RPO) with incremental backup and journal archiving Perform point-in-time recovery to overcome garbage-in problems Journal Frames Active Journal Journal Frames With Timestamps Archived Journals Full or Incremental Backup Simple operation as server restores backup set and replays the journal starting from given timestamp Restore timestamp in journal Garbage in FULL BACKUP INCREMENTAL BACKUP INCREMENTAL BACKUP INCREMENTAL BACKUP INCREMENTAL BACKUP SLIDE: 30
INCREMENTAL BACKUP (delta/differential) Smaller, Faster, and More Consistent FULL INCREMENTAL BACKUP (cumulative) FULL Store only data that changed since last full for faster restores FULL FULL Store changes since last incremental (deltas) for faster backups and less space SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY Shorter validation as subsequent incrementals do not examine the full backup 1-Full Backup 2-Incremental Validation Phase Begin Transaction Copy Phase End Transaction Sync Phase Backup and restore are transactional and guarantee a consistent view of the data 3-Subsequent TIME SLIDE: 31
Distributed Backups and Restores Database backup and restore operations are distributed All data nodes in a cluster participate Backup and restore provide consistent database-level backups and restores SLIDE: 32
Backup Directory Structure When you back up a database, you specify a backup directory Incremental backups are stored in their own directory Supports either a shared or unshared directory (same path must exist on each data node) Example: In this example, the backup directory is /abc/backup and the incremental backup directory is /abc/incremental /abc/backups 20140801-1223942093224 (full backup on 8/1) /abc/incremental 20140801-1223942093224 20140802 331006226070 (incremental backup on 8/2) 20140803 341007528950 (incremental backup on 8/3) SLIDE: 33
Flexibility to Select Data to Backup By default you backup everything: The configuration files The Security database, including all of its forests The Schemas database, including all of its forests All of the forests of the database you are backing up If you back up all forests, you will have a backup that you can restore to the exact same state as when the backup begins copying files You can also backup individual forests, choosing the ones you need. Forest-level backups are consistent for the data in the forest SLIDE: 34
Consistent Database-Level Backups and Restores Backup and restore operations are transactional and guarantee a consistent view of the data Data changes after copy begins are not reflected in the backup or restore set Backup and restore operations do not lock the database Database and Forest administrative tasks such as drop, clear, and delete cannot take place during a backup; any such operation is queued up and will initiate after the backup transaction has completed SLIDE: 35
Phases of Backup and Restore Operation Validation Phase Copy Phase Sync Phase Begin Transaction End Transaction Validation Phase Checks for needed files and directories and if they are writable and valid For backup operations, they are checked for sufficient disk space Copy Phase The files are actually copied to or from the backup directory The config files are copied at the beginning and a timestamp is written Starts a transaction; if the transaction fails on a restore, the database remains unchanged Synchronization Phase Deletes temporary files Leaves the database in a consistent state On a restore, it also takes the old version of the database offline and replaces it with the newly restored version SLIDE: 36
Summary of Incremental Backup Since an incremental backup takes less time than a full backup, it is possible to schedule frequent incremental backups (for example, by the hour) A full backup and a series of incremental backups can allow you to recover from a situation where a database has been lost Incremental backup can be used with or without journal archiving If you enable both incremental backup and journal archiving, you can replay the journal starting from the last incremental backup timestamp Incremental backups are recommended for large databases that would take long to backup in full mode SLIDE: 37
Backup/Restore Operations with Journal Archiving Journal Archiving enables restore to a specific point in time between backups with the input of a wall clock time When journal archiving is enabled, journal frames are written to backup directories by near synchronously streaming from the active journal When journal archiving is enabled, you will experience longer restore times and slightly increased system load as a result of the streaming of journal frames Performance can be tuned by adjusting the lag limit, the amount of time in which journal frames can differ from the frames streamed to the backup journal After incremental backup, journals can automatically purged to save space SLIDE: 38
REFERENCE HARDWARE ARCHITECTURE
Reference Hardware Architecture With some direct recommendations, you will know exactly how many nodes you will need for your data to ensure you achieve optimal performance for your applications at the lowest cost. PERFORMANCE CAPACITY 100% INDEXED 1% INDEXED SLIDE: 40
Sizing Forests of Indexed Content 100 GB/Forest 8 M Docs/Forest 500 GB/Forest 100 M Docs/Forest PERFORMANCE CAPACITY High Performance Many Facets/Range Indexes (~10) Sub Second High Number of Concurrent Requests Positions High capacity Fewer Concurrent Requests Archive/Repository/Analytics SLIDE: 41
Indexed Content Versus Non-Indexed Content Database Records Small Text Files 100% indexed Media Binaries Metadata only 1% Indexed 100% INDEXED 1% INDEXED SLIDE: 42
Ready to Wear: High Performance/High Capacity Minimum number of hosts and forests per host remains constant 3 host cluster, 6 primary forests, 6 replica forests per host on commodity hardware Size of forests shift depending upon where you are on the High Performance/High Capacity spectrum SLIDE: 43
Ready to Wear: High Performance Storage: 20 2.5 15K 600 Gb drives RAID 10, striping plus mirroring Use Case: Search Application Multiple facets (range indexes) Large number of concurrent users Subsecond queries Will require smaller forests with fewer documents per forest SLIDE: 44
Ready to Wear: High Capacity Storage: 20 2.5 10K 1200 Gb drives RAID 50, striping plus parity Use Case: Data Warehouse, Large Scale Analytics Smaller number of concurrent users Batch report processing that can run offline Forests can get much larger SLIDE: 45
Hardware/Sizing Recommendations 2U 25 SFF Chassis 2 Socket 8 Core/2.8Ghz 10GB Network 128GB 256GB RAM 2 2GB RAID Cards 22 10K 900GB Data Drives SLIDE: 46
Hardware/Sizing Recommendations 32 Threads @ 2Ghz 4GB/8GB per Thread 2U 25 SFF Chassis 1GB/Sec IO to Network 1GB/Sec IO to Disks 300GB/Forest + Temp, Binaries, Logs SLIDE: 47
Example 3 Node Clusters (All HA) Archival, ediscovery Metadata Search, Media Store Mid-Density Database High- Performance RAID50 RAID50 RAID10 RAID10 22TB Indexed 9TB Indexed 6TB Online 16TB Nearline 20TB Binaries 4TB Indexed 2TB Indexed SLIDE: 48
Best Practices: Ancillary Database Placement Replicate Security, Triggers, Modules, Schemas, Meters Critical to replicate Security and Modules; multiple copies are good When upgrading, masters should all be on ONE HOST in the cluster Meters needs multiple forests at scale SLIDE: 49
Best Practices: Huge Pages Transparent Huge Pages: enabled by default in RHEL 6. Instead, disable THP and configure Huge Pages instead. Should be set to 3/8 physical memory Swap should be equal to size of physical memory minus huge pages SLIDE: 50
Best Practices: Local Disk Replicas 6 Replicas per Host Ingestion: still need background merges for replicas Essentially doubles the size of the forests: now we have a copy of all documents in a replica forest 2x the size forests, 2x the number of forests Another way of saying this: non-ha is ½ of HA But don t do that SLIDE: 51
Design Patterns for High Availability 6 Primary, 6 Replica per Host Distribute across hosts don t want to be in a situation where we re not sharing load evenly in failover situation Easiest to add 3 hosts at a time and use same distribution pattern; you can add one or two, but you will need to use forest migration SLIDE: 52
Three Host Cluster: Starting Configuration Host 1 f1 r-7 f2 r-8 f3 r-9 f4 r-16 f5 r-17 f6 r-18 Security Modules Triggers Schema Meters Host 2 f7 r-13 f8 r-14 f9 r-15 f10 r-4 f11 r-5 f12 r-6 r-sec1 r-mod1 Host 3 f13 r-1 f14 r-2 f15 r-3 f16 r-10 f17 r-11 f18 r-12 r-sec2 r-mod2 SLIDE: 53