Databricks Cloud Platform Native REST API 1.1



Similar documents
Databricks Cloud Platform Native REST API 1.0

MathCloud: From Software Toolkit to Cloud Platform for Building Computing Services

The Easiest Way to Run Spark Jobs. How-To Guide

Leveraging Cloud Storage Through Mobile Applications Using Mezeo Cloud Storage Platform REST API. John Eastman Mezeo

UForge 3.4 Release Notes

Customer Case Study. Sharethrough

API Reference Guide. API Version 1. Copyright Platfora 2016

Policy Guide Access Manager 3.1 SP5 January 2013

Contents. 2 Alfresco API Version 1.0

HTTPS hg clone plugin library SSH hg clone plugin library

Introduction to Spark

Cloud Elements! Marketing Hub Provisioning and Usage Guide!

Microsoft Internet Information Services Solution 1.0

Online signature API. Terms used in this document. The API in brief. Version 0.20,

Cloud Elements ecommerce Hub Provisioning Guide API Version 2.0 BETA

PTC Integrity Eclipse and IBM Rational Development Platform Guide

SafeNet KMIP and Google Cloud Storage Integration Guide

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

Copyright Pivotal Software Inc, of 10

Sentinel EMS v7.1 Web Services Guide

Amazon Glacier. Developer Guide API Version

Cloud Elements! Events Management BETA! API Version 2.0

Databricks. A Primer

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Enabling SSO between Cognos 8 and WebSphere Portal

How To Use Kiteworks On A Microsoft Webmail Account On A Pc Or Macbook Or Ipad (For A Webmail Password) On A Webcomposer (For An Ipad) On An Ipa Or Ipa (For

Moving From Hadoop to Spark

SQUEEZE SERVER. Operation Guide Version 3.0

CA SiteMinder. SAML Affiliate Agent Guide. 6.x QMR 6

IBM Watson Ecosystem. Getting Started Guide

Databricks. A Primer

1 How to install CQ5 with an Application Server

CA XCOM Data Transport Gateway

Google Calendar 3 Version 0.8 and 0.9 Installation and User Guide. Preliminary Setup

Apigee Gateway Specifications

Running Knn Spark on EC2 Documentation

Lecture 11 Web Application Security (part 1)

OAuth 2.0 Developers Guide. Ping Identity, Inc th Street, Suite 100, Denver, CO

The Cloud to the rescue!

Passcreator API Documentation

Login with Amazon. Getting Started Guide for Websites. Version 1.0

NetBrain Security Guidance

Customer Case Study. Automatic Labs

API documentation - 1 -

Integrating VoltDB with Hadoop

CS506 Web Design and Development Solved Online Quiz No. 01

Fairsail REST API: Guide for Developers

IBM Cloud Manager with OpenStack. REST API Reference, version 4.1

Hadoop Distributed File System Propagation Adapter for Nimbus

Put a Firewall in Your JVM Securing Java Applications!

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Jobs Guide Identity Manager February 10, 2012

Enabling Single-Sign-On between IBM Cognos 8 BI and IBM WebSphere Portal

Mobile Trillium Engine

IPSL - PRODIGUER. Messaging Platform Design

Developer Guide: REST API Applications. SAP Mobile Platform 2.3 SP03

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

Ansible Tower API Guide

A Quick Introduction to Google's Cloud Technologies

Intellicus Enterprise Reporting and BI Platform

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Making big data simple with Databricks

CRM Migration Manager for Microsoft Dynamics CRM. User Guide

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

MarkLogic Server. Administrator s Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

FaxCore 2007 Database Migration Guide :: Microsoft SQL 2008 Edition

DEPLOYMENT GUIDE Version 2.1. Deploying F5 with Microsoft SharePoint 2010

Deploying the BIG-IP System v11 with Microsoft SharePoint 2010 and 2013

SOOKASA WHITEPAPER SECURITY SOOKASA.COM

Programming Autodesk PLM 360 Using REST. Doug Redmond Software Engineer, Autodesk

Introduction to Web Development with R

CumuLogic Load Balancer Overview Guide. March CumuLogic Load Balancer Overview Guide 1

Table of Contents. Vu ipad. v2.6. Navigation. Document Manager. Sync Manager. File Access. Field Verification Measurements. Markups List.

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

CommonSpot Content Server Version 6.2 Release Notes

Cache Configuration Reference

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP system v10 with Microsoft Exchange Outlook Web Access 2007

SSL CONFIGURATION GUIDE

Traitware Authentication Service Integration Document

Office365Mon Developer API

STORAGE S3 API. Storage S3 API. Programmer s Guide. Revision 1.0 (20/04/2012) Lunacloud Tel: Southampton Row info@lunacloud.

Perceptive Integration Server

The QueueMetrics Uniloader User Manual. Loway

MarkLogic Server. Java Application Developer s Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Developer Guide: REST API Applications. SAP Mobile Platform 2.3

WORKING WITH LOAD BALANCING AND QUEUEING FOR ADOBE INDESIGN CS5 SERVER

Adeptia Suite 6.2. Application Services Guide. Release Date October 16, 2014

Cyber Security Challenge Australia 2014

A RESTful Web Service Interface to the ATLAS COOL Database

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

Automate Your BI Administration to Save Millions with Command Manager and System Manager

AT&T VERIFY CONNECT (V3.2.0) GETTING STARTED GUIDE FOR MOBILE SDK

Alfresco Enterprise on AWS: Reference Architecture

ArcGIS for Server: Administrative Scripting and Automation

FHLBNY File Transfer System (FTS)

Transcription:

Databricks Cloud Platform Native REST API 1.1 Overview This document describes the Databricks Native API that can be used by third party applications to interact with the Spark clusters managed by Databricks Cloud. This is an API similar to the one used by the Databricks Workspace (i.e., notebooks and dashboards) to interact with Spark and offers a tight integration of third party applications with the Databricks Workspace: the same tables (cached or not) can be shared between the third party applications and the Databricks Workspace. The API described in this document covers several aspects: Cluster management : describing and selecting existing clusters Execution contexts : creating unique variable namespaces where Spark commands can be called (in Scala or Python) Command execution : executing commands within a specific execution context Libraries : allows uploading 3rd party libraries that can be used in the submitted commands. This REST API runs over HTTPS. For retrieving information, HTTP is used, for state modification, HTTP is used. Response content type is json. For request, multipart/form data is used for file upload, otherwise application/x www form urlencoded is used. In version 1.1, basic authentication is used to authenticate the user for every API call. User s credentials are base64 encoded and are in the HTTP header for every API call, for example Authorization: Basic YWRtaW46YWRtaW4=. Next, we present a list of all the API calls and the we show examples and more details on each of the API calls. Update History version 1.0, Dec 2014 First version version 1.1, March 2015 Library management is updated

Uploaded libraries are visible in UI Persistent library across cluster restart Option to attach a library to new clusters Attach and detach a library to clusters Added ability to create, resize, and delete spark clusters List of API calls Cluster Management https://dbc_api_url/api/1.1/clusters/list lists all spark clusters, including id, name, state https://dbc_api_url/api/1.1/clusters/status retrieves information about a single spark cluster https://dbc_api_url/api/1.1/clusters/restart restart a spark cluster https://dbc_api_url/api/1.1/clusters/create create a new spark cluster https://dbc_api_url/api/1.1/clusters/resize add or remove memory from an existing spark cluster https://dbc_api_url/api/1.1/clusters/delete delete a spark cluster Execution Context https://dbc_api_url/api/1.1/contexts/create creates an execution context on a specified cluster for a given programming language https://dbc_api_url/api/1.1/contexts/status shows the status of an existing execution context https://dbc_api_url/api/1.1/contexts/destroy destroys an execution context Command Execution https://dbc_api_url/api/1.1/commands/execute runs a given command https://dbc_api_url/api/1.1/commands/cancel cancels one command https://dbc_api_url/api/1.1/commands/status shows one command s status or result Known limitations: command execution does not support %run. Library Upload https://dbc_api_url/api/1.1/libraries/list shows all uploaded libraries https://dbc_api_url/api/1.1/libraries/status shows library statuses https://dbc_api_url/api/1.1/libraries/upload uploads a java jar, python egg or python pypi library file https://dbc_api_url/api/1.1/libraries/delete delete a library https://dbc_api_url/api/1.1/libraries/attach schedule an action to attach an uploaded library to a cluster or all clusters

https://dbc_api_url/api/1.1/libraries/detach schedule an action to detach a library from a cluster or all clusters Note: the library API is more experimental than the rest of the API and could suffer bigger changes in the future versions of the API. The behavior is undefined if two libraries containing the same class file are added. Error Handling If the API call is successful the returned HTTP status code is 200. In case of internal errors or bad parameters, the returned code is 500 and the error messages also contains a json error message. For example: 500, "error": "RuntimeException: Unknown cluster id 1" or 500, "error": "NumberFormatException: For input string: None" For library manager, the API checks the request parameter, and return the following errors unknown library id error "error":"nosuchelementexception: key not found: 2915" unknown folder error Libraries with same name in the same folder are allowed. If two libraries with the same class are attached to the same cluster, the behavior is undefined. Updating a library jar has to delete the library first. Uploading a bad library still returns an id. User can find out the error by status call. API Call Examples and Details We now describe examples for each of the existing API calls and describe their parameters. Cluster Management https://dbc_api_url/api/1.1/clusters/list Reponse: "status" : "Running", "id" : "peacejam", "name" : "test", "driverip":"10.12.34.56", "jdbcport":"10000", "status" : "Running",

] "id" : "peacejam2", "name" : "test2", "driverip":"10.12.34.56", "jdbcport":"10000" The cluster status is one of Pending, Running, Reconfiguring, Terminating, Terminated, or Error. https://dbc_api_url/api/1.1/clusters/status?clusterid=peacejam id : peacejam, name : spark-it-up, driverip : 12.34.56.78, jdbcport : 10000 https://dbc_api_url/api/1.1/clusters/restart data = clusterid : peacejam id : peacejam Restart the given spark cluster given its id. https://dbc_api_url/api/1.1/clusters/create data = name : spark it up, memorygb : 500, usespot = true id : peacejam The cluster will enter a Pending and then Running state. The id returned in the response can be used in future calls to resize or delete a cluster. https://dbc_api_url/api/1.1/clusters/resize

data = clusterid : peacejam, memorygb : 1000 id : peacejam The response is returned immediately. Note that the actual operation may take a while if we need to acquire more instances. The cluster will enter a Reconfiguring state and then return to Running once the new capacity has been added. https://dbc_api_url/api/1.1/clusters/delete data = clusterid : peacejam id : peacejam The cluster will enter a Terminating state before transitioning to Terminated. Execution Context https://dbc_api_url/api/1.1/contexts/create data = "language": "scala", "clusterid": "peacejam" Reponse: "id" : "1793653964133248955" The two parameters to this call are the programming language, which can be one of scala, python, sql ] and clusterid, which must be obtained through the cluster manager API. https://dbc_api_url/api/1.1/contexts/status?clusterid=peacejam&contextid=179365396413324 8955 Reponse: "status" : "Running", "id" : "1793653964133248955" If the contextid does not exist, http 500 is returned with a message:

"error" : "ContextNotFound" status is one of "Pending", "Running", "Error"]. The first status represents the execution context is warming up, the second means that commands can be submitted to the execution context while the last state represent an error has occured while creating the execution status. https://dbc_api_url/api/1.1/contexts/destroy data = "contextid" : "1793653964133248955", "clusterid" : "peacejam" Reponse: "id" : "1793653964133248955" This API always echo the id back even if it is not a valid contextid. Command Execution https://dbc_api_url/api/1.1/commands/execute data = "language": "scala", "clusterid": "peacejam", "contextid" : "5456852751451433082", "command": "sc.parallelize(1 to 10).collect" "id" : "5220029674192230006" The response contains the ID of the command being executed. Note that the language parameter, specifies the language of the command. Currently, a scala execution context supports scala and sql commands, a python execution context supports python and sql commands, while a sql execution context supports only sql commands. with a command file https://dbc_api_url/api/1.1/commands/execute data = "language": "sql", "clusterid": "peacejam", "contextid" : "5456852751451433082" files = "command": "./wiki.sql" "id" : "2245426871786618466"

https://dbc_api_url/api/1.1/commands/status?clusterid=peacejam&contextid=545685275145 1433082&commandId=5220029674192230006 Example Responses: 1. Text Success Results "status" : "Finished", "id" : "5220029674192230006", "results" : "resulttype" : "text", "data" : "res0: ArrayInt] =Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)" 2. Table Success results: "status" : "Finished", "id" : "2245426871786618466", "results" : "resulttype" : "table", "truncated" : "False", "data" : 1847917, "& Yet& Yet" ], 28894499, "'Amorte'eCarnevale" ], 16496086, "'Ayim" ], 2231045, "'Hours...'" ], 4410030, "'Sgudi'Snaysi" ] ], "schema" : "type" : "int", "name" : "id",

] "type" : "string", "name" : "title" 3. Command is still running: "status" : "Running", "id" : "8370632921745693416" 4. Finished with errors: "status" : "Finished", "id" : "3881634436378952468", "results" : "resulttype" : "error", "cause" : "InnerExecutionException" status can be "Queued", "Running", "Cancelling", "Finished", "Cancelled", "Error"]. The data field will be present for commands with status: Finished and Cancelled. Note that in this version the API (1.1), command results can only be retrieved once, i.e., after the result has been returned, subsequent commands for status will return an error. Table results are returned when the result of the command is a (part of a) table, e.g., select * from table1. In version 1.1, the API does not support direct retrieval of binary data, the data in a text result will be encoded using UTF 8. However, 3rd party application developers can easily implement a shim layer to encode/decode binary data inside their commands/application (e.g., using base64 encoding). https://dbc_api_url/api/1.1/commands/cancel data = "clusterid": "peacejam", "contextid" : "5456852751451433082", "commandid" : "2245426871786618466" "id" : "2245426871786618466"

Library Management https://dbc_api_url/api/1.1/libraries/list "id" : "1234", "name" : "mylib", "folder" : "/", "id" : "5678", "name" : "jar1_2.10-1.0.jar", "folder" : "/" ] https://dbc_api_url/api/1.1/libraries/status?id=1234 "id" : "1234", "name" : "mytestjar", "folder" : "/", "files" : "7fbcf1de_3a36_4628_8173_172413292907-jar4_2.10_1.0.jar"], "libtype" : "java-jar", "attachallclusters" : "true", "statuses" : "clusterid":"0223-221237-tract", "status":"attached", "message":"errormessage", ] library status can be "Attaching", "Attached", "Error", "Detaching", "Detached"] with a library file https://dbc_api_url/api/1.1/libraries/upload data = "libtype": "python egg", "name" : "mylib.egg", "folder": "/mylibraries", "attachtoallclusters": "false"

files = uri": "./spark/python/test_support/userlib 0.1 py2.7.egg" "id" : "12345" language can be " java jar ", " python pypi ", " python egg " ] with a python pypi package name https://dbc_api_url/api/1.1/libraries/upload data = "libtype": "python pypi", "name" : "fuzzywuzzy", "folder": "/mylibraries", "attachtoallclusters": "false" files = uri": "fuzzywuzzy" "id" : "12345" https://dbc_api_url/api/1.1/libraries/delete data = "libraryid": "1234" "id" : "12345 https://dbc_api_url/api/1.1/libraries/attach data = "libraryid": "1234", "clusterid" : "0223 221237 tract" "id" : "12345 clusterid can a specific cluster id, or " ALL_CLUSTERS". The latter will schedule an attach action for the library to all clusters including new clusters. https://dbc_api_url/api/1.1/libraries/detach data = "libraryid": "1234", "clusterid" : " ALL_CLUSTERS" "id" : "12345

clusterid can be a specific cluster id, or " ALL_CLUSTERS". The latter will disable attaching to all clusters. Unless the library is specifically attached to the cluster, this will schedule a detaching action for that cluster.