docs.rackspace.com/api
Rackspace Cloud Big Data Getting Started API v2.0 (2015-06-30) 2015 Rackspace US, Inc. This guide is intended for software developers interested in developing applications using the Rackspace Cloud Big Data Application Programming Interface (API). The document is for informational purposes only and is provided AS IS. RACKSPACE MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, AS TO THE ACCURACY OR COM- PLETENESS OF THE CONTENTS OF THIS DOCUMENT AND RESERVES THE RIGHT TO MAKE CHANGES TO SPECIFICATIONS AND PROD- UCT/SERVICES DESCRIPTION AT ANY TIME WITHOUT NOTICE. RACKSPACE SERVICES OFFERINGS ARE SUBJECT TO CHANGE WITH- OUT NOTICE. USERS MUST TAKE FULL RESPONSIBILITY FOR APPLICATION OF ANY SERVICES MENTIONED HEREIN. EXCEPT AS SET FORTH IN RACKSPACE GENERAL TERMS AND CONDITIONS AND/OR CLOUD TERMS OF SERVICE, RACKSPACE ASSUMES NO LIABILITY WHATSOEVER, AND DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO ITS SERVICES INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. Except as expressly provided in any written license agreement from Rackspace, the furnishing of this document does not give you any license to patents, trademarks, copyrights, or other intellectual property. Rackspace, Rackspace logo and Fanatical Support are registered service marks of Rackspace US, Inc. All other product names and trademarks used in this document are for identification purposes only and are property of their respective owners. ii
Table of Contents 1. Overview... 1 1.1. Cloud Big Data concepts... 1 1.2. Use cases... 2 1.3. Prerequisites for running examples... 3 1.4. Pricing and service level... 3 2. Service access endpoints... 4 3. Sending requests to Cloud Big Data... 5 3.1. Using curl... 5 3.1.1. Sending API requests by using curl... 5 3.1.2. Copying and pasting curl request examples into a terminal window... 6 3.2. Setting up python-lavaclient CLI... 8 3.2.1. Prerequisites... 8 3.2.2. Installing the CLI... 8 4. Generating an authentication token using curl... 9 5. Generating an authentication token using the lavaclient... 14 6. Creating and managing credentials... 15 6.1. Creating a credential... 15 6.1.1. curl example... 15 6.1.2. Client example... 16 6.2. Listing all credentials... 17 6.2.1. curl example... 17 6.2.2. Client example... 17 6.3. Updating credentials... 18 6.3.1. curl example... 18 6.3.2. Client example... 18 6.4. Deleting credentials... 19 6.4.1. curl example... 19 6.4.2. Client example... 19 7. Viewing resource limits... 20 7.1. curl example... 20 7.2. Client example... 20 8. Creating and managing Hadoop clusters... 22 8.1. Listing flavors... 22 8.1.1. curl example... 22 8.1.2. Client example... 23 8.2. Listing available distros... 23 8.2.1. curl example... 24 8.2.2. Client example... 25 8.3. Listing available stacks... 25 8.3.1. curl example... 25 8.3.2. Client example... 27 8.4. Creating a cluster... 28 8.4.1. curl example... 28 8.4.2. Client example... 31 8.5. Listing clusters... 32 8.5.1. curl example... 32 8.5.2. Client example... 33 iii
8.6. Viewing node details... 33 8.6.1. curl example... 33 8.6.2. Client example... 36 8.7. Resizing clusters... 37 8.7.1. curl example... 37 8.7.2. Client example... 40 8.8. Creating a script... 40 8.8.1. curl example... 41 8.8.2. Client example... 42 8.9. Listing all scripts... 42 8.9.1. curl example... 42 8.9.2. Client example... 44 8.10. Deleting clusters... 44 8.10.1. curl example... 44 8.10.2. Client example... 44 9. Additional resources... 45 10. Document change history... 46 Glossary... 47 iv
List of Tables 2.1. Regionalized service endpoints... 4 3.1. curl command-line options... 5 v
List of Examples 3.1. curl authenticate request: JSON... 7 4.1. curl authenticate request: JSON... 9 4.2. Authenticate response: JSON... 10 5.1. Authentication response using CLI utility... 14 5.2. Export environment variables... 14 6.1. curl create a credential - ssh_keys request... 15 6.2. Create a credential - ssh_keys request: JSON body... 15 6.3. Create a credential - ssh_keys response: JSON... 16 6.4. curl create a credential - cloud_files request... 16 6.5. Create a credential - cloud_files request: JSON body... 16 6.6. Create a credential - cloud_files response: JSON... 16 6.7. Create a SSH credential using the CLI... 16 6.8. Create a Cloud Files credential using the CLI... 16 6.9. curl list all credentials request... 17 6.10. List all credentials using the CLI... 17 6.11. curl update a credential request... 18 6.12. Update a credential request: JSON body... 18 6.13. Update a credential response: JSON... 18 6.14. Update a credential using the CLI... 18 6.15. curl delete a credential request... 19 6.16. Delete a credential using the CLI... 19 7.1. curl view resource limits request: JSON... 20 7.2. View resource limits response: JSON... 20 7.3. View resource limits... 21 8.1. curl list flavors request: JSON... 22 8.2. List flavors response: JSON... 22 8.3. List flavors and associated resources by using the flavors list command with the CLI... 23 8.4. curl list available distros request: JSON... 24 8.5. List available distros response: JSON... 24 8.6. View available distros with the CLI... 25 8.7. curl list all stacks request: JSON... 25 8.8. List all stacks response: JSON... 26 8.9. View available stacks with the CLI... 27 8.10. curl create cluster request... 28 8.11. Create cluster request: JSON body... 28 8.12. Create cluster response: JSON... 29 8.13. Create a cluster with the CLI... 31 8.14. curl list clusters request: JSON... 32 8.15. List clusters response: JSON... 32 8.16. List clusters with the CLI... 33 8.17. curl list cluster nodes request: JSON... 33 8.18. List cluster nodes response: JSON... 33 8.19. Query the details of a cluster by using the show and nodes commands with the CLI... 36 8.20. curl resize cluster request: JSON... 38 8.21. Resize cluster request: JSON body... 38 8.22. Resize cluster response: JSON... 38 vi
8.23. Increase cluster size by using the resize command with the CLI... 40 8.24. curl create a script... 41 8.25. Create a script request: JSON body... 41 8.26. Create a script response: JSON... 41 8.27. Create a script with the CLI... 42 8.28. curl list all scripts request... 42 8.29. List all scripts response: JSON... 42 8.30. List available scripts with the CLI... 44 8.31. curl delete cluster request: JSON... 44 8.32. Remove clusters by using the delete command... 44 vii
1. Overview Rackspace Cloud Big Data is an on-demand Apache Hadoop service for the Rackspace open cloud. The service supports a RESTful API and alleviates the pain associated with deploying, managing, and scaling Hadoop clusters. Cloud Big Data is just as flexible and feature-rich as Hadoop. With Cloud Big Data, you benefit from on-demand servers, utility-based pricing, and access to the full set of Hadoop features and APIs. However, you do not have to worry about provisioning, growing, or maintaining your Hadoop infrastructure. The Cloud Big Data service uses an environment that is specifically optimized for Hadoop, which ensures that your jobs run efficiently and reliably. Note that you are still responsible for developing, troubleshooting, and deploying your applications. The primary use cases for Cloud Big Data are as follows: Create on-demand infrastructure for applications in production where physical servers would be too costly and time-consuming to configure and maintain. Develop, test, and pilot data analysis applications. Cloud Big Data provides the following benefits: Create or resize Hadoop clusters in minutes and pay only for what you use. Access the Hortonworks Data Platform (HDP), an enterprise-ready distribution that is 100 percent Apache open source. Provision and manage Hadoop through an easy-to-use Control Panel and a RESTful API. Seamlessly access data in Cloud Files containers. Gain interoperability with any third-party software tool that supports HDP. Access Fanatical Support on a 24x7x365 basis via chat, phone, or ticket. This guide provides examples for the following ways to use the Cloud Big Data API: Using the API directly with curl Using the python-lavaclient command-line client (CLI) Examples for both ways to make request to Cloud Big Data are provided for authentication (Chapter 4, Generating an authentication token using curl [9] and Chapter 5, Generating an authentication token using the lavaclient [14]and for creating and managing clusters (Chapter 8, Creating and managing Hadoop clusters [22]. 1.1. Cloud Big Data concepts To use the Cloud Big Data API effectively, you should understand the following terminology: 1
Credentials: Credentials allow you to set up SSH keys and other connector credentials for use with clusters. Ex: Cloud Files credentials. Distros: Distros provide a list of supported distributions and their corresponding versions, as well as a list of supported services and components per distribution. Stacks: Stacks are high-level building blocks of software that compose a Big Data architecture. Stacks are composed of services, which in turn are composed of components. A stack is specific to a distribution because of the differences in services that are supported across distributions. Clusters: A cluster is a group of servers (nodes). Cloud Big Data supports both virtual and OnMetal servers. Nodes: A node is either a virtual or an OnMetal server that serves a particular role in the cluster. A node runs one or more components in the Hadoop ecosystem. Scripts: You can create a custom script that runs during various phases of the cluster's life cycle. The script is invoked on all nodes of the cluster. The script type currently supported is POST_INIT, which runs after the cluster is completely set up. The script must be executable. Preferably, the script should be a bash script, but it could be a Python script or a self-contained executable that works with the base libraries of the installed OS. Flavor: A flavor is an available configuration for each node in a Cloud Big Data cluster. Each flavor has a unique combination of memory capacity, priority for CPU time and storage space. Resource limits: Resource limits include items such as remaining node count, available RAM, and remaining disk space for the user. For the definitions of additional terminology related to Cloud Big Data, see the Glossary [47]. 1.2. Use cases Use cases for Cloud Big Data include but not limited to the following examples: Clickstream analysis Analyze click stream data in order to segment users and understand user preferences. Advertisers can also analyze click streams and advertising impression logs to deliver more effective ads. Log analysis Process logs generated by web and mobile applications. Cloud Big Data Platform helps customers turn petabytes of unstructured or semi-structured data into useful insights about their applications or users. Sentiment analysis Examine a corpus of text to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. 2
1.3. Prerequisites for running examples In order to run the examples in this guide, you must have the following prerequisites: A Rackspace Cloud account A Rackspace Cloud username and password, as specified during registration Prior knowledge of HTTP/1.1 conventions Basic familiarity with Cloud and RESTful APIs Prior knowledge of Hadoop or a third-party tool that works with Hadoop Ability to work with the Hortonworks Data Platform (HDP) By using the Cloud Big Data API, you understand and agree to the following limitations and conditions: Cloud Big Data includes a Swift integration feature so that Hadoop, MapReduce, Pig, Hive and Spark jobs can directly reference Cloud Files containers. 1.4. Pricing and service level Cloud Big Data is part of the Rackspace Cloud and your use through the API will be billed according to the pricing schedule at http://www.rackspace.com/cloud/big-data/pricing. The Service Level Agreement (SLA) for Cloud Big Data is available at http:// www.rackspace.com/cloud/legal/sla. 3
2. Service access endpoints The Cloud Big Data service is a regionalized service. The user of the service is therefore responsible for appropriate replication, caching, and overall maintenance of Cloud Big Data data across regional boundaries to other Cloud Servers. The endpoints to use for your Cloud Big Data API calls are summarized in the table below. To help you decide which regionalized endpoint to use, read the Knowledge Center article about special considerations for choosing a data center at About Regions. Table 2.1. Regionalized service endpoints Region Chicago (ORD) Dallas/Ft. Worth (DFW) London (LON) Northern Virginia (IAD) Endpoint https://ord.bigdata.api.rackspacecloud.com/v2/youraccountid/ https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ https://lon.bigdata.api.rackspacecloud.com/v2/youraccountid/ https://iad.bigdata.api.rackspacecloud.com/v2/youraccountid/ Replace the youraccountid placeholder with your actual account number, which is returned as part of the authentication service response, after the final / in the publicurl field. Note All examples in this guide assume that you are operating against the DFW data center. If you are using a different data center, be sure to use the associated endpoint from the table above. When you perform a Cloud Big Data API operation, place the endpoint at the beginning of the request URL. For example: https://dfw.bigdata.api.rackspacecloud.com/ v2/youraccountid/. 4
3. Sending requests to Cloud Big Data You have several options for sending requests to Cloud Big Data: You can use curl, the command-line tool from http://curl.haxx.se/. With curl, you can send HTTP requests and receive responses back from the command line. You can use the python-lavaclient CLI. If you like to use a more graphical interface you can use the Rackspace Cloud Control Panel https://mycloud.rackspace.com. 3.1. Using curl You can use curl, the command-line tool from http://curl.haxx.se/. With curl, you can send HTTP requests and receive responses back from the command line. 3.1.1. Sending API requests by using curl curl is a command-line tool that is available in most UNIX system-based environments and Apple Mac OS X systems, and can be downloaded for Microsoft Windows to interact with REST interfaces. For more information about curl, visit http://curl.haxx.se/. curl enables you to transmit and receive HTTP requests and responses from the command line or from within a shell script. As a result, you can work with the REST API directly without using one of the client APIs. The following curl command-line options are used in this guide to run the examples: Table 3.1. curl command-line options Option Description -d Sends the specified data in a POST request to the HTTP server. Use this option to send a JSON request body to the server. -H Specifies an extra HTTP header in the request. You can specify any number of extra headers. Precede each header with the -H option. Common headers in Rackspace API requests are as follows: Content-Type. Required for operations with a request body. Specifies the format of the request body. The syntax for the Content-Type header is as follows: Content-Type: application/format format is json. X-Tenant-Id. Optional. Specifies the Tenant ID, which is your account number. 5
Option Description Accept. Optional. Specifies the format of the response body. The syntax for the Accept header is as follows: Accept: application/format format is json. The default is json. X-Auth-Token. Required. Specifies the authentication token. -i Includes the HTTP header in the output. -s Silent or quiet mode. Does not show progress or error messages. Makes curl mute. Note: If your curl command is not generating any output, try replacing the -s option with -i. -T Transfers the specified local file to the remote URL. -X Specifies the request method to use when communicating with the HTTP server. The specified request is used instead of the default method, which is GET. About json.tool For commands that return a response, you can append the following code to the command to call json.tool to pretty-print output: python -m json.tool To use json.tool, import the json module. For information about json.tool, see json JSON encoder and decoder. If you do not want to pretty-print JSON output, omit this code. 3.1.2. Copying and pasting curl request examples into a terminal window To run the curl request examples shown in this guide on Linux or Mac systems, perform the following actions: 1. Copy and paste each example from the HTML version of this guide into an ASCII text editor (for example, vi or TextEdit). You can click on the small document icon to the right of each request example to select it. 2. Modify each example with your required account information and so on, as detailed in this guide. 3. After you are finished modifying the text for the curl request example with your information (for example, your username and your API key), paste the command into your terminal window. 4. Press Enter to run the curl command. 6
Note The carriage returns in the curl request examples that are part of the curl syntax are escaped with a backslash (\) to avoid prematurely terminating the command. However, you should not escape carriage returns inside the JSON message within the command. Consider the following curl authentication request: JSON example, which is described in detail in Chapter 4, Generating an authentication token using curl [9]. Example 3.1. curl authenticate request: JSON curl -i -d \ ' "auth": "RAX-KSKEY:apiKeyCredentials": "username": "yourusername", "apikey": "yourapikey" ' \ -H 'Content-Type: application/json' \ 'https://identity.api.rackspacecloud.com/v2.0/tokens' Notice that the lines that are part of the curl command syntax have been escaped with a backslash (\) to indicate that the command continues on the next line. curl -i -d \ (... lines within the JSON portion of the message are not shown in this example) (... the example only shows lines that are part of curl syntax) -H 'Content-Type: application/json' \ 'https://identity.api.rackspacecloud.com/v2.0/tokens' However, the lines within the JSON portion of the message are not escaped with a backslash to avoid issues with the JSON processing. ' "auth": "RAX-KSKEY:apiKeyCredentials": "username": "yourusername", "apikey": "yourapikey" ' \ The final line of the JSON message is escaped because the backslash lies outside the JSON message and continues the curl command to the next line. 7
Tip If you have trouble copying and pasting the examples as described, try typing the entire example on one long line, removing all the backslash line continuation characters. 3.2. Setting up python-lavaclient CLI Another way you can send requests to Cloud Big Data is to use the python-lavaclient CLI. This section provides the prerequisites for use of the client and installation instructions. 3.2.1. Prerequisites Following are the requirements for using the python-lavaclient CLI: Linux or Mac OS X Python 2.7.2 or later Rackspace Cloud account and access to Rackspace Cloud Big Data 3.2.2. Installing the CLI Perform the following steps to install the CLI. 1. Install the python-lavaclient from PyPI using pip. $ pip install lavaclient 2. Run the help command to ensure that the client has been installed correctly and note the usage information. $ lava help 8
4. Generating an authentication token using curl Whether you use curl or a REST client to interact with the Cloud Big Data API, you must generate an authentication token. You provide this token in the X-Auth-Token header in each Cloud Big Data API request. Example 4.1, curl authenticate request: JSON [9] demonstrates how to use curl to obtain the authentication token as well as your account number. You must provide both when making subsequent Cloud Big Data API requests. Remember to replace the placeholders in the following authentication request examples with your information: yourusername Your common Cloud Big Data user name, as supplied during registration. yourapikey Your API access key. You can obtain the key from the Rackspace Cloud Control Panel in the Your Account / API Keys section. Note This guide uses yourusername and yourapikey for authentication. For information about other supported authentication methods, see Authentication tokens in the Cloud Identity Client Developer Guide. Use the following global endpoint to access the Cloud Identity service for authentication: https://identity.api.rackspacecloud.com/v2.0/ You authenticate by using the URL https://identity.api.rackspacecloud.com/ v2.0/tokens for the Cloud Identity services. Note that the v2.0 component in the URL indicates that you are using version 2.0 of the Cloud Identity API. Example 4.1. curl authenticate request: JSON curl -s -d \ ' "auth": "RAX-KSKEY:apiKeyCredentials": "username": "yourusername", "apikey": "yourapikey" ' \ -H 'Content-Type: application/json' \ 'https://identity.api.rackspacecloud.com/v2.0/tokens' In the authentication response (example follows), the authentication token id is returned with an expires attribute that specifies when the token expires. Remember to supply your authentication token wherever you see the placeholder yourauthtoken in the examples in this guide. 9
Notes The values that you receive in your responses vary from the examples shown in this document because they are specific to your account. The expires attribute denotes the time after which the token will automatically become invalid. A token might be manually revoked before the time identified by the expires attribute. The attribute predicts a token's maximum possible lifespan but does not guarantee that it will reach that lifespan. Clients are encouraged to cache a token until it expires. Applications should be designed to re-authenticate after receiving a 401 (Unauthorized) response from a service endpoint. The publicurl endpoints for Cloud Big Data (for example https:// dfw.bigdata.api.rackspacecloud.com/v2/1100111) are also returned in the response. Your actual account number is after the final slash (/) in the publicurl field. In the following examples, the account number is 1100111. You must specify your account number on most of the Cloud Big Data API operations, wherever you see the placeholder youraccountid specified in the examples in this guide. After authentication, you can use curl to perform GET, DELETE, and POST requests for the Cloud Big Data API. Example 4.2. Authenticate response: JSON HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 Content-Length: 477 Date: Sat, 07 Dec 2013 18:45:13 GMT "access": "token": "expires": "2013-12-08T22:51:02.000-06:00", "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "user": "id": "123456", "name": "jsmith", "RAX-AUTH:defaultRegion": "DFW", "roles": [ "description": "Admin Role.", "id": "identity:admin", "name": "identity:admin", "description": "Default Role.", "id": "identity:default", "name": "identity:default" ], "servicecatalog": [ 10
"endpoints": [ "publicurl": "https://dfw.bigdata.api.rackspacecloud. com/v2/1100111", "region": "DFW", "tenantid": "1100111",, "name": "cloudbigdata", "type": "rax:bigdata" "endpoints": [ "publicurl": "https://dfw.loadbalancers.api. rackspacecloud.com/v1.0/1100111", "region": "DFW", "tenantid": "1100111", "publicurl": "https://ord.loadbalancers.api. rackspacecloud.com/v1.0/1100111", "region": "ORD", "tenantid": "1100111" "name": "cloudloadbalancers", "type": "rax:load-balancer", "endpoints": [ "tenantid": "1100111", "region": "DFW", "publicurl": "https://dfw.servers.api.rackspacecloud. com/v2/1100111", "versionid": "2", "versioninfo": "https://dfw.servers.api. rackspacecloud.com/v2/", "versionlist": "https://dfw.servers.api. rackspacecloud.com/", "tenantid": "1100111", "region": "ORD", "publicurl": "https://ord.servers.api.rackspacecloud. com/v2/1100111", "versionid": "2", "versioninfo": "https://ord.servers.api. rackspacecloud.com/v2/", "versionlist": "https://ord.servers.api. rackspacecloud.com/" "name": "cloudserversopenstack", "type": "compute", "endpoints": [ 11
v1.0/1100111", com/v1.0/", com/", "tenantid": "1100111", "publicurl": "https://servers.api.rackspacecloud.com/ "versionid": "1.0", "versioninfo": "https://servers.api.rackspacecloud. "versionlist": "https://servers.api.rackspacecloud. "name": "cloudservers", "type": "compute" "endpoints": [ "tenantid": "MossoCloudFS_aaaaaaaa-bbbb-cccc-ddddeeeeeeee", "publicurl": "https://storage101.dfw1.clouddrive.com/ v1/mossocloudfs_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "internalurl": "https://snet-storage101.dfw1. clouddrive.com/v1/mossocloudfs_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "region": "DFW", "tenantid": "MossoCloudFS_aaaaaaaa-bbbb-cccc-ddddeeeeeeee", "publicurl": "https://storage101.ord1.clouddrive.com/ v1/mossocloudfs_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "internalurl": "https://snet-storage101.ord1. clouddrive.com/v1/mossocloudfs_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "region": "ORD" "name": "cloudfiles", "type": "object-store", "endpoints": [ "tenantid": "MossoCloudFS_aaaaaaaa-bbbb-cccc-ddddeeeeeeee", "publicurl": "https://cdn1.clouddrive.com/v1/ MossoCloudFS_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "region": "DFW", "tenantid": "MossoCloudFS_aaaaaaaa-bbbb-cccc-ddddeeeeeeee", "publicurl": "https://cdn2.clouddrive.com/v1/ MossoCloudFS_aaaaaaaa-bbbb-cccc-dddd-eeeeeeee", "region": "ORD" "name": "cloudfilescdn", "type": "rax:object-cdn", "endpoints": [ 12
1100111" ] "tenantid": "1100111", "publicurl": "https://dns.api.rackspacecloud.com/v1.0/ "name": "clouddns", "type": "rax:dns" 13
5. Generating an authentication token using the lavaclient To authenticate your session by using the lavaclient, use the following steps. You need your Cloud username, API key and tenant ID. 1. Run the authenticate command with the parameters shown below. $ lava --user [username] --tenant [tenant_id] --api-key [api_key] --region DFW authenticate If the command runs successfully, your authentication token is displayed, as shown in the following example. Example 5.1. Authentication response using CLI utility AUTH_TOKEN=692c2a14-39ad-4ee0-991d-06cd7331f3ca 2. Export the AUTH_TOKEN and LAVA2_API_URL environment variables as shown in the following example. Replace yourtenantid with the your actual tenant ID. Example 5.2. Export environment variables $ export AUTH_TOKEN=692c2a14-39ad-4ee0-991d-06cd7331f3ca $ export LAVA2_API_URL=https://dfw.bigdata.api.rackspacecloud.com/v1.0/ yourtenantid Note The export commands are valid only for the current session. You need to rerun the export commands if, for example, you create a new console window. 3. To confirm that the client is running, run the distros list command. $ lava distros list +--------+---------------------------+---------+ ID Name Version +--------+---------------------------+---------+ HDP2.2 HortonWorks Data Platform 2.2 +--------+---------------------------+---------+ 14
6. Creating and managing credentials Before you can create Hadoop clusters, you must create credentials. Credentials allow you to setup SSH keys and other connector credentials for use with clusters. Note Your Cloud Big Data credentials are different from your cloud account. Your credentials have the following characteristics and requirements: A credential is the configuration for the administration and login account for the cluster. You can create any number of SSH credentials and attach to a cluster. Each cluster can contain only one Cloud Files credential connector. After you create a credential, you can attach that credential to clusters that you provision by using the API. This allows you to remotely SSH into a server to transfer data, run or troubleshoot jobs, and so on. 6.1. Creating a credential Verb URI Description POST /v2/tenant_id/credentials/type Creates a credential. This operation adds new credentials for a specific type. Based on the chosen type, ssh_keys or cloud_files, the request body varies. A general pattern is followed of a dict of the type that contains one or more credential related fields. 6.1.1. curl example The following examples show the curl request and corresponding response for creating a credential. Example 6.1. curl create a credential - ssh_keys request curl -i -X POST https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ credentials/ssh_keys -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 6.2. Create a credential - ssh_keys request: JSON body "ssh_keys": "key_name": "cbdkey", "public_key": "ssh-rsa AAkphQZaDNi2Ij3DX...5twE62lerq7Xhaff foo@bar" 15
Example 6.3. Create a credential - ssh_keys response: JSON "credentials": "ssh_keys": "key_name": "cbdkey" Example 6.4. curl create a credential - cloud_files request curl -i -X POST https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ credentials/cloud_files -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 6.5. Create a credential - cloud_files request: JSON body "cloud_files": "username": "cfuser", "api_key": "samplekey" Example 6.6. Create a credential - cloud_files response: JSON "credentials": "cloud_files": "username": "cfuser" 6.1.2. Client example Using the client, create credentials as shown in the following example. Example 6.7. Create a SSH credential using the CLI $ lava credentials create_ssh_key cbdkey "ssh-rsa AAkphQZaDNi2Ij3DX... 5twE62lerq7Xhaff foo@bar" +------+---------+ Type SSH Key Name cbdkey +------+---------+ Example 6.8. Create a Cloud Files credential using the CLI $ lava credentials create_cloud_files cfuser samplekey +----------+-------------+ Type Cloud Files Username cfuser +----------+-------------+ 16
6.2. Listing all credentials Verb URI Description GET /v2/tenant_id/credentials Lists all user credentials. This operation lists all user credentials. 6.2.1. curl example This operation does not accept a request body. The following examples show the curl request and corresponding response for listing all user credentials. Example 6.9. curl list all credentials request curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ credentials -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" "credentials": "cloud_files": [ "username": "cfuser" "ssh_keys": [ "key_name": "cbdkey" ] 6.2.2. Client example Using the client, list all credentials as shown in the following example. Example 6.10. List all credentials using the CLI $ lava credentials list +-------------+--------+ Type Name +-------------+--------+ SSH Key cbdkey Cloud Files cfuser +-------------+--------+ 17
6.3. Updating credentials PUT Verb URI Description /v2/tenant_id/credentials/type/ name Updates the specified user credential. The update marks clusters that already use the credential as out of sync. 6.3.1. curl example The following examples show the curl request and corresponding response for updating a credential. Example 6.11. curl update a credential request curl -i -X PUT https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ credentials/ssh_keys/cbdkey -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 6.12. Update a credential request: JSON body "ssh_keys": "key_name": "cbdkey", "public_key": "ssh-rsa AAkddddddddd3DX...5twE62lerq7Xhaff foo@bar" Example 6.13. Update a credential response: JSON "credentials": "ssh_keys": "key_name": "cbdkey" 6.3.2. Client example Using the client, update a credential as shown in the following example. Example 6.14. Update a credential using the CLI $ lava credentials update_ssh_key cbdkey "ssh-rsa AAkphQZaDNi2Ij3DX... 5twE62lerq7Xhaff foo@bar" +------+---------+ Type SSH Key Name cbdkey +------+---------+ 18
6.4. Deleting credentials DELETE Verb URI Description /v2/tenant_id/credentials/type/ name Deletes the specified user credential. You can delete only credentials that are not used by any active clusters. 6.4.1. curl example The following example show the curl request for deleting a credential. Example 6.15. curl delete a credential request curl -i -X DELETE https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ credentials/ssh_keys/cbdkey -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" This operation does not accept a request body. This operation does not return a response body. 6.4.2. Client example Using the client, delete a credential as shown in the following example. Example 6.16. Delete a credential using the CLI $ lava credentials delete_ssh_key cbdkey 19
7. Viewing resource limits The use of the Rackspace Cloud Big Data API is subject to resource limits. You can view the limits associated with your account by using the operation to view the resource limits, which displays limits such as remaining node count, available RAM, and remaining disk space for the user. Verb URI Description GET /v2/tenant_id/limits Displays the resource limits for the user. 7.1. curl example This operation does not accept a request body. The following examples show the curl request and corresponding response for viewing resource limits. Example 7.1. curl view resource limits request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ limits -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 7.2. View resource limits response: JSON "limits": "absolute": "disk": "limit": 100000, "remaining": 28882, "node_count": "limit": 100, "remaining": 50, "ram": "limit": 50000, "remaining": 34567, "vcpus": "limit": 50, "remaining": 25, 7.2. Client example Using the client, view the limits associated with your account by using the limits command. 20
Example 7.3. View resource limits $ lava limits get +-------------------------------+ Quotas +----------+--------+-----------+ Property Limit Remaining +----------+--------+-----------+ Nodes 20 18 RAM 614400 599040 Disk 115000 112500 VCPUs 160 156 +----------+--------+-----------+ 21
8. Creating and managing Hadoop clusters Now you are ready to create and manage Hadoop clusters by using the Rackspace Cloud Big Data API. This chapter provides examples, using curl and the client, for some of the common operations against Cloud Big Data. For information about all of the operations available in Cloud Big Data, see the Cloud Big Data Developer. 8.1. Listing flavors A flavor is an available hardware configuration for a cluster. Each flavor has a unique combination of memory capacity and priority for CPU time. The larger the flavor size you use, the larger the amount of RAM and the higher the priority for CPU time your cluster receives. You use the operation to list flavors to find the available configurations for your cluster, and then you decide which size you need for your cluster. You perform this operation when you create a cluster. Verb URI Description GET /v2/tenant_id/flavors Lists all available flavors. 8.1.1. curl example This operation does not require a request body. The following examples show the curl request and the corresponding response for listing flavors. Example 8.1. curl list flavors request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ flavors -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.2. List flavors response: JSON "flavors": [ "disk": 10000, "id": "hadoop1-60", "name": "XLarge Hadoop Instance", "ram": 61440, "vcpus": 16, "class": "hadoop1" 22
,, "disk": 1250, "id": "hadoop1-7", "name": "Small Hadoop Instance", "ram": 8192, "vcpus": 2, "class": "hadoop1" "disk": 3200, "id": "onmetal-io1", "name": "OnMetal IO 1", "ram": 122880, "vcpus": 40, "class": "onmetal" "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ flavors?limit=2&marker=hadoop1-7", "rel": "next" ] 8.1.2. Client example You can enumerate the flavors and associated resources by using the flavors list command, as shown in the following example. Example 8.3. List flavors and associated resources by using the flavors list command with the CLI $ lava flavors list +------------+------------------------+-------+-------+-------+ ID Name RAM VCPUs Disk +------------+------------------------+-------+-------+-------+ hadoop1-15 Medium Hadoop Instance 15360 4 2500 hadoop1-30 Large Hadoop Instance 30720 8 5000 hadoop1-60 XLarge Hadoop Instance 61440 16 10000 hadoop1-7 Small Hadoop Instance 7680 2 1250 +------------+------------------------+-------+-------+-------+ 8.2. Listing available distros Distros provide a list of supported distributions and their corresponding versions, as well as a list of supported services and components per distribution. Use the operation to list the distros to see the distros that are available. Verb URI Description GET /v2/tenant_id/distros Lists available distros. 23
8.2.1. curl example This operation does not accept a request body. The following examples show the curl request and the corresponding response for listing cluster types. Example 8.4. curl list available distros request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ distros -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.5. List available distros response: JSON "distros": [ "id": "HDP1.3", "name": "Hortonworks Data Platform", "version": "1.3", "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/distros/HDP1.3", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ distros/hdp1.3", "rel": "bookmark", "id": "HDP2.2", "name": "Hortonworks Data Platform", "version": "2.2", "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/distros/HDP2.2", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ distros/hdp2.2", "rel": "bookmark", "id": "CDH5", "name": "Cloudera Hadoop", "version": "5", "links": [ 24
1234/distros/CDH5", distros/cdh5", ], "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ "rel": "self" "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ "rel": "bookmark" 8.2.2. Client example Using the client, view available distros using the distros list command as shown in the following example. Example 8.6. View available distros with the CLI $ lava distros list +--------+---------------------------+---------+ ID Name Version +--------+---------------------------+---------+ HDP2.2 HortonWorks Data Platform 2.2 +--------+---------------------------+---------+ 8.3. Listing available stacks Stacks are high-level building blocks of software that compose a Big Data architecture. Stacks are comprised of services, which in turn are comprised of components. A stack is specific to a distribution due to the differences in services that are supported across distributions. You can create a stack or use one of the preconfigured stacks. Verb URI Description GET /v2/tenant_id/stacks Lists available stacks. 8.3.1. curl example The following examples show the curl request and corresponding response for listing all stacks. This operation does not accept a request body. Example 8.7. curl list all stacks request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ stacks -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" 25
Example 8.8. List all stacks response: JSON "stacks": [ "distro": "HDP2.2", "id": "HDP2.2_Hadoop", "name": "Core Hadoop", "description": "Core Hadoop Stack with Hive", "services": [ "modes": ["HA" "name": "HDFS", "name": "Yarn", "name": "MapReduce", "name": "Hive", "name": "Pig" "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/stacks/HDP2.2_Hadoop", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ stacks/hdp2.2_hadoop", "rel": "bookmark", "distro": "HDP2.2", "id": "HDP2.2_HBase", "name": "HBase", "description": "Core Hadoop Stack with HBase", "services": [ "modes": ["HA" "name": "HDFS", "name": "Yarn", "name": "HBase", "name": "MapReduce", 26
"name": "Hive", "name": "Pig" "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/stacks/HDP2.2_HBase", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ stacks/hdp2.2_hbase", "rel": "bookmark" "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ stacks?limit=2&marker=hdp2.2_hbase", "rel": "next", ] 8.3.2. Client example Using the client, view available stacks using the stacks list command as shown in the following example. Example 8.9. View available stacks with the CLI $ lava stacks list +---------------+---------------------------+-------- +--------------------------------------------------- +----------------------------------+ ID Name Distro Description Services +---------------+---------------------------+-------- +--------------------------------------------------- +----------------------------------+ HADOOP_HDP2_2 Hadoop HDP 2.2 HDP2.2 Core batch processing systems and interactive [name=hdfs, modes=[secondary querying with Hive. name=yarn, modes=[ name=mapreduce, modes=[ name=hive, modes=[ name=pig, modes=[ name=sqoop, modes=[ name=oozie, modes=[ 27
name=flume, modes=[ name=zookeeper, modes=[]] KAFKA_HDP2_2 Kafka HDP 2.2 HDP2.2 An individual Kafka stack serving as the backbone [name=hdfs, modes=[secondary of a distributed message queuing system. name=kafka, modes=[ name=zookeeper, modes=[]] SPARK_HDP2_2 Spark HDP 2.2 HDP2.2 Spark on Yarn supporting both batch and real-time [name=hdfs, modes=[secondary processing. name=yarn, modes=[ name=mapreduce, modes=[ name=hive, modes=[ name=pig, modes=[ name=zookeeper, modes=[ name=spark, modes=[]] +---------------+---------------------------+-------- +--------------------------------------------------- +----------------------------------+ 8.4. Creating a cluster This operation creates a cluster for your account. Verb URI Description POST /v2/tenant_id/clusters Creates a cluster. 8.4.1. curl example The following examples show the curl request followed by the JSON request body and the corresponding response for creating a cluster. Example 8.10. curl create cluster request curl -i -X POST https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ clusters -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.11. Create cluster request: JSON body "cluster": "name": "test", "username": "cbduser", "ssh_keys": ["cbdkey" "stack_id": "HDP2.1_Hadoop", "node_groups": [ 28
"count": 10, "flavor_id": "hadoop1-7", "id": "slave" "connectors": [ "type": "cloud_files", "credential": "name": "cfuser" "scripts": [ "id": "c5033423-97ff-4215-9552-19d425e1f9dd" ] Example 8.12. Create cluster response: JSON "cluster": "created": "2014-06-14T10:10:10Z", "id": "aaa-bbbb-cccc", "name": "test", "username": "cbduser", "ssh_keys": ["cbdkey" "status": "BUILDING", "progress": "5", "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ clusters/aaa-bbbb-cccc", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc", "rel": "bookmark" "stack_id": "HDP2.1_Hadoop", "node_groups": [ "components": [ "name": "Namenode", "name": "ResourceManager", "name": "YarnTimelineServer", "name": "JobHistoryServer" 29
,,,, "count": 1, "flavor_id": "hadoop1-7", "id": "master" "components": [ "name": "Namenode" "count": 1, "flavor_id": "hadoop1-7", "id": "standby-namenode" "components": [ "name": "JournalNode" "count": 3, "flavor_id": "hadoop1-1", "id": "journalnodes" "components": [ "name": "Datanode", "name": "NodeManager" "count": 10, "flavor_id": "hadoop1-7", "id": "slave", "components": [ "name": "HiveServer2", "name": "HiveMetastore", "name": "HiveClient", "name": "HiveAPI", "name": "PigClient" "count": 1, "flavor_id": "hadoop1-2", "id": "gateway" 30
"updated": "", "connectors": [ "type": "cloud_files", "credential": "name": "cfuser" "scripts": [ "id": "c5033423-97ff-4215-9552-19d425e1f9dd", "name": "Mongo Connector", "status": "PENDING" ] 8.4.2. Client example Using the client, create a cluster using the clusters create command as shown in the following example. Example 8.13. Create a cluster with the CLI $ lava clusters create test KAFKA_HDP2_2 --node-groups='slave(flavor_id= hadoop1-7, count=3)' \ --ssh-key cbdkey --username cbduser +----------------------------------------------------+ Cluster +-------------+--------------------------------------+ ID c5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 Name test Status BUILDING Stack KAFKA_HDP2_2 Created 2015-05-30 06:10:37+00:00 CBD Version 2 Username cbduser Progress 0.00 +-------------+--------------------------------------+ +------------------------------------------------------------+ Node Groups +-----------+-----------+-------+----------------------------+ ID Flavor Count Components +-----------+-----------+-------+----------------------------+ master hadoop1-4 1 [name=namenode] secondary hadoop1-4 1 [name=secondarynamenode] slave hadoop1-7 3 [name=datanode, name=kafkabroker, name=zookeeperclient] zookeeper hadoop1-2 3 [name=zookeeperserver, name=zookeeperclient] +-----------+-----------+-------+----------------------------+ 31
8.5. Listing clusters You use the operation to list clusters to find the available clusters for your account. Verb URI Description GET /v2/tenant_id/clusters Lists all clusters for your account. 8.5.1. curl example This operation does not require a request body. The following examples show the curl request and the corresponding response for listing clusters. Example 8.14. curl list clusters request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ clusters -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.15. List clusters response: JSON "clusters": [ "created": "2014-06-14T10:10:10Z", "id": "aaa-bbbb-cccc", "name": "test", "status": "ACTIVE", "stack_id": "HDP2.1_Hadoop", "updated": "", "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/clusters/aaa-bbbb-cccc", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc", "rel": "bookmark" "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/clusters& limit=1&marker=aaa-bbbb-cccc", "rel": "next" ] 32
8.5.2. Client example Using the client, list clusters using the clusters list command as shown in the following example. Example 8.16. List clusters with the CLI $ lava clusters list +--------------------------------------+-------------+--------+-------------- +---------------------------+ ID Name Status Stack Created +--------------------------------------+-------------+--------+-------------- +---------------------------+ c5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 test ACTIVE KAFKA_HDP2_2 2015-06-30 06:10:37+00:00 +--------------------------------------+-------------+--------+-------------- +---------------------------+ 8.6. Viewing node details The operation to get node details lists all server nodes for the specified cluster. GET Verb URI Description 8.6.1. curl example /v2/tenant_id/clusters/clusterid/nodes Lists all nodes for a cluster. In the following example, the cluster has a master node and two slave nodes. Each node has a private IP address, which is used for backend (Hadoop) data transfer, and a public IP address, which enables you to access it over the public Internet. In the example, you can remotely SSH into the master or slave nodes over the IP address by using the username and ssh_key that you added during cluster creation. The following example show the curl request and corresponding response for listing all nodes for a cluster. Example 8.17. curl list cluster nodes request: JSON curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ clusters/ac111111-2d86-4597-8010-cbe787bbbc41/nodes -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-Type: application/json" Example 8.18. List cluster nodes response: JSON "nodes": [ "created": "2014-06-14T10:10:10Z", "id": "111-222-444", "name": "master-1", "node_group": "master", "status": "ACTIVE", "updated": "", 33
"addresses": "public": [ "addr": "168.x.x.3", "version": 4 "private": [ "addr": "10.x.x.3", "version": 4 ], "flavor_id": "hadoop1-4", "components": [ "name": "Namenode", "nice_name": "HDFS Namenode", "uri": "http://master-1.local:50070", "name": "ResourceManager", "nice_name": "YARN Resource Manager", "uri": "http://master-1.local:8088", "name": "YarnTimelineServer", "nice_name": "YARN Timeline History Server", "uri": "http://master-1.local:8188", "name": "JobHistoryServer", "nice_name": "MapReduce History Server", "uri": "http://master-1.local:19888" "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/clusters/aaa-bbbb-cccc/nodes/111-222-444", "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc/nodes/111-222-444" ], "created": "2014-06-14T10:10:10Z", "id": "111-222-333", "name": "slave-1", "node_group": "slave", "status": "ACTIVE", "updated": "", "addresses": "public": [ 34
"addr": "168.x.x.4", "version": 4 "private": [ "addr": "10.x.x.4", "version": 4 ], "flavor_id": "hadoop1-7", "components": [ "name": "Datanode", "nice_name": "HDFS Datanode", "uri": "http://slave-1.local:50075", "name": "NodeManager", "nice_name": "YARN Node Manager", "uri": "http://slave-1.local:8042", "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/clusters/aaa-bbbb-cccc/nodes/111-222-333", "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc/nodes/111-222-333" ], "created": "2014-06-14T10:10:10Z", "id": "111-222-555", "name": "slave-2", "node_group": "slave", "status": "ACTIVE", "updated": "", "addresses": "public": [ "addr": "168.x.x.5", "version": 4 "private": [ "addr": "10.x.x.5", "version": 4 ], "flavor_id": "hadoop1-7", "components": [ 35
, "name": "Datanode", "nice_name": "HDFS Datanode", "uri": "http://slave-2.local:50075" "name": "NodeManager", "nice_name": "YARN Node Manager", "uri": "http://slave-2.local:8042", "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/clusters/aaa-bbbb-cccc/nodes/111-222-555", "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc/nodes/111-222-555" ] "links":[ "rel":"next", "href":"https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ clusters/aaa-bbbb-cccc/nodes?limit=3&marker=111-222-555" ] 8.6.2. Client example Using the client, use the show and nodes commands to query your cluster, as shown in the following example. Example 8.19. Query the details of a cluster by using the show and nodes commands with the CLI $ lava nodes list cc5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 +--------------------------------------+-------------+-----------+-------- +----------------+----------------+--------------------------------+ ID Name Role Status Public IP Private IP Components +--------------------------------------+-------------+-----------+-------- +----------------+----------------+--------------------------------+ 057b24f1-6397-4c46-ba59-3649a32db23d master-1 master ACTIVE 166.78.133.67 10.190.241.50 [nice_name=hdfs Namenode, name=namenode, uri=http://mast er-1.local:50070] 42bca320-9581-4321-b835-668216c3e3a9 slave-1 slave ACTIVE 166.78.132.244 10.190.240.242 [nice_name=hdfs Datanode, name=datanode, uri=http://slav 36
e-1.local:50075, nice_name=kafka Broker, name=kafkabroker, nice_name=zookeeper Client, name=zookeeperclient] 4818bc5c-82e1-4392-800e-0667519b0129 slave-2 slave ACTIVE 166.78.132.249 10.190.240.246 [nice_name=hdfs Datanode, name=datanode, uri=http://slav e-2.local:50075, nice_name=kafka Broker, name=kafkabroker, nice_name=zookeeper Client, name=zookeeperclient] +--------------------------------------+-------------+-----------+-------- +----------------+----------------+--------------------------------+ The example shows that the cluster has the following nodes: One master node Two slave nodes Each server node has the following IP addresses: A private IP address that is used for backend (Hadoop) data transfers A public IP address that allows you to access the server over the public Internet 8.7. Resizing clusters You can increase or decrease the size of an existing cluster by using the resize operation. Verb URI Description PUT /v2/tenant_id/clusters/clusterid Resizes a cluster. 8.7.1. curl example The following example resizes the previously created cluster, named test, to include 4 slave nodes. When you use the resize action, you specify the total number of slave nodes that you want the cluster to have, not just the number of nodes to add or remove. In the example, 4 is the total number of slave nodes that the cluster will have after the command is run. After you initiate the resize operation, you can use the operation to list the cluster nodes to confirm that your cluster has been resized. The following examples show the curl request and corresponding response to resize a cluster. 37
Example 8.20. curl resize cluster request: JSON curl -i -X PUT https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ clusters/yourclusterid -d \ -H "Accept: application/json" \ -H "X-Auth-Token:yourAuthToken" \ -H "Content-Type: application/json" Example 8.21. Resize cluster request: JSON body "cluster": "node_groups": [ "count": 15, "id": "slave" ] Example 8.22. Resize cluster response: JSON "cluster": "created": "2014-06-14T10:10:10Z", "id": "aaa-bbbb-cccc", "name": "test", "status": "UPDATING", "progress": "5", "links": [ "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ clusters/aaa-bbbb-cccc", "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ clusters/aaa-bbbb-cccc", "rel": "bookmark" "stack_id": "HDP2.1_Hadoop", "node_groups": [ "components": [ "name": "Namenode", "name": "ResourceManager", "name": "YarnTimelineServer", "name": "JobHistoryServer" "count": 1, 38
,,,, "flavor_id": "hadoop1-7", "id": "master" "components": [ "name": "Namenode" "count": 1, "flavor_id": "hadoop1-7", "id": "standby-namenode" "components": [ "name": "JournalNode" "count": 3, "flavor_id": "hadoop1-1", "id": "journalnodes" "components": [ "name": "Datanode", "name": "NodeManager" "count": 15, "flavor_id": "hadoop1-7", "id": "slave", "components": [ "name": "HiveServer2", "name": "HiveMetastore", "name": "HiveClient", "name": "HiveAPI", "name": "PigClient" "count": 1, "flavor_id": "hadoop1-2", "id": "gateway" "updated": "2014-06-25T10:10:10Z" 39
8.7.2. Client example Using the client, you can or decrease the size of an existing cluster by using the resize command. The following example resizes a previously created cluster test to include 4 slave nodes. Example 8.23. Increase cluster size by using the resize command with the CLI $ lava clusters resize c5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 --node-groups= 'slave(flavor_id=hadoop1-7, count=4)' +----------------------------------------------------+ Cluster +-------------+--------------------------------------+ ID c5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 Name test Status UPDATING Stack KAFKA_HDP2_2 Created 2015-05-30 06:10:37+00:00 CBD Version 2 Username cbduser Progress 0.00 +-------------+--------------------------------------+ +------------------------------------------------------------+ Node Groups +-----------+-----------+-------+----------------------------+ ID Flavor Count Components +-----------+-----------+-------+----------------------------+ master hadoop1-4 1 [name=namenode] secondary hadoop1-4 1 [name=secondarynamenode] slave hadoop1-7 4 [name=datanode, name=kafkabroker, name=zookeeperclient] zookeeper hadoop1-2 3 [name=zookeeperserver, name=zookeeperclient] +-----------+-----------+-------+----------------------------+ When you use the resize command, you specify the total number of nodes per node group that you want the cluster to have, not just the number of nodes to add or remove. In the example, 4 is the total number of slave nodes that the cluster will have after the command is run. After you initiate the resize operation, use the get or list commands to confirm that your cluster has been resized. 8.8. Creating a script You can create a custom script that runs during various phases of the cluster's lifecycle. The script is invoked on all nodes of the cluster. The script type currently supported is POST_INIT, which runs after the cluster is completely set up. The script must be executable. Preferably, the script should be a bash script, but it could be a python script, or a self-contained executable that works with the base OS-installed libraries. By default there's a few, product-provided scripts that have a is_public flag can be used. You do not have the option to edit them. 40
Verb URI Description POST /v2/tenant_id/scripts Creates a script. This operation creates a script. 8.8.1. curl example The following examples show the curl request and corresponding response for creating a credential. Example 8.24. curl create a script curl -i -X POST https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ script -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.25. Create a script request: JSON body "script": "name": "ipython Notebooks", "type": "POST_INIT", "url": "https://example.com/ipynb_install.sh" Example 8.26. Create a script response: JSON "script": "created": "2014-06-14T10:10:10Z", "updated": "", "id": "1111-aaaa-bbbbb", "name": "ipython Notebooks", "type": "POST_INIT", "url": "https://example.com/ipynb_install.sh", "is_public": false, "links": [ "rel":"self", "href":"https://dfw.bigdata.api.rackspacecloud.com/v2/1234/ scripts/1111-aaaa-bbbbb", "rel":"bookmark", "href":"https://dfw.bigdata.api.rackspacecloud.com/1234/ scripts/1111-aaaa-bbbbb" ] 41
8.8.2. Client example Using the client, create a script as shown in the following example. Example 8.27. Create a script with the CLI $ lava scripts create sample http://example.com/sample.sh post_init +---------+--------------------------------------+ ID 44f31579-035c-4c63-9ebc-3670fc117506 Name sample Type POST_INIT Public False Created 2015-06-30 17:03:12+00:00 URL http://example.com/sample.sh +---------+--------------------------------------+ 8.9. Listing all scripts Verb URI Description GET /v2/tenant_id/scripts Lists all scripts - global product provided scripts and user-created scripts. This operation lists all scripts. 8.9.1. curl example This operation does not accept a request body. The following examples show the curl request and corresponding response for listing all scripts. Example 8.28. curl list all scripts request curl -i -X GET https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ scripts -d \ -H "X-Auth-Token: yourauthtoken" \ -H "Accept: application/json" \ -H "Content-type: application/json" Example 8.29. List all scripts response: JSON "scripts": [ "created": "2014-06-14T10:10:10Z", "updated": "", "id": "1111-aaaa-bbbbb", "name": "ipython Notebooks", "type": "POST_INIT", "url": "https://example.com/ipynb_install.sh", "is_public": false, "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/scripts/1111-aaaa-bbbbb" 42
, "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ scripts/1111-aaaa-bbbbb" ], "created": "2014-06-14T10:10:10Z", "updated": "", "id": "1111-aaaa-cccc", "name": "ml_libs", "type": "POST_INIT", "url": "https://example.com/mllibs.sh", "is_public": false, "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/scripts/1111-aaaa-cccc", "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ scripts/1111-aaaa-cccc" ], "created": "2014-06-14T10:10:10Z", "updated": "", "id": "aaa-bbbb-33333", "name": "Mongo DB Connector", "type": "POST_INIT", "url": "https://example.com/mongodb_connector.sh", "is_public": true, "links": [ "rel": "self", "href": "https://dfw.bigdata.api.rackspacecloud.com/v2/ 1234/scripts/aaa-bbbb-33333", "rel": "bookmark", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/ scripts/aaa-bbbb-33333" ], "links": [ "rel": "next", "href": "https://dfw.bigdata.api.rackspacecloud.com/1234/scripts? limit=3&marker=aaa-bbbb-33333" ] 43
8.9.2. Client example Using the client, list all scripts as shown in the following example. Example 8.30. List available scripts with the CLI $ lava scripts list +--------------------------------------+--------+-----------+-------- +---------------------------+------------------------------+ ID Name Type Public Created URL +--------------------------------------+--------+-----------+-------- +---------------------------+------------------------------+ 44f31579-035c-4c63-9ebc-3670fc117506 sample POST_INIT False 2015-06-30 17:03:12+00:00 http://example.com/sample.sh +--------------------------------------+--------+-----------+-------- +---------------------------+------------------------------+ 8.10. Deleting clusters Use the operation to delete clusters to remove unused Hadoop clusters. This operation deletes any servers associated with the cluster and any data stored in the cluster. You cannot delete clusters that are in the process of being created or resized. Verb URI Description DELETE /v2/tenant_id/clusters/id Deletes a cluster. 8.10.1. curl example The following examples show the curl request and corresponding response to delete a cluster. Example 8.31. curl delete cluster request: JSON curl -i -X DELETE https://dfw.bigdata.api.rackspacecloud.com/v2/youraccountid/ clusters/yourclusterid -d \ -H "Accept: application/json" \ -H "X-Auth-Token:yourAuthToken" \ -H "Content-Type: application/json" This operation does not accept a request body. This operation does not return a response body. 8.10.2. Client example Using the client, use the delete command to remove unused Hadoop clusters as shown in the following example. This command deletes any servers associated with the cluster and any data stored in the cluster. Example 8.32. Remove clusters by using the delete command $ lava clusters delete c5444b98-f4b4-aaaa-bbbb-b6e9d3313da1 44
9. Additional resources If you have any questions or concerns about using any of the steps in this guide, send an email to <cbdteam@rackspace.com>. For information about all Cloud Big Data API operations, version 2, see the Cloud Big Data Developer at http://docs.rackspace.com/. All you need to get started with Cloud Big Data is this getting started guide and the developer guide. For more details about Rackspace Cloud Big Data, go to http://www.rackspace.com/cloud/ big-data/. This site also offers links to official Rackspace support channels, including knowledge center articles, forums, phone, chat, and email. Visit the Product Feedback Forum and tell us what you think about Cloud Big Data. This API uses standard HTTP 1.1 response codes as documented at www.w3.org/protocols/rfc2616/rfc2616-sec10.html. 45
10. Document change history This version of the guide replaces and obsoletes all earlier versions. The most recent changes are described in the following table: Revision Date Summary of Changes June 30, 2015 Initial General Availability (GA) release of Cloud Big Data, v2. 46
Glossary Cluster A cluster consists of a group of servers running a distributed system coordinating and functioning as one. A cluster consists of a single stack. Examples are a Hadoop cluster or a HBase cluster. Component A service can have one or more components. A component can be a specific configuration for a service or additional modes of operation. Examples are HDFS Secondary Namenode or HDFS HA. Credentials Credentials allow you to setup ssh keys and other connector credentials for use with clusters. Distro Distros provide a list of supported distributions and their corresponding versions, as well as a list of supported services and components per distribution. Flavor A flavor is an available configuration for Cloud Big Data. Each flavor has a unique combination of memory capacity and priority for CPU time. HDFS The Apache Hadoop Distributed File System. This is the default file system used in Cloud Big Data. MapReduce A framework for performing calculations on the data in the distributed file system. Map tasks run in parallel with each other. Reduce tasks also run in parallel with each other. Node A node is either a virtual or an OnMetal server that serves a particular role in the cluster. A node runs one or more components in the Hadoop ecosystem. Resource limits Resource limits include items such as remaining node count, available RAM, and remaining disk space for the user. Scripts You can create a custom script that runs during various phases of the cluster's lifecycle. The script is invoked on all nodes of the cluster. The script type currently supported is POST_INIT, which runs after the cluster is completely set up. The script must be executable. Preferably, the script should be a bash script, but it could be a python script, or a self-contained executable that works with the base OS-installed libraries. Service A service is any individual software component that can function on its own or depend on other services to provide added functionality. Examples are HDFS, Pig, and Hive. Service catalog Your service catalog is the list of services available to you, as returned along with your authentication token and an expiration date for that token. All the services in your service catalog should recognize your token as valid until it expires. 47
The catalog listing for each service provides at least one endpoint URL for that service. Other information, such as regions and versions and tenants, is provided if it is relevant to your access to this service. Stack Stacks are high-level building blocks of software that compose a Big Data architecture. Stacks are comprised of services, which in turn are comprised of components. A stack is specific to a distribution due to the differences in services that are supported across distributions. Tenant A container used to group or isolate resources or identity objects. Depending on the service operator, a tenant could map to a customer, account, organization, or project. 48