Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment



Similar documents
High Performance Analytics through Data Appliances

PREDICTIVE INSIGHT ON BATCH ANALYTICS A NEW APPROACH

Managing Skills Challenge in an Open Source World Prajod Vettiyattil Software Architect Wipro Limited

Software Defined Infrastructure The Next Wave of Workload Portability Vinod Eswaraprasad Principal Architect, Wipro

INTERNET OF THINGS Delight. Optimize. Revolutionize.

OPTIMIZATION OF QUASI FAST RETURN TECHNIQUE IN TD-SCDMA

DIGITAL WEALTH MANAGEMENT FOR MASS-AFFLUENT INVESTORS

Manage Your Leads Well to Boost Sales Volumes Anupam Bhattacharjee Shine Gangadharan

THE FORECAST FOR CLOUD IS SUNNY Sudeshna Bhadury

Powering the New Supply Chain: Demand Sensing for Small and Medium-Sized Businesses

OPTIMIZING INSURANCE DISTRIBUTION THROUGH A HYBRID MODEL

WIPRO S MEDICAL DEVICES FRAMEWORK

Amanda, a working mom, spotted a summer skirt on the website of a top clothing brand and ordered it. When the skirt arrived it was the wrong color.

Application of Big Data Solution to Mining Analytics Sandipan Chakraborti Senior Architect ENU

SMART FACTORY IN THE AGE OF BIG DATA AND IoT

NFV and its Implications on Network Fault Management Abhinav Anand

HR - A STRATEGIC PARTNER Evolution in the adoption of Human Capital Management systems

TRANSFORMING TO NEXT-GEN APP DELIVERY FOR COMPETITIVE DIFFERENTIATION

UP IN THE CLOUD

Big Data Analytics Driving Revenue Growth in Retail Banking Sandeep Bhagat, Practice Head, Big Data Analytics, Wipro Analytics

Transforming Distribution Utilities

Re-Shaping Retail Integration. Changing retail landscape with Social-Mobile-Analytics-Cloud.

NATURAL RESOURCES: Mining the way ahead

How To Manage A Supply Chain

ENCOURAGING STORE ASSOCIATES IN AN OMNI CHANNEL WORLD MAKING INCENTIVE SCHEMES TRUE AND FAIR

ACCOMMODATING IOT / M2M REQUIREMENTS IN THE CELLULAR ECOSYSTEM Mahendra Agarwal Architect, Wipro Tecnologies

CONNECTED HEALTHCARE. Multiple Devices. One Interface.

Community Analytics Catalyzing Customer Engagement Srinath Sridhar Wipro Analytics

Addressing Need-Based Consumerism for Cloud Services Robert Bates SMAC Architecture Group Head, Advanced Technologies & Solutions

The Wipro NxtGen MEMS Advantage. Wipro NxtGen MEMS

Petroleum Retailers Ready to Fuel Omni-channel for a Seamless Customer Experience Sudhansu Choudhury Senior Consultant, Wipro

Telecom Analytics: Powering Decision Makers with Real-Time Insights

Going Seamless with SIAM. Why you need a platform-based approach for Service Integration and Management

CENTRALIZED CONTROL CENTERS FOR THE OIL & GAS INDUSTRY A detailed analysis on Business challenges and Technical adoption.

EMPOWER YOUR ORGANIZATION - DRIVING WORKFORCE ANALYTICS

An Integrated Validation Approach to SDN & NFV

BIG DATA BREATHES LIFE INTO NEXT-GEN PHARMA R&D

BETTER DESIGNED BUSINESS PROCESSES

Data Quality Obligation by Character but Compulsion for Existence Sukant Paikray

Future of Minerals Exploration Helping the mining industry go deeper.

MANAGING LINEAR ASSETS Managing Linear Assets has always been a challenge; find out how customers leverage SAP to meet industry requirements.

mhealth SOLUTIONS EMPOWER MASSES WITH AFFORDABILITY, ACCESSIBILITY AND QUALITY HEALTHCARE Santhosh Kumar Madathil Aparna Kumpatla

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

CRITICAL SUCCESS FACTORS FOR A SUCCESSFUL TEST ENVIRONMENT MANAGEMENT

Analytics in an Omni Channel World. Arun Kumar, General Manager & Global Head of Retail Consulting Practice, Wipro Ltd.

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

VIRGINIE O'SHEA Senior Analyst, Securities and Investments, Aite Group

BENCHMARKING THE ENTERPRISE S B2B INTEGRATION MATURITY

KEEPING ENERGY M&As ON TRACK WITH EARLY IT ENGAGEMENT

SaaS Maturity Evolution for Transforming ISVs business

Evaluating Managed File Transfer Solutions

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Leveraging SOA Principles for API Adoption

RIGHT INTEGRATION STRATEGY - A CORNERSTONE FOR OMNI-CHANNEL RETAIL

The Global Supply Chain Goes Collaborative

Enriching In-Store Experience with Analytics

SDN/NFV TRANSFORMATION FOR SERVICE PROVIDER NETWORK

IDENTITY & ACCESS MANAGEMENT IN THE CLOUD

RESILIENCE AGAINST CYBER ATTACKS Protecting Critical Infrastructure Information

Averting Chaos with Dual Supply Chain Management Strategy

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

ENSURING SUCCESS IN AN AGILE-UX ENGAGEMENT

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Freight aggregation in order fulfilment lifecycle to achieve better freight planning

Table of contents. Abstract. Technology and evolving customer demands. Customer demands driving need for IT improvements in service enhancement

Architected Blended Big Data with Pentaho

Revenue Enhancement and Churn Prevention

The 4 Pillars of Technosoft s Big Data Practice

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

PRODUCTION SURVEILLANCE DASHBOARDS IN UPSTREAM INDUSTRY

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Design An Invoice Processing And Document Management System

Open Source Software and The Enterprise

Data Integration Checklist

MOBILITY AS A SERVICE (MaaS)

Using Tableau Software with Hortonworks Data Platform

Agile Change: The Key to Successful Cloud/SaaS Deployment

WHITE PAPER WHY ENTERPRISE RESOURCE PLANNING SOFTWARE IS YOUR BEST BUSINESS INTELLIGENCE TOOL

Risks in Middleware Migration- Demystifying the Journey

Software vendors evolution in the new industry paradigm

Transcription:

www.wipro.com Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment Pon Prabakaran Shanmugam, Principal Consultant, Wipro Analytics practice

Table of Contents 03...Abstract 03...Introduction 04...Internet of Things (IoT) and Big Data 04...Emergence of Multi-Platform Data Warehouse Environment 04...Restful Web Services 04...Building Real-Time Data Access with RESTFUL Framework 05...Conceptual Representation of Using REST for Multi-Platform Data 06...Data Extraction 06...Advantages of Restful Framework 07...Conclusion

Abstract The IT departments in enterprises see a lot of value in building serviceoriented architecture around their data warehouse environment to empower their internal customers. The arrival of the Internet of Things (IoT) introduced a new deluge of data getting processed and used for analytics. With more data getting processed and stored, the need for multi-platform data warehouse environment has emerged. The volume, velocity and variety of data and its potential use for the organic growth of the business elicited the data platforms growing bigger. Today, data warehouse environment in organizations are at the threshold of fulfilling diverse use cases and provide data to broad users across the spectrum like business applications, business intelligence, data analysts, data scientists, etc. Real-time data ingestion and extraction need to be easier with or without the involvement of IT. With the availability of features like text analysis, pattern matching in analytical platforms, REST as a framework is a great vehicle to carry and retrieve data from the data process and storage engines. This paper addresses how RESTFUL framework becomes a cost effective solution to achieve the mounting need to serve data in real-time. Introduction The heavy dependence on Extract, Transform and Load (ETL), and business intelligence tools has created some fatigue among business users. It takes multiple iterations and a long wait for businesses to get the 3

data that they need. The emergence of simple but efficient open source frameworks like REST enable fast movement of data using most popular web protocols. Internet of Things (IoT) and Big Data Internet-enabled computer embedded chips in products and devices are used primarily for data-gathering, offering enterprise-level details on everything from how efficiently their machines are running to the purchasing habits of their consumers. Without the proper data-gathering in place, it will be impossible for businesses to sort through all the information flowing in from these embedded sensors. What that means is that, without analytics on the Big Data being captured, the Internet of Things can offer an enterprise only little more than noise i. Emergence of Multi-Platform Data Warehouse Environment The 21 st century was the period which marked the emergence of data warehouse as a science. The need to process and store data got traction with the business finding its usage. With more and more data getting processed, data appliances became popular. With the arrival of Internet of Things, data collection and processing got a new definition as the amount of data being collected increased exponentially. The need to build multiple platforms to process and store data has hit the organizations. With the introduction of architectural principles like Teradata Unified Data Architecture (UDA), there are a lot of options to build a true multi-platform data warehouse environment. It is possible to store data of any size. Data Lake gives the options of storing data as it comes and in any data format. A combination of platform which is interconnected gives the facility to move data between platforms. There is now an option to perform insights on data in real-time. Tools like Teradata Query Grid, helps to move data between platforms and also has features to retrieve data from different platforms without the user knowing where the data was stored. The volume and variety of data is directly correlated to the number of components to process. The days of conventional batch processing and canned analytics don t satisfy the new type of users who use this data. That s why organizations are looking out for non-formal ways to integrate, store and access data. The Open Source RESTFUL framework is one of the technologies which facilitate the ease of data integration and extraction ii. RESTFUL Web services: REST defines a set of architectural principles by which one can design web services that focus on a system s resources iii. Major advantages being: Use HTTP methods explicitly Be stateless Expose directory structure-like URIs Transfer XML, JavaScript Object Notation (JSON) or both Building Real-Time Data Access with RESTFUL Framework Given that multi-platform data warehouse environments have different work load capabilities, looking for real-time data ingestion and extraction becomes more difficult. Assume that there is a requirement to load and access unstructured data in real-time into multi-platform data warehouse environment. Being unstructured data, it makes sense to load into Hadoop (which is basically meant/good for batch processing). After the data is cleansed and ready for integration, it is meaningful to load the cleansed data into EDW or IDW to make it more efficient for the need for real-time access. The web HDFS (Hadoop Data File Store) feature which is being offered by Apache or Hortonworks could be used for real-time data ingestion into HDFS. Also, the REST API can easily communicate to the Hadoop clusters. The file read and file write calls are redirected to the corresponding data nodes. It uses full bandwidth of the Hadoop cluster for streaming data iv. BIG DATA BIG BIG DATA 4

Conceptual Representation of Using REST for Multi-Platform Data Warehouse Environment End Users Data In Analytical Platform TD Hadoop U1 U2 CSV Files Structured Data HTTP POST Request Content-type: multipart/form-data {UserID, datalabel, CSV File} JSON Structured / Unstructured Data HTTP POST Request Content-type: application/json {UserID, datalabel, data} REST API (CLIENT) Java Code TD Rest API Lookup data before Calling Dispatcher HDFS Rest API REST Dispatchers Metadata U3 Machine Logs Unstructured Data HTTP POST Request Content-type: multipart/form-data {UserID, datalabel, Log File} All Data In Hive Staging MYSQL Database Audit Tables 5

Data Ingestion The architecture diagrams below elaborate how variety of data from different sources can be injected in to multiple data storage platforms (like UDA) with the help of REST framework. Advantages of RESTFUL Data Extraction Data access from a multi-platform environment is easy with REST service as it provides the abstraction on top of the storage environment. Below is a sample architecture diagram on how REST acts as a Façade Layer for data storage. End Users Data Out TD Hadoop U1 U2 U2 JSON Structured / Unstructured Data HTTP GET Request Re Supporting 3 types: /{user} : datalabels of user are displayed /{user}/{datalabel}/metadata: meta info of a datalabel /{user}/{datalabel}*: data under datalabel ColumnNames is an optional param CSV Structured Data HTTP GET Request Request-Data: CSV /{user}/{datalabel}*: data under datalabel ColumnNames is an optional param REST API (CLIENT) Java Code Lookup data before Calling Dispatcher All Data In REST Dispatchers Metadata MYSQL Database Audit Tables Framework There are many advantages for using REST in a multi-platform data warehouse environment: Being a public API, REST API is very easy to adopt and develop Helps in work load balancing; no dependence on ETL tool or ESB (Enterprise Service Bus) for real-time integration REST works on top of HTTP; thus, only browser is needed for it to work REST API for Teradata provides driverless connectivity to read and write data into Teradata database v. Similarly REST API for HDFS makes it easy to work with Hadoop Clusters 6

Conclusion Real-time access to Hadoop along with other data warehouse platforms is promising because it provides the data pipeline not only for structured data but also to handle data types that the average data warehouse environment doesn t support. REST API s natural support to JSON objects add value, when new platforms like Mongo DB, Cassandra gets added in to the data warehouse environment. The major benefit of using REST for real-time access is the low cost of development and the ease of deployment. Moreover REST naturally fits in the world of diverse data storage as it provides the perfect FAÇADE layer to inject and extract data from different platforms. Also, REST gives the ability for data scientists and business analysts to mix and m atch data on the fly without knowing where they reside. Also, now they don t have to wait for a day or two till the data load jobs complete in the conventional way. Imagine moving machine data or web data in real-time using REST to Data Lake, processed using analytical platforms like Aster or in-memory analytical tools and stored in Teradata for business usage. At the end, the processed and report-ready data can be accessed using REST. All this is possible without the need to use highly priced business intelligence or ETL tools. RESTFUL framework is indeed going to empower the internal customers and provide a cost effective way to integrate and access data in real-time. 7

References 1. http://www.datamation.com/applications/why-big-data-and-the-internet-of-things-are-a-perfect-match.html 2. http://tdwi.org/articles/2014/04/01/executive-summary-evolving-data-warehouse-architectures.aspx 3. http://javadevhell.blogspot.com/2010/11/rest-ful-web-service-basics-with.html 4. http://hortonworks.com/blog/webhdfs-%e2%80%93-http-rest-access-to-hdfs/ 5. http://blogs.teradata.com/tdmo/rest-api-enables-driverless-connectivity/ 8

About the Author Pon Prabakaran Shanmugam is a Principal Consultant with Wipro Analytics practice. He possesses exhaustive data architecture experience in the Financial Industry with strong data modeling, integration and analytical skills, and is an enthusiastic & agile modeling proponent. He is also a strong believer of embracing open source technologies to make data architecture flexible and evolving. About Wipro Ltd. Wipro Ltd. (NYSE:WIT) is a leading Information Technology, Consulting and Business Process Services company that delivers solutions to enable its clients do business better. Wipro delivers winning business outcomes through its deep industry experience and a 360 degree view of Business through Technology - helping clients create successful and adaptive businesses. A company recognised globally for its comprehensive portfolio of services, a practitioner s approach to delivering innovation, and an organization wide commitment to sustainability, Wipro has a workforce of over 150,000, serving clients in 175+ cities across 6 continents. For more information, please visit www.wipro.com 9

DO BUSINESS BETTER WWW.WIPRO.COM WIPRO LTD, DODDAKANNELLI, SARJAPUR ROAD, BANGALORE - 560 035, INDIA CONSULTING SYSTEM INTEGRATION BUSINESS PROCESS SERVICES TEL: +91 (80) 2844 0011, FAX: +91 (80) 2844 0256, E-MAIL: INFO@WIPRO.COM North America Canada Brazil Mexico Argentina United Kingdom Germany France Switzerland Nordic Region Poland Austria Benelux Portugal Romania Africa Middle East India China Japan Philippines Singapore Malaysia South Korea Australia New Zealand WIPRO LTD 2015 No part of this booklet may be reproduced in any form by any electronic or mechanical means (including photocopying, recording and printing) without permission in writing from the publisher, except for reading and browsing via the world wide web. Users are not permitted to mount this booklet on any network server. IND/PMCS/WIPRO/NOV2015-JAN2016