1 PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights Surendra Reddy (PARC), Cirrus Shakeri (SAP), Heinz Ulrich Roggenkemper (SAP), Hartmut Vogler (SAP), and Jens Doerpmund (SAP) PARC and SAP Co-innovation, page 1
2 Table of Contents Executive Overview...3 Introduction to Graph Analytics and PARC s Big Data Research...3 Real-Time Marketing and Big Data...3 How SAP HANA and PARC HiperGraph Disrupt the Way Business Insights are Delivered to Users...4 Real-world Case Study: Major Retailer Data with HiperGraph and HANA...5 Next Steps and the Innovation Edge of HANA and HiperGraph...6 PARC and SAP Co-innovation, page 2
3 Executive Overview Graph analytics is a crucial element in extracting insights from Big Data because it helps discover hidden relationships by connecting the dots. A graph, meaning the network of nodes and relationships, treats the linkage between objects as equally important as the objects themselves. Social networks or supply chains are obvious examples, but graphs include any network of objects such as customers, products, purchase orders, customer support calls, product inventory, etc. HiperGraph, PARC s breakthrough Big Data technology, is a high-performance graph analytics engine. Through a fourmonth research project with SAP, we added HiperGraph s analytics to SAP HANA to demonstrate a live, real-time marketing insights use case. Graph reasoning technologies provide the ability to contextualize relational data with the tapestry of information and can go beyond simplistic reporting and dashboards. This creates opportunities to rapidly experiment, gain new insights, and identify root causes. The demonstrated technology match between HANA and HiperGraph has great disruptive potential, especially in the identification of key patterns within datasets (e.g., via clustering). With HANA and HiperGraph we can finally deliver on the promise of a closed feedback loop in the enterprise where transactions are analyzed and reacted to in real-time. The intelligence that is implicit in large volumes of structured and unstructured data from varieties of sources from inside or outside of the enterprise can be delivered to the users in the form of smart business applications. We concluded that the existing commercial or open source algorithms either did not provide the real-time response or were unable to scale to the large volumes of data. The requirements from our customer (an online retailer) required real-time response from their Big Data system. PARC s graph reasoning, versatile goaldirected clustering, egocentric recommendations, and real-time recommendation algorithms combined with the power of HANA in-memory technologies far exceeded the expectations. Brand managers can use this solution to automatically find clusters of customers with similar purchases, clusters of products that are frequently bought together, clusters of products that tend to be purchased on sale vs. those that are purchased at full price, and so on, and act on these insights during the customer s shopping experience. There is a great opportunity for businesses to gain value by combining the HANA in-memory technology with HiperGraph reasoning, recommendation, matrix factorization, egocentric collaborative filtering, and versatile goal-directed clustering. With SAP and PARC co-innovation in Big Data analytics we can now reduce and/or eliminate the need for complex extract, transform, and load (ETL) processes; increase speed in clustering; and introduce new accessibility for business users to directly explore data clusters. We are democratizing data science for all business users in the enterprise. Introduction to Graph Analytics and PARC s Big Data Research During PARC s multi-year research effort on graph-based reasoning and graph analytics, we found many broad applications across industries beyond their popular use for the mining of Twitter associations or Facebook friends. We realized that most applications today handle data which is inherently deeply associative, and becoming more and more graph-oriented in nature. Typically enterprise datasets are high dimensionality with a rich tapestry of relationships requiring highly scalable machine learning and reasoning algorithms for advanced analytics. PARC has developed high-performance algorithms for analyzing large graphs in real time called HiperGraph. After six months of exploration with Hadoop + Hive, Native Map/Reduce, R/ MR, and Mahout under different execution environments like multi-core, multi-threaded, and parallel computation, PARC found the optimal solution by integrating our reasoning and insight discovery algorithms with SAP HANA. Real-Time Marketing and Big Data Technology Drivers The primary challenge is finding the right tools to reveal insights in a dataset that can sometimes take days to process. Organizations must have powerful hardware and nuanced software to produce actionable insights. Otherwise, the analysis can hold little value. PARC and SAP Co-innovation, page 3
4 Business Drivers With the proliferation of digital channels like web, social, and mobile, today s consumers have more power and choice than ever before. Multi-channel campaign marketing is becoming highly important for brands to reach out and engage with these technically savvy customers and respond to them with relevant offers and campaigns in near real time. Real-time recommendation engine/service is one example of many real-time response approaches. Traditional, batch processing recommendation engines are a good start, but not sufficient. To make real-time marketing work, marketing managers should be able to discover and contextualize marketing insights in near real-time and craft customized one-on-one personalized messages to the targeted audience. Real-time marketing is viewed as a business process, having an operational team ready to react and engage with consumers and messages that are relevant to current events, sports, television, or sometimes even natural disasters. The benefits of being a relevant and timely brand can be powerful, but needs to be executed at scale and delivered in a unique but consistent way to optimize the customer experience. From PARC s experience working with customers for the past 18 months, we realized that the following four pillars form a strong foundation for the success of real-time marketing efforts: Speed and Agility: Fundamental to actually achieving real-time insights is understanding the context of a customer at the very moment of truth when engaging with a consumer. This also needs to be an automated and dynamic process. Personalization: Every message or touch point delivered by a brand should matter, providing a unique and memorable experience for each consumer, including personalized messages for anonymous prospects and known customers. Marketers need to look to an approach that not only relates to current events quickly, but also places emphasis on the individual consumer, and takes into account coordinating a consistent experience across multiple channels. Scalability: Real-time insights process needs to automatically glean through large volumes of high dimensional data for every differentiated consumer. Every consumer has their own tastes, interests, needs, and behaviors. This consumer data is not frozen in time either. As customers engage with brands in different channels, provide commentary, and share and exchange information, this context should be absorbed and used to immediately enhance the customer experience at every touch point. Cross-Channel Optimization: Add the ability to carry contextual information with consumers wherever they go, across channels, ensuring that information is reconciled and channel conflict is avoided. How SAP HANA and PARC HiperGraph Disrupt the Way Business Insights are Delivered to Users SAP HANA is a fast, massively parallel ACID-compliant database platform for both analytical and transactional data processing. Both transactions and analytics are supported within the in-memory columnar engine, and all data processing and calculations take place in memory. HANA provides business, predictive and advanced analytic libraries (e.g. rules engine, text processing, spatial analytics) which can be called from within a rich, procedural language. What is unique about HANA is that it enables customers to perform complex analytical processing directly on top of the online transaction processing (OLTP) data structures, thus eliminating redundant data storage and reporting via batchprocessing. With HANA Live, customers have access to a large number of non-materialized business views for real-time reporting and application development. HANA s real-time response combined with PARC s fast HiperGraph reasoning algorithms helped us to generate qualitatively superior output (i.e., clusters with higher modularity, rapid discovery of hidden patterns, and insights). The perfect match between HiperGraph and HANA s analytics is unique in terms of turning the speed of computations into new ways of solving problems. For example, we can simulate the spread of diseases, optimize when and where vaccinations should be done, analyze viral marketing, detect next-best-action, optimize supply-chains with up-to-thesecond transactions, and detect frauds with input data in real-time. PARC and SAP Co-innovation, page 4
5 This powerful combination of HANA and HiperGraph deliver three building blocks to disrupt the way business insights are delivered to the business users: #1 Real-Time Data: All sources of data from the enterprise, both at rest and in motion, are in an environment where enterprise data is at users fingertips. No heavy ETL tools. Access to data is as simple as point and click. Where data lives is not a worry. Users are equipped with the ability to simply define what data is relevant for their function be it OLTP, online analytical processing (OLAP), or knowledge bases. HANA plus HiperGraph fuses the relevant features and builds the best models through machine learning. It can provide the knobs and controls to drive objectives from these insights and act on them. #2 Automated Domain Specific Models: Manual generation of models doesn t scale. Domain specific use cases and business problems can empower business analysts to rapidly explore and discover new insights and act on them. The system should automatically generate and select models as well as provide nobs and controls for business analysts to apply it to years of historic data and/or a stream of live real-time data. Seamlessly and easily configure data in a format that an organization s algorithms can consume and be executed on infrastructure in a parallel, distributed, highly scalable way. #3 Actionable Insights: Knowledge of how to apply analytics against data through the use of application business rules to produce a positive impact to the business. Producing insights and reports are not sufficient. Business analysts should be able to fuse insights and business rules in a way that they can actually be consumed by business users and acted upon. What s even more exciting is the ability to deploy insights operationally through an application that leverages individual domain expertise and understanding of the business logic associated with the targeted use case being solved against. Real-world Case Study: Major Retailer Data with HiperGraph and HANA In a recent project for a large retail customer, PARC deployed HiperGraph and ego-centric collaborative filtering the act of making predictions for a single individual based on the behaviors of others that have at least one commonality with HANA to help shape a new approach to real-time big data analytics. The primary focus of the project was to improve and streamline contextual recommendations. Our goal was to learn from disparate data sources including Internet usage histories, third-party CRM data, and click patterns to help improve product layout, recommendations, and generate a smarter user experience that is based on algorithmic deductive reasoning and machine learning. The dataset represented 50 million customers, 3 million products, 371 million e-commerce transactions, and 6.5 billion clickstreams. That is a massive amount of data points and relationships to model. The real challenge is being able to deliver tools that can enable users to explore and discover new insights and then translate those insights into actions. PARC and SAP Co-innovation, page 5
6 Upon completion, brand and CRM managers were able to: Gather data from many external sources (including news) to gain insight into their risk position Engage customers in interactive/personalized conversations (real-time) Provide a consistent, cross-channel experience including real-time touch points like web and mobile Understand and respond to critical moments in the customer sales cycle (in the moment) Model and adjust campaigns based on customer real-time activities We validated the premise that the entirety of a real-time dataset, such as that generated by the retail industry, can be processed in real-time to allow for immediate insights and influence. Next Steps and the Innovation Edge of HANA and HiperGraph The demonstrated technology match between HANA and HiperGraph has a great disruptive potential, especially in the key identification of clusters within datasets. PARC has demonstrated using HANA to hold and extract arbitrary datasets while using Goal Directed Clustering running in remote accelerators to identify clusters within those datasets. This innovation is about turning the fast in-memory computations of HANA and HiperGraph into business applications that are qualitatively different or superior compared to the current generation of applications. Right now SAP and PARC are entering a new phase of our partnership in order to bring this co-innovation to the market. In the coming months, we will provide more details on the technology and product roadmap. Stay tuned! Visit or follow for updates Coyote Hill Road Palo Alto, California USA Palo Alto Research Center Incorporated PARC, a Xerox company, is in The Business of Breakthroughs. Practicing open innovation, we provide custom R&D services, technology, expertise, best practices, and IP to Fortune 500 and Global 1000 companies, startups, and government agency partners. We create new business options, accelerate time to market, augment internal capabilities, and reduce risk for our clients. SAP Labs, Bay Area 3410 Hillview Avenue Palo Alto, CA USA As market leader in enterprise application software, SAP (NYSE: SAP) helps companies of all sizes and industries run better. From back office to boardroom, warehouse to storefront, desktop to mobile device SAP empowers people and organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. SAP applications and services enable more than 251,000 customers to operate profitably, adapt continuously, and grow sustainably. PARC and SAP Co-innovation, page 6
SAP BusinessObjects Business Intelligence SAP BusinessObjects Business Intelligence 4.0 Solutions Empowering the Real-Time, Mobile, Social, and Global Enterprise SAP BusinessObjects Business Intelligence
SAP Statement of Direction Business Intelligence Solutions Business Intelligence Solutions from SAP: Statement of Direction Table of Contents 3 Quick Facts 4 Driving Business Innovation Through Radical
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
April 2013 Operational Intelligence: What It Is and Why You Need It Now Sponsored by Splunk Contents Introduction 1 What Is Operational Intelligence? 1 Trends Driving the Need for Operational Intelligence
HOW TO TURN 9 RETAIL IT CHALLENGES INTO 9 BUSINESS OPPORTUNITIES Intro According to a recent market study on be the main driver of total retail sales Embracing mobility the state of the retail sector that
IT@Intel Achieving Intel Transformation through IT Innovation 2014 2015 Intel IT Business Review Annual Edition The Transformative Power of Innovation Kim Stevenson Intel Chief Information Officer Contents
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
fs viewpoint www.pwc.com/fsi 02 15 19 21 27 31 Point of view A deeper dive Competitive intelligence A framework for response How PwC can help Appendix Where have you been all my life? How the financial
ANALYTIC ARCHITECTURES: Approaches to Supporting Analytics Users and Workloads BY WAYNE ECKERSON Director of Research, Business Applications and Architecture Group, TechTarget, March 2011 SPONSORED BY:
Retail Banking Business Review Industry Trends and Case Studies U.S. Bank Scotiabank Pershing LLC Saudi Credit Bureau Major International Bank Information Builders has been helping customers to transform
At the Big Data Crossroads: turning towards a smarter travel experience Thomas H. Davenport Visiting Professor at Harvard Business School Amadeus IT Group is committed to minimizing its carbon footprint.
www.pwc.com PwC Advisory Oracle practice 2012 How to drive innovation and business growth Leveraging emerging technology for sustainable growth 1 Heart of the matter Top growth driver today is innovation
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Introduction.... 1 Emerging Trends and Technologies... 3 The Changing Landscape... 4 The Impact of New Technologies... 8 Cloud... 9 Mobile... 10 Social Media... 13 Big Data... 16 Technology Challenges...
E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania email@example.com, firstname.lastname@example.org
Cloud Democratizes Access to Big Data Analytics Every organization can discover and act on analytic to deliver superior service Number 74 January 2014 In the quest to provide superior service to customers,
CGMA REPORT From insight to impact Unlocking opportunities in big data Two of the world s most prestigious accounting bodies, AICPA and CIMA, have formed a joint venture to establish the Chartered Global
SOFTWARE ENGINEERING Key Enabler for Innovation NESSI White Paper Networked European Software and Services Initiative July 2014 Executive Summary Economy and industry is experiencing a transformation towards
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
IBM Global Business Services IBM Institute for Business Value Business Analytics and Optimization for the Intelligent Enterprise Business Analytics and Optimization IBM Institute for Business Value IBM
QLIKVIEW FOR LIFE SCIENCES A Clinical and Operational Breakthrough for the Life Sciences Industry TABLE OF CONTENTS Running on Insight 3 The Answer to Complexity 3 Doing More, Doing it Better 4 The Integrated
JANUARY 2013 REPORT OF THE DEFENSE SCIENCE BOARD TASK FORCE ON Cyber Security and Reliability in a Digital Cloud JANUARY 2013 Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics