TCS Big Data Conclave



Similar documents
Digital Enterprise Unit. White Paper. Web Analytics Measurement for Responsive Websites

Business Process Services. White Paper. Smart Ways to Implement Smart Meters: Using Analytics for Actionable Insights and Optimal Rollout

ANALYTICS. Insights that drive your business

From Customer Satisfaction to Customer Advocacy

On-Premise CRM to Salesforce Migration - Benefits, Challenges and Best Practices

Benefits of CRM Going Social

Business Application Services. Business Applications that provide value to your enterprise.

An Approach to Fusion CRM Adoption

SOCIAL MEDIA. Keep the conversations going

IT Support n n support@premierchoiceinternet.com. 30 Day FREE Trial. IT Support from 8p/user

Supply Chain Management

Driving Superior Outcomes: Making Performance Engineering an Integral Part of Business Operations

leasing Solutions We make your Business our Business

Domain 1: Designing a SQL Server Instance and a Database Solution

To c o m p e t e in t o d a y s r e t a i l e n v i r o n m e n t, y o u n e e d a s i n g l e,

CCH CRM Books Online Software Fee Protection Consultancy Advice Lines CPD Books Online Software Fee Protection Consultancy Advice Lines CPD

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

Six Optimization Opportunities in Multichannel Retailing

WHERE CHANGE IS POSSIBLE

Make Your Dive into Big Data Rewarding with an Effective Quality Assurance and Testing Strategy

The Importance of Change Management in Application Managed Services Outsourcing

Banking & Financial Services. White Paper. Managing Enterprise Financial Risk Using Big Data Technologies

Banking & Financial Services. White Paper. Basel III Capital Disclosure Requirements The Way Forward For Banks

Big Data Adoption An Iterative Approach to Harness the Power of Big Data

A Balanced Scorecard

(VCP-310)

ContactPro Desktop for Multi-Media Contact Center

Effective Data Deduplication Implementation

The Big Picture: An Introduction to Data Warehousing

Utilities. White Paper. Reimagining Customer Care for Utilities during Storms

Mobile Application Testing

Telecom. White Paper. Actionable Intelligence in the SDN Ecosystem: Optimizing Network Traffic through FRSA

CREATIVE MARKETING PROJECT 2016

ion Wellness Solution

facing today s challenges As an accountancy practice, managing relationships with our clients has to be at the heart of everything we do.

Mobility. Manage Agility with Our Technology

HiTech. White Paper. Adopting an Effective Strategy for Seamless Adoption of Cloud Integration

Forensic Readiness for Effective Incident Management

BPM Capabilities in CRM Landscape

Making training work for your business

Digital Enterprise. White Paper. Making a difference with a multichannel Content Marketing strategy

Business Intelligence on the Cloud: Overview and Use Cases

CCH Accounts Production

Enabling Offshore Collaboration in German Engineering Services

ODBC. Getting Started With Sage Timberline Office ODBC

Platform Solution. White Paper. Transaction Based Pricing in BPO: In Tune with Changing Times

Professional Networking

optimise your investment in Microsoft technology. Microsoft Consulting Services from CIBER

Smart Connected Products & The Internet of Things

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Ideate, Inc. Training Solutions to Give you the Leading Edge

Digital Enterprise Unit. White Paper. Leveraging Best Practices and Recommendations for Optimal Performance Tuning of IBM Campaign

Optimize your Network. In the Courier, Express and Parcel market ADDING CREDIBILITY

Based on Climates and Pendemap thermal Aggregates

client communication

Five Effective Testing Practices to Assure Meaningful Use of Electronic Health Records

Creating Tomorrow s Contact Center Today

Full Lifecycle Project Cost Controls

A Guide to Better Postal Services Procurement. A GUIDE TO better POSTAL SERVICES PROCUREMENT

U.S.-Based Project Centers Offer Superior Effectiveness Over Offshore in CRM Implementations

Loyalty as a lever in enhancing customer experience in banking

The ERP Card-Solution. The power, control and efficiency of ERP combined with the ease-of-use and financial benefits of a P-Card.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Configuring Additional Active Directory Server Roles

Creating an Agile BI Environment

Manufacturing. White Paper. Managing Knowledge from Big Data Analytics in Product Development

INDEPENDENT BUSINESS PLAN EVENT 2016

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

ELearning courses Competition / Antitrust

Document Control Solutions

IntelliSOURCE Comverge s enterprise software platform provides the foundation for deploying integrated demand management programs.

Assessment of the Board

Trustwave Leverages OEM Partnerships to Deepen SIEM Market Penetration

Baan Service Master Data Management

Insurance. White Paper. Why Contact Centers have to Evolve to Customer Engagement Center?

Grow your business with savings and debt management solutions

Product Manager. Integrated management of intellectual property and workflows from pre-acquisition to post-production

Modified Line Search Method for Global Optimization

Bio-Plex Manager Software

HiTech. White Paper. Data Virtualization: Enabling Next-Generation Business Intelligence

Wells Fargo Insurance Services Claim Consulting Capabilities

Transformation of Storage Technology Industry: Digital Trends and their Impact

Customer-Centric Digital Transformation: Analytics-Driven Impact on Business Strategies for Communications Service Providers

Supply Chain Innovation Driving Operational Improvements

The Agile Supply Chain:

Big Data. Better prognoses and quicker decisions.

Engineering Data Management

FOCUS 2015 PATHWAYS EXTRAORDINARY EXPERIENCES COMMUNITY CONNECTIONS OPERATIONAL EXCELLENCE STRATEGIC PLAN. INSPIRE n TRANSFORM n CONNECT

Security Functions and Purposes of Network Devices and Technologies (SY0-301) Firewalls. Audiobooks

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

France caters to innovative companies and offers the best research tax credit in Europe

Life Sciences. White Paper. Adopting Future-Forward Discovery Research Strategies: High Performance Computing in the Pharmaceutical Industry

Advancement FORUM. CULTIVATING LEADERS IN CASE MANAGEMENT

Supply Chain Management

Business Rules-Driven SOA. A Framework for Multi-Tenant Cloud Computing

Agency Relationship Optimizer

CCH Accountants Starter Pack

How To Write A Privacy Policy For A Busiess

Managing an Oracle ERP Upgrade with Best Practices in Organizational Change Management

Social CRM Possibilities and Challenges

Transcription:

White Paper TCS Big Data Coclave September 13, 2012, TCS Silico Valley Customer Collaboratio Ceter, Sata Clara Foreword Dr Gautam Shroff, Vice Presidet ad Chief Scietist, TCS Big data is here to stay ad will be a icreasigly sigificat area of competitive differetiatio. However, Big Data is i the icubatio phase for most orgaizatios. At TCS Research we see two fudametal aspects to Big Data; haressig ad harvestig. Ustructured data caot be cosumed i its raw form. It is essetial to covert it ito cosumable structured form for useful iterpretatio. We are workig o problems i this area. Fusio of ustructured ad structured iformatio is creatig the eed for a ew sciece stream Data Sciece which requires both busiess cotext ad hard sciece. The curret iovatio ladscape is vast, with multiple products ad offerigs. TCS s Big Data Coclave coveed i September 2012 at TCS Sata Clara Customer Collaboratio Ceter to begi a deep ad frak discussio about emergig treds that are largely clubbed uder the Big Data label, especially from a research perspective. There were four keyotes from academic researchers, as well as five breakout sessios o specific aspects chose through a participative pre-evet process. The sessios were iteractive, with all participats cotributig to the articulatio of problem statemets aroud Big Data techologies, as well as to directios for ivestigatio. This paper collates ad summarizes some of the mai ideas expressed at the coclave; some views were a reiforcemet of what may already believed, others were ideed thought provokig; most importatly, shared experieces provided clues to all as to possible directios to move forward i. While capturig the essece of a whole day s discussios, we have tried to keep the format as ope as the breakouts; some sessios were i a Q&A format; some were free discussio format. We have iterlaced these excerpts with key ideas from keyote speeches. We hope you beefit from these isights. We welcome your resposes to ideas expressed.

Table of Cotets Sessio Itroductio 3 Sessio Excerpts 3 Pael Discussio: Big Data Aalytical Techiques v/s Traditioal 3 Pael Discussio: Busiess Impact of Big Data Aalytics 5 Attributio ad Measuremet, Professor Ram Akella, Uiversity of Califoria 6 Pael Discussio: Next Geeratio Eterprise Aalytics Orgaizatio 6 MAD Skills to Re-thik Eterprise Data Maagemet, Professor Joe Hellerstei, Computer Sciece, Uiversity of Califoria- Berkeley 7 Pael Discussio: Big Data Techology Ladscape ad Segmetatio 8 Database Treds Survival of the Biggest, Prof Emeritus Jeff Ullma, Staford Uiversity 9 Pael Discussio: Big Data ad Traditioal DW/BI 10 Big Data ad Traditioal Methods - Professor Jerry Friedma, Dept. of Statistics, Staford Uiversity 12 Coclusio 13 2

Sessio Itroductio Iauguratig the sessio, Dr Satya Ramaswami (Global Head, Mobility Solutios ad Big Data CoE, TCS) made several observatios based o his team s experiece at the Sata Clara Ceter. Some of his observatios o Big Data were: 1. Big Data turs market segmetatio o its head. We ow have computatioal firepower to actually go ad serve each customer o a idividual basis. 2. Big Data brigs the computig power ecessary to do exhaustive aalysis as opposed to market samplig, reducig error margis. 3. Big Data gathers sesor data from the physical world, as well as digital data from various sources. It marries the physical world with the digital world. Such close modelig maximizes efficiecy. 4. Big Data does ot have to be limited to oly aalytics. The uderlyig power of Big Data is about harvestig the power of distributed computig. Big Data brigs i iterplay betwee aalytics, mobility, social media ad cloud computig. Our visio is to get isights from Big Data, ad deliver it o time, ay time, aywhere o mobile devices, he said. Sessio Excerpts Pael Discussio: Big Data Aalytical Techiques v/s Traditioal Achor: Dr Gautam Shroff, Vice Presidet ad Chief Scietist, TCS Q: What is the expertise we eed i data sciece? What is it goig to be used for? Two viewpoits were expressed regardig the role a data scietist should be able to play: Data-drive Busiess Strategist: Ofte the busiess value of data is ot clear. May eterprises are actually still i this situatio. They are storig data ad retrievig data, but formulatig how to extract their value is still a challege. So the data scietist has to be part of a strategic team that is able to articulate what oe ca do with data ad how value ca be extracted from it. The role is a mix of data sciece coupled with busiess strategy. Isight Architect: Able to orchestrate the derivatio of isight from actual as well as potetial iformatio. Note that ulike the role of the data-drive busiess strategist, i may cases, busiess problems are clearly kow. I such scearios, the data scietist has to be a part of a team that will desig ad orchestrate a data strategy to solve busiess problems. Oe aspect of this role is to determie exactly how to extract value from data that oe has, makig the whole exercise ru like a software architect, but i the data domai. Apart from this rather techical fuctio, there also remais the ofte udervalued fuctio of figurig out how to acquire additioal data where 3

required, ofte through creative icetives ad ivetive data-brokerig arragemets. I this latter aspect, the data scietist is also called upo to perform more strategic elemets closer to the first role metioed above. Q: How do we use social data or what is also called exogeous data(data that is essetially from outside the eterprise; it may be customer reactio, weather, accidets, etc.)? Three importat eeds were expressed: Speed is the key: Oce you get some iformatio or isight from exteral data, how quickly ca you get isight from that data? Correlatio to actios: Did the decisios the compay take regardig a product or service correlate with the isights obtaied from text miig? Data itegratio ad iformatio fusio: Data is available at differet levels of graularity. Ofte there is ot eough data at the right level of detail. The assumptios that you ca make about iterpolatig or extrapolatig or eve sometimes reducig the graularity of data allows oe to make isights at differet levels i spite of lack of data. The role of collaboratio i data aalytics: Experiece shows that people will use collaborative social platforms withi the orgaizatio, but such collaboratio is far more purpose-specific tha o public social etworks. Discussio about data, queries about specific data ad iformal collaboratios ca be very useful i the data aalytics cycle. Such discussios ca help improve data quality, completeess, as well as suggest better data collatio ad fusio strategies. Collaboratio o data ca add domai isights ito the aalytics process, marryig the people who uderstad the domai with the people who uderstad data sematics ad proveace. Advaced techiques themselves ca cotribute towards improvig collaborative data aalysis: e.g., case-based reasoig, sematic search ad recommeder systems. Simulatio ad Statistical Models There is a iterestig tred i sciece ad egieerig i geeral, i.e., the marriage of simulatio techiques used for physical, chemical, biological or eve fiacial modelig with statistical techiques. The best example of this comig together is i the bio-iformatics area; ad other domais are followig this tred. There are limits to how large complex models ca be, ad what rage of parameters they might remai valid for. However, multiple models ca be itegrated by statistical techiques, ad additioally improved usig statistical aalysis of field data i additio to validatios i a experimetal settig. Oe of our participats from the oil ad gas idustry commeted that his orgaizatio is already movig i this directio. 4

Pael Discussio: Busiess Impact of Big Data Aalytics Achor: Kevi Mulcahy, Head - Busiess Aalytics Solutios, Global Cosultig Practice, North America, TCS Q: How ca Big Data improve top lie reveue? By closig the loop betwee customer feedback ad busiess operatios, Big Data techologies should mie iteractios with customers to provide cotextual awareess, iform operatios ad create a close looped eviromet, where each iteractio of the customer is relevat ad is able to effectively address their eeds. Such iformatio should more effectively iform operatios, ad likewise, the iformatio that comes out of operatios, e.g., actios take, should help aticipate ad better hadle the ext customer iteractio. At a more techical level, what this implies is that whe oe is able to successfully close the loop from a orgaizatioal perspective, the opportuity to exploit iteractive, semi-supervised learig techiques becomes real, just as is already the case i the web-world. Value Ecosystem Applicatios Deeper isight ito ecosystem capabilities Isight ito ecoomic, political, ad other evets affectig upstream /dowstream processes Kowledge capture across the ecosystem Powerful isights from third party data Advatages Deeper isight ito ew or exteded value creatio opportuities across the ecosystem Expaded risk isight for risk mitigatio Products ad Services Product ad service isight from ew chaels Moetizatio of data assets ad isights Big Data as a core compoet of products ad services Isight to tailor experieces across products Isight to ehace product / service support Fiacial, market, product / service usage data ad aalytics to maage portfolio Go-to-Market Isight from multiple chaels Sigle view of the customer New market segmet isight Segmetatio at the idividual level Cotext awareess Early Big Data adopters create competitive advatage New levels of customer isight: acquisitio, retetio, service, marketig effectiveess, iovatio, reputatio maagemet Culture ad Orgaizatio Employees trasformig too ew levels of employee isight New Role: the Data Scietist Shift from silos of iformatio to pooled kowledge Value shift from beig isolated guru to opeess ad sharig Mitigatig employee cocers: everythig is recorded Isight as a eabler of ext geeratio efficiecy Figure 1: Potetial areas where Big-Data ca impact busiess 5

Attributio ad Measuremet, Professor Ram Akella, Uiversity of Califoria Case study: A advertisigetwork aalyzed 2,885 campaigs, 1,251 products ad over six moths of data. No reliable cookies relatig ad-impressios to user actios were available. The challege was to correlate advertisig campaigs with the impact o actual sales. The etwork geerated 50 terabytes of data every day, which was aalyzed usig a 1000 machie Hadoop cluster. Just a ballpark figure: if you have 15 campaigs, it takes about 75 miutes to compute; about 5 miutes a campaig (aalysis usig techiques such as Gibbs samplig, Bayesia Kalma Filterig, FFBS, etc.). With all this data, approximately a 100 marketig campaigs ad some smart aalytics people icludig PhDs from Staford, what sort of success would you expect i predictig outcomes? About 1 %; i.e., out of 100 campaigs, 20 appeared to be okay, ad oly oe had a demostrably high lift with statistically acceptable cofidece. Learig: There were millios of exposures, but little actio.it is ot ofte Big Data, but the sparseess that makes it difficult Message: Orgaizatios have both processed ad uprocessed data; uderstadig eve the processed data is tough. Without cotextual uderstadig, if you just wat aalytics, you are ot goig to get what you eed. Apart from askig the right questios, you eed the right measuremet method. The you eed to validate it, to be sure that what the busiess is doig based o your aalytics is workig. If you ca show that you have a statistically valid process to measure the busiess impact of aalytics isights, the chace of busiess buy-i, ad therefore, beig able to close the loop, is much higher. Pael Discussio: Next Geeratio Eterprise Aalytics Orgaizatio Achor: Ajay Bhargava, Director, Aalytics & Big Data, TCS Big Data Maturity: There are pockets of aalytics happeig i every orgaizatio. It starts i a distributed fashio. Oe lie of busiess has a dire eed, ad ca afford to ivest i aalytics. Later, these aalytics trasitio to a cetralized model to cocetrate expertise. Evetually we retur to a federated model where there is close iteractio with multiple busiess uits. I this coectio, the participats felt that it is ofte alot easier for a busiess perso to lear techology tha the other way roud. Likig Silos: There is a big eed to lik people i silos i each lie of busiess, so they ca work together to derive orgaizatioal busiess value from data. Academia-Idustry Lik: There is a huge gap betwee supply ad demad for data scietists i terms of the academia versus idustry requiremets. 6

Demad for deep aalytical talet i the Uited States could be 50 to 60 percet greater tha its projected supply by 2018 Supply ad demad of deep aalytical talet by 2018 Thousad people 140-190 440-490 150 180 30 300 50-60% gap relative to 2018 supply 2008 employmet Graduates with deep aalytical talet 1 Others 2018 supply Talet gap 2018 projected demad 1 Other supply drivers iclude attritio (-), immigratio (+), ad reemployig previously uemployed deep aalytical talet (+). Source: US Bureau of Labor Statistics; US Cesus; Du & Bradstreet; compay iterviews; McKisey Global Istitute aalysis Figure 2: Demad for Data Scietists MAD Skills to Re-thik Eterprise Data Maagemet, Sciece, Uiversity of Califoria- Berkeley Professor Joe Hellerstei, Computer Just as we talk about techologies, attitudes aroud data maagemet eed to chage too. Data withi eterprises is protected ad guarded by database architects, the high priests of data. I agree with Bill Imo who says, There is o poit i brigig data ito the data warehouse eviromet without itegratig it. Magetic, Agile, Deep (MAD) skills are valuable to iovate ad reap good ROI from data: Magetic: Make data warehouses magetic; attract users ad data. Share data ad isights. Agile: Get ew data for aalysis >> ru aalytics for isights >> chage practices; close the loop ad keep this agile. Deep: Look deep ito the data with a statistical approach. Focus o desities, ot idividual items; ru massively parallel processes o Big Data. 7

Pael Discussio: BigData Techology Ladscape ad Segmetatio Achor: Pratik Dhebri, Strategy & Busiess Developmet Lead, NextGe Solutios, TCS The curret Hadoop ecosystem Price performace cocers: There was a feelig that trasparecy was lackig i the pricig of certai distributios sice the price icluded both cosultig dollars ad software dollars, eve though Hadoop is cosidered to be ope source. Further, comparig product price aloe with traditioal BI platforms is ot ecessarily valid sice, at least today, Big Data techology requires greater programmig effort ad, therefore, cost. Features ad security: The Hadoop ecosystem today lags i meta-data services ad security. Some small players perform vedor-idepedet istallatios ad offer security cofiguratio as a service. But major distributios do ot provide adequate security features out of the box. Hardware upgrade challeges: Today if you deploy ew hardware as a cluster, ad ext year Itel or AMD releases ew chips, you would eed to upgrade your hardware; i the case of a cluster it is temptig to add ew odes ad live with a mix of cofiguratios. How ca this be doe efficietly, while also esurig that the most critical applicatios get the latest ad best hardware resources? Also over time, usig clusters ca result i icreased/io seek times because disks are gettig older. Thus there are challeges i maagig the evolutio of a cluster as compared to replacemet of a high-ed itegrated system: Without proper cluster maagemet to hadle evolutio, performace will degrade. Multi-teacy: Should IT maitai multiple clusters or ca Hadoop support multi-teacy? It does look as if Hadoop is movig towards multi-teacy i comig distributios. However, this is a evolvig space ad a adequate solutio is yet to be i place. Database techologies We are curretly witessig a move towards NoSQL. There are traditioal RDBMSs; ad i a short time we have see ew systems like Cassadra, HBase, Mogo-DB, Couch-DBetc. becomig popular. At the same time, the performace that oe ca get from appliaces such as Netezza remais attractive, especially if oe s aalytical users are sophisticated eough to use SQL rather tha havig to rely o DBAs. I such situatios movig to a o-sql system will most certaily require programmers to get ivolved, which ca icrease costs as well as degrade aalysis cycle times. However, such situatios are exceptioal, e.g., i high-ed fiacial services; most orgaizatios see the beefit i movig to cheaper o-sql eviromets, especially for deep aalytical tasks that ivolve all data echoig. Data marts where the output of deep aalytics are cosumed should probably cotiue to be implemeted o traditioal relatioal databases, but the ormal oes such as Oracle or MySQL would suffice for this ed of the cycle. I short, the heavy-duty datawarehouses such as Teradata are facig a challege, be it from appliaces or Big Data stacks. 8

Similarly, whe it comes to ETL ad data itegratio techologies, Map Reduce-based implemetatios are able to brig price-performace advatages as compared to traditioal ETL tools such as Iformatica or IBM. Further, the ability to perform deep aalytics o data i-flight durig itegratio eeds to be icreasigly exploited. Time series What is the best way to aalyze time series? There are a few ope source platforms for time-series, but this largely remais a area that requires more iovative ad complete solutios. I-memory databases I-memory databases such as SAP Haa, or HStore i the OpeSource world, are evolvig to become attractive platforms for fast i-memory data visualizatio ad Olie Aalytical Processig (OLAP) queries. Such queries are required at the datamart or output ed of a deep aalytics cycle, or for i-flight heavy aalytics eeded i some specialized high-throughput eviromets. Cloud Data ceters eed to get better itegrated with cloud platforms, ad aalytics is possibly a example of where this might be appropriate. The aalysis of tweets ad social feeds is best doe i the cloud, while iteral data remais i a private data ceter or cloud. The advatages of such a approach is the elasticity available i the cloud allowig oe to process large volumes, after which the results of aalyses ca be pushed to the data ceter ad fused with master/trasactioal data extracted from operatioal systems ad/or a data warehouse. Database Treds Survival of the Biggest, Prof Emeritus Jeff Ullma, Staford Uiversity A DBMS from your favorite vedor is still the way to do large-scale trasactio processig ad simple queries. Vector supercomputers ca beat map-reduce (MR) implemetatios o commodity clusters. But how do we overcome processig glitches due to hardware failure? How do we hadle recursio ad iteratio? Resiliece to hardware failures, which MR provides, is vital. Map-Reduce is also well desiged for the most powerful SQL operatios: joi ad group/aggregate. So I would say, whe you do t eed trasactios ad you are processig large volumes of data, you should try Map-Reduce. Example: deep aalytics where all the data eeds to be touched, such as to lear a model. I fact a RDMBS is ca be a poor choice i such cases. Iterative map-reduce remais i the research domai though: Whe your problem requires a log sequece of MR jobs, the followig research projects are worth explorig: 9

Iterated MR (HaLoop, Twister). A dataflow system (Hyracks, Dryad, etc.). A recursive system (Pregel, Giraph, S4). Ad whe you have simple processig, but way too much data, try: Sawzall - a procedural domai-specific programmig laguage that processes large umbers of idividual log records. Dremel (ot yet ope-source) is probably the ext evolutio i the big-data ecosystem, i.e. a scalable, iteractive ad-hoc query system for aalysis of read-oly ested data. Pael Discussio: Big Data ad Traditioal DW/BI Achor: Dr Satosh Mohaty, VP, Head of Compoets Egieerig Group, TCS Q: What is the role of traditioal ETL techologies goig forward whe should they be replaced if at all; does the ew stack offer a viable replacemet? Big Data ETL techology will evetually take over the largest pie of ETL techology. However, Big Data ETL will take aother 2-3 years to gai product level maturity from a eterprise perspective (eve though these techologies are i productio i the web-world ) Service Providers eed to develop/mature process/methodology to adopt Big Data ETL People eed to be skilled/re-skilled to tap the market opportuity Q: What is the role of traditioal data warehousig techologies should they be replaced if at all; does the ew stack offer a viable replacemet ad if so what is the price-performace tradeoff, if ay? Big Data techology will complemet traditioal data warehousig techology. However, Trickle feed / real-time data iput is yet to be prove o the Hadoop-related BigDatastack ad hece, mixed workloads may create performace issues. Streamig databases remai the way to go for such situatios; here Twitter Storm is a optio to track. For complex, but ormalized, data structure ad data size i order of terabytes, Big Data techology may ot be the preferred optio at preset. However, where u-ormalized, raw data eeds to be processed or eve fused with ormalized data, the BigData platforms are a optio. 10

Q: What is the role of traditioal queuig systems ad emergig complex-evet techologies do the emergig BigData tools, such as Twitter s Storm, provide complemetig or replacemet capabilities? If yes, for what purposes? Both traditioal ad emergig Complex Evet Processig (CEP) will co-exist ad require further maturity/scalability. However, a CEP query does ot respod i a millisecod be it relatioal DW, Hadoop or Twitter Storm. To speed-up, the database ca be put ito the memory, but memory is ot as large ad cost-effective as the HDD storages. Q: What is the role of emergig i-memory databases how do these complemet or compete with traditioal ad other Big Data techologies especially for aalytics applicatios? Big Data techology will complemet i-memory based computig, such as SAP Haa, providig: Easy to write ad scale complex, real-time computatios o a cluster of computers for real-time processig (e.g., Hadoop batch processig) Horizotal scalability through multiple threads, processes ad servers I-memory databases to speed up database operatios Features such as parallel processig, fault tolerace ad guarateed delivery of messages, which give Twitter storm a edge. However, a i-memory database is ot cost effective for very large, highly structured data whe compared to a traditioal relatioal database. Q: How mature are emergig BigData-based aalytics libraries ad tools versus those from traditioal aalytics vedors that coect with Big Data techologies? Big Data-based aalytical libraries are still maturig ad are two to three years away from competig with traditioal aalytics vedors. R ad Mahout have bee picked up by the idustry for aalysis as they provide rich processig fuctioality approachig that of commercial tools such as SAS or SPSS. At the same time, visualizatio remais a gap i these ope source platforms. SAS, SPSS ad Petaho are already buildig capability to support Big Data techologies New modelig techiques eed to be built ad tested i large data sets eve as aalytical libraries with strog statistical/mathematical models are emergig i certai domais, such as fiace or marketig. Certai emergig scearios, such as the ability to mie large-scale time-series, remai uaddressed from a solutio perspective. 11

Big Data Techology Applicatios Idustry specific use cases Low-cost, high volume ad performace Data Warehouse which ca easily scale-out Cost Savigs > $100k per TB as compared to EDW appliaces Massive parallel computig tool to divide ad coquer workloads Sigificatly reduced computatio time for complex aalytics from days to a few hours Idustry agostic horizotal use cases CRM ad Customer Loyalty Fraud Detectio Log Aalysis Text & Media Miig Applicatios like Profile Search Predictive Aalytics o Machie data Bakig: Risk Moitorig ad Trade Surveillace, Compliace ad regulatory reportig, Isurace: Claims aalysis, Policy pricig, Risk Aalysis Telecom: Network performace & optimizatio, Call Detail Record (CDR) aalysis Figure 3: Possible evolutio of Big Data i the eterprise Big Data ad Traditioal Methods - Professor Jerry Friedma, Dept. of Statistics, Staford Uiversity I may cases Big Data, i.e., lots of data, does ot ecessarily improve the correctess of traditioal methods, because you ca hit model saturatio. We eed richer ad more adaptive algorithms, may of which are curretly still i the research domai. However, there are istaces where havig lots of data ca help a lot: Rare evets such as frauds, where statistical accuracy matters, ot the data size. Simple coutig methods o large data sets (these ca be competitive with sophisticated methods o small data sets) Real time aalysis: Relatioships chage rapidly with time, ad there is a eed for eough recet data to capture model chages Assessig accuracy: Extra data ca help validate the reliability of results Lastly, whe oe really has lots of data, it becomes feasible to just query the data for the required estimates rather tha havig to use may assumptios ad statistical / machie-learig models. It is importat to realize the relatioship betwee queryig data, which is essetially what OLAPs/ROLAPs do, ad the statistical model buildig approach; Big Data provides a cotext to uderstad this relatioship better. 12

Coclusio The TCS Big Data Coclave i September 2012 cocretized several key ideas ad issues aroud this emergig techology. The techology is just maturig. Most orgaizatios are icubatig Big Data i small measures. Research i likig iteral ad exteral data seamlessly has some way to go. Importatly, busiess has to take several key decisios regardig Big Data such as how much data to store ad for how log. If isights are available, ca the busiess be made agile eough to act o these? While questios remai, several clues to optimizig the techology have ow emerged. The role of the Data Scietist was demystified to a extet. May use cases of Big Data emerged from the discussios with the experts. Though the coclave eded after the fifth sessio, we would like to keep the coversatio goig. Please feel free to commet o the views discussed here o Big Data. 13

List of Abbreviatios BI CEP CoE DB DBA DBMS DW/BI ETL FB FFBS HDD IBM I/O MAD MR NextGe OLAP PhD RDBMS ROI ROLAP SAS SPSS Busiess Itelligece Complex Evet Processig Ceter of Excellece Database Database Aalysts Database Maagemet System Data Ware/ Busiess Itelligece Extract, trasform, load Facebook Forward Filterig Backward Samplig Hard Disk Drive Iteratioal Busiess Machies Iput/ Output Magetic, Agile, Deep Map-Reduce Next Geeratio Olie Aalytical Processig Doctor of Philosophy Relatioal Data Base Maagemet System Retur O Ivestmet Relatioal Olie Aalytical Processig Statistical Aalysis System Statistical Product ad Service Solutios 14

Cotact sessio achors: Those who would like to kow more about our Big Data Research or would like to joi discussios like these, please mail us at iovatio.ifo@tcs.com. We will keep you updated o Big Data-related evets. Subscribe to TCS White Papers TCS.com RSS: http://www.tcs.com/rss_feeds/pages/feed.aspx?f=w Feedburer: http://feeds2.feedburer.com/tcswhitepapers About Tata Cosultacy Services (TCS) Tata Cosultacy Services is a IT services, cosultig ad busiess solutios orgaizatio that delivers real results to global busiess, esurig a level of certaity o other firm ca match. TCS offers a cosultig-led, itegrated portfolio of IT ad IT-eabled ifrastructure, egieerig TM ad assurace services. This is delivered through its uique Global Network Delivery Model, recogized as the bechmark of excellece i software developmet. A part of the Tata Group, Idia s largest idustrial coglomerate, TCS has a global footprit ad is listed o the Natioal Stock Exchage ad Bombay Stock Exchage i Idia. For more iformatio, visit us at www.tcs.com IT Services Busiess Solutios Outsourcig All cotet / iformatio preset here is the exclusive property of Tata Cosultacy Services Limited (TCS). The cotet / iformatio cotaied here is correct at the time of publishig. No material from here may be copied, modified, reproduced, republished, uploaded, trasmitted, posted or distributed i ay form without prior writte permissio from TCS. Uauthorized use of the cotet / iformatio appearig here may violate copyright, trademark ad other applicable laws, ad could result i crimial or civil pealties. Copyright 2013 Tata Cosultacy Services Limited TCS Desig Services I M I 03 I 13