Towards the Next Generation Intelligent BPM In the Era of Big Data. Xiang Gao, China Mobile Communications Corporation

Size: px
Start display at page:

Download "Towards the Next Generation Intelligent BPM In the Era of Big Data. Xiang Gao, China Mobile Communications Corporation"

Transcription

1 Towards the Next Generation Intelligent BPM In the Era of Big Data Xiang Gao, China Mobile Communications Corporation

2 Outline Background From BPM to ibpm A Big Data Perspective on BPM Embrace the Idea of Big Data Conclusion and Future Work

3 Brief Introduction of CMCC As the leading Chinese telecommunication company, China Mobile Communications Corporation (CMCC) is also recognized as the world's largest mobile phone operator by subscribers with about 740 million. In 2012, the Company was once again selected as one of the "FT Global 500" by Financial Times and "The World's 2,000 Biggest Public Companies" by Forbes magazine. Base stations (thousand) Subscribers (million) Over 1 million base stations covering 99% of national population, and roaming service to 237 countries and regions Over 700 million subscribers, including over 100 million 3G subscribers The world s largest teleco BASS data warehouse over 1200TB The world s largest daily signaling data/billing data over 100/10 TB The world s largest teleco business process repository over processes

4 State of the Art of BPM BPM is always recognized by CMCC as a kind of holistic management approach. By borrowing the basic idea of Gartner's hype cycle, we provide a qualitative graphic representation of the maturity, adoption and social application of specific BPM technologies in CMCC s view. Expectations process fragmentization cloud-enabled BPMS platform process repository business artifacts BPM Suite fragment merging process mining Ontology based business behavior/rules modeling distributed engine BPMS evaluation process models clustering workflow pattern and applications similarity analysis modeling language and mutual conversion More attention has been focused recently on advanced analytical technologies, especially on interdisciplinary collaboration in both research and application. SOA business process analysis Technology Trigger Peak of inflated expectations Trough of disillusionment time Slope of enlightenment Plateau of productivity

5 Great Challenges Arise from IT Consolidation In the context of IT systems consolidation, such a large number of business processes must cause an extremely arduous task for maintaining and evolving. Accordingly, one must elaborately consider business process consolidation first. business understanding & raw process reconstruction complex business logic & recessive rules flexible modeling based on business semantics redundancy removal & process repository During more than 10 years IT construction, a large number of business processes have been varied and updated for many times, and thus, have been biased from the original design. To achieve the objective of consolidation, great attention should be paid to deep understanding of business behavior and efficient process reconstruction. The current processes contain so many special and complex business logics, making the design and execution of processes extremely complicated. Formal design of these business logic should be of much consideration. Actually, business analysts have deep understanding of business but cannot design the process models independently without the support of IT staffs. It is really important to provide an efficient approach to assist flexible and agile design of process models for business analysts with the least IT efforts. Different subsidiary organizations follow unified business specifications and design their own processes. Due to individual management requirements, their processes, even expressing the same business behavior, are usually not exactly the same while having a high degree of similarity. The technology to reduce duplications and make the differences between process models explicit is really important.

6 Example: OA Systems Consolidation Office Automation (OA) is one of the most important management information systems for governments and enterprises in China, to process and communicate information for daily working of all the users. OA fundamentally refers to supporting document flow, approval, transfer, archive and other enterprise general management business processes. The OA systems of CMCC have been independently built for more than 10 years by each subsidiary organizations. more than 3 kinds of OS more than 4 kinds of RAID more than 5 kinds of middleware more than 49 system integrators more than 240 external interfaces more than 1000 application modules more than 8190 business processes It is causing increased architecture heterogeneity, high integration complexity and especially high construction and maintenance cost. These processes are described differently and each subsidiary may have its own description, mainly based on informal user-definition or natural-language text documents.

7 Challenge I: Diversity of OA Different subsidiary organizations follow unified business specifications and design their own processes. Due to individual management requirements, their processes, even expressing the same business behavior, are usually not exactly the same while having a high degree of similarity. Statistics of OA Samples Source: Jan Mendling. Metrics for Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness, LNBIP 6, Springer, 2008 Let u1, u2, u3 express the normalized mean vectors of different subsidiaries, then the autocorrelation matrix R has the form u1, u1 u1, u2 u1, u3 2, 1 2, 2 2, 3 u, u u, u u, u R R R R R u u R u u R u u R R R T where i j R ui, uj uu u u es belonging to different subsidiaries have high similarity. i j

8 Challenge I: Diversity of OA Different subsidiary organizations follow unified business specifications and design their own processes. Due to individual management requirements, their processes, even expressing the same business behavior, are usually not exactly the same while having a high degree of similarity. Statistics of OA Samples high graph sparsity (edge density=0.07) high control-flow complexity (control-flow complexity = 16.41, when compared with number of tasks) weakly structured (structuredness = 0.61, include 5 arbitrary cycles) Source: Jan Mendling. Metrics for Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness, LNBIP 6, Springer, 2008

9 Challenge I: Diversity of OA Different subsidiary organizations follow unified business specifications and design their own processes. Due to individual management requirements, their processes, even expressing the same business behavior, are usually not exactly the same while having a high degree of similarity. Statistics of OA Samples Within one subsidiary organization, different processes are highly different. Source: Jan Mendling. Metrics for Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness, LNBIP 6, Springer, 2008

10 Challenge II: Complex Business Logic of OA The OA processes usually contain plentiful complex business logics, where special attention should be paid to both theoretical and practical issues concerning analysis and realization. registrat ion Approval by Dept. Manager Department There are several difficulties for current BPM technologies in attempts to handle such complex business logics. For example: Carry Out Department Take Dept. COUNTERSIGN fragment for example. It is a common part within most of document approval processes, and can be modeled as parallel multi-instance plus recursive sub-process structure. 1.How to measure the similarity between two of such models? 2.How to mining such models from the corresponding execution log? 3.How to analyze the soundness of such models? 4.How to do indexing and searching on such models? 5.How to implement execution, monitoring and conformance? 6.

11 Challenge III: Big Gap between Business & IT Realization There is always a big gap: business analysts have deep understanding of business but cannot design the process models independently without the support of IT staffs, even though notation based modeling language is exploited. An intelligent configuration tool for process models design is needed to assist business analysts with easy and flexible process models design. Provide business analysts numerous process templates and fragments templates with specific business semantics, automatically extracted from existing process models Provide intelligent process models and log analysis capability, e.g., similarity, search, merging, mining and so forth Provide statistics capability for all kinds of process data template Scenario of Modeling by Business Analysts + Organization s and Role s templates E-Form templates Rules, external services models

12 Outline Background From BPM to ibpm A Big Data Perspective on ibpm Embrace the Idea of Big Data Conclusion and Future Work

13 On the Way from BPM to Intelligent BPM The IT systems consolidation highly depends on the consolidation of business processes first, where intelligent BPM (ibpm) has been given new impetus by integrating analytical technologies into orchestrated processes. Evolution Workflow 1990s BPM 2000 BPMS Mid 2000s ibpm 2012~ meet the ongoing need for process agility, especially for regulatory changes and more-dynamic exception handling aim at leveraging the greater availability of data from inside and outside the enterprise as input into decision making facilitate interactions and collaboration in cross-boundary processes + Mobile Social External Data + = BPMS + Cloud Platform Advanced Analytics ibpms

14 Definition of ibpm from an Industrial Point of View There are a thousand Hamlets in a thousand people s eyes. William Shakespeare Like the famous definition of Big Data by the three Vs, from CMCC s perspective, analytical, automatic, agile and adaptive may also constitute a comprehensive definition of ibpm, and they bust the myth that ibpm is only about analytics. In addition, each of the four As has its own ramifications for analytics. Analytical The most prominent feature of ibpm is the capability of advanced analytics. It integrates with state-of-the-art analytic technologies, including both preanalytics and post-analytics. process model based analysis, such as model decomposition, clone detection, similarity search etc historical log and other information based analysis, such as automatic business process discovery (i.e., process mining), social analysis, intelligent recommendation, prediction etc Automatic The enormous volumes of data require automated or semi-automated analysis techniques to detect patterns, identify anomalies, and extract knowledge. Take process consolidation for example. The ibpm should be designed to facilitate the procedure that automatically reduces duplications and makes the differences between process models explicit, instead of manual operation.

15 Definition of ibpm from an Industrial Point of View BLOG SOCIAL Agile The ibpm is expected to simplify the procedure. For example, by incorporating process fragments with business semantics into design tool, the efficiency of modeling can be significantly improved and most of the procedures can be implemented by business analysts with the least IT efforts. SMART METER Adaptive The dynamic changing of business processes and external data inside and outside should be flexibly captured and responded by resorting to not only the adaptive adjusting of the analysis algorithm parameters, but also the on-demand selection of appropriate algorithms in a configuration way. It is worth noticing that achieving of the 4As features will be given new opportunities in the era of big data.

16 Outline Background From BPM to ibpm A Big Data Perspective on ibpm Embrace the Idea of Big Data Conclusion and Future Work

17 Big Data: Becoming Big Business The birth and growth of big data was the defining characteristic of the 2000s. As obvious and ordinary as this might sound to us today, we are still unraveling the practical and inspirational potential of this new era. Growth of global data Global installed computation to handle information Source: Oracle, 2012 Trends of global data Global installed, optimally compressed, storage Source: Cisco, 2011; Gartner 2009&2011; IDC, 2012 Source: Hilbert and Lopez, The world s technological capacity to store, communicate, and compute information, Science, 2011.

18 Big Data: Advanced Analytical Techniques There is a need for ongoing innovation in techniques that will help individuals and organizations to analyze the growing torrent of big data. A wide variety of technologies has been developed and adapted to aggregate, manipulate, and analyze big data. Association Rule Learning Classification Clustering Crowd Sourcing Optimization Data Fusion Machine Learning Ensemble Learning Genetic Algorithms Neural Networks Pattern Recognition Spatial Analysis Natural Language ing Predictive Modeling Source: Big data: then next frontier for innovation, competition, and productivity, McKinsey Global Institute, June 2011 Network Analysis

19 Big Data: Advanced Analytical Techniques (Deep Insight) The applicability analysis of various algorithms and models have been given new impetus for practice in specific large scale data set scenarios, mainly according to the time complexity and space complexity evaluation. Algorithms with both data intensive and CPU intensive capabilities composite the primary computational mode for big data complicated analysis tasks. Data intensive FP-growth LVM leader K-means CF Hierarchical Model Naïve Bayes SVM Decision Tree Apriori GA For ultra large scale data set and non-real time batch processing, parallel computing is highly recommended after necessary MapReduce type reconstruct of the original algorithm. For real-time task, in-memory computing or stream computing and incremental computing are recommended to increase processing efficiency. Apply probability processing method like Bloom filter\lsh can benefit from time and space complexity decrease. CPU intensive Classification Clustering Association Recommendation

20 Big Data Catalyst for BPM Evolution What does big data really mean in the evolution of BPM? Elegantly stated by the founding father and pioneer long before the introduction of the big data concept. In God we trust; all others must bring data. W. Edwards Deming. Mobile Social External Data Driven by process data and other related data, it can be a new platform for the R & D of intelligence based on big data, making Deming's maxims a reality for the operation of future ibpm systems. Advanced Analytics New methods and tools to embed information into business processes, are making insights more understandable and actionable. Source: Big data, Analytics and the Path From Insights to Value, MIT Sloan Management Review 2011

21 On the Path from Insight to Action (Data) Finding the Needle in the Big Data BPM Haystack The biggest misnomer actually comes from the name itself that is, that big data is about big data. When we talk about big data, we must put its size in relation to the available resources, the question asked, and the kind of data. Traditionally, It is universally acknowledged that Where is business process data? From a special point of view, the complete event log data the process models in centralized repository the process cases data of a large corporation (e.g.,cmcc) can all be treated and analyzed as big data in the BPM field. While, from a generalized point of view, data describing a set of behavior or task with specific order can all be treated as process, such as user clicking on the web, searching Accordingly, a large number of such kind of data are all big data in the BPM field.

22 On the Path from Insight to Action (Analysis) Sparsity Vs. Redundancy The widespread use of traditional data mining and artificial intelligence algorithms has usually exposed their limitations on data sparsity in large-scale data set or problems associated with high dimensionality. However, the large amount of process data always exhibits redundancy instead of sparsity. Identification of highly reusable fragments approval process of province A Vs. approval process of province B reusable fragments (Draft & Approval) For example, user-based collaborative filtering systems have been very successful in the past, but their weakness has been revealed for large, sparse databases Source: Analysis of Recommendation Algorithms for E-Commerce, Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl To deal with more than 8000 OA processes, the technology to reduce duplications and make the differences between process models explicit is really important.

23 On the Path from Insight to Action (Analysis) Sample Vs. Population Sample based analysis is usually conducted to infer the whole behavior of population. However, in the age of big data, one turns to put emphasis on population but not sample, since collecting and processing large amount of data are feasible now. Take the process mining scenario for example, where the completeness of event log plays an extremely important role. For limited event log (also recognized as sample), the global completeness needs to be evaluated by resorting to distribution fitting or at least bound estimation. Vs. For complete event log (also recognized as population), it seems the global completeness is definitely guaranteed. However, the data quality can also affect the efficiency of mining algorithms, while it suffers from data missing and noise infection for population data. Arbitrary Jump Therefore, one must pay more attention to unprecedented challenges when population based analysis was implemented, such as noise cancellation, redundancy removal, data quality improvement and so on.

24 On the Path from Insight to Action (Analysis) Individual Vs. Network Big data is often associated with complex data network, so that it is offering a fresh perspective, rapidly developing into a new network science discipline. It also has exerted a subtle influence on the BPM research and applications world business processes people machines components organizations models analyzes supports/ controls specifies configures implements analyzes software system records events, e.g., messages, transactions, etc. Sample of Cross Enterprise Business Network (process) model discovery conformance enhancement event logs From a business process network point of view, process mining is re-defined as the discipline that covers the discovery, extraction and domain specific analysis of relevant data from dynamic, distributed and heterogeneous enterprise landscapes. Source: From Network Mining to Large Scale Business Networks, Daniel Ritter, WWW 2012 LSNA 12 Workshop, 2012 enterprise people applications machines components models real world, analyzes Business processes middleware network model manages/ controls specifies re-deploys interpretes analyzes discovery conformance enhancement software stack manages,e.g.,c onfiguration,me ssages,process es raw data

25 On the Path from Insight to Action (Analysis) Causality Vs. Correlation A major issue of concern in big data research is that correlation plays much more important role than causality. Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. However, we would like to mention that causality and correlation are equally important in BPM field. mining is strongly based on the rigorous deduction of activity causality from event log. Similarity analysis and clustering based technologies are typical correlation taken into consideration. Clustering Similarity

26 Outline Background From BPM to ibpm A Big Data Perspective on ibpm Embrace the Idea of Big Data Conclusion and Future Work

27 Outline Embrace the Idea of Big Data OA es Consolidation Social Network Analysis Intelligent Recommendation & Matching Test & Application of Distributed Cloud Storage

28 OA Consolidation Procedure I. Mining es with Particular Business Logic Algorithms: improved alpha++ Modeling Language: standard BPMN 2.0 Implementation: tailor made tool (to be shown in the video) business understanding & raw process reconstruction BPMN 2.0 Specification Template Legacy Asset Raw atiza tion Tool mining Tool Temporary Reopsitory Analysis Tool designer models XLS Event Log Raw process level process fragment level Refined process level

29 OA Consolidation Procedure II. ization and Reuse Algorithms: clone detection based on RPST Modeling Language: standard BPMN 2.0 Implementation: tailor made tool (to be shown in the video) BPMN 2.0 Specification VI. Ontology based Rule Modeling Modeling Language: standard BPMN 2.0, OWL Implementation: tailor made tool (to be shown in the video) External Tools: protégé 4.1 Template business understanding & raw process reconstruction Legacy Asset Raw atiza tion Tool Complex business logic & recessive rules mining Tool Temporary Reopsitory Analysis Tool designer models XLS Event Log Raw process level process fragment level Refined process level

30 OA Consolidation Procedure III. Similarity & Clustering Algorithms: clone detection, hierarchical clustering, SSDT-matrix based behavioral similarity Modeling Language: standard BPMN 2.0, Newick format Implementation: tailor made tool (to be shown in the video) External Tools: Legacy Asset Raw BPMN 2.0 Specification Figtree, bpcd atiza tion Tool V. Differentiation based on change operations Modeling Language: standard BPMN 2.0 Implementation: tailor made tool (to be shown in the video) Template business understanding & raw process reconstruction Complex business logic & recessive rules mining Tool Temporary Reopsitory Analysis Tool designer models flexible modeling based on business semantics XLS Event Log Raw process level process fragment level Refined process level

31 OA Consolidation Procedure IV. Merging Algorithms: based on SPL (Software Product Line) Modeling Language: standard BPMN 2.0 Implementation: tailor made tool (to be shown in the video) business understanding & raw process reconstruction BPMN 2.0 Specification Template Legacy Asset Raw atiza tion Tool Complex business logic & recessive rules mining Tool Temporary Reopsitory Analysis Tool designer models flexible modeling based on business semantics XLS Event Log Raw process level process fragment level Refined process level redundancy removal & process repository

32 I. Mining es with Particular Business Logic BPMN 2.0 Specification Template Preemptive dispatch: Multi-instance Parallel Legacy Asset Raw atiza tion Tool mining Tool Temporary Reopsitory Analysis Tool designer models XLS Event Log Raw process level process fragment level Refined process level log parsing OA Log Pattern Recognition Pre-processing Miner preemptive dispatch Combine events Alpha # countersign Events filtering fuzzy Arbitrary Jump Format convert genetic Model Convert BPMN2.O Model Pattern countersign: recursive sub-process 部 门 落 实 Department

33 II. ization and Reuse Legacy Asset Raw mining Tool XLS Event Log BPMN 2.0 Specification atiza tion Tool Temporary Reopsitory Analysis Tool Template designer models Raw process level process fragment level Refined process level Based on clone detection and RPST, a procedure is proposed for process model decomposition, and thus, obtains highly reusable fragments. Decompose process models into fragments that are suitable for reuse. Select fragments that are frequently used. The selection can be made manually by experienced business analysts or automatically by thresholds (e.g., size of fragments), according to specific scenario. Source: Vanhatalo J, Völzer H, Koehler J. The refined process structure tree[m]//business Management. Springer Berlin Heidelberg, 2008: Motivation s reuse can reduce duplications between processes. based process modeling significantly improves the process design efficiency. By testing, the average duration for modeling a process can be reduced by 20% to 60%, when compared to the common approach without fragments with specific business semantic can assist business analysts directly design processes with least IT efforts. ization of sending document process model Uba R, Dumas M, García-Bañuelos L, et al. Clone detection in repositories of business process models[m]//business Management. Springer Berlin Heidelberg, 2011: Gao X., Chen Y., Ding Z., et.al, Model ization, Clustering and Merging: An Empirical Study

34 Calculate SSDT III. Similarity & Clustering BPMN 2.0 Specification Template 2. Newick format Legacy Asset Raw mining Tool atiza tion Tool Temporary Reopsitory Analysis Tool designer models (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); distances and leaf names XLS Event Log Raw process level process fragment level Refined process level 1. SSDT-matrix based behavior Similarity M 1 : approval process of province A 3. Hierarchical Clustering (Agglomerative Clustering) about 0.75 The closer two elements are connected, the more similar they are M 2 :approval process of province B D R AI A P D R AI A P D D 1 2 R 1 3 R 1 AI 2 AI A 1 A P Similarity=1-4/25 P = = 0.84 Source: FigTree 1.4,

35 IV. Merging Legacy Asset Raw mining Tool XLS Event Log Raw process level BPMN 2.0 Specification atiza tion Tool Temporary Reopsitory Analysis Tool Template designer models process fragment level Refined process level Addition: Weaving Configurable Model Blocking and hiding are the essential concepts of configuration. Blocking Hiding Blocking Subtraction Action Activating The merging operation converts a set of similar fragments into one merged fragment (also referred as master fragment), by borrowing some basic ideas from the software product line (SPL). Variation points in the merged diagrams represent locations where the input models disagree in their behavior. That is, a variation point occurs when several alternative flows that belong to different input processes go out from a common activity.

36 V. Differentiation based on Change Operations Legacy Asset Raw mining Tool XLS Event Log Raw process level BPMN 2.0 Specification atiza tion Tool Temporary Reopsitory Analysis Tool Template designer models process fragment level Refined process level Objectives Always get a sound model from a sound model by change operations. The correctness of OA process models/fragments changing and maintenance can be naturally guaranteed More syntactical meanings and much less than change primitives. Make sure that changing operation can be structured recorded. identifying change operations between models Match nodes between two models (one to one match). Compute the (minimal) movements. These movements that are operated on the original model, to match the relationship between matching nodes of models. Model A Model A2 typeset Archive Model B Delete nodes from the original model. These nodes do not have a matching node in the target model. Model A1 typeset Archive Insert nodes into the original model (at the right place). These nodes are in the target model but do not have a matching node in the original model. Model A3=Model B typeset Archive Source: Li C, Reichert M, Wombacher A. On measuring process model similarity based on high-level change operations[m]//conceptual Modeling-ER Springer Berlin Heidelberg, 2008: print

37 VI. Ontology based Rule Modeling Legacy Asset Raw mining Tool XLS Event Log BPMN 2.0 Specification atiza tion Tool Temporary Reopsitory Analysis Tool Template designer models Raw process level process fragment level Refined process level Motivation Business processes are always influenced by legal and regulatory constraints according to managerial requirements. These kinds of constraints are always recessive. Ontology based rule modeling can avoid redundancy and keep consistency Ontology based rule modeling can make information sharable and exchangeable Implementation ontology is designed to allow modeling of external and internal regulations as guidelines and constraints on the interaction between entities and on states of process template. Standard specification of a given language is provided as a descriptive facility and OWL is adopted to describe business rules.

38 Prototype Show In the context of OA consolidation, a specialized configuration tool has been built, which integrates all the aforementioned algorithms as well as tools, aiming at providing flexible and efficient process modeling capability for both the business analysts and IT staffs. Integrative modeling tool User friendly interface Open source based Pluggable modules intelligent analysis Complete statistics A short video Snapshot of the fragment based process configuration tool

39 Embrace the Idea of Big Data OA es Consolidation Social Network Analysis Intelligent Recommendation Test & Application of Distributed Cloud Storage

40 Enterprise Social Network Analysis China Mobile has built a SNA platform based on Big Data frameworks including Hadoop, Graph and No SQL Databases. The platform collects the customer behavior from SINA and TENCENT WEIBO (microblog). We proposed a number of analytic models for social media, such as propagation tracing, TDT, user and group portrait. Enterprise Applications Marketing & Sales Social CRM Brand and Opinion Mgmt. Social Media Operation Big Data Visualization Big Data Visualization Traditional BI Visualization Portal & Integration Analytic Models for Social Media Propagation Tracing Public Topic Detection & Tracing User Portrait Group Portrait Big Data ing & Mining Semantic Entity Analysis Sentiment Analysis Spam Detection Social Search Natural Language Machine Learning & Data Mining Graph Analysis Big Data Collect MapReduce BSP (Hama) Traditional OLAP & Storage APIs & Spider Graph Database No SQL Database MySQL

41 Algorithms in SNA The SNA platform involves quite lots of algorithms: 1) Graph theory, including structure hole, clustering and centrality computation; 2) Text mining, including sentiment analysis, natural language processing and topic detection; 3) links /attributes prediction and inferring, including SVM, normalized conductance and supervised random walking. Topic Detection & Tracking: Latent Dirichlet Allocation LDA is a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model. We have proposed a novel TDT method combining with clustering preprocessing, which improves the precision and recall-rate. Inferring Missing Attributes: Supervised Random Walking Inspired by Prof. Jure Leskovec from Stanford, we implemented a supervised random walking method to infer the missing attributes, such as school and company, of certain user. The supervised random walking method combines the social structure and the known attributes. Therefore we achieve much higher accuracy than ordinary machine learning method. Sentiment Analysis: REN-CECps To evaluate the sentiment of each short-phase microblog, we introduce an eight-emotion analysis method based no REN-CECps dictionary. The emotions are: Expectation, Happiness, Like, Surprise, Anxiety, Sad, Angry and Hate.

42 Enterprise Social Network Analysis (continued) We have gathered about 20 million user and 1.2 billion microblogs. By using our SNA models, we achieved useful tools for enterprise management, such as public opinion tracing, VIP discovery, satisfaction survey and novel CRM. Public Opinion Tracing Mining public topics and opinions from one billion micro-blogs Method: Text Mining, Spam-trim,TDT, Satisfaction survey National-wide, 100 thousand user satisfaction survey Method: Sentiment analysis, NLP Opinion leader and VIP discovery Discovery 244 opinion leader and VIPs out of 100 thousand users. Method: Graph Clustering, Spam detection Novel CRM Based on user s satisfaction and his influence, we draw a user-care map Method: Sentiment analysis, NLP, Social influence

43 Embrace the Idea of Big Data OA es Consolidation Social Network Analysis Intelligent Recommendation Test & Application of Distributed Cloud Storage

44 Background and Targets MM (Mobile Market) of CMCC is similar to Apple App Store, which provides not only more than 150,000 mobile apps, but also a huge number of other digital commodities. Information overload problem of MM is becoming more and more challenging. Truly massive data More than 200,000,000 users More than 150,000 apps More than 1,500,000 audios More than 1,800,000 videos More than 50,000 books More than 60,000 comics More than 100,000,000 PVs per day More than 10,000,000 downloads per day Daily generated log data is up to 50GB now, resulting to a more than 20TB annual accumulation However, the data growth speed is still rising Severe information overload Poor download rate Very hard for users to choose items of their interest Very hard for developers to promote products effectively Targets of Intelligent Recommendation To Realize tripartite win of the users, developers and operators of MM For users: better user experience For developers: equitable opportunity to promote products efficiently For operators: improving operational income

45 System Architecture We built the intelligent recommendation system for MM with open source big data tools, such as Hadoop, MongoDB, Mahout, etc. Data Storage ETL & Modeling Intelligent Recommendation System Architecture Intelligent Rec Engine Hybrid CF ALS-LFM MM Log Sys P-FP Results Optimization MM Business Sys MM UI User Feature & Performance Pluggable recommendation algorithms Support customized algorithm parameter sets for different scenes Three algorithms integrated now: Hybrid- CF(Hybrid Collaborative Filtering) P- FP(Association rule mining) ALS-LFM Run Hadoop Mapreduce on top of MongoDB cluster, achieving better performance Complete all offline computing based on 12 months data in 5 hours Every recommendation costs less than 10ms under concurrent PV up to 3000 per second

46 Deeper Insights into Hybrid-CF Among the 3 algorithms integrated in the system now, Hybrid-CF(Hybrid Collaborative Filtering) is designed and developed by our own research group. To improve the traditional item based CF, in Hybrid-CF we take not only user behavior but also characteristics of both items and users into account. Hybrid-CF Architecture MangoD B User behavior Browse, download, review Calculate basic item preference value Item attribute Category, Subject, etc. User profile Demographic & social information Identify Interest Groups by similarity MangoDB Weighted item rating Generate results Comprehensive use of structured data (user behavior) and unstructured data (item attributes, user profile) Customized parameter sets for different item categories and scenes Mapreduce-Style parallel computing on top of MangoDB

47 Reciprocal Recommendation Traditional recommenders is to provide a USER with recommendations of ITEMS likely to be of interest to the user, such as books, movies, mobile APPs and pharmacy products. Apart from this, we also focus on another important class of recommendations named Reciprocal Recommender, where both the USER and the ITEM models represent people. The two sides have similar standing and both have preferences to be satisfied. Scenario 1 : Online Dating Dealing with SMP(Stable Marriage Problem),to find a stable pair between two sets of elements representing men and women. Matching models such as Gale Shapley algorithm are applied. Scenario 3 : Job-Hunting Motivated by the matching between medical students and hospitals in the US, currently known as NRMP (National Resident Matching Program). A bilateral recommendation approach in matching people and jobs. Scenario 2 : Social Networking Recommending people on social networking sites. To help people to create social and personal connections, to expand friend lists. IBM Beehive using Content-plus-Link model to recommend new colleagues. Scenario 4 : Stable Roommates Known as Roommate Finder, Roommate matching Networks. To help students to find their satisfied roommates. Similar to the stable marriage problem, but differs in that all participants belong to a single pool.

48 Algorithms in Reciprocal Recommenders We divide the Reciprocal recommendation issues into two groups. They are bilateral matching recommendations and multilateral matching ones. We currently have successfully modeled and found solutions using algorithms such as Gale Shapley matching and Exact Cover - Dancing Links X Models. Bilateral Matching : Gale Shapley Matching Model Gale-Shapley Algorithm is applied in Online Dating activities, to find a stable marriage matching pair of man and woman, with full consideration of all members' satisfaction requirements. It can be used in many different scenarios to pair items from two sets. In 2012, Nobel Prize in Economics was awarded. Multilateral Matching : Exact Cover Dancing Links X Model We have successfully established dormitory roommates recommendation system and carried out empirical studies in universities. We treated it as an exact cover problem by using the Dancing Links X algorithm described by Knuth. The model has good universality and replicability, can meet the multi-type multilateral matching problems.

49 Embrace the Idea of Big Data OA es Consolidation Social Network Analysis Intelligent Recommendation Test & Application of Distributed Cloud Storage

50 China Mobile Cloud Benchmarking Framework In open source world, there are various different types of NoSQL Databases and cloud storage systems, in order to distinguish which one has better performance in given scenarios, CMCC developed a novel distributed cloud benchmark framework(cm-cbf), based on open source benchmark and analysis software. Structure: Four layer-architecture (see right) focuses on tuning / benchmarking / surveillance / visualization Framework development is based on widely used open source ( Nmon / YCSB++ & IOzone/ ganglia / rrdtool ) Test framework instruction manage Test visualization manage module Test data generation module Coordinated interface China Mobile Cloud Benchmarking Framework (CM-CBF) Load generator Framework configuration manage Multitask manage Result display Test parameter adjust module Initial Multiframework parameter parameter manage adapter Capability: Self developed adapters support different types of cloud databases & serving systems Evaluation scenarios including scaling / availability / elastic / replication / consistency Parameter Optimization: CM-CBF delivers a group of parameter settings recommendations for different user scenarios Network monitor Operation system monitor Data statistic analysis module Document storage statistic Table storage data access manage HBase adapter MongoDB adapter Cassandra adapter Table storage statistic Test result analysis Distributed document access manage GPFS adapter BigInsights HDFS adapter SWIFT adapter

51 CM-CBF Output and Further Development CM-CBF helps determine the most suitable candidate database for different business, based on business scenarios. Selection Principle : (P1) IF: balanced read and write operations ;THEN: Cassandra or MongoDB; (P2) IF: frequent applications for read operations ;THEN: Cassandra or MongoDB; (P3) IF: large data throughput ;THEN: MongoDB; (P4) IF: frequent insert operation with few applications read ;THEN: HBase; (P5) IF: has a very high scalability ;THEN: HBase; (P6) IF: maintain strong consistency; THEN: SQL DB MongoDB or HBase; (P7) IF: high availability requirements ;THEN: Cassandra (P2P structure) (P8) IF: more flexible data types, complex data structures ;THEN: MongoDB; (P9) IF: native support for MapReduce ;THEN: MongoDB or HBase; (P10) IF: provides strong management and maintenance functions ;THEN: MongoDB or Cassandra Next stage of development for CM-CBF supporting big data data warehouse: benchmark the SQL execution ability of big data data warehouse, Hive / impala supporting cloud management system: Comparison of functional ability for cloud management system Openstack / Cloudstack / OpenNebula

52 Metrics and Evaluation of NoSQL Database CM-CBF aims to test the suitability of cloud storages for different scenarios, in order to benchmark the performance, CM-CBF includes six aspects: basic function test, basic performance test, elastic test, flexibility test, high availability test, consistency test. Test terms Test content MySQL Cassandra Basic function Basic performance Data definition, data operation, control, manage & maintain, function and interface System performance with full-read System performance with full plug-in System performance with upgrade System performance with frequent read System performance with frequent update Mongo DB HBase elastic flexibility High availability System expansion with stable frequent read System expansion with stable frequent update ability of dynamic node extension with online service group influence with failed manage node Group influence with failed data node Group influence with failed router node Group influence with failed distributed node Consistency group node synchronies time, estimate system consistency

53 Benchmarking Results Based on CM-CBF, function, performance, expansion, flexibility and consistency of main distributed document storage and NoSQL Databases are tested. The result proves the effectiveness of open source framework and points out the main aspect they need to focus for realization. reliability performance Distributed document storage have standard read & write interface, conveniently replace traditional storage mode Part of the commercial product have better performance and reliability reliability performance Any open source distributed file systems or NoSQL DB all need more detailed parameter optimization, otherwise which may show quite different performances MongoDB have the most familiar operation interface with SQL and good performance and reliability under most case

54 Outline Background From BPM to ibpm A Big Data Perspective on ibpm Embrace the Idea of Big Data Conclusion and Future Work

55 Conclusions and Future Work The best way to predict the future is to create it. Peter F. Drucker Multi-discipline collaboration social network analysis machine learning semantic web data visualization distributed cloud storage Platform & Commercialization build cloud enabled ibpm platform and provide ondemand analytical service integrate various advanced technologies and tools into process engine and accelerate the evolution to ibpm Open R & D Ecosystem build open and harmony environments for both academia and industry promote and encourage open source tools and prototypes for technology innovation and incubation We sincerely hope to promote the relationship with academia, share our idea, devote ourselves to the advanced researches as well as their realization.

56 Thanks for your attentions

Towards the Next Generation Intelligent BPM In the Era of Big Data

Towards the Next Generation Intelligent BPM In the Era of Big Data Towards the Next Generation Intelligent BPM In the Era of Big Data Xiang Gao Department of Management Information System, China Mobile Communications Corporation, Beijing 100033, China gaoxiang@chinamobile.com

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

A Business Process Services Portal

A Business Process Services Portal A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Graph Database Performance: An Oracle Perspective

Graph Database Performance: An Oracle Perspective Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective

More information

Simple. Extensible. Open.

Simple. Extensible. Open. White Paper Simple. Extensible. Open. Unleash the Value of Data with EMC ViPR Global Data Services Abstract The following paper opens with the evolution of enterprise storage infrastructure in the era

More information

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Can Drive the Business and IT to Evolve and Adapt Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics 1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

How to Leverage Big Data in the Cloud to Gain Competitive Advantage How to Leverage Big Data in the Cloud to Gain Competitive Advantage James Kobielus, IBM Big Data Evangelist Editor-in-Chief, IBM Data Magazine Senior Program Director, Product Marketing, Big Data Analytics

More information

Magic Quadrant for Intelligent Business Process Management Suites

Magic Quadrant for Intelligent Business Process Management Suites Magic Quadrant for Intelligent Business Process Management Suites 17 March 2014 ID:G00255421 Analyst(s): Teresa Jones, W. Roy Schulte, Michele Cantara VIEW SUMMARY This ibpms Magic Quadrant positions 14

More information

The Impact of PaaS on Business Transformation

The Impact of PaaS on Business Transformation The Impact of PaaS on Business Transformation September 2014 Chris McCarthy Sr. Vice President Information Technology 1 Legacy Technology Silos Opportunities Business units Infrastructure Provisioning

More information

Play with Big Data on the Shoulders of Open Source

Play with Big Data on the Shoulders of Open Source OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION Make Big Available for Everyone Syed Rasheed Solution Marketing Manager January 29 th, 2014 Agenda Demystifying Big Challenges Getting Bigger Red Hat Big

More information

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics BY FRANÇOYS LABONTÉ GENERAL MANAGER JUNE 16, 2015 Principal partenaire financier WWW.CRIM.CA ABOUT CRIM Applied research

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Extend your analytic capabilities with SAP Predictive Analysis

Extend your analytic capabilities with SAP Predictive Analysis September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

UNIFY YOUR (BIG) DATA

UNIFY YOUR (BIG) DATA UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA How to leverage SAP HANA for fast ROI and business advantage 5 STEPS to success with SAP HANA Unleashing the value of HANA 5 steps to success with SAP HANA How to leverage SAP HANA for fast ROI and business

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Oracle Data Integrator 11g New Features & OBIEE Integration Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Agenda 01. Overview & The Architecture 02. New Features Productivity,

More information

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant juan.madera.jimenez@accenture.com

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant juan.madera.jimenez@accenture.com The Potential of Big Data in the Cloud Juan Madera Technology Consultant juan.madera.jimenez@accenture.com Agenda How to apply Big Data & Analytics What is it? Definitions, Technology and Data Science

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013 Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

ANALYTICS STRATEGY: creating a roadmap for success

ANALYTICS STRATEGY: creating a roadmap for success ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling

More information

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

Senior Business Intelligence/Engineering Analyst

Senior Business Intelligence/Engineering Analyst We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer Ericsson is a world-leading provider of telecommunications equipment & services to mobile & fixed network operators. Over 1,000 networks in more than 180 countries use Ericsson equipment, & more than 40

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal BIG DATA AND MICROSOFT Susie Adams CTO Microsoft Federal THE WORLD OF DATA IS CHANGING Cloud What s making this possible? Electrical efficiency of computers doubles every year and ½. Laptops and mobile

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

Managing Third Party Databases and Building Your Data Warehouse

Managing Third Party Databases and Building Your Data Warehouse Managing Third Party Databases and Building Your Data Warehouse By Gary Smith Software Consultant Embarcadero Technologies Tech Note INTRODUCTION It s a recurring theme. Companies are continually faced

More information

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement WHITE PAPER CRM Evolved Introducing the Era of Intelligent Engagement November 2015 CRM Evolved Introduction Digital Transformation, a key focus of successful organizations, proves itself a business imperative,

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Vinay Parisa 1, Biswajit Mohapatra 2 ;

Vinay Parisa 1, Biswajit Mohapatra 2 ; Predictive Analytics for Enterprise Modernization Vinay Parisa 1, Biswajit Mohapatra 2 ; IBM Global Business Services, IBM India Pvt Ltd 1, IBM Global Business Services, IBM India Pvt Ltd 2 vinay.parisa@in.ibm.com

More information

Teradata s Big Data Technology Strategy & Roadmap

Teradata s Big Data Technology Strategy & Roadmap Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information