Standards in Predictive Analytics
|
|
|
- Alexander Jordan
- 10 years ago
- Views:
Transcription
1 The role of R, Hadoop and PMML in the mainstreaming of predictive analytics. James Taylor CEO CONTENTS Predictive Analytics Today Broadening The Analytic Ecosystem With R Managing Big Data with Hadoop Moving to Real-Time with PMML Future Developments Conclusions Revolution Analytics Zementis About the Research Standards play a central role in creating an ecosystem that supports current and future needs for broad, real-time use of predictive analytics in an era of Big Data. Just a few years ago it was common to develop a predictive analytic model using a single proprietary tool against a sample of structured data. This would then be applied in batch, storing scores for future use in a database or data warehouse. Recently this model has been disrupted. There is a move to real-time scoring, calculating the value of predictive analytic models when they are needed rather than looking for them in a database. At the same time the variety of model execution platforms has expanded with in-database execution, columnar and inmemory databases as well as MapReduce-based execution becoming increasingly common Decision Management Solutions Modeling too has changed: the open source analytic modeling language R has become extremely popular, with up to 70% of analytic professionals using it at least occasionally. The range of data types being used in models has expanded along with the approaches used for storage. Modelers increasingly want to analyze all their data, not just a sample, to build a model. This increasingly complex and multi-vendor environment has increased the value of standards, both published standards and open source standards. In this paper we will explore the growing role of standards for predictive analytics in expanding the analytic ecosystem, handling Big Data and supporting the move to real-time scoring. Sponsored by 1
2 Predictive Analytics Today Predictive Analytics is becoming increasingly mainstream with most companies, most organizations, making it a part of their overall strategy. Whether it is focused on improving customer engagement, managing risk, reducing fraud or optimizing the supply chain, predictive analytics is turning organizations data into useful, actionable insight. The results of one recent survey (Taylor, 2013) are shown in Figure 1. Two thirds of respondents have seen a real, positive impact from predictive analytics while fully 43% of respondents report a significant or transformative impact. Figure 1: Overall impact from Predictive Analytics These are significant numbers and a comparison to the same survey from 2011 shows that the number of organizations seeing a positive impact has increased significantly. As the impact of predictive analytics has grown in organizations the focus has shifted. In the past the primary use case might have been to occasionally build a predictive model to influence a decision. Today there is a focus on operationalizing analytics. Organizations are building more models and applying these models in their day to day operations. In fact the same study shows that this increase in impact is highly correlated with this shift to a more operational focus as those reporting a transformative impact were much more likely to also report that they tightly integrated predictive analytics into operations. Those reporting predictive analytics as a primary driver for decision-making also outperformed those regularly or occasionally using predictive analytics. As Figure 2 shows, the more tightly respondents integrate predictive analytics into operations the more likely they are to report transformative impact from those predictive analytics. It should be noted that this need not imply automation of the decision that uses the predictive analytics. While embedding predictive analytic models into automated decisions is a very effective operational deployment technique it is not the only one. Well defined manual decision-making approaches that integrate predictive analytics 2014 Decision Management Solutions 2
3 can be supported in operational systems. Many organizations find this more palatable and evolve gradually towards increased automation. Figure 2: Different integration approach and their impact The increased impact of predictive analytics, and increased awareness of this impact, leads to increased demand for both technology and resources. The range of available tools has increased dramatically as has the number of training courses and academic programs for producing data scientists As in other industries this explosion of interest has put a premium on standard approaches that will allow the ecosystem to expand to meet demand. In particular, the open source analytic language R has come to be an essential part of the predictive analytic landscape. Meanwhile another trend has increasingly merged with the growth of predictive analytics Big Data. Driven by increased digitization and the Internet, the amount of data available as well as the range of data types and the speed at which that data arrives have all increased. Commonly described as the 3 Vs based on an initial Gartner Group paper, Big Data is about handling the increased Volume, Variety and Velocity of data. While Big Data is associated with many trends, it is increasingly clear that the real value of Big Data comes from being able to apply it to business decision-making using data mining and predictive analytics. Simply trying to adapt reporting and dashboard infrastructure to show more data is of limited utility. Organizations are finding that to really exploit Big Data they must turn it into analytic insight. One of the primary consequences of this is that the data available for predictive analytics is no longer all structured data and it is no longer all stored in traditional databases and data warehouses. Open source technology has evolved to meet this demand by providing a collection of highly scalable approaches to storing and managing data under the Hadoop label. Consisting primarily of the Hadoop Distributed File System and the MapReduce programming framework, Hadoop has rapidly become the platform of choice for storing the large amount of unstructured or semi-structured data that is available to organizations. The role of this open source stack in the predictive analytics market is evolving rapidly as the need to bring this data into the predictive analytics mainstream has grown Decision Management Solutions 3
4 When those building predictive analytics are asked about using the more unusual data types covered by Big Data, they don t report much usage most predictive analytic models are still built using structured data. However, when those who have seen a transformative impact from predictive analytics are compared to those who have yet to see any impact, or when those who already have cloud-based solutions deployed are compared to those who don t, there is a significant increase in the importance of Big Data types as shown in Figure 3. Figure 3: Impact of experience on Big Data Usage The clear implication of this is that as more organizations get more experience with predictive analytics, the rate at which these Big Data types will be used to build predictive analytic models is likely to grow rapidly. At the same time there has been an explosion of new technologies for data storage including columnar and in-memory databases as well as Massively Parallel Processing or MPP appliances. Combined with the growth of Hadoop this has put a premium on approaches that allow predictive analytics to be built and executed in a wide variety of proprietary and open source platforms. This, combined with the shift to real-time scoring and away from batch-oriented scoring using predictive analytics has led to a surge in interest in PMML the Predictive Model Markup Language. This XML format allows a predictive analytic model to be developed in one tool or language against data stored in a particular format and easily migrated to a new environment for execution. This paper then will examine three core themes: The role of R in broadening the predictive analytic ecosystem The role of Hadoop in handling Big Data for predictive analytics The role of PMML in moving to real-time predictive analytics 2014 Decision Management Solutions 4
5 Broadening The Analytic Ecosystem With R Many factors are putting pressure on the predictive analytic ecosystem to expand and grow: The use of predictive analytics is rising in companies of every size and in every industry. The problems are shifting from those where high cost/high ROI approaches are acceptable, such as risk and fraud, to customer opportunity-facing problems where a different model is required. As smaller companies tackle predictive analytics this pressure to lower the cost of solutions is increasing. Organizations using predictive analytics are using more models now than in the past, with hundreds or thousands of models becoming increasingly common. The net result of these pressures is increased demand for analytic professionals, data scientists and data miners, as well as demand for cheaper software alternatives. R is being driven by, and driving, this broadening ecosystem. Introducing R R is fundamentally an interpreted language for statistical computing and for the graphical display of results associated with these statistics. Highly extensible, it is available as free and open source software. The core environment provides standard programming capabilities as well as specialized capabilities for data ingestion, data handling, mathematical analysis and visualization. The core contains support for linear and generalized linear models, nonlinear regression, time series, clustering, smoothing and more. The language has been in development and use since 1997 with the 1.0 release coming in The core is now at release 3.0. New capabilities can be added by creating packages typically written in the R language itself. Over 5,000 such packages have been added through the open source community. The Opportunity For R The biggest opportunity for R is the number of people using it: It is widely used in academic programs, creating a pool of new resources every year. Because it is free and open source it is widely used in not for profit and government projects, further increasing the pool of potential users. As more business and technical professionals see data mining and analytics in their future, it is freely downloadable making it appealing as a tool to learn with. All this creates a large community of users both new and increasingly experienced Decision Management Solutions 5
6 Furthermore the rate of increase in the adoption of R as an analytic model development language has really taken off. It has risen steadily in usage in the Rexer Analytic Survey every year since the survey first started asking about it. In % of respondents now report using it while 24% say it is their primary tool. Because R is open and designed to be extensible the number of algorithms available for it is huge. Over 5,300 packages have been developed that extend R in some way and it is hard to imagine an algorithm that is not available for R. While proprietary platforms may have algorithms with unique features or better performance characteristics, the likelihood is that R will have something that can solve the same problem. R also emphasizes cross-vendor compatibility through broad support for the PMML standard (see Moving to Real-Time with PMML below). Related to this huge body of existing material is the fact that R has a large and growing research community. Those developing new algorithms, new ways to process data, in universities and research institutes are far more likely to do so in R than in any other language. This creates a pipeline of new ideas, new approaches, that feeds continually into the R community. The opportunity for R is clear: Using R gives an organization access to a pipeline of research, a huge body of proven algorithms and an active community while making it possible to recruit at every level of experience from new college graduates to experienced data miners. R has broadened the analytic ecosystem and continues to do so. Challenges With R It can seem sometimes that the future of analytics lies entirely with R, that there is no opportunity for any non-r based approaches going forward. From the perspective of commercial predictive analytics, however, R has a number of challenges. First R is an open source project. While this makes it freely accessible it also means that projects can get started, even commit a company to its use, without having the support they need to succeed. Like most successful open source projects this challenge has been mitigated by the rise of companies that provide commercial support and training services around the product. Parallelism, scalability and performance are also an issue, particularly of the base algorithms. This is mitigated by some of the additional packages developed. It remains a challenge given that one of the benefits of R is the constant supply of new packages which may or may not be scalable. Commercial vendors are mitigating this by providing their own implementations that take R functions and execute them indatabase or in-hadoop as well as on massively parallel infrastructures or appliances. Tooling is also an issue with the basic R environment being script-based. While many data miners like to be able to write code, script management and reuse are a 2014 Decision Management Solutions 6
7 problem for R scripts as they are with any programming language. Commercial implementations are integrating R into integrated development environments (IDEs) so that less technical users can develop scripts and to improve management of reuse etc. Improved editors, often integrated with source code control systems and testing environments, are likewise improving development practices. It can be challenging to deploy R on a production system. The base core package has a large tree of dependencies that must be installed with it. The dependent packages include compiler-related code as well as packages that are not generally required for any other purpose on back-end servers. R deployments can also involve compiled third-party packages that have other dependencies. This can be difficult to manage when the analytic team does not have full control over the analytic deployment and must request all this from IT. The use of PMML to manage deployment, as discussed in Moving to Real-Time with PMML below, can mitigate this for both batch and realtime deployments. Finally, batch scoring of models is an issue as much of R s strength coming from its use in batch updates of databases with scores. The increasing use of real-time scoring puts pressure on models developed in R as a result. This is addressed partly by growing support for PMML in the R community (see Moving to Real-Time with PMML) and by vendor support for real-time scoring services that can be created from R models. Recommendations For R Commercial not just academic use of R is increasingly practical and well proven. Organizations should make R part of their predictive analytics adoption and roll out strategy. This might be to focus all analytic development on R or take what is an increasingly common approach to mix and match R with proprietary environments especially where performance, scalability and/or support are of prime concern. Even for organizations already committed to a commercial platform it makes sense to take advantage of R at some level and organizations should explore their platforms support for integrating R. Whether R is the core or their strategy or just a part, organization s should plan on working with a commercial vendor that has a solid plan for R in terms of providing scalable implementations of the algorithms they care about. Organizations should also look for either a better development environment than the default or integration with graphical modeling tools, especially if they want to include more non-data scientists in the process. Make sure also to investigate deployment options though this can be mitigated by adopting technology that supports PMML Decision Management Solutions 7
8 Managing Big Data with Hadoop The era of Big Data has arrived. As anyone who reads the IT or business press knows, there is more data now than ever before. Dealing with that data, getting value from it, is top of mind in commercial, government and not for profit organizations of every size. Big Data is best described using the 3Vs first identified by Gartner Group: Volume there is simply more data available today than in the past as more of our lives are digitized and as organizations digital history reaches back further. Variety data is no longer just structured data suitable for storing in a relational database. Unstructured and semi-structured data such as s, sensor logs and text as well as audio and video are becoming increasingly common. Velocity this data is becoming available, arriving at organizations, and changing more rapidly than ever. Last week s data, even yesterday s data, is increasingly being complemented by today s data or even live streaming data. These three challenges are putting pressure on traditional data infrastructures. Simply trying to expand databases or data warehouses (and their associated processes) to cope is difficult and expensive. Hadoop has emerged as a key technology in dealing with Big Data. Introducing Hadoop Hadoop consists of two core elements the Hadoop Distributed file System or HDFS and the MapReduce programming framework. An open source project, Hadoop development started in 2004 inspired by earlier work at Google, and became an official top-level Apache project in HDFS is a highly fault tolerant distributed file system that runs on low-cost commodity hardware, allowing very large amounts of data to be stored cheaply. The data uses a file metaphor so that the structure of the data needs to be specified only at read rather than at write. It maintains multiple copies of the data and the software handles management tasks such as keeping enough copies on different nodes or responding to node failure. Name nodes and data nodes run on dedicated machines and communicate over TCP/IP. MapReduce is a programming framework that breaks large data processing problems into pieces so they can be executed in parallel on lots of machines close to the data that they need to process. First the input data is broken up into pieces. Then the programming logic runs as multiple parallel processes on the nodes that store their share of the data. This Map step creates key value pairs that are then, shuffled to group pairs with same key so they can be processed (Reduced) to give an answer by aggregating all the pairs with same keys in some way. Solving a given problem may 2014 Decision Management Solutions 8
9 require multiple rounds of MapReduce coding, especially when the function being performed is complex. The Opportunity For Hadoop Hadoop provides a distributed, robust, fault tolerant data storage and manipulation environment that is well suited to the challenges of Big Data: The use of commodity hardware allows it to scale at low cost and so handle the Volume involved. Because a file metaphor is used that allows the schema of data to be applied only when it is being read (rather than being pre-determined when it is written to storage) Hadoop is flexible enough to handle the wide Variety of data involved. The storage and processing are streaming-centric in that they assume data will be continuously added and this enables the environment to handle the Velocity involved. Some newer organizations, web 2.0 companies for instance, are using a pure Hadoop strategy and putting all their data in Hadoop. More common is a mixed database/data warehouse/hadoop approach. Several distinct use cases are emerging for using Hadoop in such a mixed environment: Hadoop is used as a landing zone for data where it can be pre-processed (cleaned, aggregated) before being moved to a data warehouse. This can include analytic processing using algorithms (that run in MapReduce). This approach allows new data sources to be rapidly added to an existing environment. Hadoop is used as an active archive where data used less often is moved out of the data warehouse and onto Hadoop. This allows older data that might have been archived inaccessibly in the past to be available for analysis, to build predictive models or forecasts where several years of data might be useful for example. Hadoop is used somewhat standalone for rapid data exploration. It is easy to bring data in from lots of sources (including from databases or data warehouses) and various tools exist to explore the combined data to see what might be learned. What is learned might be applied to data in Hadoop going forward or used to determine what new data might be usefully integrated into a more traditional environment. The explosive growth of the Hadoop community in recent years means that this open source project is supported by several commercial organizations that provide training, supported distributions and consulting. In addition database and data warehouse vendors have increasingly integrated their storage with Hadoop while appliance vendors are delivering Hadoop-based appliances that have integrated security and management capabilities Decision Management Solutions 9
10 Challenges With Hadoop Hadoop has many features that make it appealing in an era of Big Data but it has its challenges too. Like R, Hadoop is an open source project, so projects can get started and even become central to a company s data strategy without having the support they really need. This has been mitigated for Hadoop, as it has for R and other successful open source projects, by companies providing commercial support and training services around Hadoop. In addition, Hadoop has some technical challenges: Hadoop is a programmer-centric environment and there is no support for SQL in the base environment making it difficult to use existing analytic and development tools against data stored in Hadoop. Additional projects such as Hive and more recently Cloudera Impala have added support for SQL while open source frameworks allow more traditional development approaches to be used to build data processing applications by abstracting the MapReduce programming framework and allowing developers to write SQL or Java. Hadoop structures are better at batch processing than they are at interactive systems. A Hadoop/MapReduce approach will handle great scale in batch processing but is less effective at processing a single record. Most use cases for Hadoop focus on its ability to process large numbers of records in parallel rather than on, for instance, being able to retrieve the information about a single customer to drive a next best action script for an interaction with that customer. From a decision management perspective Hadoop also lacks any specific data mining or predictive analytics support. Combined with the lack of support for SQL this means most advanced analytic tools won t work on data stored in Hadoop. This has been largely addressed, however, by the rapid increase in the number of analytic companies that offer direct access to data in HDFS. In addition several companies offer data mining and analytic algorithms that can execute as MapReduce jobs and there is growing support for scoring HDFS data by applying analytic models defined in PMML. Scoring against Hadoop means writing MapReduce jobs that pre-process, score and post-process the data. Writing a model as MapReduce code takes significant time and effort, delaying implementation and increasing cost. Better development approaches and improved vendor support mitigate this. Various vendors also allow analytic models, in PMML or other formats, to be executed in a Hadoop environment for training and for scoring, though this is also batch-centric Decision Management Solutions 10
11 Recommendations For Hadoop The biggest mistake organizations make with Hadoop is by beginning by focusing on the technology. Hadoop has a lot of potential for companies adopting predictive analytics but it must be applied in context. Instead of beginning with a Hadoop installation and loading data into it, begin with a business problem a decision that must be made. By determining the analytics that will be required to make that decision more accurately or more precisely organizations can see what kind of data will be required and on what terms. This will lead naturally to identifying needed data that exists already in traditional data infrastructure as well as potentially useful data that does not. This creates a use case for Hadoop as it has identified a business problem that requires data not already available in the existing infrastructure. For organizations that are familiar with and engaged in open source Hadoop should be no different from other technologies they have adopted. Where organizations lack this familiarity they should consider one of the commercial organizations that support Hadoop, provide a distribution that includes the full range of capabilities and have plenty of experience. This might be a Hadoop pure play or an existing vendor offering Hadoop services. Finally once Hadoop becomes part of the data infrastructure for an organization it is important that it is supported by the rest of their decision management infrastructure. The analytic tools used must be able to access data in HDFS while any in-database approach should be extensible to in-hadoop also. It is possible to create Hadoop-sized problems by focusing on batch scoring. If an organization has many customers to score and thinks in terms of batch scoring that every customer must be scored every day then this can sound like a Hadoop problem. However the scoring could be done on-demand in real-time for the much smaller number of customers who interacted with the company on a typical day. This real-time scoring problem might be much smaller in scale and so not demand a Hadoop infrastructure Decision Management Solutions 11
12 Moving to Real-Time with PMML In the past most analytic models were built and applied with a batch mindset. Once a model had been designed and tested it was applied to all the customers, all the claims, all the products in the database. This scoring step might be repeated every day or every week so that a reasonably up to date score was always available in the database. In recent years this approach has been challenged. Increasingly the focus for companies is on scoring a transaction or a customer right when a decision is required if an offer is to be made then the customer churn score, for instance, is calculated as part of determining the offer to be made. The rise of Decision Management as an approach for handling real-time decisions has only increased this focus. This move to real-time scoring of predictive analytics has put pressure on many traditional analytic environments. When scoring was done in batch it was generally done using the same technology as was used to build the model. Once the final script was ready it could be used to process all the records and they could then be loaded into the marketing system or stored back into the database. With real-time scoring this becomes impractical. Not only are these analytic environments batch-oriented they are often only loosely attached to the production environment. It has become essential to be able to move models from their development environment to a more real-time, interactive scoring environment. Many platforms have added such a capability. Some organizations want to use a different production environment. They need a way to move models, built in a variety of analytic tools, into their production environments, business rules management systems etc. PMML has emerged as the primary way to do this. It should be noted that there is nothing inherently real-time about PMML. PMML can and is used in batch scoring scenarios too. The focus of the standard is on interoperability and on replacing custom code when deploying models. This interoperability is a key benefit of PMML, as noted below, independent of an organization s need to move to real-time scoring. Introducing PMML PMML is an XML standard for the interchange of predictive analytic models developed by the Data Mining Group. The basic structure is an XML format document that contains: A Header A data dictionary containing both continuous and categorical variables 2014 Decision Management Solutions 12
13 Data transformations One or more models each consisting of a mining schemas based on the type of model, a target and other outputs such as measures of accuracy. PMML started in 1998 with 0.7, moving to a 1.0 release in Since then the standard has seen multiple releases with 4.1 being the most recent (in 2011). The 4.x releases marked a major milestone with support for pre- and post-processing, time series, explanations and ensembles. Support for PMML is widespread and growing with an active community. Thousands of analytic professionals participate in the Data Mining Group s LinkedIn Group and many analytic vendors are either members or provide support. Members of the PMML consortium include IBM, MicroStrategy, SAS, Actian, Experian, Zementis, Equifax, FICO, KNIME, Open Data Group, RapidMiner, Togaware Pty Ltd, Angoss, KXEN, Microsoft, Oracle, Portrait Software, Prudsys, Salford Systems, SAP, StatSoft, and Tibco. In addition organizations including BigML, Predixion, Revolution Analytics, Teradata as well as open source projects such as Weka and R also provide support for the standard. The Opportunity For PMML PMML offers an open, standards-based approach to operationalizing predictive analytics. This is a critical need for organizations looking to maximize the value of predictive analytics: unless predictive analytic models can be effectively operationalized, injected into operational systems, then the danger is that they will sit on the shelf and add no value. Similarly if it takes too long to operationalize them if it takes weeks or months then the value of the model will degrade even as the cost to implement the model rises. As the results of a model are increasingly needed in real-time, this is a critical need for organizations. Support for PMML is increasingly broad-based: Scoring servers and execution engines running on a wide range of production environments can bring in PMML models and execute them to provide a realtime score. Business Rules Management Systems can import PMML and either execute them as a model or translate them into business rules that can be executed as part of a rules-based decision. Databases and data warehouses can execute PMML models in-database allowing scores to be applied without moving the data from the operational datastore in which it resides. PMML models can be executed on Hadoop nodes e.g. through Hive using a SQLlike approach or by being imported into commercial execution engines that provide in-hadoop execution Decision Management Solutions 13
14 In addition it is possible to import PMML models into many analytic development environments, allowing new models or ensembles to be based on models built in other tools. Some model management environments also support PMML allowing their model reporting and monitoring capabilities to be applied to a PMML model also. This wide range of deployment options for PMML models also means that organizations can relax their concerns about multiple development environments. If models can be generated as PMML and that PMML can be executed widely then it is possible to create an environment in which models are developed with any analytic tool and run anywhere. As deployment and execution become more focused on operational environments, and less tied to the model development environment, this is becoming an important capability for analytic organizations. Challenges For PMML The primary challenge for PMML, as it is for any standard, is to get the vendor community to regard support for it as more than just a check the box capability. Once a standard becomes established, as PMML has, organizations will ask vendors about their support for it. However it is easy for this support to be focused on checking the box on the RFP and nothing more. The result is minimalist implementations that do not provide the depth of support required for a real project. This has been a challenge for PMML in the past but the market s adoption has recently reached a tipping point where organizations are relying on PMML in critical production environments. Large organizations where heterogeneous environments are the norm realize the benefits of an open, vendor-neutral standard. These customers are demanding PMML support from their suppliers which, in turn, is putting a great deal of pressure on vendors to provide solid support. Standards such as PMML also struggle to get vendors to stay current and support the latest release. For PMML this is particularly an issue for the support in PMML 4.x of pre- and post-processing. According to the Data Mining Group s list of supporting vendors only about half are supporting PMML 4.x for instance. When an organization has different parts of its ecosystem supporting different versions of a standard this can undermine the value proposition for that standard. For PMML this is somewhat mitigated by the availability of tools for converting between versions. In addition many organizations remain uncertain how best to handle pre-and postprocessing. Many do not actually want their analytic platform to handle this automatically for fear that the way the analytic team designed this will not match the way the production environment can best support it. Finally not everything that an analytic team might wish to do is supported in PMML. Most mainstream model types are covered by PMML but new, research-type algorithms may not be. Projects may therefore have to choose to use an alternative algorithm or lose the ability to generate a PMML model for their project. Some 2014 Decision Management Solutions 14
15 vendors with specific additional functionality in their models try and extend PMML to avoid this problem for their customers but this of course also means that their products must be used at both ends as no other vendor will understand their extensions. All of these challenges are typical for a standard and as PMML continues to build support in the user community, and as more organizations commit to it, these problems will typically be mitigated. PMML is in the fortunate position of being the only standard for predictive models that is widely accepted and supported across commercial and open source tools. With no real competition and broad support the long-term benefits of adopting PMML seem likely to greatly outweigh minor challenges in adopting the standard. Recommendations For PMML All organizations approaching predictive analytics should include PMML in their list of requirements for products. Selecting analytic tools that do a good job of generating and consuming PMML and identifying operational platforms that can consume and execute PMML just makes sense. While organizations committed to a single vendor stack may be able to avoid this requirement, even there the ability to bring models developed by a consortium or third party into that environment may well prove critical while partners may need to execute models but not share the same vendor stack Decision Management Solutions 15
16 Future Developments R, PMML and Hadoop are all well established standards that can and should be part of a predictive analytics strategy. There are also some future developments that are worth considering the emergence of the Decision Model and Notation standard, growing acceptance of Hadoop 2 and planned updates to PMML. DMN The Object Management Group recently accepted a new standard, the Decision Model and Notation standard. DMN as it is known is now in beta and is expected to be finalized in DMN provides a common modeling notation, understandable by both business and technical users that allows decision-making approaches to be precisely defined. These decision models can include the specification of detailed decision logic and can model the role of predictive analytics at a requirements level and at an implementation level through the inclusion of PMML models as functions. DMN will allow the modeling of automated and manual decisions in a way that shows how business rules and predictive analytics are being combined. It can also be used to describe the requirements for predictive analytic models. Hadoop 2.x Technically this is Apache Hadoop and it was released in October of It s considered a future development in this paper because most Hadoop users are not using it yet. Hadoop 2.x is all about really all about YARN a resource management system that manages load across Hadoop nodes that allows other approaches besides MapReduce to be used. This allows Hadoop resources to be shared across jobs using different approaches and, in particular, will allow more real-time processing. For instance new engines that support SQL, steam processing and graph processing can and are being developed and integrated into the Hadoop stack using this approach. Other changes include better support for federation. PMML PMML Release 4.2 is expected to be released in the first half of As with 4.1, release 4.2 is expected to improve support for post-processing, model types and model elements. 4.2 is particularly focused on improving support for predictive scorecards (especially those with complex partial scores), adding regular expressions as built in functions, and continuing to expand support for different types such as continuous input fields in Naïve Bayes Models. In addition DMG continues to consider ways to make PMML more adaptable and relevant to its growing community of users Decision Management Solutions 16
17 Conclusions These standards R, Hadoop and PMML are proven tools for broadening the use of predictive analytics, supporting Big Data and operationalizing analytics. All three should be part of an organization s predictive analytic strategy. The use of R to develop at least some of an organization s predictive analytics offers clear benefits in terms of staff acquisition and retention. It also plugs an organization into an innovation pipeline. Working with a commercial vendor that supports R provides the scalable implementations, development and deployment tools, and support that will be required. Big Data is going to be increasingly important in predictive analytics as a complement to traditional data types. The use of Hadoop to store some of this data makes sense. Making sure this data storage is integrated with existing data infrastructure and working with commercial vendors that can provide support is essential as is ensuring that this data can be accessed for both developing and deploying predictive analytic models. Finally all organizations approaching predictive analytics should include PMML in their strategy. Analytic tools and operational infrastructure that support PMML should be prioritized. While organizations should not always be constrained to models supported by PMML it is generally both a good idea to do so and not much of a limitation. The future of these standards, and of new standards like the Decision Model and Notation, is bright. Increasing vendor support and growing awareness means that organizations will have excellent choices that support these standards allowing them to benefit from a broader pool of resources, work better with service providers and more easily integrate the products they want to use. The following content is provided by our sponsors, Revolution Analytics and Zementis Decision Management Solutions 17
18 Revolution Analytics Champions of an Emerging Programming Standard Revolution Analytics Delivers R for Big Data Analytics in the Enterprise The open source, statistical programming language R is in use by over 2 million people worldwide. It is supported by a vibrant and talented community with over 5,000 published packages. R continues to grow in use and popularity across academia and multiple industries. Its use in corporate environments started as an isolated group which would develop and test hypotheses before having to re-code their work for deployment on a legacy platform for production. However, with adoption of R by some of the largest and most recognizable names in the technology industry, including Google and Facebook, R has emerged as a de facto standard for statistical programming and production deployment in diverse data environments. Revolution Analytics has helped organizations pave the way to the adoption of R as a programming standard through its initial offerings of support and consulting. Today, with its Revolution R Enterprise 7 (RRE 7) software platform, the company delivers a fully tested and supported version of R that scales beyond its open source origins to deliver fully parallelized deployment of statistical algorithms. As companies augment and update their data architectures with more and new technologies including enterprise data warehouse appliances, Hadoop, cloud storage and real-time streaming; the potential for latency, data loss and compromised security increases. RRE 7 allows companies to use their data without having to move it or being restricted to using only a small sample of it. Write Once, Deploy Anywhere The RRE 7 stack is purpose-built to deliver high performance, Big Data ready analytics to any data platform so you can: Eliminate out-of-memory limitations Analyze data without moving it Build models using large data sets Build models using ensemble techniques Deploy models without recoding RRE 7 empowers data scientists to build predictive models to perform on the platform where the data is stored. This ensures that investments in data architecture are protected. RRE 7: The Big Data, Big Analytics Platform As Revolution Analytics continues to deliver better performance for a growing number of the most used and effective models, RRE 7 also delivers these powerful packages to a broader, less technical audience. Integration with BI and visualization platforms, including market leaders Tableau and Alteryx, puts the power and flexibility of R directly into the hands of business managers and analysts. For more information on R and Revolution R Enterprise please visit: Decision Management Solutions 18
19 2014 Decision Management Solutions 19
20 Zementis PMML, the Predictive Model Markup Language, allows for predictive models to be easily moved into production and operationally deployed on-site, in the cloud, in-database or Hadoop. Zementis offers a range of products that enable the deployment of predictive solutions and data mining models built in IBM SPSS, SAS, StatSoft STATISTICA, KNIME, KXEN, R, etc. Our products include the ADAPA Scoring Engine and the Universal PMML Plug-in (UPPI). SOLUTIONS FOR REAL-TIME SCORING AND BIG DATA ADAPA, the Babylonian god of wisdom, is the first PMML-based, real-time predictive scoring engine available on the market, and the first scoring engine accessible on the Amazon Cloud as a service. ADAPA on the Cloud combines the benefits of Software as a Service (SaaS) with the scalability of cloud computing. ADAPA is also available as a traditional software license for deployment on site. As even the god of wisdom knows, not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in these cases is already stored in a database and, with indatabase scoring, there is no data movement. Data and models reside together; hence, scores and predictions flow at an accelerated pace. The Universal PMML Plug-in (UPPI) is the Zementis solution for Hadoop and in-database scoring. UPPI is available for IBM PureData for Analytics powered by Netezza, SAP Sybase IQ, Pivotal/Greenplum, Teradata and Teradata Aster. It is also available for Hive/Hadoop and Datameer Decision Management Solutions 20
21 BROAD SUPPORT FOR PREDICTIVE ANALYTICS AND PMML ADAPA and UPPI consume model files that conform to the PMML standard, version 2.0 through 4.1. If your model development environment exports an older version of PMML, our products will automatically convert your file into a 4.1 compliant format. Supporting an extensive collection of statistical and data mining algorithms, ADAPA and UPPI deliver on the promise of a universal deployment capability which is vendor-neutral and platform independent. By working with Zementis, a PMML innovator, we are able to offer a vendor-agnostic solution for efficiently moving enterprise-level predictive analytics into the FICO Analytic Cloud. Customers, application developers and FICO partners will be able to extract value and insight from their predictive models and data right away, using open standards. This will result in quicker time to innovation and value on their analytic applications. Stuart Wells, Chief Technology Officer FICO By partnering with Zementis, we are able to offer high performance, enterprise-level predictive analytics scoring for the major analytics tools that support PMML. With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need for customers to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today, Chris Twogood, VP Product & Services Marketing Teradata Contact us today! Zementis, Inc Carmel Mountain Road, Suite 300 San Diego, CA T: x2000 Visit us on the web: Follow us on Or send us an at [email protected] 2014 Decision Management Solutions 21
22 About the Research Research was conducted by James Taylor of Decision Management Solutions in Q The opinions expressed are those of the author and not necessarily of the sponsors of the research or of the groups responsible for these standards. We would like to acknowledge our research sponsors without whom this research would not have been possible Revolution Analytics, Zementis and The Data Mining Group. References Apache Hadoop, Guazzelli Alex, Lin Wen-Ching, Jena Tridivesh. PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics (2 nd Ed): CreateSpace Rexer, Karl. Rexer Analytics 2013 Data Miner Survey. Taylor, James. Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics. New York, NY: IBM Press, Taylor, James. Decision Management Systems Platform Technologies Report. Palo Alto, CA: Decision Management Solutions, Taylor, James. Predictive Analytics in the Cloud: Opportunities, Trends and the Impact of Big Data Palo Alto, CA: Decision Management Solutions, The Data Mining Group, The R Project for Statistical Computing, Contact Us If you have any questions about Decision Management Solutions or would like to discuss engaging us we would love to hear from you. works best but feel free to use any of the methods below. [email protected] Phone : Web : Decision Management Solutions 22
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
In-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
Universal PMML Plug-in for EMC Greenplum Database
Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. [email protected] USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)
Easy Execution of Data Mining Models through PMML
Easy Execution of Data Mining Models through PMML Zementis, Inc. UseR! 2009 Zementis Development, Deployment, and Execution of Predictive Models Development R allows for reliable data manipulation and
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Informatica and the Vibe Virtual Data Machine
White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information
5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
Navigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Delivering Customer Value Faster With Big Data Analytics
Delivering Customer Value Faster With Big Data Analytics Tackle the challenges of Big Data and real-time analytics with a cloud-based Decision Management Ecosystem James Taylor CEO Customer data is more
Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
Oracle Real Time Decisions
A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)
Evolution to Revolution: Big Data 2.0
Evolution to Revolution: Big Data 2.0 An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Actian March 2014 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents
Three steps to put Predictive Analytics to Work
Three steps to put Predictive Analytics to Work The most powerful examples of analytic success use Decision Management to deploy analytic insight in day to day operations helping organizations make more
Predictive Analytics: Seeing the Whole Picture
Webinar will begin shortly Predictive Analytics: Seeing the Whole Picture Presented by Caserta Concepts, Zementis, FICO June 18, 2015 Copyright 2015 Zementis, Inc. All rights reserved. 2 Predictive Analytics:
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Understanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Apache Hadoop Patterns of Use
Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
How To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
Big Data Executive Survey
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
There s no way around it: learning about Big Data means
In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
White Paper: Datameer s User-Focused Big Data Solutions
CTOlabs.com White Paper: Datameer s User-Focused Big Data Solutions May 2012 A White Paper providing context and guidance you can use Inside: Overview of the Big Data Framework Datameer s Approach Consideration
SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603
SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Bringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics
SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify
DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases
DATAMEER WHITE PAPER Beyond BI Big Data Analytic Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
2015 Ironside Group, Inc. 2
2015 Ironside Group, Inc. 2 Introduction to Ironside What is Cloud, Really? Why Cloud for Data Warehousing? Intro to IBM PureData for Analytics (IPDA) IBM PureData for Analytics on Cloud Intro to IBM dashdb
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Big Data for the Rest of Us Technical White Paper
Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive
Marketing Analytics Technology Overview
Marketing Analytics Technology Overview Disclaimer: All logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only
BEYOND BI: Big Data Analytic Use Cases
BEYOND BI: Big Data Analytic Use Cases Big Data Analytics Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Data Virtualization A Potential Antidote for Big Data Growing Pains
perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and
Big Data must become a first class citizen in the enterprise
Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Deploying an Operational Data Store Designed for Big Data
Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction
Business Analytics and the Nexus of Information
Business Analytics and the Nexus of Information 2 The Impact of the Nexus of Forces 4 From the Gartner Files: Information and the Nexus of Forces: Delivering and Analyzing Data 6 About IBM Business Analytics
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Tap into Big Data at the Speed of Business
SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
How To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY WITH PRACTICAL OUTCOMES
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process
VIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA
How to leverage SAP HANA for fast ROI and business advantage 5 STEPS to success with SAP HANA Unleashing the value of HANA 5 steps to success with SAP HANA How to leverage SAP HANA for fast ROI and business
Microsoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE
BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.
OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT
WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Extend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
Ubuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop
Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014 Introduction John Allen; computer scientist Background in data
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
