Managing big data without breaking the bank

Size: px
Start display at page:

Download "Managing big data without breaking the bank"

Transcription

1 Managing big data without breaking the bank By George Gilbert April 29, 2013 This report was underwritten by RainStor. Cloud

2 TABLE OF CONTENTS Executive summary... 3 A short look at the present and future of big data analytics... 5 The two main application scenarios that form the sweet spot for online analytic archive databases... 7 The challenge of keeping data accessible when volumes are growing 50 percent to 100 percent per annum... 7 The column store approach... 7 The online analytic archive approach... 7 The two most popular usage scenarios for online analytic archives... 8 Business impact of big data deployment: Major telco case study Making disk work as the new tape for an online archive: Major financial services institution case study.. 15 RainStor s done for archives what Splunk s done for logs Technology considerations and choices: What RainStor is good for and why Traditional SQL DBMS performance constraints not relevant to online analytic archive Data compression Greater analytic query performance Deployment flexibility and administrative efficiency Greater ingest speed of new data Conclusion About George Gilbert About GigaOM Research Managing big data without breaking the bank 2

3 Executive summary In the tsunami of experimentation, investment, and deployment of systems that analyze big data, vendors have seemingly been trying approaches at two extremes either embracing the Hadoop ecosystem or building increasingly sophisticated query capabilities into database management system (DBMS) engines. At one end of the spectrum, the scale-out Hadoop distributed file system (HDFS) has become a way to collect volumes and types of data on commodity servers and storage that would otherwise overwhelm traditional enterprise data warehouses (EDWs). The Hadoop ecosystem has a variety of ways to query data in HDFS, with SQL-based approaches emerging in variety and maturity. At the other end of the spectrum are both traditional and NewSQL DBMS vendors, with IBM, Microsoft, and Oracle among the former and Greenplum, Vertica, Teradata Aster, and many others emerging among the latter. What all these seem to have in common is the unprecedented innovation and growth in analytic query sophistication. Accessing tables stored on disks organized in rows via SQL is no longer enough. Vendors have been adding the equivalent of new DBMS engine plug-ins, including in-memory cache for performance, column storage for data compression and faster queries, advanced statistical analysis, and even machine learning technology. While the NewSQL vendors have introduced much lower price points than the traditional vendors as well as greater flexibility in using commodity storage, they haven t made quite as much progress on shrinking the growth in storage hardware required relative to the growth in data volumes. For some use cases, there appears to be room for a third approach that lies between the extremes and borrows from the best of each. RainStor in particular and the databases focusing on column storage more generally have carved out a very sizable set of data storage and analytics scenarios that have been mostly ignored. Much of the explosion in data volumes that needs to be analyzed doesn t need to be updated. In other words, it can be stored as an archive in a deeply compressed format while still online for query and analysis. Databases with column store technology, such as Vertica and Greenplum, have taken important steps in this direction, and the incumbent vendors are also making progress in offering this as an option. In early April, IBM announced a statement of direction in which they intend to add in-memory and column store technology to DB Organizing data storage in columns makes it easier to compress, typically by a factor of four to six, with IBM claiming 10 as a statement of direction. Column stores can accelerate Managing big data without breaking the bank 3

4 queries by scanning just the relevant and now smaller columns in parallel on multiple CPU cores. But the storage and database engine overhead of mediating potentially simultaneous updates to and reads from the data still remains. In other words, the column stores are a better data warehouse. They are not optimized to serve as archives, however. An online archive can compress its data by a factor of 30 to 40 because it will never have to be decompressed for updates. New data only gets appended. Without the need to support updates, it s much easier to ingest new data at very high speed, and without the need to mediate updates, it s much easier to distribute the data on clusters of low-cost storage. This paper is written for two audiences. One is the business buyer who is evaluating databases and trying to reconcile the difficulty of growth in data volumes running at 50 percent to 100 percent per annum with an IT budget growing in single digits. Of particular value to this audience are the generic use cases and the customer case studies, since they show how others are tackling the problem. Some of the world s largest banks and telcos are already putting these solutions into production. Also relevant is the price comparison with Oracle Exadata, which shows not just the capital cost of a traditional data warehouse solution but also the hidden running costs. The other audience is the IT infrastructure technologist who is tasked with evaluating the proliferation of database technologies. For this audience, the more technical sections of the paper will be valuable. These sections focus on the different technology approaches to creating online analytic databases. The paper will compare mainstream data warehouse technologies and column stores in particular with a database that focuses more narrowly as an online analytic archive. In order to use a concrete example of an existing analytic archive, the paper will explain how RainStor s database works. When purpose-built, an online analytic archive can achieve as much as an order of magnitude data storage savings relative to traditional solutions, run on commodity clusters of storage, support standard SQL, ODBC, and business intelligence (BI) tool access, and make more assumptions about administrative requirements so that the operational costs are much lower. But when evaluating the two approaches, one has to remember that archives can t refresh existing data with new information; they only can add more recent or older data. They are read-only solutions. Managing big data without breaking the bank 4

5 A short look at the present and future of big data analytics The rapidly growing interest in all things labeled big data is grounded in two major catalysts. 1) Relative to compute and networking, the price and performance of getting data to and from storage has improved exponentially. This is true whether it s through storage area network (SAN)/network-attached storage (NAS) solid-state drive (SSD) or even the commodity clusters of direct-attached storage (DAS) hard disk drive (HDD) that new databases can exploit. That improvement has meant whole new classes of systems could be built costeffectively and centered on large-scale data capture and analysis. 2) A new branch of data-intensive science has emerged: Informatics is being appended to just about all sciences as a new discipline. Within commercial environments, these changes have shown up in data growth rates of 50 percent to 100 percent per annum, and just about every area in the value chain is becoming more data-intensive and data-driven. The ironic twist unearthed in some of GigaOM s research is that the problem of managing data growth has become so urgent that customers and vendors have been tackling the problem from two extremes. At the very high end, the approaches feature advanced scale-out architectures and parallel processing of advanced ways to manipulate data within the database itself, as opposed to using tools once the data is retrieved by the end-user. Doing the work in the database makes it possible to process data with much higher performance because it doesn t need to be moved around the network. Some of these new features include user-defined functions, predictive analytics, machine learning, and other algorithms. Accompanying this high sophistication are high software license fees, storage costs that grow at least several times the rate of the IT budget, and hidden operational expenses to make all these pieces fit together and stay well-oiled. At the other end of the spectrum is the proliferation of technologies in the Hadoop ecosystem. Here customers and vendors are applying the technologies of web-scale consumer online services to more traditional IT challenges. While much lower cost than the high-end approach, the Hadoop ecosystem has required a completely different skillset, particularly when it comes to using standard SQL for interactive data manipulation, though that is beginning to change. Managing big data without breaking the bank 5

6 The industry is moving in the direction of combining the best of the high-end databases with the Hadoop ecosystem s support for commodity storage software and hardware infrastructure, but there are still user scenarios that don t need some of the more high-end database features. Instead, these scenarios need to use disk as the new tape. When data volumes move into the petabytes (PB), customers need to store data extremely cost effectively while leveraging disk s ability to remain online, forever if need be, and support traditional SQL or BI-driven analytics cost-effectively. This paper will look at several approaches to this problem and contrast them with the way column stores have started to become a popular approach the problem. Managing big data without breaking the bank 6

7 The two main application scenarios that form the sweet spot for online analytic archive databases The challenge of keeping data accessible when volumes are growing 50 percent to 100 percent per annum Increasingly, data can t be discarded just because it bumps up against storage capacity, whether for compliance reasons or because of the growing need to have analytic access to all data. Archiving it to tape makes the data just marginally more accessible than the disaster recovery scenario of putting tapes in salt mines, especially if the data s structure has evolved. In the past, the solution to the problem was to keep the most recent data online in a traditional data warehouse for 12 months, for example, and as the data got older, it would be migrated off to progressively less expensive but harder to access solutions, with tape archives being the lowest cost but highest volume. The column store approach The popular column store databases are relevant solutions. They improve on traditional data warehouses built with OLTP database engines by using data storage compression of individual columns instead of tables of rows. It s easier to compress data in columns because all the items stored together are much more likely to be similar or repetitive. As a result, current solutions appear to compress data by a factor of six. IBM s announced DB claims it can reach a factor of 10. The online analytic archive approach An online analytic archive is an alternate approach that features a database designed to bridge the need for online access to current and historical data sets via standard SQL by leveraging data compression of 30 to 40 times relative to raw data. It can achieve higher compression ratios relative to column stores because it can look for common values not just across individual columns but also across tables within an individual partition of a million rows of data. Data can be loaded in a variety of formats, including logs or Managing big data without breaking the bank 7

8 tabular data from another relational database management system (RDBMS). As an example, RainStor has an online analytic archive database designed for this approach. The two most popular usage scenarios for online analytic archives So far, two mainstream use cases appear uniquely well-suited to this approach. A graphical illustration for understanding the scenarios is in the illustration in Figure 1 on the following page. The Y-axis captures data recency, in which the most recent data is at the top. As it ages, it accumulates and drives greater accumulation of volume on the X-axis. The first scenario uses an online analytic archive as the primary analytic repository. In this case, the data being loaded is immediately historical and will never change. This could be raw machine or log data, whether call detail records from a switch or network packet flow or high-volume sensor data. Because the data is continuous and high-velocity, the load volumes are huge. In some situations, the archive may load data from existing applications. Here, entire applications might be decommissioned, but their data is migrated to the online archive for analytic access. The second scenario uses an online archive that can replace tape with an immutable medium that allows for querying and has an essentially infinite capacity. Managing big data without breaking the bank 8

9 The two most popular usage scenarios for online analytic archives Source: GigaOM Research Economic drivers of big data management and analytics: Managing multiple tiers of price and performance points Source: GigaOM Research Every prospective customer is going to have a different mix of software, hardware, and operational expense requirements. Even so, a snapshot of cost comparison is useful if only to show the magnitude of the difference between an online archive such as RainStor relative to traditional approaches. Traditional enterprise data warehouses (EDW) or high-end engineered systems are focused on the most demanding Managing big data without breaking the bank 9

10 and sophisticated analytics. Oracle and others have been adding the capability to go way beyond the traditional SQL queries needed for an online analytic archive. Much attention from vendors and industry analysts has been focused on embedding these advanced data manipulation features in the nodes of the DBMS cluster itself for maximum performance. These features include statistical analysis by parallel implementations of the language R, similar implementations of specialized machine learning algorithms, and user-defined functions, among others. Some usage scenarios for an online archive could make use of these features. When developing sophisticated predictive models, having both query sophistication and the largest possible dataset are critically valuable. However, many online analytic archive scenarios don t need all these features and the associated high cost of the software, servers, storage, and administrative overhead. The tables on the following pages detail the price differences and illustrate how traditional EDWs are targeted at different price and performance points. At every line item, from server cost to storage cost and especially to software and administrative costs, approaches such as RainStor s, used here as an example, are dramatically more cost-effective as an online analytic archive. Managing big data without breaking the bank 10

11 Oracle Exadata Item Product Unit price Quantity Total (with 20% discount on hardware, software) Database software license Oracle EE + RAC (cluster) $70,500 per core 4 servers x 12 cores x 0.5 Intel Core Adjustment Factor = 48 cores $1,353,600 Storage software license Exadata storage server software $10,000 per disk 84 x 600GB $672,000 Server hardware System administration cost Database administration cost 4 servers x 12 Xeon cores = 48 cores (X5675 3GHz 384GB) 84 x 600GB 10K RPM SAS disks = 50TB raw + 2.6TB Flash $625,000 per half- rack appliance half- rack Exadata appliance with high- performance storage (Minimal effective compression based on HA replication, denormalization, indexing) Sys admin labor $250 per hour 6.4 man hours / TB x 50 TB = 320 man hours DBA labor $250 per hour 41 man hours / TB x 50 TB = 2080 man hours $500,000 $80,000 $520,000 $Total 3,125,600 *Source: Oracle-published price comparison between Exadata and IBM Power system, (Dated 10/12; retrieved 4/2/13) Managing big data without breaking the bank 11

12 RainStor Item Product Unit price Quantity Total (with 20% discount) Database software license Server hardware RainStor HP 12 core Xeon (X5675 3GHz 384GB) Tiered: $3,000- $4,000 per raw terabyte** TB $300,000 $12,000 2 servers $24,000 Storage NetApp NAS $ 3,000 per TB 100 TB (compressed volume is 2.5 to 5 TB) System administration cost Sys admin labor $250 per hour 13 man hours / TB x 4.5 TB = 650 man hours $15,000 $14,625 Total $353,625 *Assumption: RainStor is designed to minimize the need for a database administrator (DBA). To compensate, the comparison doubles the amount of system administration labor required per TB. **RainStor doubles the storage capacity at less than one-tenth the cost of Oracle Exadata. A company using the RainStor approach of maximum compression and traditional SQL queries can run on commodity hardware and with license and administrative costs that are materially less expensive. However, because of the advanced compression, this method needs significantly less storage and server hardware to process that smaller amount of data. It also requires less administrative overhead to go along with the smaller deployment, but for the purposes of comparison, we ll just look at the storage hardware cost savings. The compression factor is well-known, but using storage cost per gigabyte (GB) per month makes it possible to convert that compression into a dollar figure once the raw storage amount is figured into the equation. The chart below shows how it is possible to convert SAN/NAS storage that would cost $2 to $4 per GB per month into its effective price on commodity storage in an online analytic archive. By applying 30 to 40 times the compression, that storage cost becomes the equivalent of $0.05 to $0.13 per GB per month. What the chart does not show is that customers can run their databases on lower-performance, lowercost storage. Since an online analytic archive supports cloud or scale-out storage, it can run in Managing big data without breaking the bank 12

13 environments that are $0.10 per GB per month. Applying 30-time compression to this storage equates to an effective cost of $0.003 per GB per month relative to the original $2 to $4 per GB per month. For any given storage performance tier, RainStor compression saves a factor of 30x-40x Source: GigaOM Research Managing big data without breaking the bank 13

14 Business impact of big data deployment: Major telco case study Looking at how a major telco solved its data storage and analysis problem illustrates the advantages of an online analytic archive approach. At the highest level, this telco used to take data from multiple customer touch points or network operations and carve it up into multiple timeframes starting with the most recent data in the highestprice and highest-performance analytic tier, the EDW. Older data was archived off into progressively more inaccessible but less expensive repositories because of cost constraints. By storing data 30 to 40 times more cost-effectively for SQL 92 and open database connectivity (ODBC)/Java database connectivity (JDBC) access, a company can now use an online analytic archive to keep an order of magnitude more data volume in a single repository for analysis. One 10 terabyte (TB) Oracle database with indexes and denormalized data only contained 2.5 TB of raw data. With the online analytic archive compression, it occupied 116 GB. Although, in the case of RainStor, support is available for MapReduce, Pig, Hive, and search access on HDFS, this customer doesn t yet use that functionality. In the past, this telco customer typically kept six months of data in an EDW, followed by 18 months in an online archive and seven years (or more, considering tape doesn t get discarded) in an offline tape archive. Cost constraints made it impossible to unify the repositories to analyze and query all the historical data. Now two or sometimes all three tiers can be unified because the telco is using a technology that enables an order of magnitude price/performance advantage over traditional EDW solutions. Unifying multiple tiers of analytic databases makes it easier for business users to analyze data over much longer time periods, for example, 10 years instead of only the most recent year. It also reduces the operational cost by an order of magnitude or more because the reduced data footprint from compression means administrative overhead, which is measured by TB, comes down commensurately. Traditional EDWs require database administrators to analyze query patterns, set up and manage data partitions, and determine how to index the data. If requirements change, such as new views on the data, administrators have to iterate the whole process. The pipeline that moves data from transaction processing systems to analytic systems is traditionally very brittle. Managing big data without breaking the bank 14

15 Making disk work as the new tape for an online archive: Major financial services institution case study One of Wall Street s major financial services companies demonstrates a second use case, which is the ability to manage cold or historical data that can t be maintained in a traditional data warehouse because of cost. Last year, this bank was accumulating one billion records per day with a growth rate of 70 percent to 100 percent per annum. The total volume of data is on track to hit 75 PB by the end of The bank retains 60 percent of this data because of the many regulatory and compliance requirements governing it. Currently, the total cost of tier 1 SAN storage is $3 to $4 per GB per month. That equates to $3 to $4 million per PB per month. This compliance-related data has limited business value, and individual departments were beginning to build their own trade archives. However, these departmental archives would only accelerate the accumulation of information as well as the operational costs of managing it. Overall spending on these regulatory requirements already consumes 40 percent of the entire IT budget. For Nasdaq trading data, these same compliance rules mandate retention of all historical data indefinitely even though the data is only trade logs of who bought what from whom for how much and when. The two traditional choices for managing it were unacceptable. Keeping all 13 years of it in a traditional data warehouse managed by Sybase IQ on tier 1 EMC VMAX storage at $3 to $4 per GB per month would have been orders of magnitude too expensive. And the old method of keeping just the most recent month of data in the data warehouse with the balance archived to tape made that data essentially inaccessible. Keeping all the historical data on tape became too difficult to meet service level agreements (SLA) with auditors and regulatory mandates when reports, lookups, or audits required timely access. Using RainStor as their online analytic archive solution, the bank expects that it will eventually shrink 30PB of data down to 1PB. In addition, it expects to further reduce the cost by moving the information off tier 1 SAN storage to tier 3 NAS. Managing big data without breaking the bank 15

16 RainStor s done for archives what Splunk s done for logs Splunk has gained a wide following by carving off simplified and cost-effective management of the log files that systems and applications generate. The bank implementing RainStor sees it accomplishing the same result for applications archive data and data warehouses based on its ability to compress trade history data by 40 times. The trade execution engine is an in-memory cache database; after each day, the trade data goes through a feed into the online analytic archive database. All 13 years of historical trade data has been moved to the online repository, which has become a single source for all historical online queries and reports. The results of the most demanding queries in the main data warehouse also get archived with the raw data once executed. The archive is accessible via the bank s existing BI tools. The trading data archive is compressed at a 40-time ratio relative to the raw data. Since Sybase IQ achieved a 1.5-time ratio, this compression would have delivered close to a 30-time reduction in storage not just in capital costs but also in overall operational expenditures. The compression actually accelerates query performance as well. Since storage bandwidth is generally the constraining factor in query performance, moving less data from storage to server accelerates performance. Less space in storage in this case also means the bank can use tier 3 NAS storage with slower SATA drives. Tier 3 storage costs $1 to $2 per GB per month, a little more than half what its tier 1 storage costs. So the total savings was roughly 50 times when combining price, performance, and capacity. An online analytic archive (such as RainStor s) can take advantage of scale-out storage running on commodity servers, but the bank kept the NAS setup in order to avoid any changes to its storage administration and the data pipeline from the main trading system. From the bank s perspective, columnar databases can only compress data within the scope of one column; the data still must be formatted to allow refreshes of updated data. A read-only archive can compress data far more deeply because it never has to be decompressed and changed. As a result, it can physically organize the data to maximize compression across columns and tables within each partition or container assuming it will never be changed, only added to. Rather than directly expose what is so densely packed, the database presents a relational interface to any administrator or BI tool that looks at it. Although columnar databases such as Sybase IQ and hybrids such as Oracle 12c on Exadata can have five to six times the compression capabilities relative to raw data, there are compromises in some situations. For this particular customer, when the data is formatted to accelerate some complex queries through denormalization as well as indexing or replicated for high availability, compression ratios came down to 1.5 times in practice. Managing big data without breaking the bank 16

17 The bank s implementation of the online analytic archive generated additional and unanticipated performance and operational cost benefits. The bank needed no changes to its database administration scripts to get data from the online trading system to the new archive. Security was simpler since the data is unchangeable by the very design of the database. And the daily load time went down from eight to 12 hours to two hours. In the future, this bank expects to simplify this process further so that it uses a simple extract and load. More surprisingly, the queries run on the previous warehouse (Sybase IQ) can run unmodified on the archive. That s a pretty strong statement of this technology s support for standard SQL or BI queries. Despite all the focus on it being a space-efficient repository, it s also a database. The bank has significant plans for this solution. It plans to use it as the online archive for all its tier 1 online databases. Departments will no longer have their own archive landfills. Part of the appeal is that many individual archive databases can run under a single instance of the archive DBMS engine with common compliance enforced and running on the cheapest storage that is still performant. That will create a shared service with much fewer administrative costs and not just for the original trading system. The bank expects that 30 percent of all new information will ultimately go into this online archive. Managing big data without breaking the bank 17

18 Technology considerations and choices: What RainStor is good for and why So far, several SQL DBMSs have served as generic examples for EDWs and transaction-processing usage scenarios. In order to take a concrete look at online analytic archives, RainStor will be examined in detail in this section. Traditional SQL DBMS performance constraints not relevant to online analytic archive Understanding how an online analytic archive works starts with understanding its intended use. Because it is meant to be both an analytic database and an online archive, it must be able to store petabytes of data with maximum efficiency, load and append new data with commensurate speed, and read large data sets via SQL with high performance. The crucial difference between this usage scenario and a traditional DBMS is that once data is in the archive, it never gets updated. It can only be appended. Therefore, the core value-add of a traditional DBMS technology of mediating reads and writes to common data records via transaction processing is not relevant. Traditional databases are optimized to use memory to mediate random reads and writes to data on physical disks that store and fetch data best when they are reading or writing sequentially. This type of mediation is known as transaction processing. Traditional SQL DBMSs have adhered to rigorous transaction processing principles known as ACID (atomicity, consistency, isolation, and durability). These principles account for much of the code and processing overhead of traditional databases. Without this requirement, a whole range of analytic archive features become possible. It s hard to overstate how much that means for the features on which analytic archives in general and RainStor in particular focus. Data compression Once the design assumption of an analytic archive is made, all the traditional assumptions about how to organize and store data can be reconsidered. One of the first considerations is the size of the block of data on which all operations are made. Transactional databases typically feature small block sizes of 64KB in order to minimize the likelihood that multiple transactions will be trying to access data that another is Managing big data without breaking the bank 18

19 modifying at the same time. RainStor, by contrast, works with a block size of 50MB because it assumes it will sequentially scan the data and that it never has to worry about an update that might get in the way of multiple read requests. Trying to update data stored this way would be extremely slow. Since the data sets will never need to be updated, it becomes possible to store everything without repeating anything. RainStor looks across all the values and patterns within each container or partition so everything is stored only once. If the data were going to be updatable, then it would have to go through a process that would consume a lot of computational and disk I/O resources to unpack this dense collection, make the changes, and then recalculate how to compress everything again. Even columnar databases like Sybase IQ or Vertica have to make accommodations for updates to previously stored data, limiting the amount they can compress their data. In addition, the traditional DBMS needs indexes on its tables to speed the process of finding the right records on which to operate. These indexes also take up space, sometimes as much or more than the underlying data, and they also consume CPU resources for recalculating them if the underlying data changes. Once the large 50MB block sizes are combined with a 40-time compression ratio, RainStor is effectively dealing with 2GB (2,000MB) blocks of raw data. That greatly accelerates query performance. Greater analytic query performance Most databases bottleneck on moving data to and from disk because they re reading and writing small blocks of data, typically the 64KB mentioned above. Because RainStor is working with 50MB blocks that actually contain 2GB of raw data, it can process large queries with very high performance. It only has to unpack the data when it s presenting the final result set. All intermediate operations work on the compressed data. In addition, query optimizers in a DBMS that feature updatable data can t always pinpoint the data they re seeking to query as quickly. Since updates can change the data, the query optimizer must assume the indexes that point to the data are changing as well. That means the optimizer can only make estimates ahead of time about what data is located where in the database. RainStor s equivalent of a query optimizer doesn t have to make those same fuzzy estimates. The data never changes. RainStor puts one million records in a container and then distributes the containers Managing big data without breaking the bank 19

20 across the file system, whether HDFS, other scale-out file systems, or high-end SAN and NAS storage. Every data container has one million immutable records no matter what the total size of the database. Each container also has the exact statistics on every record it holds. A 10bn record database would require 10,000 containers. If the database were distributed on a cluster of servers with local storage, each node would contain the metadata for the entire database. The metadata approximates 2 percent of the size of the database itself, so all of it can typically stay in memory to accelerate performance. RainStor uses the metadata to apply a filter that tells it which containers on which storage nodes to read. Indexes on traditional DBMSs fulfill a similar role and typically occupy 50 percent to 200 percent of the size of the raw data so that metadata generally can t stay in memory. As previously mentioned, RainStor supports SQL 92 queries with ODBC/JDBC APIs that enable mainstream BI tools to access it. While RainStor supports managing large data sets with highperformance queries, it is not going down the road of complex in-database functions that many other vendors are pursuing. Those products are for different use cases in which customers are typically trying to build predictive models offline on large datasets that can be applied in real-time via online applications. RainStor and other databases that rely primarily on SQL for data manipulation are more focused on trends and anomalies in data that is more historical. Deployment flexibility and administrative efficiency One of the most prominent trends in SQL databases over the last few years is the effectiveness of recent entrants in exploiting the scalability of massively parallel processing (MPP) on commodity hardware. Greenplum and Vertica are only among the most visible of these products. Products such as Hadapt and Cloudera s Impala are moving toward combining the Hadoop ecosystem with SQL query engines. SQL DBMSs and Hadoop-based ones are converging in a direction in which they combine the best of both approaches. But Oracle still represents the mainstream, and it hasn t yet mastered MPP. It can run multiple database nodes using its Real Application Cluster (RAC), but these share storage in the form of a SAN or NAS. The need for shared storage goes back to the discussion on block sizes and transactions. Since the DBMS has to mediate multiple potential transactions while trying to look at or change the same data records, it needs to work with small blocks of data. The whole process bottlenecks on storage, and managing it is a lot easier when the storage is in one place rather than distributed across all the database server nodes. Oracle can squeeze more performance out of its database when deployed on its Exadata hardware, but it Managing big data without breaking the bank 20

21 is still using highly specialized and highly expensive storage hardware, this time linking storage nodes so they look like a single storage node with an Infiniband network. The economic drivers section above goes into the cost implications of this type of deployment. RainStor can run with one or more database nodes with a SAN or NAS when that is how an organization administers storage, but it also has the flexibility to work on MPP database servers with each having local storage. Most databases that run with this type of scale-out hardware run in a configuration called shared nothing, since each node operates independently. One node generally has to act as master in order to organize the output of the others. RainStor actually stores the metadata for the entire database on each node, and because it communicates with 50MB blocks that are effectively 2GB of raw data, its performance in a commodity cluster of servers looks as if it were actually a single expensive machine that has shared everything. Shared everything architectures are technically easier to scale, but they are the most expensive approach. With RainStor, each node knows exactly where all the other data is stored so each can call on any other node to process all or part of a query or to load data. Since RainStor operates on large blocks of data, it works well with scale-out file systems, such as S3 and EBS on AWS and HDFS among others. The design center is also for an administrator closer in skills to a system administrator than a more specialized database administrator. Greater ingest speed of new data Just as RainStor deals in large block sizes for each operation, it also loads data into the file system in sets of a million records at a time since the database never has to worry about reading or writing out-of-date data. Where transactions are more meaningful with RainStor is when it is ingesting data at a high volume across many servers in parallel. If one node fails during this process, the other nodes won t improperly save related data. Where transactions are not as meaningful or appropriate for RainStor include the classic update example in which cash withdrawn from an ATM account must be simultaneously withdrawn from the customer s checking account. If either operation fails, both must abort. Managing big data without breaking the bank 21

22 Conclusion The explosion in data volumes and velocity requires specialization in databases that we haven t seen in the decades since traditional SQL DBMSs became universal. There is much innovation at the high end in transaction performance, especially in scale-out and in-memory architectures. Within the database engine itself, analytic databases are supporting the emerging applications by providing new and advanced functionality, such as predictive analytics, machine learning, and other programmability options. Within this race to the top, RainStor in particular and the databases focusing on column storage more generally have carved out a very sizable set of data storage and analytics scenarios that have been mostly ignored. Granted, some enterprises might rightly ignore these scenarios because they aren t applicable. If analytic functionality beyond SQL is involved, the high-end DBMS vendors represent the best solutions today. For those who are strictly adhering to a design center of an online analytic archive, they can achieve 30 to 40 times the improvements in storage efficiency. That allows companies trying to manage the exponential growth in data to keep their archives online and accessible forever via standard SQL, Hive, Pig, MapReduce, and BI tools. Managing big data without breaking the bank 22

23 About George Gilbert George Gilbert is the co-founder and partner of TechAlpha, a management consulting and research firm that advises clients in the technology, media, and telecommunications industries. He is recognized as a thought leader on the future of cloud computing, data center automation, and Software-as-a-Service (SaaS) economics, and he has contributed to many publications, including the Economist. Previously Gilbert was the lead enterprise software analyst for Credit Suisse First Boston, one of the leading investment banks for the technology sector. Prior to being an analyst, Gilbert worked at Microsoft as a product manager on Windows Server and spent four years in product management and marketing at Lotus Development. He received his B.A. in economics from Harvard University. About GigaOM Research GigaOM Research gives you insider access to expert industry insights on emerging markets. Focused on delivering highly relevant and timely research to the people who need it most, our analysis, reports, and original research come from the most respected voices in the industry. Whether you re beginning to learn about a new market or are an industry insider, GigaOM Pro addresses the need for relevant, illuminating insights into the industry s most dynamic markets. Visit us at: pro.gigaom.com 2012 Giga Omni Media, Inc. All Rights Reserved. This publication may be used only as expressly permitted by license from GigaOM and may not be accessed, used, copied, distributed, published, sold, publicly displayed, or otherwise exploited without the express prior written permission of GigaOM. For licensing information, please contact us. Managing big data without breaking the bank 23

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Tier 1 Communications Provider Efficiently Manages Big Data, Saving Millions of Dollars and Enabling Richer Analytics for Business Users

Tier 1 Communications Provider Efficiently Manages Big Data, Saving Millions of Dollars and Enabling Richer Analytics for Business Users Tier 1 Communications Provider Efficiently Manages Big Data, Saving Millions of Dollars and Enabling Richer Analytics for Business Users www.rainstor.com Background Communications providers have had experience

More information

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information

More information

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting BIG DATA APPLIANCES July 23, TDWI R Sathyanarayana Enterprise Information Management & Analytics Practice EMC Consulting 1 Big data are datasets that grow so large that they become awkward to work with

More information

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved. Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

Global Investment Bank Saves Millions with RainStor

Global Investment Bank Saves Millions with RainStor SUCCESS STORY Global Investment Bank Saves Millions with RainStor Reduces data footprint by 97%, meets stringent compliance regulations and achieves payback in 18 months www.rainstor.com Background Financial

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

SAP Real-time Data Platform. April 2013

SAP Real-time Data Platform. April 2013 SAP Real-time Data Platform April 2013 Agenda Introduction SAP Real Time Data Platform Overview SAP Sybase ASE SAP Sybase IQ SAP EIM Questions and Answers 2012 SAP AG. All rights reserved. 2 Introduction

More information

Introduction to NetApp Infinite Volume

Introduction to NetApp Infinite Volume Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Big Data and Its Impact on the Data Warehousing Architecture

Big Data and Its Impact on the Data Warehousing Architecture Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

HyperQ Storage Tiering White Paper

HyperQ Storage Tiering White Paper HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Introduction to the Event Analysis and Retention Dilemma

Introduction to the Event Analysis and Retention Dilemma Introduction to the Event Analysis and Retention Dilemma Introduction Companies today are encountering a number of business imperatives that involve storing, managing and analyzing large volumes of event

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

Overview: X5 Generation Database Machines

Overview: X5 Generation Database Machines Overview: X5 Generation Database Machines Spend Less by Doing More Spend Less by Paying Less Rob Kolb Exadata X5-2 Exadata X4-8 SuperCluster T5-8 SuperCluster M6-32 Big Memory Machine Oracle Exadata Database

More information

4 th Workshop on Big Data Benchmarking

4 th Workshop on Big Data Benchmarking 4 th Workshop on Big Data Benchmarking MPP SQL Engines: architectural choices and their implications on benchmarking 09 Oct 2013 Agenda: Big Data Landscape Market Requirements Benchmark Parameters Benchmark

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

Architecture & Experience

Architecture & Experience Architecture & Experience Data Mining - Combination from SAP HANA, R & Hadoop Markus Severin, Solution Principal Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved. EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

EMC BACKUP MEETS BIG DATA

EMC BACKUP MEETS BIG DATA EMC BACKUP MEETS BIG DATA Strategies To Protect Greenplum, Isilon And Teradata Systems 1 Agenda Big Data: Overview, Backup and Recovery EMC Big Data Backup Strategy EMC Backup and Recovery Solutions for

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile. Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database White Paper Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database Abstract This white paper explores the technology

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture Ron Weiss, Exadata Product Management Exadata Database Machine Best Platform to Run the

More information

Il mondo dei DB Cambia : Tecnologie e opportunita`

Il mondo dei DB Cambia : Tecnologie e opportunita` Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject

More information

Exadata Database Machine

Exadata Database Machine Database Machine Extreme Extraordinary Exciting By Craig Moir of MyDBA March 2011 Exadata & Exalogic What is it? It is Hardware and Software engineered to work together It is Extreme Performance Application-to-Disk

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

EMC SOLUTION FOR SPLUNK

EMC SOLUTION FOR SPLUNK EMC SOLUTION FOR SPLUNK Splunk validation using all-flash EMC XtremIO and EMC Isilon scale-out NAS ABSTRACT This white paper provides details on the validation of functionality and performance of Splunk

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Cisco Solutions for Big Data and Analytics

Cisco Solutions for Big Data and Analytics Cisco Solutions for Big Data and Analytics Tarek Elsherif, Solutions Executive November, 2015 Agenda Major Drivers & Challengs Data Virtualization & Analytics Platform Considerations for Big Data & Analytics

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers White Paper rev. 2015-11-27 2015 FlashGrid Inc. 1 www.flashgrid.io Abstract Oracle Real Application Clusters (RAC)

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com

More information

Big + Fast + Safe + Simple = Lowest Technical Risk

Big + Fast + Safe + Simple = Lowest Technical Risk Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big

More information

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

Why DBMSs Matter More than Ever in the Big Data Era

Why DBMSs Matter More than Ever in the Big Data Era E-PAPER FEBRUARY 2014 Why DBMSs Matter More than Ever in the Big Data Era Having the right database infrastructure can make or break big data analytics projects. TW_1401138 Big data has become big news

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

NetApp Big Content Solutions: Agile Infrastructure for Big Data

NetApp Big Content Solutions: Agile Infrastructure for Big Data White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data

More information

High Performance IT Insights. Building the Foundation for Big Data

High Performance IT Insights. Building the Foundation for Big Data High Performance IT Insights Building the Foundation for Big Data Page 2 For years, companies have been contending with a rapidly rising tide of data that needs to be captured, stored and used by the business.

More information

SUN ORACLE EXADATA STORAGE SERVER

SUN ORACLE EXADATA STORAGE SERVER SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand

More information

In-memory computing with SAP HANA

In-memory computing with SAP HANA In-memory computing with SAP HANA June 2015 Amit Satoor, SAP @asatoor 2015 SAP SE or an SAP affiliate company. All rights reserved. 1 Hyperconnectivity across people, business, and devices give rise to

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information