Microsoft Big Data and Analytics. Server, an on-premises solution, and Windows Azure HDInsight Service*, a completely cloud-based solution.

Size: px
Start display at page:

Download "Microsoft Big Data and Analytics. Server, an on-premises solution, and Windows Azure HDInsight Service*, a completely cloud-based solution."

Transcription

1 Executive Summary Microsoft has established a firm foothold in the world of traditionally structured data with Microsoft SQL Server* and an even firmer foothold in the world of data analysis with tools such as Microsoft Excel*. However, the big data era requires solutions to store, query, and analyze data beyond that which is traditionally structured in relational databases or spreadsheets. Microsoft has responded to this big data challenge not only by offering a new big data solution, but also by describing a broad solution for comprehensive data management and analysis that is supported by a combination of new and old Microsoft products. The big data trend in recent years has been largely driven by the popular, open-source software framework of Apache Hadoop*. Apache Hadoop allows massive amounts of data that is not structured into relational databases to be stored in clusters of commodity servers and then analyzed for correlations, trends, and other potentially valuable information. So popular has Apache Hadoop become as a big data solution that to many, the terms big data and Apache Hadoop have become synonymous. Microsoft is offering an Apache Hadoop component with Microsoft HDInsight*, a set of services built on Hortonworks Data Platform* (HDP*) for Windows*. More specifically, HDInsight can refer to either of two separate Microsoft products, both still in preview and months away from general release: HDInsight Server, an on-premises solution, and Windows Azure HDInsight Service*, a completely cloud-based solution. Although Microsoft does offer these two new Apache Hadoop products for storing and mining both semi-structured and unstructured data, the company has also been keen to steer the big data conversation away from the need for big data solutions per se and toward the need for a universal data management and analysis solution. Until recently, in fact, Microsoft used the term big data to refer to this universal vision, but its most recent messaging makes a distinction between big data of Apache Hadoop and other forms of data. Microsoft s broader vision is supported in part by Microsoft SQL Server 2012 Parallel Data Warehouse* (PDW), which is a data-warehouse hardware appliance that stores only structured data but that also supports queries of both structured and unstructured data through Microsoft s proprietary PolyBase technology. Microsoft also positions SQL Server Analysis Services (SSAS), Excel, and Microsoft SharePoint Server* as part of its all data tool set, along with optional analysis add-ons for Microsoft Office* such as PowerPivot, Power View, Power Map, and Power Query.

2 Contents Executive Summary... 1 Evaluating the Microsoft Data Platform... 3 Is Microsoft Really Democratizing Big Data?... 3 Does Microsoft Offer a Truly Comprehensive Data-Management Solution?... 3 Conclusion... 4 Microsoft s Big Data Vision... 5 Microsoft s General Claims about its Comprehensive Data Solution... 6 Claim: The Microsoft big data solution offers an integrated platform for managing data of any type or size Claim: Microsoft s big data solution gives you the power to enable anyone in your organization to easily glean insight from your data so they can make. smarter decisions Microsoft HDInsight*: Microsoft s Apache Hadoop* Solution... 7 Creating HDInsight Service Clusters... 8 HDInsight Storage Options... 8 HDInsight Management... 9 Getting Data in and out of HDInsight... 9 Technical Notes about HDInsight Microsoft s Claims about HDInsight Claim: [HDInsight lets you] accelerate the deployment with the cloud by deploying an Apache Hadoop cluster on Windows Azure* in just 10 minutes Claim: Microsoft simplifies programming on Apache Hadoop Claim: [Microsoft big data lets you] seamlessly extend privileges across HDInsight with Active Directory* Claim: HDInsight is 100% compatible with Apache Hadoop SQL Server 2012* Parallel Data Warehouse: An (Almost) All-in-One Data Solution PDW Hardware Specifications Dell Parallel Data Warehouse Appliance HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse How PDW Works Comparison of Data Warehousing Appliances Big Data Integration PolyBase CREATE EXTERNAL TABLE Statement CREATE TABLE AS SELECT Statement Querying the Data Pushing Data to Apache Hadoop from PDW Roadmap for PolyBase ETL in PDW Microsoft s Claims about PDW Claim: PolyBase for PDW provides seamless integration of Apache Hadoop data with the data warehouse in a single query Claim: HDFS Bridge in PolyBase enable[s] direct communication between HDFS data nodes and PDW compute nodes Business Intelligence and Analytics Apache Hive* ODBC Driver PowerPivot Power Query Power View Power Map Microsoft s Claims about BI Claim: HDInsight democratizes the power of big data BI Claim: [Microsoft lets you] analyze big data with familiar tools Conclusion Notes... 23

3 Evaluating the Microsoft Data Platform Microsoft makes two alluring pitches for its suite of data products. The first is that its solution can bring the power of big data to the masses, making queries easier to submit and data easier to analyze with tools that are already ubiquitous. The second claim is that the Microsoft solution offers a single, comprehensive solution to manage all enterprise data regardless of size, structure, or speed. Is Microsoft Really Democratizing Big Data? Despite the near-exuberant rhetoric about bringing big data analysis to the masses, Microsoft s progress on this count has been somewhat modest. Microsoft is indeed lowering the barrier to entry for big data, but only incrementally. Its clearest success along these lines comes in deployment and management. Whether on premises or in the cloud, HDInsight is easy to set up and manage compared to other big data solutions, especially for IT personnel who lack Linux* expertise. This solid innovation, however, does not simplify deriving value from data in the cluster. For this ultimate purpose, HDInsight only modestly reduces the difficulty of searching, analyzing, and mining Apache Hadoop data compared to other Apache Hadoop solutions. Microsoft s unique contribution toward simplifying data mining from Apache Hadoop clusters is to offer a set of programming libraries that allows programmers to run operations against Apache Hadoop data in simpler programming languages, such as JavaScript* and.net* languages such as C# and F#. It also offers an interactive JavaScript console that allows programmers to run JavaScript commands against data in Apache Hadoop files one line of code at a time. As a comparison, the classic WordCount program in Apache Hadoop requires approximately 60 lines of code in Java*, but only 15 in JavaScript. Such advancements will allow more people to gain insights from data stored in Apache Hadoop files, but that wider group must still be programmers. One area where Microsoft is truly democratizing data analysis and visualization is on the client end, in Excel. Excel has the ability to take data stored in individual Apache Hadoop files, run traditional database queries against this data, perform analysis on tables in this data, and finally present this data in impressive visualizations that can provide valuable insights. However, it is essential to understand first that Excel can load data from any Apache Hadoop source, not just from HDInsight. Excel allows users to import Apache Hadoop data from any source by means of a special add-on driver (the Apache Hive* Open Database Connectivity [ODBC] driver from Microsoft). As much as Microsoft is attempting to connect Excel and HDInsight as part of a single solution, there is no substantial advantage to choosing HDInsight as the particular backend source of Apache Hadoop data in Excel. Moreover, Excel does not allow users to perform complex operations, such as machine learning, that analyze or mine vast amounts of data from an Apache Hadoop cluster in the way the term big data suggests. With Excel, information workers can merely import individual Apache Hadoop files and perform analyses and visualizations on tables stored in these files. Does Microsoft Offer a Truly Comprehensive Data- Management Solution? The closest Microsoft comes to a comprehensive data solution today is with its PDW hardware appliance, which includes SQL Server 2012 to store structured data and which can also connect to Apache Hadoop data from an external source. PDW thus enables unified access to both structured and unstructured data. However, PDW does not currently favor any particular Apache Hadoop solution as the external source of unstructured data, making it a big data solution far from specific to Microsoft. Microsoft s current big data solution is also limited in that none of its components can handle streaming unstructured data, such as from social media or user clickstreams. PDW might have clear limitations today, but in the future the appliance is likely to fulfill Microsoft s promise of delivering truly comprehensive data management on-premises, if at a high price. This comprehensive vision is set to be realized with the next version of PDW, which will likely include a preinstalled version of HDInsight Server (at least as an option). The ability to perform real-time queries of unstructured data

4 streams is also likely to be incorporated into future versions of HDInsight, making Microsoft s data-handling capabilities truly comprehensive. Microsoft has not confirmed that it will include PolyBase in a future, general release of SQL Server outside of PDW, but such a move is also plausible. Adding PolyBase broadly to SQL Server would bring the capabilities of handling structured and unstructured data to the wider database market. A cloud-based, on-demand solution that meets Microsoft s promise of comprehensive data management is also eventually likely to arrive. Windows Azure* already allows users to create, store, and manage databases in Windows Azure SQL Database* online, so all of the components of a comprehensive data solution will be available through Windows Azure when Windows Azure HDInsight Service matures. It is unclear if a wider distribution of PolyBase to SQL Server would extend to a cloud-based version, however. Even if it does, the bottleneck of upload speeds on large, proprietary data sets could limit the usefulness of the cloud-only option for some data-heavy firms. However, the cloud-only solution will present an attractive option for firms that generate data online or that work with public data files. fundamentally, Microsoft s solution for unstructured big data is still not released, and it will be a matter of time before general usage can truly reveal its strengths and its faults. Despite these reservations, there are reasons to be optimistic about Microsoft s chances of bringing big data to the masses in the future. Compared to other companies, Microsoft has more of the components in place for a comprehensive data solution, including popular database management software in SQL Server, a rapidly-maturing cloud provider in Windows Azure, widely-used business intelligence tools, and the resources to invest in this comprehensive vision for the long term. Conclusion Microsoft provides a vision for big data within a larger context of all data, structured and unstructured. While this vision is tantalizing for the future, it ultimately lacks substance today. Democratizing big data would hold some of the same revolutionary promise that personal computing and later the Internet realized in the last three decades, yet it is far from clear that Microsoft will ultimately consummate this revolution. PolyBase shows potential for managing and analyzing structured and semi-structured enterprise data by using familiar database skills, but it is currently only available in a high-end data-warehouse appliance. Using Excel as a frontend for bigdata analysis is another alluring vision, but it too is limited to dealing with structured and semi-structured data. Moreover, if Excel continues to be agnostic about the big-data backend supporting it, it does not provide an argument for companies to pick HDInsight over any other Apache Hadoop solution. Most

5 Microsoft s Big Data Vision Microsoft is currently developing a big data solution whose main components are likely to be released over the next year. These products have not yet been finalized, but their features have been made public, and Microsoft s own statements about their soon-to-be-released big data tools provide insight not only into the company s big data strategy, but also into its broader data strategy in general. This paper provides an overview of this broader strategy and an analysis of Microsoft s big data claims. Big data as a trend relies technically on the open-source software framework of Apache Hadoop. Originally created at Yahoo!, Apache Hadoop allows nearly unlimited amounts of unstructured or semi-structured data (such as is found in log files) to be stored in clusters of inexpensive servers and then analyzed for correlations, trends, causal relationships, and other insights. Apache Hadoop has become the industry standard for big data, and for many, the terms big data and Apache Hadoop have become synonymous. For many companies selling a big data solution, the conversation about big data begins and ends with Apache Hadoop. Microsoft s vision of big data differs from many others in that it has publicly positioned Apache Hadoop as only a component of a more comprehensive data strategy. This comprehensive strategy includes not only the unstructured and semi-structured data that are the accepted mainstays of big data, but also data that is structured (such as into traditional database tables, such as in a data warehouses), along with the business intelligence tools used to analyze all data, whether unstructured, semistructured, or structured. This broader all data vision allows Microsoft to draw into the big data conversation the company s existing strengths in products such as SQL Server and Excel. By re-imagining the business staples of SQL Server and Excel as having a role in a big data solution, Microsoft is targeting their suite of big data products toward the many businesses that have already invested heavily in these tools and accumulated large amounts of potentially useful data in them. Microsoft is also targeting the many companies that have high skills in common software tools but that lack the specialized knowledge reserved for data scientists and pure Apache Hadoop experts. The most central component of Microsoft s big data strategy is provided by HDInsight, an Apache Hadoop solution built from a particular Apache Hadoop distribution, namely HDP for Windows. (HDP for Windows, developed by Hortonworks, Inc., is in fact the first distribution of Apache Hadoop that runs natively on Windows, and it is already publicly available as a free tool.) HDInsight can actually refer to either of two separate Apache Hadoop products, both still available only as preview versions: HDInsight Server, an on-premises solution, and Windows Azure HDInsight Service, a completely cloud-based solution. Both of these options are touted as versions of Apache Hadoop that are easier to set up and use than are the Apache Hadoop products offered by competitors. More recently, Microsoft has also described HDInsight as a solution for analyzing data that is semi-structured in particular, such as data sourced from smartphones, web sites, RFID tags, and Twitter feeds. Microsoft has also hinted that its search engine technologies, Bing* and Microsoft FAST Search, will act as the solutions to interact with completely unstructured data, such as documents. Code from both products was in fact incorporated into the search function in Microsoft SharePoint 2013*. However, Microsoft has not elaborated on the particular role it sees for its search engines within its comprehensive data strategy. A second cornerstone of Microsoft s big data vision is SQL Server 2012 PDW, a hardware appliance that supports queries of data stored both in SQL tables and Apache Hadoop files through Microsoft s proprietary PolyBase technology. PDW is already available at a price of approximately $1.5M. (Note that data warehouses commonly cost as much as $30M, so while high, the cost of PDW is actually low relative to that of the competition.) The third, and currently final, component of Microsoft s big data solution is its business intelligence (BI) and visualization tools. These tools include Excel most importantly, but also Microsoft

6 SharePoint Server and Microsoft Office 365*, along with optional analysis add-ons such as PowerPivot, Power View, Power Map, and Power Query. Future components might be added to this suite of big data products as they become available. For example, the next version of SQL Server will include an in-memory online transaction processing (OLTP) engine, currently code-named Hekaton. Hekaton will allow any new products based on it to efficiently process data captured in real-time, such as from data streams. It is plausible that Microsoft s big data strategy will eventually reflect this new functionality provided by Hekaton and include a real-time data analysis tool. Although Microsoft is describing these various products as components of an integrated big data solution, they do not function cohesively today. It is more accurate to view these components as a list of separate tools that might slowly become integrated over time. Another limitation to keep in mind about Microsoft s big data solution is that its central component, HDInsight, is still a work in progress and many months away from release. Moreover, there is even a question about whether HDInsight will be outdated when it finally is released. HDInsight is currently based on HDP for Windows 1.x, which in turn is based on the Linux exclusive Apache Hadoop 1.0. The next version of Apache Hadoop based on Linux, version 2.0, is currently in community preview and is scheduled for general release in late summer 2013; it offers an architectural overhaul that promises to dramatically improve performance and extensibility. Hortonworks s port of Apache Hadoop 2.0 to HDP for Windows 2.0 is currently being targeted for late Any future version of HDInsight that incorporates the updates in Apache Hadoop 2.0 can only be built after HDP for Windows 2.0 is finalized in late Microsoft s General Claims about its Comprehensive Data Solution Microsoft s claims surrounding its all-data solution fall into three broad categories: that Microsoft provides a platform to manage data of any type and size, that the Microsoft solution provides a way to analyze all data, and that the Microsoft solution enables information worker generalists to glean insights from big data. While these claims are generally accurate, careful examination of each claim yields a more nuanced picture. Claim: The Microsoft big data solution offers an integrated platform for managing data of any type or size. 1 In discussing its comprehensive data solution, Microsoft places data into two broad categories for management: structured data (managed by SQL Server) and semi- and unstructured data (managed by HDInsight). The fact that Microsoft points to two products actually hints at the lack of an integrated platform for data management: data for SQL Server and Apache Hadoop are not integrated into a single platform. Even within each discrete product, data is not necessarily integrated. On the one hand, it is true that SQL Server is the management tool for structured data. On the other hand, managing data in HDInsight is more complex. For companies choosing the cloud-based Windows Azure HDInsight Service as their Apache Hadoop option, both semi-structured and unstructured data are likely to be stored and managed in Windows Azure blob storage. For firms choosing the onpremises HDInsight Server option, semi-structured and unstructured data are likely to be managed in separate locations. Semi-structured data will likely be stored and managed in Apache Hadoop Distributed File System (HDFS). Unstructured data, such as documents, spreadsheets, presentations, videos, and audio recordings, will likely be managed not in Apache Hadoop but in SharePoint, in a Microsoft product-centric IT deployment.

7 The technology that currently comes closest to realizing the claim of an integrated platform is PolyBase. PolyBase should not be viewed as a silver bullet, however. Beyond being currently locked away in a specialized, expensive data warehouse appliance, it is unclear to what extent it will integrate with Microsoft s principal tool for unstructured data querying, Bing, or with Microsoft FAST Search for queries in SharePoint. As with so many other aspects of the Microsoft data vision, time alone will tell how and to what degree organizations can implement them. In general, Microsoft does not currently offer a comprehensive data management solution but a set of tools and products that allows organizations to handle structured, semi-structured, and unstructured data. Claim: Microsoft s big data solution gives you the power to enable anyone in your organization to easily glean insight from your data so they can make smarter decisions. 2 This claim exaggerates the democratizing power of the Microsoft big data solution. Microsoft s integration of ubiquitous and well-understood tools for big data analytics (particularly Excel) should not be confused with making big data queries and analysis inherently easier. Using laymen s tools for big data work is not the same as putting big data insights within reach of all laymen. That said, this represents a key part of Microsoft s competitive advantage in the big data arena, particularly with the saturation of Excel in the enterprise productivity market. Many more knowledge workers are familiar with Excel than with even SQL queries, for example, opening up direct examination of big data sets to a larger pool of analysts who previously had to work through middlemen like data scientists. Moreover, Excel addins such as PowerPivot, Power View, Power Map, and Power Query definitively put more analytical power in the hands of end users than before. IT organizations looking at these solutions, however, should keep their eyes wide open for the behind-the-scenes work that can go into preparing data sets for wider use within a company. A sample data set of electrical usage of households in two Dallas suburbs used to demonstrate Power Map and Power View in Excel provides a telling example. The Microsoft team loaded Dallas County Appraisal District flat-file records into SQL Server, converted geographical coordinates within them from planer to an ellipsoid projection with a third-party tool, and calculated the centroid of each land parcel in SQL Server to obtain a longitude and latitude figure for each plot before exporting the data to Excel. (All of this before adding details to the data set, such as simulated rates of electricity usage.) The result was a rich data set that could be dissected by information workers across a variety of dimensions, including time. The route to get there was anything but trivial, however. Microsoft HDInsight*: Microsoft s Apache Hadoop* Solution HDInsight is the brand Microsoft has assigned to its two upcoming Apache Hadoop products: the cloud-based Windows Azure HDInsight Service and the on-premises HDInsight Server. Both of these solutions are built from a core of HDP for Windows. HDInsight in both cases thus refers to a product composed of this basic Hortonworks Apache Hadoop distribution in addition to extensive software customizations added by Microsoft. (HDInsight and HDP for Windows do not, in other words, refer to distinct components that communicate with each other.) Of the two versions of HDInsight, Microsoft has promoted the cloud-based Windows Azure HDInsight Service to a much greater degree. This product, hosted on Windows Azure, is also expected to be released first, mostly likely in Q The emphasis on the cloud-based HDInsight suggests that this version of the product aligns more closely with Microsoft s chosen market positioning for HDInsight in general.

8 The HDInsight Service web page (found at windowsazure.com/en-us/services/hdinsight/) describes the service by featuring words and phrases such as gain insight from any data, any size, anywhere, provides simplicity, ease of management, simplicity of Windows Azure, simple and straightforward, seamless scale, quickly create, cost savings only possible on a cloud environment, glean insights on all your data with familiar tools, and analyze all your data easily. The messaging is clear: HDInsight Service is simple, cost-efficient, and takes advantages of existing knowledge. Simple as it might be, HDInsight Service is not the only cloud-based Apache Hadoop solution. Other such products include Amazon s Elastic MapReduce*, Joyent Solution for Apache Hadoop*, and InfoChimps Cloud::Hadoop*. Microsoft s offering differs from these others most obviously in that it runs on Windows and that it is integrated into the Windows Azure platform. Another idiosyncrasy of HDInsight is that it is currently based on Apache Hadoop and HDP for Windows 1.1.0, 3 even though (as of August 2013) the most recent stable releases of Apache Hadoop based on Linux are Apache Hadoop and HDP Even the most recent version of HDP for Windows is a later version: Because Apache Hadoop is a quicklymaturing platform, the difference in incremental updates can be significant. For example, HDP features a revision of the Hive query language called the Stinger Initiative that supports 50 times faster performance and increased compatibility with the SQL query language, but this technology is not currently included in HDInsight. In addition, the next full version of Apache Hadoop, Apache Hadoop 2.0, is expected to be released in Q and to be incorporated into HDP for Windows in Q4. Apache Hadoop 2.0 is an important update that will dramatically improve the efficiency and extensibility of the platform, but it is not clear when these updates will reach HDInsight. Creating HDInsight Service Clusters Clusters created in HDInsight are intended to be disposable as a way to minimize costs. HDInsight was designed with the expectation that users will create an HDInsight cluster, load the data needed, run the analyses desired, and then destroy the cluster. HDInsight promises to be simple, and as far as the procedure to create a new cluster is concerned, it lives up to this promise. With the Quick Create option in particular, the user merely chooses the cluster size (as defined by the number of nodes) and then assigns a name, password, and storage account for the cluster. Once the user clicks the option to create the cluster, the process takes 15 to 20 minutes. HDInsight Storage Options HDInsight allows data to be stored in the local HDFS file system, as does any Apache Hadoop distribution. However, an option unique to HDInsight Service is the Azure Storage Vault (ASV) protocol, which builds on the HDFS API to map Apache Hadoop operations to Windows Azure blob storage instead of to local HDFS. Through ASV, customers can keep their Apache Hadoop data in an inexpensive Windows Azure blob storage account and avoid having to import this data into the physical compute nodes of the HDInsight cluster. Because the data accessed through ASV isn t physically stored in the HDInsight cluster, the data remains in Windows Azure blob storage before clusters are created and after they are destroyed. After users spin up an HDInsight cluster, they can point operations such as Hive queries toward data that has been stored in Windows Azure blob storage by using a URI beginning with asv:// or asvs://. The drawback to ASV is that, because this data is not stored in the Apache Hadoop cluster itself, performance is not always optimized. However, write performance on Windows Azure blob storage is much faster that it is on HDFS, and with large file reads, temporary writes can be used so often that ASV can actually even result in better overall performance than local HDFS storage can. Figure 1 shows the setting to configure ASV for HDInsight.

9 For more fine-grained management of HDInsight clusters and their associated storage, Windows PowerShell* is available. Windows PowerShell cmdlets for HDInsight are currently in version 0.9 and are available through the Microsoft.NET SDK For Apache Hadoop web site on Codeplex ( codeplex.com/releases/view/109811). Figure 1. HDInsight cluster management screen If optimal performance is important, it is advisable to run tests with data stored in both ASV and local HDFS and compare the results. Note however that the cost of storing data in HDFS on HDInsight node instances is much higher than the cost of storing a comparable amount of data in Windows Azure blob storage. Another drawback to HDFS over ASV is that data stored in HDFS is removed when the cluster is destroyed. Figure 2 illustrates the relationship between an HDInsight cluster, HDFS, and ASV. Figure 3. HDInsight cluster dashboard Beyond these current tools, Microsoft has stated that in the future, Microsoft System Center* will provide tools to manage HDInsight. Given this information, it seems most likely that this System Center integration will become available in first full release of System Center after the official public release of HDInsight. Figure 2. Relationship between an HDInsight cluster, HDFS, and Windows Azure Blob Storage HDInsight Management Windows Azure HDInsight Service and its on-premises counterpart HDInsight Server share the same web-based management interface, shown in Figure 3. The graphical user interface (GUI) provides options such as an interactive JavaScript and Hive console to a cluster, a remote desktop connection to the name (main) node, and monitoring data. Getting Data in and out of HDInsight HDInsight offers a number of standard Apache Hadoop ecosystem tools for loading and unloading data, such as the Apache Hadoop command or, if the source is a relational database, the Apache Sqoop* tool (included in all Apache Hadoop distributions). To load log file data, the standard Apache Hadoop ecosystem tool Apache Flume* is used. To load data into or out of Windows Azure blob storage (as opposed to HDFS), users have more options. For example, one can use any number of tools that make use of the HDFS API, such as the free graphical tools Azure Storage Explorer*

10 and CloudXplorer* or the command-line tool AzCopy*. One can also use JavaScript via the interactive console, the Apache Hadoop command line (using the Apache Hadoop command), or a.net language such as C#. Yet another option is Windows PowerShell. After data has been unloaded, it s typically necessary to clean it before it can be consumed, analyzed, or displayed in a visualization. These data cleaning operations are often referred to as extract, transform, and load (ETL). For ETL operations with HDInsight, the standard Apache Hadoop tool Apache Pig* can be used. However, Microsoft also makes ETL for Apache Hadoop possible through SSIS, by means of the Hive ODBC Driver; the Hive ODBC Driver allows external applications such as Excel and SQL Server to connect to Apache Hadoop data. Technical Notes about HDInsight HDInsight was developed with ease of use in mind and has not been optimized for other features such as performance. In addition, it is unlikely that HDInsight will ever be built on the very latest version of Apache Hadoop because these versions are written on Linux. As a result, HDInsight will be late to adopt cutting edge features and frameworks such as Intel s Project Rhino, which provides a common security framework for Apache Hadoop; or Intel Advanced Encryption Standard New Instruction (Intel AES-NI), which speeds performance on encryption; or cell-level security in Apache Hadoop, such as is being developed in the Apache Accumulo* project. Regarding security, the only claims Microsoft is in fact making about HDInsight and security relate to its integration with Active Directory* Domain Services. Microsoft s Claims about HDInsight Microsoft s main claims about HDInsight usually suggest that the product makes Apache Hadoop easier. What follows are some representative examples of Microsoft followed by a brief analysis. Claim: [HDInsight lets you] accelerate the deployment with the cloud by deploying an Apache Hadoop cluster on Windows Azure* in just 10 minutes. 7 The claim is specific and easy to verify, but it also suggests something general: that creating an HDInsight cluster in Windows Azure is a trivial exercise and is far easier than setting up one s own hardware cluster. Although it takes closer to 20 minutes to set up an HDInsight cluster, it is true that by using Windows Azure HDInsight Service the circumscribed process of setting up an HDInsight cluster is quick and easy. However, this statement is essentially misleading because it ignores the necessary aspect of uploading data into the cloud. This uploading process is necessary unless the enterprise data destined for Apache Hadoop is already stored in Windows Azure blob storage (an uncommon scenario). To upload 1 TB of uncompressed data at a rate of 1 MB/second would require approximately 12 days. Compression can reduce the transfer times by 80 to 90 percent, but even assuming the rate can be increased to a brisk 1 TB per day, the process of uploading 100 TB would still take 100 days. (Windows Azure does not yet allow customers to ship physical disks to speed the process of loading data, but this service is planned before the end of ) In addition, regardless of how complicated or time-consuming the process of deploying an Apache Hadoop cluster might be, this difficulty of deployment is not a major deterrent to the sound use of Apache Hadoop. In the broader scheme, ease of installation is a nice-to-have feature of HDInsight that does not help businesses derive any value whatsoever from an Apache Hadoop cluster. Note that for the on-premises version of this product, the true ease of installation cannot yet be verified because the current preview of HDInsight Server (for on-premises deployment) can only be installed as a single node.

11 Claim: Microsoft simplifies programming on Apache Hadoop. 9 The claim that a procedure has been simplified can mean either that it has been made simple, or that it has merely been made simpler. In this case it is true that Microsoft has made programming on Apache Hadoop a little simpler, but it is not true that it has made programming on Apache Hadoop simple. services related to Apache Hadoop use for logon credentials. These services, and the hadoop logon account, are shown in Figure 4. Microsoft s programmatic addition to Apache Hadoop has been to create a.net software development kit (SDK) and a set of JavaScript libraries for HDInsight, in addition to providing an interactive JavaScript console to Apache Hadoop. (The.NET SDK allows programmers to write essential Apache Hadoop MapReduce jobs in all.net languages such as C# and F#.) These additions in principle should make programming for Apache Hadoop easier for the many programmers who are not Java specialists. However, programming MapReduce jobs will remain fundamentally complex even in these other languages. For the IT decision maker, the take-away is that developers comfortable in any.net language or JavaScript will now be able to program MapReduce jobs and quickly perform queries in a console against data stored in Apache Hadoop. Claim: [Microsoft big data lets you] seamlessly extend privileges across HDInsight with Active Directory*. 10 This implication of this claim is that the integration of HDInsight with Active Directory Domain Services makes managing HDInsight easier. Apache Hadoop is in fact integrated with Active Directory Domain Services, but not yet to the high degree that is suggested in the claim. The locus of integration is currently with user accounts, authentication, and authorization: Windows accounts are used to manage Apache Hadoop, and it s not necessary to create user accounts within HDInsight itself. In fact, with HDInsight, no aspect of authentication and authorization remains siloed in Apache Hadoop; security is handled by Windows Azure, Active Directory Domain Services, or local Windows security. In addition, HDInsight creates a special Windows user account named hadoop that the 14 Figure 4. HDInsight services In general, IT should not soon expect dramatic improvements in the manageability of Apache Hadoop because of its loose integration in Active Directory Domain Services. However, it is likely that Apache Hadoop and Active Directory Domain Services will become more integrated over time, leading to (for example) specific HDInsight group policy objects (GPOs) and other administrative benefits. HDInsight will likely need some years to mature before that will happen, however. Claim: HDInsight is 100% compatible with Apache Hadoop. 11 Buried within Microsoft s general claim that HDInsight makes Apache Hadoop easier is the implicit claim that HDInsight really is Apache Hadoop. Is it? In general, yes. Apache Hadoop runs inside HDInsight, and it is true that Apache Hadoop files from other Apache Hadoop distributions are 100 percent compatible with it. In addition, one can download an Apache Hadoop component such as Apache Mahout* straight from the Apache web site, and it will run on an HDInsight cluster without errors. However, it is not true (as the claim might be interpreted) that HDInsight has the same features as all standard versions of Apache Hadoop. At the time of this writing, for example, HDP and Apache Hadoop support features that have not yet appeared in HDInsight. This lag time between Apache

12 Hadoop versions is likely to persist indefinitely, and it remains to be seen whether in some cases it could actually lead to file or code incompatibilities. In general, the take away for the IT decision maker is that HDInsight is likely to be running a slightly outdated version of standard Apache Hadoop. Today, code and syntax is 100 percent portable from standard Apache Hadoop, but in the future, exceptions to this rule cannot be ruled out. Ultimately, however, Microsoft has made clear that they want to remain 100 percent compatible with Apache Hadoop, so if such an incompatibility should arise, it will likely be a temporary problem. SQL Server 2012* Parallel Data Warehouse: An (Almost) All-in-One Data Solution Another pillar in Microsoft s all-data product lineup is SQL Server 2012 PDW. PDW is a massive parallel processing (MPP) data warehousing appliance that combines custom software built on SQL Server 2012 with commodity hardware. Currently, the appliance is sold in various scalable configurations only by Dell and Hewlett-Packard. At the lowest end, both vendors sell a one-quarter rack version (of a standard 42U rack). The Dell appliance can scale up to 6 racks, and the HP counterpart can scale up to 7 racks. A key concept in understanding PDW is that it represents a scale-out solution, as opposed to a scale-up solution. When users run T-SQL queries against PDW, the queries are broken down and distributed among all required nodes. The processing itself is therefore distributed and not centralized. As nodes are added to the appliance, the raw processing power of PDW increases in an essentially linear manner. Storage in the PDW appliance is both replicated and distributed. Smaller tables (approximately 5 GB or smaller) are replicated among all nodes for improved performance. Larger tables are broken up and distributed across nodes. PDW Hardware Specifications The PDW versions from both Dell and Hewlett-Packard are not identical, but they do share some common specifications. First, both vendors assign 256 GB of RAM to each physical node in the appliance. Second, for both Dell and HP, the first rack in the appliance (or only rack, if there s only one) includes one node assigned control and management responsibilities. Microsoft also specifies that one extra node per rack should remain essentially unused and be included for failover, so this is another common element from both vendors. Finally, in both the Dell and HP solutions, nodes are connected with InfiniBand* and Ethernet, both of which are implemented with redundancy. These control and failover nodes along with the redundant networking components occupy 6U in the first (or only) rack, and 5U in all subsequent racks (because the control node is needed only in the first rack). Dell Parallel Data Warehouse Appliance Dell s PDW product is officially called the Dell Parallel Data Warehouse Appliance. The following list provides additional detailed hardware specifications about the Dell PDW configuration options, beyond the elements described above: Basic scale unit of 10U: 3 servers in a 2U enclosure, and two 4U drive arrays Basic scale unit = 3 Dell PowerEdge R620* compute nodes, 2 Dell PowerVault MD3060e* JBOD SAS arrays (102 drives) Up to 3 scale units (9 compute nodes) per rack ¼ 6 racks 3 54 compute nodes total 1, 2, or 3 TB storage capacity per drive ,223.1 TB raw free storage space 79 6,116 TB user storage (with compression) 6U available for customer space on first rack, 7U on other racks

13 HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse HP s PDW product is called the HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse. The HP AppSystem offers a different range of hardware options: Basic scale unit of 7U: two 1U servers and one 5U drive array Basic scale unit = 2 Dell ProLiant Gen8 DEL360 compute nodes, 1 HP P6000 JBOD SAS array (70 drives) Up to 4 scale units (8 computer nodes) per rack ¼ 7 racks 2 56 compute nodes 1, 2, or 3 TB storage capacity per drive ,268.4 TB raw free storage space 53 6,342 TB user storage (with compression) 8U available for customer space on first rack, 9U on other racks These different hardware specifications for the first rack from each vendor are shown in Figure 5. responds to the client with the results of the query. To answer the query, the control node uses its metadata to break up an original query into smaller parts and send these smaller component queries to the appropriate nodes. The control node then compiles into one response the results received from these various nodes and then sends this response to the client. PDW virtualizes all servers on its physical nodes and uses failover clustering to protect these virtualized workloads. No one node (including the control node) represents a single point of failure. Figure 6 shows a view of the PDW from the perspective of an administrator. Figure 6. SQL Server 2012 Parallel Data Warehouse management portal Figure 5. Comparison of SQL Server 2012 Parallel Data Warehouse hardware specificaitons between Dell and HP How PDW Works Despite the many components included in PDW, to external clients the appliance looks just like a single instance of SQL Server T-SQL queries to PDW are directed from clients toward the PDW control node, and the control node eventually

14 Vendor and Appliance Memory (GB) Total Cores Compression EMC Greenplum Data Computing Appliance* IBM PureData System for Analytics N * Microsoft SQL Server 2012 Parallel Data Warehouse (Dell)* Oracle Exadata Database Machine X3-2* Teradata Data Warehouse Appliance 2690* User Storage (TB, Compressed) List Price to $2,000,000 n/a to $1,599,000 2, to $1,569,970 2, to $13,580, to $1,168,000 Table 1. Comparison of hardware specifications for full-rack implementations of data warehousing appliances from several vendors Vendor I/O Bandwidth (GB/sec) Price per GB/sec of I/O Bandwidth EMC 24 $83,333 Microsoft 108 $14,537 Oracle 100 $136,440 Table 2. Comparison of input/output (I/O) rates among three data warehouse appliances Comparison of Data Warehousing Appliances Within the playing field of data warehousing appliances, Microsoft makes essentially three pitches in favor of PDW: that it offers a great value, that it has excellent performance, and that it connects seamlessly to Apache Hadoop. Table 1 compares hardware specifications for full-rack implementations of data warehousing appliances from various vendors. 12 Table 2 compares input/output (I/O) rates among three data warehouse appliances. 13 With respect to value, an advantage highlighted by Table 1 is that, compared to other solutions, the SQL Server 2012 PDW displays a low cost per unit storage. Microsoft is able to attain these cost reductions mainly by using direct-attached storage (DAS) with its nodes instead of storage area network (SAN) storage, an option made possible because of a Windows Server 2012 feature called Storage Spaces. Storage Spaces allows flexible SAN-like storage provisioning from a JBOD SAS array that is attached to one node only. With respect to performance, Table 2 shows that the I/O throughput of PDW compares favorably with that of the EMC and Oracle solutions. (Data from IBM and Teradata are not available.) Microsoft claims PDW is also able to speed I/O performance (over 10 times) through the use of columnstore indexing and batch processing, both members of the xvelocity* family of memory-optimized technologies in SQL Server Regarding the integration of PDW and Apache Hadoop, Microsoft is careful not to claim that it is unique among data warehouses in offering this capability. In fact, all of the data warehouse appliance vendors mentioned in Table 1 have presented a product roadmap involving some integration with Apache Hadoop. Of these, however, the PolyBase roadmap is distinctive in its plan to deeply integrate Apache Hadoop processing with PDW processing. The next section provides more detail about PolyBase and its product roadmap.

15 Big Data Integration PolyBase PolyBase is a PDW-only feature that provides a means to integrate Apache Hadoop data with SQL Server and to make this data accessible through T-SQL queries. The manner in which PolyBase integrates T-SQL with Apache Hadoop is illustrated in Figure 7. Apache Hadoop source, query results will show the updated data. However, query performance isn t optimized. Figure 8 shows an example of a CREATE EXTERNAL TABLE statement that creates a table called ClickStream from an Apache Hadoop file called employee.tbl. Figure 8. Example of a CREATE EXTERNAL TABLE statement from Apache Hadoop CREATE TABLE AS SELECT Statement The CTAS statement can be run after an external table is created. When a PDW administrator creates a table as a select statement from an external table, this external data is physically copied into a SQL table that resides in PDW. In this case, PDW can perform parallel processing on the remote Apache Hadoop data, and when the table is created, the administrator can optimize its storage in PDW by distributing it across nodes. The imported Apache Hadoop data then persists in PDW until the new table is deleted. Creating a table as a select statement optimizes query response times, but the imported data is not updated from its source if that source data should ever change. Figure 7. PolyBase integration of T-SQL with Apache Hadoop To achieve this integration, PDW must first be connected to an Apache Hadoop source. Administrators can then integrate the external Apache Hadoop data into SQL data on PDW by using either a CREATE EXTERNAL TABLE statement or a CREATE TABLE AS SELECT (CTAS) statement. Administrators can also push data from PDW to Apache Hadoop by means of a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement. CREATE EXTERNAL TABLE Statement When an external table is created from Apache Hadoop data, PDW frames a SQL structure around the external data. Users can then query the external table as if it were a normal table residing in a SQL database. If the data is updated in the The following example shows a basic CTAS statement: CREATE TABLE ClickStream _ PDW WITH DISTRIBUTION = HASH(url) AS SELECT url, event _ date, user _ IP FROM ClickStream Note that Apache Hadoop data does not need to persist as an isolated table. Imported data can also be mashed up with native relational data through JOIN statements.

16 Querying the Data After data is imported into a table in PDW, users can perform ordinary T-SQL queries on it, as shown in the three examples in Figure 9. SQL and Apache Hadoop data, makes a cost-based decision about when to process queries with SQL and when to push queries onto HDFS data as MapReduce jobs. The goals of PolyBase phase 3 have not been finalized, but Microsoft has publicly stated that it is considering compatibility with Apache Hadoop MapReduce 2.0 (YARN) and more efficient alternatives to MapReduce. No dates have been given for the release of PolyBase phase 2 or phase 3. Figure 9. Examples of T-SQL queries performed on data imported to a SQL Server 2012 Parallel Data Warehouse table Pushing Data to Apache Hadoop from PDW Finally, PDW administrators also have the option of migrating data PDW to an Apache Hadoop source. To achieve this, a CETAS statement is used, as in the following example: CREATE EXTERNAL TABLE ClickStream (url, event _ date, user _ IP) WITH (LOCATION = hdfs://myhadoop:5000/ users/outputdir, FORMAT _ OPTIONS (FIELD _ TERMINATOR = ' ')) AS SELECT url, event _ date, user _ IP FROM ClickStream _ PDW Roadmap for PolyBase Currently, PolyBase is in phase 1 of a multi-phase rollout. Phase 1 allows data to be imported directly from and exported directly to HDFS on Apache Hadoop. Because MapReduce is bypassed and parallel processing is used, performance for import and export operations is normally optimized. Phase 2 goes beyond integrating Apache Hadoop data into PDW and will move toward integrating the processing power of Apache Hadoop clusters into PDW queries. This next phase will include a PDW query optimizer that, for all queries of both Besides this roadmap for planned functionality in PolyBase, Microsoft has occasionally hinted that the technology will eventually be integrated into its SQL Server product, perhaps as soon as the next release (SQL Server 2014). ETL in PDW The Microsoft specifications for PDW do not include any ETL server, such as a dedicated instance of SQL Server loaded with SSIS. Both Dell and HP include SQL Server tools installed on the control node, but it is expected that many firms will use a pre-existing ETL server to connect to PDW. Using SSIS packages to import data is sensible if these packages are already created. It should be noted, however, that in PDW, ordinary T-SQL queries offer much better performance as a way to import data. 15 Microsoft s Claims about PDW This paper focuses on Microsoft s comprehensive data strategy and how the various components of that strategy might work together. Although Microsoft makes claims about PDW that relate to its value and its performance, these claims do not relate to its big data strategy. One important claim that Microsoft is making about PDW, however, does relate to its comprehensive data strategy: that PDW integrates Apache Hadoop data with traditional relational data. We will look at two representative examples

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft. SQL Server 2014 What s New? Christopher Speer Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.com The evolution of the Microsoft data platform What s New

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation 1/ What is Packaged IP? Categorizing the Options 2/ Why Offer Packaged IP?

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Building a BI Solution in the Cloud

Building a BI Solution in the Cloud Building a BI Solution in the Cloud Stacia Varga, Principal Consultant Email: stacia@datainspirations.com Twitter: @_StaciaV_ 2 SQLSaturday #467 Sponsors Stacia (Misner) Varga Over 30 years of IT experience,

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Modern Data Warehousing

Modern Data Warehousing Modern Data Warehousing Cem Kubilay Microsoft CEE, Turkey & Israel Time is FY15 Gartner Survey April 2014 Piloting on premise 15% 10% 4% 14% 57% 2014 5% think Hadoop will replace existing DW solution (2013:

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com;

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big

More information

Understanding Microsoft s BI Tools

Understanding Microsoft s BI Tools Understanding Microsoft s BI Tools The purpose of this document is to provide a high level understanding of what tools Microsoft has to support the concepts of data warehousing, business intelligence,

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Evolution from Big Data to Smart Data

Evolution from Big Data to Smart Data Evolution from Big Data to Smart Data Information is Exploding 120 HOURS VIDEO UPLOADED TO YOUTUBE 50,000 APPS DOWNLOADED 204 MILLION E-MAILS EVERY MINUTE EVERY DAY Intel Corporation 2015 The Data is Changing

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Parallel Data Warehouse

Parallel Data Warehouse MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Structured data meets unstructured data in Azure and Hadoop

Structured data meets unstructured data in Azure and Hadoop 1 Structured data meets unstructured data in Azure and Hadoop Sameer Parve, Blesson John sameerpa@microsoft.com Blessonj@Microsoft.com PFE SQL Server/Analytics Platform System October 30 th 2014 Agenda

More information

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary:

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary: Whitepaper: Solution Overview - Breakthrough Insight Published: March 7, 2012 Applies to: Microsoft SQL Server 2012 Summary: Today s Business Intelligence (BI) platform must adapt to a whole new scope,

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

343 Industries Gets New User Insights from Big Data in the Cloud

343 Industries Gets New User Insights from Big Data in the Cloud 343 Industries Gets New User Insights from Big Data in the Cloud Published: May 2013 The following content may no longer reflect Microsoft s current position or infrastructure. This content should be viewed

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to: Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to: Run your most demanding mission-critical applications. Reduce

More information

WINDOWS AZURE DATA MANAGEMENT

WINDOWS AZURE DATA MANAGEMENT David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A

More information

Microsoft SQL Server 2012: What to Expect

Microsoft SQL Server 2012: What to Expect ASPE RESOURCE SERIES Microsoft SQL Server 2012: What to Expect Prepared for ASPE by Global Knowledge's Brian D. Egler MCITP-DBA, MCT, Real Skills. Real Results. Real IT. in partnership with Microsoft SQL

More information

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server White Paper Published: January 2012 Applies to: SQL Server 2012 Summary: This paper explains the different ways in which databases

More information

Updating Your SQL Server Skills to Microsoft SQL Server 2014

Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Details Course Outline Module 1: Introduction to SQL Server 2014 This module introduces key features of SQL Server 2014.

More information

Big Data Processing: Past, Present and Future

Big Data Processing: Past, Present and Future Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,

More information

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All

More information

Updating Your SQL Server Skills to Microsoft SQL Server 2014

Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course 10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014 Page 1 of 8 Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course 10977B: 4 days; Instructor-Led Introduction This

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

WINDOWS AZURE AND WINDOWS HPC SERVER

WINDOWS AZURE AND WINDOWS HPC SERVER David Chappell March 2012 WINDOWS AZURE AND WINDOWS HPC SERVER HIGH-PERFORMANCE COMPUTING IN THE CLOUD Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents High-Performance

More information

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014 www.etidaho.com (208) 327-0768 Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014 5 Days About this Course This five day instructor led course teaches students how to use the enhancements

More information

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft Introducing the Reimagined Power BI Platform Jen Underwood, Microsoft Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor and manage user

More information

Updating Your SQL Server Skills from Microsoft SQL Server 2008 to Microsoft SQL Server 2014

Updating Your SQL Server Skills from Microsoft SQL Server 2008 to Microsoft SQL Server 2014 Course Code: M10977 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,025 Updating Your SQL Server Skills from Microsoft SQL Server 2008 to Microsoft SQL Server 2014 Overview This five-day instructor-led

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional Microsoft technológie pre BigData Ľubomír Goryl Solution Professional Tradičný prístup Breaking points of traditional approach Breaking points of traditional approach Breaking points of traditional approach

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal BIG DATA AND MICROSOFT Susie Adams CTO Microsoft Federal THE WORLD OF DATA IS CHANGING Cloud What s making this possible? Electrical efficiency of computers doubles every year and ½. Laptops and mobile

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions Writers: Barbara Kess and Dan Kogan Reviewers: Murshed Zaman, Henk van der Valk, John Hoang, Rick Byham Published: October

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise

The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise The Role Polybase in the MDW Brian Mitchell Microsoft Big Data Center of Expertise Program Polybase Basics Polybase Scenarios Hadoop for Staging Ambient data from Hadoop Export Dimensions to Hadoop Hadoop

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014

10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014 10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Details Course Code: Duration: Notes: 10977B 5 days This course syllabus should be used to determine whether the course is appropriate

More information

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics 1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions

More information

Course 10977: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Course 10977: Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course 10977: Updating Your SQL Server Skills to Microsoft SQL Server 2014 Type:Course Audience(s):IT Professionals Technology:Microsoft SQL Server Level:300 This Revision:B Delivery method: Instructor-led

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information