WELCOME TO THE WORLD OF BIG DATA. NEW WORLD PROBLEMS, NEW WORLD SOLUTIONS TECHNOLOGY by Zachary Zeus Data in our world has been exploding. According to IBM research, 90% of today s data was created in the last two years alone and every day sees another 2.5 quintillion bytes. This is the world of Big Data the mirror refection of human life on the planet today and inconceivable insight into people s behaviours, decisions and daily transactions. According to recent research by McKinsey, Big Data is the next frontier in global industry s quest for innovation, competition, and productivity, and is already providing sweeping change in a diverse range of sectors from medical research and crime prevention. WHAT IS THE DEFINITION OF BIG DATA? Big data refers to datasets that grow so large they become complicated to work with using on-hand database management tools. These difficulties include capturing the data, storage, searching, sharing, analytics and visualizing. Generally there are three big data types: Transactional (reserved mainly for credit card companies and financial services) Sub-transactional typically the events leading to transactions Non-transactional (websites, blogs etc). Whichever the category, we tend to categorize big data as: High Volume because it s too big to be analyzed using traditional methods
High Velocity in that much of this is real time data and needs to analyzed quickly to hold value High Variety typically unwieldy data that comes in many types and formats UNDERSTANDING THE POTENTIAL FOR YOUR BUSINESS It would be tempting to view the big data phenomenon as a social mediagenerated bubble, yet increasing numbers of global industries are uncovering compelling insights using big data analytics - online businesses, retail organisations and many media/marketing companies. Equally, a recent MGI study concluded that in the developed economies of Europe, government administrators could save more than 100 billion ($149 billion) in operational efficiency improvements alone by using big data. 1 However the key word here is could because for most industries today the ability to analyse these data sets is beyond reach. Big data cannot be analysed using database management tools and is beyond the capacity of traditional BI databases. To make sense of big data you need access to tools optimized for massive data crunching, and more importantly; access to analytics that can make sense of this data for your business. BIG DATA SOLUTIONS Unsurprisingly there are a number of Big Data solutions that have launched in the marketplace and these vary depending on your big data requirements. The system for processing weblog data, for example, is very different to applications enabling corporate treasury departments to report on intraday cash balances. Two solutions gaining global attention are Hadoop - an 1 Big data: The next frontier for innovation, competition, and productivity MGI May. 2011 by James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers
Apache developed framework that allows for the processing of large data sets across clusters of computers using a simple programming model and under a free licence and NoSQL databases non-sequential databases that uses clusters of commodity servers to manage huge data and transaction volumes and is highly-optimized for retrieval and record storage, elastic scaling and allows you to store virtually any structure of data (MongoDB, CouchDB, and Cassandra). CHALLENGES AND CONSTRAINTS When explaining what big data solutions are it s important to clarify what they are not. Firstly, big data solutions they are not databases. They don t provide the capabilities that BI toolsets expect of a database and they re comparatively slow. The smallest query possible on Hadoop, for example, has an execution time that is much slower than that of a database. It is optimized for executing very intensive data processing tasks on very large amounts of data and not for quick queries. Equally, from a BI perspective, these solutions offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and most BI tools don t provide connectivity to big data sources. This is proving hugely problematic and frustrating for organizations wanting to use their data for intelligence gathering firstly because it s not always possible to resource the highly technical users to programme the queries, and secondly because even these skilled users are limited in the access and visibility these tools allow without huge time and project commitments. THE OPEN SOURCE OPPORTUNITY Interestingly the historical moves towards 'open source' has created exactly that intelligence opportunity. Rather than committing to long term, multimillion dollar data projects, some organisations have been able to use open tools to set up short, experimental projects. These have enabled them to
explore the true value of their data and build tools and methods to make the big data mining process increasingly easier. Equally, Pentaho the leaders in Big Data analytics, now has the capability to significantly lower the technical barriers of Hadoop and No SQL tools using an environment that s logical for users to understand. The system s designed to integrate data, leverage the full capabilities of each big data platform and enables users to access the information they need in a highly visual format. In this sense Pentaho is being deployed to make it easier for groups of users (not just the technically specialised) to conduct useful analytics by sitting on top of unstructured data sources and providing an end-to-end BI solution including reporting tools, ad hoc query options and genuine interactive analysis. A BUSINESS INTELLIGENCE APPROACH This BI focus is important because without it your Big Data analysis is virtually worthless. As MIT senior lecturer Jonathan Byrnes warned recently in an article for Leading Company: initiatives have to be co-ordinated and focused on the right long-term strategic goals to be effective. If the availability of big data encourages a massive flock of independent tactical initiatives, it will do more harm than good. 2 It makes good business sense to secure a big data vendor who has business intelligence capabilities and can work closely with you to determine which data sets will have high value analytical uses for your company. Equally though, the industry needs to accept that this is new world terrain - you don t know what you need until you can explore the true value of your data 2 www.leadingcompany.com.au/big-data/big-data-big-opportunity-or-big-headache (11 March 2012)
and for that you need cost-effective tools that allow for short, experimental projects. THE WAY FORWARD The implications of big data and the increasing volumes and detail of the information will continue to multiply for the foreseeable future. Enterprise driven data is predicted to grow by 650% over the next few years and 80% of that will be unstructured 3. Your customers will continue to generate this information but how you access it (and how you make sense of it) could be the key differentiator between you and your competitors. Finally, when you re choosing a vendor for your big data analytics, keep data security high and make sure there s synergy between the big data solution and your existing infrastructure. Big data analysis is not a one size fits all solution. Make sure your vendor understands your business goals and the nuances and implications of your data. Without that insight your access to big data analysis will fall far short of the business intelligence it should provide. To talk to BIZCUBED about our Pentaho Big Data solutions, email contact@bizcubed.com.au or call 02 9007 9887. 3 Gartner webinar Technical Trends you can t afford to ignore, January 2010.