We are building the next generation of Big Data and Analytics solutions!
Background 26 years Experience IT Industry 12 Years Solutions Architect - International Profile Passionate about Technology Genuine Interest In All Things Digital 10 Years - IT Director - Private Banking OSLO Resourceful 4 Years - CEO - Cloud Explorers - Big Data International Innovative Out of the Box Thinker Disruption I DATA SCIENCE Stephen Karl Ranson CEO
Big Data - Less Fluff! More Concrete! PEOPLE ARE DOING IT! NOW! ITS TIME TO BEGIN 2015!!! If you wait you will be too late!
Big Data - A Brief History of Big Data 2001 - (During the BI Boom) - META Group (Now Gartner) Analyst Doug Laney wrote a report addressing the growth challenges and opportunities facing future Data Warehouse/BI Projects in terms of 3(V s) dimensions (V)olume, (V)ariety and (V)elocity (2001 the Internet was 458,000,000 users (7.6%) world population, with 29,254,370 sites online) (2015 the Internet is 3,074,220,500+ users (43.9%) world population (148% growth over 14 years), with 1,219,400,120+ sites online (190% growth over 14 years)) (Internet used for websites, mail, file transfer, online services (Cloud), streaming content, devices, voip. With devices from browsers, smartphones, cameras, cars, televisions, machinery, Embedded devices, refrigerators.war!) 2004 - Doug Cutting whilst working on an open source project Nutch Reads two white papers from Google explaining their approaches to problems he was trying to solve. The papers described (GFS - Google File System) & (MR - Map Reduce) implemented initial thinking into Nutch. 2006 - Yahoo Hired Doug and his team and they branched their work out of Nutch into a new project to help Yahoo solve many of the same challenges that Google faced/solved, they called it HADOOP <- V. Important! 2012 - Gartner Revisits the 3V s with more perspective, added some more V s and tried to give a clearer definition: Business Intelligence - Uses descriptive statistics with data with high information density to measure things, detect trends etc.; Big Data - Uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships and causal effects) from large sets of data with low information density to reveal relationships, dependencies and perform predictions of outcomes and behaviors. Simply put Big Data is a large volume unstructured data which cannot be handled by standard data management systems like DBMS, RDBMS or ORDBMS
Big Data Characteristics (The 6V s and a C) Volume The quantity of data that is generated is very important in this context. It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered Big Data or not. The name Big Data itself contains a term which is related to size and hence the characteristic. Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data. Velocity - The term velocity in the context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development. Variability - This is a factor which can be a problem for those who analyse the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. Veracity - The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data. Value - Enabling Business decisions and giving Business insight and advantage. Complexity - Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data. This situation, is therefore, termed as the complexity of Big Data.
Big Data Formats Structured - High degree of organization and typically found in relational databases or spreadsheets. Maps easily to data types or user defined types based on standard types. Can be searched using standard algorithms and manipulated in well defined ways. Semistructured - (Such as log files) a little more difficult to understand. Normally stored as text files with some basic form of order such as tab delimited or comma separated columns. Unlike a database that returns known meaning for a resulting column, each column needs to be assigned a type and meaning to any extracted data elements. Unstructured - no advantages of having structure coded into the data set. (Still in can data stored in a computer ever be unstructured?) This is data with too little structure to make sense of it. Traditional approaches for analysis is difficult and costly. Also typically the volumes are high with this class of data. Five main types of Data found today Sentiment data Clickstream data Sensor data or machine data Gelocation data Server logs
Big Data Sources
Advanced Analytics SINGLE VIEW OF ENTITY The first of three common patterns in analytics applications, a single view of an entity (like a customer, product or a machine) is now possible because platforms like Hadoop can store and organize previously unmanageable columns and varieties of data. DATA DISCOVERY New, voluminous data types such as machine and sensor data, geolocation data, clickstream data and sentiment data are valuable when correlated with other data sets in a shared enterprise data lake. The patterns within the data lake can then fuel machine learning applications. PREDICTIVE ANALYTICS As data scientists and analysts reveal patterns and correlations inside massive data sets, new models emerge to explain business performance. Most importantly, these models can reliably predict future events based on previously dissociated data.
HADOOP Hello my name is Hadoop! I am named after Doug Cutting s son s toy yellow elephant :-) WHAT DO I DO? + = Big Data! STORAGE Elastic/Reliable/Unlimited COMPUTATION Framework Scaleable Data Crunching / Analysis (C) Copyright Cloud Explorers Solutions AS 2015
HADOOP IS AN AQUARIUM? I provide a powerful environment for the Big Data Fishes :-)!
Lets Meet some of the FISH! (14+ and growing) Ambari - HADOOP ADMIN TOOLS FOR MANAGING ANDMAINTAINING A CLUSTER Avro - FRAMEWORK FRO DATA SERIALIZTION INTO A COMPACT BINARY FORMAT Flume - DATAFLOW SERVICE FOR MOVEMENT OF LARGE VOLUMES OF LOG FILES INTO HADOOP HBase - DISTRIBUTED COLUMNAR DATABASE USING HDFS (LARGE TABLES) HCatalog - PROVIDES A RELATION VIEW OF DATA STORED IN HADOOP Hive - DISTRIBUTED DATAWAREHOUSE FOR HDFS & SQL STYLE QUERY LANGUAGE (HIVEQL) Solr - POWERFUL NOSQL SEARCH ENGINE WITH INDEXES, FACETS, PIVOTS, SEMANTIC Hue - ADMINISTRATIVE INTERFACE FOR HADOOP (GUI) Mahout - LIBRARY OF MACHINE LEARNING ALGORITHMS IMPLEMENTED AS MAP REDUCE ON HADOOP Oozie - WORKFLOW MANAGEMENT TOOL HANDLING SCHEDULING AND CHAINING Pig - PLATFORM FOR ANALYSIS OF VERY LARGE DATA SETS WITH ITS OWN QUERY LANGUAGE PIG LATIN Sqoop - TOOL FOR EFFICIENTLY MOVING LARGE AMOUNTS OF DATA FROM DBS TO HDFS ZooKeeeper - SIMPLE INTERFACE TO CENTRALIZED CO-ORDINATION OF SERVICES Apache Storm - REAL TIME DATA STREAMING WITH REAL TIME SEARCH
THE BIG DATA - SHIPPING NOW! You Can Start Today!
Big Data - The Data Lake LETS STOCK OUR LAKE! WITH OUR DATA FISH!
Big Data - The Data Lake Business Data Profiles Clients Identity Relationships E-mail Documents Reports Facts Analysis/Mining History CRM ERP/Accounting Transactions Content DataWarehouses Enterprise Business Data
Omni Channel/Contact Points - Marketing Conversation Channel Integration Channel Usability Channel Transparency Informative Brand Experience Awareness Convenient Web Coupons Vocabulary Business KPI s SMS Mobile E-Commerce Data Consistent Continuity Real time Social Big Data KPI s Email Response Data TV/Radio? Post Store Store In Sync Store Staff Knowledgeable & Informed
Omni Channel/Contact Points - Really Means - DATA! Omni Channel/Contact PointsRelationship - Critical Relationship DATA! Web Coupons Vocabulary Business KPI s Your Mobile Social Big Data KPI s Business Email TV/Radio? Post Store
Quality Data Commercial Data
Free & Diverse Data OPEN Data GEO Data Geodata
Social Data Geodata Social Data
BIG DATA ACTIONS! Entirely New New Generation DATA Big Data DRIVEN KPI s Business BUSINESS! KPI s Richer Vocabulary Information Insight Knowledge
Big Data- Data Lake Open Data Free Public Data Environment, Infrastructure, Finance, Health, Education, Reference, Society Commercial Quality Data Brokers, High quality lists, Survey, Panel Data, Clickstream, Directories, Telephone numbers, Board and Company roles... Search Segment Web Commerce Existing Web Commerce Sites Logs, Trackers, Purchases, Abandoned Carts, Interest, Usage... Client Data Facts Analysis/Mining History Clients Identity Relationships Profiles Reports Documents E-mail CRM Transactions Accounting Content DataWarehouses SPEED LAYER Visualize Analysis Reports Dashboards PUBLISH LAYER Response Data Response Data from all Channels Web Logs, Mail Logs, Telefoni Logs, Social... Red Thread across all channels... BATCH LAYER Geo Data Mapping and Geo spatial Data, size of house, nearest shops, cafes... Real Time Sensors - STORM ibeacon, Wifi, Web Sessions, Sensors POS, Payment Terminals, Temperature, Weather, Traffic,News Events, IOT Infographics API/Events Social Social Data, FB, Twitter, Instagram, Snapchat, Google, Blogs, Natural Language Processing, Sentiment Analysis, Networks, Likes, Interests, Trends, Discussions, Product Awareness and Feedback 1 2 3 EXTRACT Import Wash / Enrich Segment / Search Visualize / Analysis Search/ Dashboards Reports / Infographics / Events Export Publish Operationalize ACTIONS
Seeing is believing!
Quick Demo! - BIG DATA IS NOW!
THANK YOU :-)