Age of Presented by: Mohammad Iqbal BCM -2014
Agenda Big? Big evolution from
Big? Name Symbol Value Kilobyte KB 10^3 BIG DATA Megabyte MB 10^6 Gigabyte GB 10^9 Terabyte TB 10^12 Petabyte PB 10^15 So large data that it becomes difficult to process it using the traditional system Exabyte EB 10^18 Zettabyte ZB 10^21 Yottabyte YB 10^24 Big? Big
Difficult to process by Traditional System Unable to send Unable to View 100 MB document Unable to Edit 100 GB document Depends on capability of system 100 TB document Big? Big
Organization/Context Specific 500 TB Text,Audio,Video data per day Big Date NOT a Big data Depends on capabilities of the organization Company A Company B Big? Big
Areas of Challenges Capture search Curation Sharing Storage Transfer Anlaysis Visualization Big? Big
Big Big Large & growing files At High speed In various Format V^3 comes at high speed result in large file This files comes in various formats VELOCITY VOLUME VARIETY Big? Big
Structured / Unstructured Challenge /Opportunity Mostly wasted Used in decision making Unstructured 90% Structured 10% To analyze & extract meaningful information Big? Big
Users Applications Systems Large & growing files ( files) Sensors Big? Big
Generation point Examples Mobile devices Machine Sensors Microphones cameras Readers/Scanners Social Media Science facilities Software/program Big? Big
Sample Events generating Every day, we create 2.5 Exabytes of data i.e 2.5 billion GB, so much that 90% of the data in the world today has been created in the last few years alone. CERN Atomic facility generates 40 TB data per second. Twitter generates 12 TB of data every day. Airbus A380 generates 10 TB every 30 minutes of flight. About 650TB generated in one flight. In 2009 total data in world was estimated to be 1 ZB. By 2020 estimated to be 35 ZB. (Source :IBM.com) Big? Big
Collect Analyze Understand Big? Big
Applications Companies gaining edge by collecting,analyzing and understanding information. Government forecasting events and taking proactive actions. Big? Big
Not able to handle Big data Created to handle big data Traditional Systems (e.g RDBMS,SQL) tool (e.g NoSQL) Time Big? Big
Traditional Enterprise Approach Only So much data could be processed Processing Limit Powerful Computer Big? Big
Modern s approach Computation Combined result Computation Computation Computation Big? Big
s s Hive Map Reduce HBase Mahout File System HDFS Pig Oozie Projects Source :hortonworks/hadoop/hdfs/.com/ Flume Scoop Big? Big
MASTER Task tracker Job Tracker DATA Application Node Name Node Slaves Task tracker Task tracker Task tracker Task tracker Node Node Node Node
MASTER can be taken directly Task tracker Job Tracker DATA Application Node Name Node Know where data residing Slaves Task tracker Task tracker Task tracker Task tracker Node Node Node Node
HDFS vs GFS Similarity with file system (GFS) MapReduce Back in 1990 search engine supported by: Excite Altavista Lycos Infoseek Big? Big
Victory 1995 Excite 2000 Altavista Lycos Big? Big
evolution from GFS paper released by released paper on MapReduce created by Doug & Cafarella at Yahoo! (Nutch search engine) Yahoo donated the project to Apache 2003 2004 2005 2006 Source : & Nutch white papers Big? Big
is here!! Big? Big
scientists with just two years' experience can earn between $200,000 and $300,000 a year (wall street journel). Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day,.(wall street journel). is a super hot up-and-coming "big data" technology. (Business insider.com). Many other data scientists, especially at data-driven companies such as, Amazon, Microsoft, Walmart, ebay, LinkedIn, and Twitter, have added to and looking for developing the tool kit. (Harvard business review). "People are slapping buzzwords as on résumés and looking to get 50 or 100 percent more, and they're getting it," said Scott Gnau, president of Teradata Lab. Big? Big
References Dean & Sanjay (2004)> MapReduce: Simplied Processing on Large Clusters.google.com Dogh Cutting Nutch(2005): A Flexible and Scalable Open-Source Web Search Engine.yahoo.com Sanjay & Howard (2003): The File System, google.com https://www.ibm.com/developerworks/vn/library/contest/dwfreebooks/tim_hieu_big_/understanding_big.pdf [Accessed date 27 th nov 2014] http://www.businessinsider.com/10-tech-skills-that-will-instantly-net-you-100000- salary-2012-8?op=1[accessed date 27 th nov 2014] Big 's High-Priests of Algorithms,http://online.wsj.com/articles/academicresearchers-find-lucrative-work-as-big-data-scientists-1407543088[Accessed date 27 th nov 2014]
Thank you for your attention Q/A