This - rather famous - image is a visualization of the data collected by NASA s WMAP mission. It shows the cosmic microwave background radiation



Similar documents
Big Data and Analytics: Challenges and Opportunities

COMP9321 Web Application Engineering

Invenio: A Modern Digital Library for Grey Literature

The Intersection of Big Data, Data Science, and The Internet of Things

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

High Energy Physics. Lecture 4 More kinematics and a picture show of particle collisions

REALIZING EINSTEIN S DREAM Exploring Our Mysterious Universe

The Predictive Marketer Episode 1 Published: October 8, 2015

Architecture & Experience

Top 10 Business Intelligence Trends

SURVEY REPORT DATA SCIENCE SOCIETY 2014

White Paper: Big Data and the hype around IoT

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Big Data Executive Survey

Cloud Service Management

An Introduction to Cloud Computing Concepts

Is the Financial Services Industry ready for the explosion of Internet of Things?

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

The Birth of the Universe Newcomer Academy High School Visualization One

DATAOPT SOLUTIONS. What Is Big Data?

Trends in Business Intelligence

Managing the Cloud as an Incremental Step Forward

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Take Online Lead Generation to the Next Level

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

Cloud Computing and Big Data What s the Big Deal

Big Data Analytics. Optimizing Operations and Enabling New Business Models

Outline. What is Big data and where they come from? How we deal with Big data?

Data Mining in the Swamp

Science+ Large Hadron Cancer & Frank Wurthwein Virus Hunting

Oracle Big Data SQL Technical Update

The Wilson Chamber. Karlsruhe, Fall 2002

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

HDFS Cluster Installation Automation for TupleWare

Why your business decisions still rely more on gut feel than data driven insights.

hij Teacher Resource Bank GCE Physics A Other Guidance: Particle Physics By J Breithaupt

CAD and Creativity. Contents

The 3 questions to ask yourself about BIG DATA

VantagePoint Getting Results Guide

Analytics in Days White Paper and Business Case

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Leveraging the power of social media & mobile applications

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Managing Existing Mobile Apps

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

Big Data Big Deal? Salford Systems

Cloud Computing and Big Data. What s the Big Deal?

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Databricks. A Primer

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Big Data. Fast Forward. Putting data to productive use

SGN Consolidates Cloud Footprint on Dedicated Servers And Saves 60%

INTRODUCING RETAIL INTELLIGENCE

Information Technology Outsourcing (ITO) Market Update: March 2010 Preview Deck Topic: Hype and Reality of Cloud Computing Mind the Gap!

Blazent IT Data Intelligence Technology:

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

Accelerating Experimental Elementary Particle Physics with the Gordon Supercomputer. Frank Würthwein Rick Wagner August 5th, 2013

Big Data Explained. An introduction to Big Data Science.

Top 10 IT Trends that will shape David Chin Chair BICSI Southeast Asia

Big Data Analytics Empowering SME s to think and act

Marketing Report 2015

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Best practices for evaluating and selecting content analytics tools

How To Write A Monitoring System For Free

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

STAND THE. Data Center Optimization. Q&A with an Industry Leader

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

From Particles To Electronic Trading. Simon Bevan

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Decisyon/Engage. Connecting you to the voice of the market. Contacts.

MAKING YOUR COMPANY BECOME DATA-DRIVEN

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Five Strategies to Build a Successful Marketing Campaign

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Where is Fundamental Physics Heading? Nathan Seiberg IAS Apr. 30, 2014

Zero-in on business decisions through innovation solutions for smart big data management. How to turn volume, variety and velocity into value

DEFINITELY. GAME CHANGER? EVOLUTION? Big Data

Big Data Comes of Age: Shifting to a Real-time Data Platform

Do you know how your TSM environment is evolving?

end to end marketing automation

Data Backups in the Clouds

The Enterprise Data Hub and The Modern Information Architecture

Big Data Everywhere. Chicago

Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale

Cloud Computing: Making the right choices

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

HOW SOCIAL MEDIA IMPACTS SEO? a publication by

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Connecting Clouds The Hybrid Opportunity for Cloud Service Providers

Vinay Parisa 1, Biswajit Mohapatra 2 ;

Transcription:

1

This - rather famous - image is a visualization of the data collected by NASA s WMAP mission. It shows the cosmic microwave background radiation variation of the universe. The image confirms that, with very high probability, the shape of the universe (https://en.wikipedia.org/wiki/shape_of_the_universe) is flat. Because of the general theory of relativity, the shape of the universe provides insight into the contents of the universe: It is comprised of approx. 5% of «normal» matter as described by our current understanding of physics, known as the Standard Model - of about 67% «Dark Energy» and about 27% «Dark Matter». 2

Contrary to earlier assumptions, the composition of the content of the universe result in an increasing expansion the initial assumption was that the expansion would slow down, until the forces of gravity provided by matter would win over the forces of expansion, followed by an accelerating collapse of the universe. It is obvious that mankind, given the fact that our knowledge of the universe only expands to roughly 5% of its contents, must take effort to better understand what the remaining contents of the universe are. 3

This image from 1955 shows the construction site of the CERRN, the European Organization for Nuclear Research. Founded in 1954, it pursued to main goals: Repair some of the terrible damage done to the European science landscape during the course of the second world war, 2: Research into the details of matter and energy to complete and verify the Standard model and find and study «strange» forms of energy an matter. 4

Particle accelerators are the method of choice: Accelerating and colliding particles yields both elementary particles (by destroying the particle bonds in the colliding particle) and creates conditions similar to those during the first instances of the universe. As we believe that all forms of matter and energy even dark matter and dark energy evolved from these first instances of the universe, we hope to observe how they come into existence and what the properties of these strange forms of matter and energy are. This image shows CERN s first accelerator, the Synchrocyclotron, Build in 1956. It had a key role in the early stages of our understanding of weak interactions, in particular with the fundamental observation of the rare pion decay into an electron and a neutrino by T. Fazzini, G. Fidecaro, A. Merrison, H. Paul and A. Tollestrup in 1958. 5

The Proton Synchrotron (PS) became available in 1959. One of its key achievements was the creation of antinuclei (composition of antiparticles). It was extended significantly during its lifetime: Boosters, composition with linear accelerators, then: Intersecting storage rings (head-on collision in favor of stationary target). The reason protons are favored in particle acceleration is that they have a (relatively) high mass and contain significant numbers of elementary particles, thus providing a great source for both these particles and their combinations (Great presentation: https://www.youtube.com/watch?v=lranu_78scw). 6

It s successor accelerator, the SPS (Super Proton Synchrotron) already was 7 Km in circumference (Startup in 1976). With its 450 GeV of accelerator energy, it was designed to look for matter as it might have been in the first instants of the universe and searched for exotic forms of matter. A major highlight came in 1983 with the Nobel-prize-winning discovery of W and Z particles, with the SPS running as a protonantiproton collider (Source: http://timeline.web.cern.ch/timelines/the-history-of- CERN?page=1). In 1988, the LEP (Large Electron Positron Collider) building commenced 7

In its first phase of operation from 1989 to 1995, electrons and positrons collided in LEP at 91 GeV. The aim was to produce Z bosons. OPAL accumulated millions of these Z events for high-precision measurements. In LEP's second phase from 1996 to 2000, the collider's collision energy was increased to make pairs of W bosons, and to search for possible new particles and new physics. (Source: http://home.web.cern.ch/about/experiments/opal). 8

And in 1990, the first webserver goes online at CERN (The NeXT server). In 1993, CERN open-sourced the related software, and the rest, as they say, is history. Now this is rather strange why would an organization for nuclear research come up with Software, with HTTP, HTML, a web server? What was the problem Tim-Berners-Lee was trying to solve? 9

Let s take a brief look at the history of computing at CERN. This is CERN s first computer, Willem Klein, aka The human computer. On 27 August 1976, Klein calculated the 73rd root of a 500-digit number in 2 minutes and 43 seconds. 10

And this is CERN s first actual computer, the Ferranti Mercury, June 1958 CERN. The storage facilities in back each feature 20 32 Kbits of memory 11

Skipping into the future: This is the IBM 7090. Each of the tape storage compartments in the back may hold up to 2.5 MB of data 12

Same year, in October: The tape unit reel display system (RDS) shown mounted over tape units in the 6600 computing complex. 13

A bit further into the future, in 1974: This is part of the Magnetic tape computer center s storage. You can see tapes being ordered and send upstairs 14

Where they would be used in the computing center this is a picture from 1983. 15

And this is the computing center in 1985. Trick question: How many computers are in this picture? Three one in the front, two in the back. So what was the problem CERN was trying to solve with the WEB? It must have been data right? Wrong. Data surely was o0ne of CERN s problems. It was rather challenging, but there were systems in place to cope with data management. Such as the TMS, the Tape Management System, that would allow querying for data to known which tape to obtain. The greater problem, however, was. 16

Information. Notice how information is different from data: Data is mostly a store, distribute and CRUD problem but data is not information, it is a pile of bits! Information is what is derived from data, and, up until today, information is almost exclusively understood and exchanged between humans. This picture taken in the beginning of June in a biology laboratory in Bern - shows the typical way in which human beings create, consume and organize information: There is almost no structure, lots of relations, unlimited amounts of formats and languages involved. CERN was losing valuable information, because it was trapped in silos, references could not be traced and people would leave and join the organization. Berners-Lee designed the Web as an 17

Information Management System. Its architecture nodes and links are what Berners-Lee observed to be the common denominator of structure in human information management: There are always the subjects people talk about, and relations between them («circles and arrows»). 17

So what was the relation between data and information at CERN in 1989? 18

Data was mostly a storage & distribution problem. 19

The data was the analyzed by the scientists 20

and the web could then be used to present and share the derived information. CERN was indeed doing «Big Data» science. 21

The Web as we know it started out exclusively as an information management system this was the open source part. 22

In 1995, Amazon started, and more and more organizations would join the web, each bringing with it its private silo of data and information, the latter being subject to semi-manual publication on the web. 23

At the same time, storage prices dropped exponentially. Around 1994-1995, digital storage became more cost effective than storage on paper. 24

In 2005 and 2006, Facebook and Twitter emerge - Companies who s entire business model rests on obtaining and exchanging information between humans 25

Also in 2006, Amazon launches Elastic Cloud Computing. Seeing a lot of skepticism at first, it quickly became clear that providing hardware resources like software was game-changing, as it effectively eliminated the up-front costs and complexity of providing hardware computing resources for providing information on the web or hosting data. It is now considered to be one of Amazon s most profitable lines of business. 26

Also in 2007, the IPhone is launched. Besides being a revolutionary phone, it also presents a new class of device, packed with environmental sensors and permanent connected to the internet, thus providing a great deal of sensor data. 27

In 2011, Hadoop 1.0.0 is released. It completes the available hardware for distributed computing (such as Amazon EC2) with the software to conveniently store and compute on distributed data. 28

- So where are we now, in 2015? We are in the same position CERN found itself in 1989 with some significant differences, though: We have more data, with much less structure, and the web has evolved into a very dynamic, highly sophisticated information management systems, available on an almost infinite number of devices. In addition, access to hardware resources and distributed computing has become very easy thanks to Cloud Computing and Software such as Apache Hadoop. Maybe this ease of access does explain the recent hype around «Big Data» analysis we now suddenly seem to believe that we can, somehow, leverage all the data to derive information that may be a game-changing for the way we present our information in the web. Specifically, we hope to find correlations in the data allowing us to build predictive models. 29

- However, our ability to leverage Artificial Intelligence for this purpose is greatly over exaggerated. Specifically, there is no single-purpose AI able to understand arbitrary data and find relations in it. Quite contrary, selecting, filtering and understanding the data domain is a science job specifically, it requires domain (business) knowledge, a solid mathematical foundation (statistics) and computer science know-how. After all, one has to write code and select algorithms and technologies suitable to both perform the data analysis and build predictive models. - What is available at the moment are two different classes of tools. Data analysis tools are focusing on data retrieval, filtering and data exploration for instance using sampling visualization and allow building and training predictive models. This is the primary toolkit of a Data Scientists. 30

Smart tools are pre-fabricated solutions for instance predictive models or pattern recognition for a specific, well understood domain. Well-known examples include image object or voice recognition. It is these two categories of things that are most likely to have a significant impact on AEM. So, can we already use something like that in AEM? The answer is soon, and, yes. 31

One of the things that got the audience excited during the 2015 Adobe Summit in London was #SmartPic, a Smart Tool that automatically tags DAM images, based on a set of tags obtained from a training set of images. Moreover, the tagged images could then be automatically assigned to campaigns. The data collected in the campaigns such as click and conversion rates can the be feed back as DAM asset meta data. This closes the information loop: The meta data can be used to automatically determine more successful images and select more images like it to be published in the campaigns effectively creating self-optimizing, automated campaigns. #SmartPic is still in the labs, but hopefully will be available soon. 32

Regarding Data Analytics tools, Adobe provides Adobe Big Data Analytics. We will surely will see more integrations of Adobe Analytics data collection and optimization code into AEM projects. 33

Besides Adobe s offerings, there is a mature and growing ecosystem of analytics and smart tools. Here is my selection of key players in both areas. 34

If you are interested in data science or smart tools, here is my personal recommendation of what to look at: The data science toolbox (http://datasciencetoolbox.org/) is a great starting point for big data science and also illustrates the tremendous ease of access to data science tooling we are seeing today. SciPy Scientific Python is the open source competitor to Matlab and pretty much «state of the art» for data science. If you are interested in leveraging Smart Tools, the Alchemy API (http://www.alchemyapi.com/) features very mature APIs I especially recommend giving the Alchemy Vision API a try (http://www.alchemyapi.com/products/demo/alchemyvision). 35

Closing, here is a Slide from David Nüscheler s EVOLVE 2013 presentation on the future of AEM: A tight integration of Analytics data retrieval and content assembly driven by predictive models provided by data scientists. It s up to us to make AEM get there. 36

37