1
This - rather famous - image is a visualization of the data collected by NASA s WMAP mission. It shows the cosmic microwave background radiation variation of the universe. The image confirms that, with very high probability, the shape of the universe (https://en.wikipedia.org/wiki/shape_of_the_universe) is flat. Because of the general theory of relativity, the shape of the universe provides insight into the contents of the universe: It is comprised of approx. 5% of «normal» matter as described by our current understanding of physics, known as the Standard Model - of about 67% «Dark Energy» and about 27% «Dark Matter». 2
Contrary to earlier assumptions, the composition of the content of the universe result in an increasing expansion the initial assumption was that the expansion would slow down, until the forces of gravity provided by matter would win over the forces of expansion, followed by an accelerating collapse of the universe. It is obvious that mankind, given the fact that our knowledge of the universe only expands to roughly 5% of its contents, must take effort to better understand what the remaining contents of the universe are. 3
This image from 1955 shows the construction site of the CERRN, the European Organization for Nuclear Research. Founded in 1954, it pursued to main goals: Repair some of the terrible damage done to the European science landscape during the course of the second world war, 2: Research into the details of matter and energy to complete and verify the Standard model and find and study «strange» forms of energy an matter. 4
Particle accelerators are the method of choice: Accelerating and colliding particles yields both elementary particles (by destroying the particle bonds in the colliding particle) and creates conditions similar to those during the first instances of the universe. As we believe that all forms of matter and energy even dark matter and dark energy evolved from these first instances of the universe, we hope to observe how they come into existence and what the properties of these strange forms of matter and energy are. This image shows CERN s first accelerator, the Synchrocyclotron, Build in 1956. It had a key role in the early stages of our understanding of weak interactions, in particular with the fundamental observation of the rare pion decay into an electron and a neutrino by T. Fazzini, G. Fidecaro, A. Merrison, H. Paul and A. Tollestrup in 1958. 5
The Proton Synchrotron (PS) became available in 1959. One of its key achievements was the creation of antinuclei (composition of antiparticles). It was extended significantly during its lifetime: Boosters, composition with linear accelerators, then: Intersecting storage rings (head-on collision in favor of stationary target). The reason protons are favored in particle acceleration is that they have a (relatively) high mass and contain significant numbers of elementary particles, thus providing a great source for both these particles and their combinations (Great presentation: https://www.youtube.com/watch?v=lranu_78scw). 6
It s successor accelerator, the SPS (Super Proton Synchrotron) already was 7 Km in circumference (Startup in 1976). With its 450 GeV of accelerator energy, it was designed to look for matter as it might have been in the first instants of the universe and searched for exotic forms of matter. A major highlight came in 1983 with the Nobel-prize-winning discovery of W and Z particles, with the SPS running as a protonantiproton collider (Source: http://timeline.web.cern.ch/timelines/the-history-of- CERN?page=1). In 1988, the LEP (Large Electron Positron Collider) building commenced 7
In its first phase of operation from 1989 to 1995, electrons and positrons collided in LEP at 91 GeV. The aim was to produce Z bosons. OPAL accumulated millions of these Z events for high-precision measurements. In LEP's second phase from 1996 to 2000, the collider's collision energy was increased to make pairs of W bosons, and to search for possible new particles and new physics. (Source: http://home.web.cern.ch/about/experiments/opal). 8
And in 1990, the first webserver goes online at CERN (The NeXT server). In 1993, CERN open-sourced the related software, and the rest, as they say, is history. Now this is rather strange why would an organization for nuclear research come up with Software, with HTTP, HTML, a web server? What was the problem Tim-Berners-Lee was trying to solve? 9
Let s take a brief look at the history of computing at CERN. This is CERN s first computer, Willem Klein, aka The human computer. On 27 August 1976, Klein calculated the 73rd root of a 500-digit number in 2 minutes and 43 seconds. 10
And this is CERN s first actual computer, the Ferranti Mercury, June 1958 CERN. The storage facilities in back each feature 20 32 Kbits of memory 11
Skipping into the future: This is the IBM 7090. Each of the tape storage compartments in the back may hold up to 2.5 MB of data 12
Same year, in October: The tape unit reel display system (RDS) shown mounted over tape units in the 6600 computing complex. 13
A bit further into the future, in 1974: This is part of the Magnetic tape computer center s storage. You can see tapes being ordered and send upstairs 14
Where they would be used in the computing center this is a picture from 1983. 15
And this is the computing center in 1985. Trick question: How many computers are in this picture? Three one in the front, two in the back. So what was the problem CERN was trying to solve with the WEB? It must have been data right? Wrong. Data surely was o0ne of CERN s problems. It was rather challenging, but there were systems in place to cope with data management. Such as the TMS, the Tape Management System, that would allow querying for data to known which tape to obtain. The greater problem, however, was. 16
Information. Notice how information is different from data: Data is mostly a store, distribute and CRUD problem but data is not information, it is a pile of bits! Information is what is derived from data, and, up until today, information is almost exclusively understood and exchanged between humans. This picture taken in the beginning of June in a biology laboratory in Bern - shows the typical way in which human beings create, consume and organize information: There is almost no structure, lots of relations, unlimited amounts of formats and languages involved. CERN was losing valuable information, because it was trapped in silos, references could not be traced and people would leave and join the organization. Berners-Lee designed the Web as an 17
Information Management System. Its architecture nodes and links are what Berners-Lee observed to be the common denominator of structure in human information management: There are always the subjects people talk about, and relations between them («circles and arrows»). 17
So what was the relation between data and information at CERN in 1989? 18
Data was mostly a storage & distribution problem. 19
The data was the analyzed by the scientists 20
and the web could then be used to present and share the derived information. CERN was indeed doing «Big Data» science. 21
The Web as we know it started out exclusively as an information management system this was the open source part. 22
In 1995, Amazon started, and more and more organizations would join the web, each bringing with it its private silo of data and information, the latter being subject to semi-manual publication on the web. 23
At the same time, storage prices dropped exponentially. Around 1994-1995, digital storage became more cost effective than storage on paper. 24
In 2005 and 2006, Facebook and Twitter emerge - Companies who s entire business model rests on obtaining and exchanging information between humans 25
Also in 2006, Amazon launches Elastic Cloud Computing. Seeing a lot of skepticism at first, it quickly became clear that providing hardware resources like software was game-changing, as it effectively eliminated the up-front costs and complexity of providing hardware computing resources for providing information on the web or hosting data. It is now considered to be one of Amazon s most profitable lines of business. 26
Also in 2007, the IPhone is launched. Besides being a revolutionary phone, it also presents a new class of device, packed with environmental sensors and permanent connected to the internet, thus providing a great deal of sensor data. 27
In 2011, Hadoop 1.0.0 is released. It completes the available hardware for distributed computing (such as Amazon EC2) with the software to conveniently store and compute on distributed data. 28
- So where are we now, in 2015? We are in the same position CERN found itself in 1989 with some significant differences, though: We have more data, with much less structure, and the web has evolved into a very dynamic, highly sophisticated information management systems, available on an almost infinite number of devices. In addition, access to hardware resources and distributed computing has become very easy thanks to Cloud Computing and Software such as Apache Hadoop. Maybe this ease of access does explain the recent hype around «Big Data» analysis we now suddenly seem to believe that we can, somehow, leverage all the data to derive information that may be a game-changing for the way we present our information in the web. Specifically, we hope to find correlations in the data allowing us to build predictive models. 29
- However, our ability to leverage Artificial Intelligence for this purpose is greatly over exaggerated. Specifically, there is no single-purpose AI able to understand arbitrary data and find relations in it. Quite contrary, selecting, filtering and understanding the data domain is a science job specifically, it requires domain (business) knowledge, a solid mathematical foundation (statistics) and computer science know-how. After all, one has to write code and select algorithms and technologies suitable to both perform the data analysis and build predictive models. - What is available at the moment are two different classes of tools. Data analysis tools are focusing on data retrieval, filtering and data exploration for instance using sampling visualization and allow building and training predictive models. This is the primary toolkit of a Data Scientists. 30
Smart tools are pre-fabricated solutions for instance predictive models or pattern recognition for a specific, well understood domain. Well-known examples include image object or voice recognition. It is these two categories of things that are most likely to have a significant impact on AEM. So, can we already use something like that in AEM? The answer is soon, and, yes. 31
One of the things that got the audience excited during the 2015 Adobe Summit in London was #SmartPic, a Smart Tool that automatically tags DAM images, based on a set of tags obtained from a training set of images. Moreover, the tagged images could then be automatically assigned to campaigns. The data collected in the campaigns such as click and conversion rates can the be feed back as DAM asset meta data. This closes the information loop: The meta data can be used to automatically determine more successful images and select more images like it to be published in the campaigns effectively creating self-optimizing, automated campaigns. #SmartPic is still in the labs, but hopefully will be available soon. 32
Regarding Data Analytics tools, Adobe provides Adobe Big Data Analytics. We will surely will see more integrations of Adobe Analytics data collection and optimization code into AEM projects. 33
Besides Adobe s offerings, there is a mature and growing ecosystem of analytics and smart tools. Here is my selection of key players in both areas. 34
If you are interested in data science or smart tools, here is my personal recommendation of what to look at: The data science toolbox (http://datasciencetoolbox.org/) is a great starting point for big data science and also illustrates the tremendous ease of access to data science tooling we are seeing today. SciPy Scientific Python is the open source competitor to Matlab and pretty much «state of the art» for data science. If you are interested in leveraging Smart Tools, the Alchemy API (http://www.alchemyapi.com/) features very mature APIs I especially recommend giving the Alchemy Vision API a try (http://www.alchemyapi.com/products/demo/alchemyvision). 35
Closing, here is a Slide from David Nüscheler s EVOLVE 2013 presentation on the future of AEM: A tight integration of Analytics data retrieval and content assembly driven by predictive models provided by data scientists. It s up to us to make AEM get there. 36
37