Open access to data and analysis tools from the CMS experiment at the LHC

Similar documents
How To Teach Physics At The Lhc

Top rediscovery at ATLAS and CMS

PHYSICS WITH LHC EARLY DATA

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Highlights of Recent CMS Results. Dmytro Kovalskyi (UCSB)

Top-Quark Studies at CMS

Data analysis in Par,cle Physics

Measurement of Neutralino Mass Differences with CMS in Dilepton Final States at the Benchmark Point LM9

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

Search for Dark Matter at the LHC

Physik des Higgs Bosons. Higgs decays V( ) Re( ) Im( ) Figures and calculations from A. Djouadi, Phys.Rept. 457 (2008) 1-216

Risultati recenti dell'esperimento CMS ad LHC e prospettive per il run a 14 TeV

Cross section, Flux, Luminosity, Scattering Rates

Calorimetry in particle physics experiments

A Guide to Detectors Particle Physics Masterclass. M. van Dijk

High Energy Physics. Lecture 4 More kinematics and a picture show of particle collisions

ATLAS NOTE ATLAS-CONF July 21, Search for top pair candidate events in ATLAS at s = 7 TeV. The ATLAS Collaboration.

arxiv:hep-ph/ v2 4 Oct 2003

How To Find The Higgs Boson

Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics

PoS(LHCPP2013)033. Rare B meson decays at LHC. Francesco Dettori Nikhef and Vrij Universiteit, Amsterdam fdettori@nikhef.nl.

Middle East Technical University. Studying Selected Tools for HEP: CalcHEP

The Compact Muon Solenoid Experiment. CMS Note. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland. D. J. Mangeol, U.

Calorimeter Upgrades for the High Luminosity LHC

Theoretical Particle Physics FYTN04: Oral Exam Questions, version ht15

Web based monitoring in the CMS experiment at CERN

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Implications of CMS searches for the Constrained MSSM A Bayesian approach

Web-based pre-analysis Tools

Concepts in Theoretical Physics

The accurate calibration of all detectors is crucial for the subsequent data

Dirigido por: Susana Cabrera Urbán. Tesis Doctoral Junio Facultat de Física Departament de Física Atòmica Molecular i Nuclear

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Jet Reconstruction in CMS using Charged Tracks only

Web application for detailed realtime database transaction monitoring

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version

Online CMS Web-Based Monitoring. Zongru Wan Kansas State University & Fermilab (On behalf of the CMS Collaboration)

Measurement of the Mass of the Top Quark in the l+ Jets Channel Using the Matrix Element Method

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant

Invenio: A Modern Digital Library for Grey Literature

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de

HEP GROUP UNIANDES - COLOMBIA

Search for a heavy gauge boson W in the final state with electron and large ET. s = 7 TeV

FCC JGU WBS_v0034.xlsm

Vector-like quarks t and partners

Progress in understanding quarkonium polarization measurements

US NSF s Scientific Software Innovation Institutes

Running a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1.

variables to investigate Monte Carlo methods of t t production

Overview of HEP. in Spain

The Standard Model and the LHC! in the Higgs Boson Era Juan Rojo!

Looking for Magnetic Monopoles AT The Large Hadron Collider. Vicente Vento Universidad de Valencia-IFIC

Computing at the HL-LHC

Study of the B D* ℓ ν with the Partial Reconstruction Technique

Version 18 October 2008

Physics Letters B 716 (2012) Contents lists available at SciVerse ScienceDirect. Physics Letters B.

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

A SUSY SO(10) GUT with 2 Intermediate Scales

Bounding the Higgs width at the LHC

New approaches in user-centric job monitoring on the LHC Computing Grid

From Distributed Computing to Distributed Artificial Intelligence

Transcription:

Open access to data and analysis tools from the CMS experiment at the LHC Thomas McCauley (for the CMS Collaboration and QuarkNet) University of Notre Dame, USA thomas.mccauley@cern.ch! 5 Feb 2015

Outline CMS at the LHC 1st public release of CMS data CMS masterclasses Large data release Open data portal Outlook and future plans

CMS at the LHC CMS (Compact Muon Solenoid) is one of the two general-purpose experiments at the LHC Over 350 papers published describing searches for SUSY and exotica, measurements of QCD, electroweak, top, b, forward, and heavy-ion physics, as well as the discovery of the Higgs boson and its properties Collected ~ 28 1/fb of proton-proton collision data at COM energies up to 8 TeV Nearly 3000 physicists and ~800 engineers from over 40 countries http://cern.ch/cms

CMS public data (i) The CMS experiment has allowed the release of the following data to the public for use in education and outreach: 2000 events each of J/ψ μμ, J/ψ ee! 2000 events each of Υ μμ, Υ ee$ 500 events each of Z μμ, Z ee! 1000 events each of W μν, W eν! 100,000 events each of di-muon, di-electron, and di-jet events in the energy range 2-110 GeV! 19 Higgs candidate events: 10 γγ, 1 2e2μ, 1 4e, 1 4μ, 2 bb, 2 ττ, 2 WW in the mass range 120-130 GeV! ~50 1/pb single muons for top quark analysis Bold: indicates datasets already delivered and/or in use These data form the core of the masterclasses

CMS public data (ii)

Masterclasses Masterclasses: students travel to nearby universities and research laboratories to listen to lectures, analyze real LHC data, and interact with other groups via videoconference. International masterclasses organized under the auspices of IPPOG, the International Particle Physics Outreach Group (http:// ippog.web.cern.ch) with central organization at TU Dresden and Notre Dame. In 2014 (from Feb 12 - Apr 12) there were 69 CMS masterclasses in 26 countries in 12 languages. CMS masterclass developed in collaboration with QuarkNet (http://quarknet.fnal.gov) Current CMS exercise: W+:W-, Z, J/ψ, and Y invariant mass

CMS masterclasses in 2014 https://quarknet.i2u2.org/content/running-cms-wzh-path-masterclass! http://cms.physicsmasterclasses.org/cms.html

CMS masterclasses in 2014

CMS masterclasses

2014 CMS masterclass exercise Students use up to 30 separate datasets each with 100 events containing samples from the W, Z, and di-lepton events (one 4-lepton and two di-photon Higgs candidate events included) Each group views in an event display up to 100 events and attempts to determine whether or not it is a W or Z (di-lepton) event. If a W, did it decay into an electron and a neutrino or into a muon and a neutrino? What is the charge of the lepton? If a Z, is it di-electron or di-muon? What is the invariant mass? What is the W+:W- ratio? What does it mean for proton and its structure? What does the invariant mass spectrum look like? (There will be several unexpected peaks from the di-lepton background) 2015: content the same data analysis tools improved (covered later); what follows shows exercise of 2014

After an introduction by moderator covering HEP and the experiment, start by opening the event display: Browser-based event display written in JavaScript

Select a set of 100 W, Z, J/ψ, and Y events (each with a Higgs candidate included)

electron? significant MET? Therefore, it s a W to e nu event? But is it an e+ or e-?

The electron seems to curve clock-wise, so therefore e+

Mark the answer on the spreadsheet (hosted on Google docs): Mark as a W+ e+ν candidate

muon! muon! Therefore a Z μ+μ- candidate?

In the 2014 masterclasses......students correctly identified an event as a Z candidate (i.e. an event with 2 leptons) 92% of the time...students correctly identified a electron 90% of the time and a muon 93% of the time...students correctly identified an event as a W 91% of the time...when the students correctly identified an event as W μν (W eν), they correctly identified the charge 84% (81%) of the time. 11% (16%) of these events were assigned no charge

2014 results CMS value http://cds.cern.ch/record/1646590

2014 student results

Videoconference Students communicate and discuss results with other masterclass groups using Vidyo http://cern.ch/vidyo with support from CERN and FNAL IT: A recorded videoconference: http://cds.cern.ch/record/1693152

For 2015 Exercise to remain the same New IPPOG masterclasses start next month Masterclasses for CERN visitors start next week New browser-based tool developed by RWTH Aachen will replace Google spreadsheets and include creation of plots on-the-fly New event display! Beyond 2015: new opportunity to use open data from CMS to develop new exercises in the future

http://cern.ch/cms-masterclass/ispy-webgl

https://www.i2u2.org/elab/cms/cima/index.php Web-based data entry and histogram tool developed by RWTH Aachen

CMS Open Data policy CMS has drafted and adopted a data preservation, re-use, and open-access policy which includes: Commitment to publication in open-access journals Release of data to the public Preservation and release of software and documentation needed for reconstruction and analysis In the future: a commitment to release data after a suitable embargo period https://cms-docdb.cern.ch/cgi-bin/publicdocdb/showdocument?docid=6032

New release (i) The new release of CMS data is much larger and more extensive than previous releases: Half of reconstructed data from 2010 proton-proton collisions at 7 TeV (tens of 1/pb) ~ 30 TB in size In CMS Analysis Object Data (AOD) format (ROOT files)

New release (ii)

CMS AOD Contains information needed for an analysis such as physics objects, tracks, calo hits, vertices, trigger info, etc. ROOT-based format needing CMSSW in order to read and analyze Q: How can/will the public handle such a dataset? A (partially): Initially focus on an already-proven, successful use-case: education and outreach

How does one get from...

...to

...or to

Open Data Portal Data and tools and resources for analysis has been made available via an open data portal Portal is divided into two main areas: Education and Research Datasets are distinguished as either primary or derived Philosophy: include and build upon the previous and current success of public data in education and outreach but also include the possibility for more in-depth, complex analysis Built with Invenio digital library software: http://invenio-software.org The portal is a collaboration between CERN, CMS, ATLAS, ALICE, and LHCb: what follows is a description of the CMS content

http://opendata.cern.ch

http://press.web.cern.ch/press-releases/2014/11/cern-makes-public-first-data-lhc-experiments

Education

Education

Derived dataset record A derived dataset is a dataset that has been created from a primary dataset and contains reduced information (like four-vectors) Software with which to create the derived datasets is provided Analysis of derived datasets does not require special CMS software (but production of derived datasets might)

Education: histogram tool

Education: histogram tool

Education: event display

Education: example analysis

Research

CMS-specific CERN VM Analysis of primary datasets requires CMSSW environment; we provide it in a virtual machine image VM contains SLC5, CMS software environment, access to primary datasets via XRootD Example code also available via GitHub

Primary dataset record (i)

Primary dataset record (ii)

CMS External Resources

Invenio and CERN support Open data portal built with Invenio (a familiar example of an application using Invenio is CERN Document Server http://cdsweb.cern.ch) Invenio provides document organization, search capability, and handling of metadata The portal relies on CERN support and services for data storage, access to and distribution of data, and security and bandwidth restrictions

Data re-use Data released under the Creative Commons CC0 waiver: essentially releasing it into the public domain http://creativecommons.org/publicdomain/ zero/1.0 Data are identified with digital object identifiers (DOI) and it is expected that third parties will access the data using these

Outlook CMS public data has reached thousands of students all over the world via CMS masterclasses Re: open data portal We can conclude that about ~82k distinct users visited our site since the launch, out of which ~600 people downloaded EOS files over HTTP, ~5k read About pages, ~21k viewed collections, ~16k used event display, ~3k used histogramming, ~21k viewed records, and ~10k used search. - T. Simko (Invenio team)19 Dec 2014 Next: Improve tools and with new, large data release develop new E&O programs

Thank you