14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application
Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture Data distribution over HTTP Conclusion
ATLAS TDAQ The ATLAS Trigger and Data Acquisition (TDAQ) infrastructure is responsible for filtering and transferring data from detectors to mass storage system It relies on a large computing environment with thousands of software applications running concurrently and interacting with each other
Message analysis Message analysis is fundamental for controlling applications behavior, error reporting and operational monitoring << ERROR- Application SFI-53 - Problem with data integrity...>>
TDAQ Log Service Permanent archive of messages group by runs Graphical interface to browse among messages Perfect tool for in detail analysis of problems Poster PO-WED-005 << A new design and implementation of the Log Service package for the ATLAS experiment>> by Murillo Garcia Raul
How can we improve? What is difficult now: To follow the flow of messages To extract meaningful information To detect global system behavior What we can do: Create a tool to help system analysis and errors detection To visualize effectively the flow of messages Easy to access and use Easy to customize per user Show real time information as well as historical data
Some works later... TDAQ Analytics Dashboard http://atlasdaq.cern.ch/dashboard Web based Built-in Analytics view Effective Client-side analysis Real time Historical data
Analytics Graphs Built-in analytics graphs Structured in panels and widgets Graphs are interactive: display information on demand
Analysis Criteria LIVE data Select the interval of interest from time graphs Top 10 Each graphs provide an options panel: Filtering (per Severity, App name, etc.) Options Top 50
Challenges The main challenges we face builiding the dashboard: Produce analytics data collect and correlate messages sent in the system and produce analytics summary in windows of time Distribute data Make analytics result available to client-specific requests Express and process analysis criteria Visualize data Aggregate the analytics result in easy to read and immediate view
How it works
Technologies Engine: Java application to gather and correlate messages connects to TDAQ infrastructures MySQL as DB backend to archive data Dashboard: is a Java - Google Web Toolkit (GWT) v.1.7 project GWT translate Java apps in light, fast and browser compatible HTML and JS pages Graphs: Google Charts project set of JS graphs (pie, columns, bar charts, etc...) interactive and customizable compliant to the Google Visualization Wire protocol
Data distribution 1/2 Google Charts introduce the idea to separate visualization from data provider Data Sources expose the Google Visualization Wire protocol Clients make an HTTP GET request to the data source URL Google Chart JS graph compliant to output data schema
Data distribution 2/2 SQL-like capability: select group by filter Multiple output format support (JSON, HTML, CSV) Standard: easy to integrate in nongoogle projects
Along the way... Problems we encountered and possibly fixed: Analytics data archiving: Poor performance on CSV file Switch to MySQL DB for the production instance Seamless change, only the engine was involved! GWT - Google Charts integration: good, but far to be perfect attaching and detaching widget from page fails to works annoying page refresh is needed Google Charts JavaScript not available offline User browser need to have access to google.com/jsapi to download them
Conclusion The dashboard is now used in production for ATLAS Weekly snapshot to keep monitored system behavior It helps in increasing the quality and decreasing the amount of messages in the system Dashboard approach works: New widget can be easily created to fulfill user requirements The Wire protocol is an inspiring idea: investigating similar approach to expose data from several TDAQ services in a uniform fashion