Complex Event Processing (CEP) - A Primer

Complex Event Processing (CEP) - A Primer In a nutshell, CEP systems query events on the fly (with no to very light storage requirements). In a CEP setup, the focus is on event streams. An event stream reflects a logical sequence of events that are available for processing within a certain time-window. To illustrate, a stock event steam may consist of events that describe the price change over time. Simplified, the users execute queries against the CEP engine (that contains the CEP logic) while the CEP engine matches the queries against the events that are flowing through via the event streams. It has to be pointed out that a CEP system differs from event processing or filtering environments as a CEP setup supports temporal queries that focus on temporal concepts (time windows) or a before and after relationship. So a simple CEP query would be one that states if the Chevron (CVX) stock value increased by more than 5% over the last 15 minutes, submit a notification. In general, CEP queries can be characterized as: They are perpetual and emit events when the events match the condition that is given via the query. They operate on the fly and only store a minimal amount of events (sometimes no storage is needed). They normally respond to the changing conditions/events with millisecond granularity (an actual performance target has to be designed/build into the solution). As outlined in Figure 3 and Table 3, many CEP components are available (some for free) today. Some solutions are focused on larger SMP systems (as the HW setup) and hence have some degree of vertical scalability designed into the product. The Big Data movement though is all about horizontal scalability and hence some of today's CEP solutions scale rather well horizontally as well. To really address CEP scalability, a few different workload scenarios have to be discussed. In this primer, the following CEP workload scenarios are elaborated on: a CEP workload that executes a very large number of queries a CEP workload that constitutes a very large memory footprint a CEP workload where the complex queries cannot be processed by a single SMP server a CEP workload that has to process a very large number of events Workload Scenario 1: A CEP workload that executes a very large number of queries With this workload scenario, a shared nothing architecture may be appropriate. In other words, n CEP engine instances can be deployed on n nodes and each node executes 1/nth of the queries (the goal is load balancing among the n nodes). From an actual event distribution perspective, depending on the tangible event (streams) workload, some environments are deployed in a brute-force manner where all the events are submitted to all the nodes. While such an approach may be feasible with smaller even sets, most larger production environments utilize a message broker (such as Kafka or RabbitMQ) as the buffer entity that is situated in front of the CEP engine nodes (from a dataflow perspective). Hence, each CEP engine node only analyzes the outcome of the queries based on the event streams that the note subscribes to. From an implementation perspective, the CEP environment matches each event stream to a topic that is available in the broker tier (such as in a Kafka message broker environment). It is very common these days to delegate the event distribution to a streams processing system such as Apache Storm, S4, or Spark. In other words, for scalability purposes it is rather common to have (as an example) a Kafka cluster (the message broker tier) that as a backend utilizes a streams cluster setup (such as Storm) where a CEP engine (such as Esper) is deployed/integrated.

Workload Scenario 2: A CEP workload that constitutes a very large memory footprint This case assumes a workload that triggers a long-running, complex query that operates on a rather large time-window where the events require a substantial amount of memory to operate. There is no clear cut answer to how to setup a CEP solution for a workload like this, but some form of distributed caching seems like a potential option. The best approach is to expand on the potential workload scenario and highlight a few more distinct cases: 1. Read and write to and from a large cache subsystem. That reflects the easiest setup that can be used if the memory footprint can be handled by a node (aka design the CEP nodes accordingly). 2. With really large streaming windows, the CEP engine may require some local disk space (aka the application will not run at 100% memory speed and hence, the IO latency behavior has to be clearly understood to quantify the performance potential of such a CEP solution). 3. Very large streaming windows where the CEP solution is distributed (aka the events are shared among n CEP nodes). Such a setup may also provide some HA capabilities. 4. Workload scenarios that integrate streaming data with (cached) reference data to perform continuous join operations among the streamed and cached reference data, respectively. In this case, the CEP engine has to provide an abstraction layer that properly supports this architecture via the EPL - event processing language (such as join keys). From a CEP perspective, solutions such as Esper or Drools cover scenarios 1, 2, and 4 while for case 3 some combination of Storm/Esper or Spark/Drools may be considered (maybe even with some HA spin designed into the solution). A CEP workload that cannot be processed by a single server system If the workload cannot be processed by a single node, actual workload distribution across n cluster nodes is required. Such a scenario may be due to either not being able to process a given event rate on a single node or where the required memory footprint exceeds the capabilities of a single server solution. To solve workload scenarios like this, the queries have to be decomposed into k steps (form a pipeline) where the events are matched against some conditions and based on the outcome, the matching events get processed again down the pipeline. The various steps (k) of the query that constitute the CEP solution can be executed on separate cluster nodes. To illustrate, the stock-price query Chevron (CVX) depicted below is used as the discussion blueprint. The query reports a match if within 60 seconds, 2 events report a stock price > $110 and where the actual delta between these 2 events is > 5%. select evn1.symbol, evn1.prize, evn2.prize from every evn1=stockstream[price > 110 symbol = CVX ] -> evn2=stockstream[price > 110 symbol = CVX ] [evn1.price < 1.05*evn2.prize][within.time=60] As depicted in Figure 1, the query can easily be distributed across 3 nodes where each node is processing a subpart of the query and forwards the events as defined by the application. Figure 1: CEP Simple Pipeline

It has to be pointed out that many queries contain other properties that allow further optimization scenarios. To illustrate, while the last step in the pipeline (matching the stock-price increase) is stateful, the other 2 steps outlined in Figure 1 are stateless. While stateful operations memorize information after processing an event (in a way so that earlier events - on the timeline - can affect the processing of later events), stateless operations only depend on the current event (on the timeline) that is being processed. Hence, for all these stateless operations, multiple instances arranged in a shared-nothing cluster setup can be utilized (see Figure 2). Figure 2: CEP Partitioned Setup As depicted in Figure 2, the stateless operations are distributed across 6 nodes while the stateful operation is processed after the events are consolidated from node 2, 4, and 6, respectively. Figure 2 basically depicts a directed acyclic graph (DAC) that describes the data (event) flow. As CEP processing normally occurs via some form of filtering, the number of events being processed further down in the pipeline diminishes (aka filtering adds value at the source and reduces the number of events that have to be further processed). Hence, from a design perspective, it is paramount to identify the filter operations that have the biggest impact on the workload reduction and execute them as early as possible. In general, with any CEP project (or Big Data project in general), it is paramount that the actual data flow is entirely understood so that an appropriate systems design is feasible. Performance, scalability, capacity, security, or reliability requirements have to be designed (not re-engineered) into the solution as well. As illustrated in Figure 1, a filter operation such as symbol=cvx may significantly reduce the working set of what is firehosed into the CEP environment down the pipeline. As depicted in Figure 2, partitioning the firehose data in a shared-nothing setup has the potential to significantly increase the scalability potential of the CEP environment and so may achieve rather high throughput results. To illustrate, assuming that the stock spouts operate on an aggregate event stream of 500,000 events per second, but that only 1% of the data (event) set matches the filter CVX, only 5,000 events will be forwarded to the next stage in the pipeline. It has to be pointed out that the here discussed scenario only works if there are 1 or more stateless operations to be processed. If a query contains a single stateful operation, a (distributed) pipelined approach will obviously not be applicable.

CEP Products- Status Quo The word complex in CEP mainly refers to the complexity of state management over time while processing the events. Some examples are: Calculations over sliding windows Correlation of events along a timeline (such as to determine that event x occurs prior to event y within an exact timeframe or to identify a non-occurrence of an event within a timeframe). Most CEP implementations also provide advanced pattern detection, such as a non-deterministic finite state automaton (similar to a regular expression search over a flow of events - with a time influence in the search). Another key influence of time is referred to as timeliness. Timeliness refers to the ability to handle events and produce an output in a constrained time scenario. The target can be described as the end-to-end latency and can reach the millisecond or microsecond scale. CEP tools also provide the ability to arbitrate between guaranteed time and correctness of output (waiting or not on late or unordered events). CEP tools normally handle rather large event volumes (thousands of events per second). Other complexity factors that may motivate a company to consider a move towards CEP technologies are: Number and type of event sources If the application is expected to significantly change over time (such as new event sources, new interactions and responses are introduced) Richness of information in output events (such as counts, averages, composition of events from different sources) Context dependant situations (such as detection of events occurring within a defined spatial distance or within a defined group of customers - possibly querying external systems to determine the context. Correlation of real-time data with historical data Intelligence in event processing (such as inference or machine learning models) Table 1: Fitting Event Processing scenarios: Event rates Application Complexity Timeliness High High High High High Low High Low High Low High High Application Complexity refers to time, state, and context. In other workload scenarios, more traditional messaging and/or transactional systems may be more appropriate than a CEP solution. Various CEP products that support different approaches/paradigms in their event processing cycle are available (See Figure 3 and Table 3). Some of the more common paradigms are:

Table 2: Some CEP Paradigms Paradigm Stream oriented, query based workload scenarios. The workload calls for perpetual queries that are operating on an infinite data flow. ECA (event/condition/action) rule based systems. Works similar to triggers in some database solutions ( trigger is a special kind of stored procedure that automatically executes when an event occurs in the database server). Inference rule systems (similar to business rule management systems - BRMS). A BRMS represents a SW system that is used to define, deploy, execute, monitor, and maintain the variety and complexity of decision logic used by operational systems within an organization or enterprise. Time-state based systems Possible applications Manipulation & aggregation of event data (may use SQL-like join logic). May operate among events and/or some external datastore Users define event patterns by composing simple rules System has to take action when certain states are reached. Or business activity monitoring with real-time decision support Monitoring systems with a well defined finite state space Some CEP Functional Features Data reduction (filtering), projection (discarding some attributes), and/or aggregation based on a certain time-window Modeling capabilities for event shape and payload (aka query logic) Transformation (enrichment or shape change, pattern detection- including the detection that an event is absent) Time focused, event timestamps, intervals of occurrence (in respect to time windows and pattern detection, sliding time windows) Context awareness, the context in which the event occurs is being taken into account, the capability to query external systems (other datastores or historical data repository) Logging and analysis for audit purposes or retrospective event processing (understanding precursor events that led to a particular output event) Prediction, learning, and adaptation, pattern discovery, scoring against a data-mining model or some other form of machine-learning capabilities Support capabilities similar to an integrated development environment (IDE) Some CEP Non-Functional Features Input/output connectivity to event sources and event sinks Routing (statically or dynamically) and partitioning for workload distribution Performance enhancements (optimized end-to-end latency - has to be designed into the solution) Predictability, low latency variance (such as a 95th-percentile with a guaranteed latency behavior) Scalability and elasticity (mostly via encapsulation with some distributed streams solution) Availability and recoverability, fault tolerance, continuous operation Consistency and integrity via a distributed system setup, management of temporal granularity Security and privacy, segregation of event streams Simple usability, maintainability, manageability (good design is required though)

Figure 3: Some of the major CEP products Figure courtesy of P. Vincent

As depicted in Figure 3, a rather comprehensive set of commercial and open sourced event processing solutions are available to design, architect, and develop event processing applications. In the literature, the solutions are sometimes labeled event processing platforms, complex-event processing (CEP) systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs). While Figure 3 is not entirely comprehensive, most of the major solutions are listed. The 3 main exceptions are: Amazon Kinesis (Cloud Based) Google DataFlow (Cloud Based) Stratio Streaming (Uses Spark Streaming & Siddhi CEP Engine) To complement Figure 3, some of the major solutions are listed again below (in a more readable form). Table 3: Some Major CEP components Apache Samza - DSCP Apache Spark - DSCP Apache Storm - DSCP Apache S4 - DSCP Codehaus/EsperTech s Esper, Nesper - Native CEP RedHat Drools Fusion/JBoss Enterprise BRMS - Native CEP, Comes with a Rule Engine DataTorrent RTS (Real-Time Streaming) - a DSCP/CEP Hybrid FeedZai Pulse - Operational Intelligence SQLStream s-server - Operational Intelligence Vitria Technology Operational Intelligence Analytic Server - Operational Intelligence IBM InfoSphere Streams - Commercial, DSCP/CEP Hybrid IBM Operational Decision Manager (ODM) - Commercial, Comes with a Rule Engine SAP Event Stream Processor (ESP) - Commercial Software AG Apama Event Processing Platform - Commercial Tibco BusinessEvents - Commercial, Comes with a Rule Engine Tibco StreamBase - Commercial Note: Operational intelligence (OI) refers to a type of real-time dynamic business analytics solutions that provide visibility/insight into the data, streaming events, and business operations. OI solutions execute queries against streaming data feeds and event data to deliver real-time analytic results that are referred to as operational instructions. The SW systems depicted in Figure 3 and Table 3 do provide a wide range of different features and techniques, but do all perform CEP in a technical sense. To reiterate, CEP (or ESP) describes a practice where incoming data (event data) is processed almost instantly to generate higher-level, useful/valuable, summarized information (labeled the complex events). Event processing platforms provide embedded capabilities for filtering the incoming data, storing windows of event data, compute aggregates, and detecting patterns. In a more formal context, CEP SW reflects any application that is capable of generating, reading, discarding, or performing calculations on complex events. A complex event is an abstraction of 1 or more base (input) events. Complex events may signify threats or opportunities that require a response. One complex event may be the result of performing some form of calculation on either a few or on millions of base events that have 1 or more input. Some of the more popular commercial products are IBM s ODM and InfoSphere Streams, SAP s ESP, Software AG s Apama, or TIBCO s Business Events and StreamBase. These products reflect comprehensive

development and runtime solutions that include adapters to integrate the CEP environment with various event sources, dashboards, and alert systems or administration tools. Other event processing platforms are combined with features such as query capabilities, reporting, interactive analytics, alerts, or key performance indicator tools and are specifically directed at operational intelligence. Examples include FeedZai Pulse, SQLStream s-server, or Vitria Technology s Operational Intelligence. On the DSCP side, Apache products such as Samza, Spark, Storm, or S4 represent general-purpose streaming platforms that do not provide a native CEP (analytic function) engine. These solutions though are highly scalable (horizontally) and developers have the opportunity to add the necessary (CEP) logic to address various types of streams processing related problems. The DSCPs can be merged with native CEP solutions such as Esper/Nesper or RedHat s Drools Fusion/JBoss Enterprise BRMS (both of which are open-sourced native CEP systems) to deploy comprehensive, distributed, saleable CEP solutions. Depending on the source, DataTorrent s RTS and IBM s InfoSphere Streams are classified as either a DSCP or a CEP solution (aka a hybrid). Further, some event processing platforms are bundled with an actual rule engine (such as IBM s ODM, RedHat Drools Fusion/JBoss Enterprise BRMS, or TIBCO Business Events). It has to be stated that the increasing popularity of CEP solutions is mainly due to the fact that this approach basically reflects the only way to absorb and process information from various event streams in real or near-real time. Any CEP target solution has to process the event data as the data arrives so that appropriate actions can swiftly be taken. When it comes to CEP, there really is no one-size-fits-all approach. All the solutions discussed in this article reflect general purpose development and runtime tools that can be used by the developers to design and implement customized event processing applications. What is provided is basically the core algorithms to process event streams. Some companies use CEP as part of a larger application solution. Some companies acquire a packaged application or subscribe to SaaS services that have CEP embedded under the hood. So a company may purchase a solution that happens to require event processing and the company may not even be aware that CEP is being used. For example, supply chain visibility products, security information and even management (SIEM) solutions, fraud detection and governance applications, risk and compliance (GRC) products, systems and network monitoring solutions or business activity monitoring (BAM) tools do implement CEP to a certain degree. But, an ever growing number of companies do need the kind of event processing subsystems discussed in this article to support their high-throughput and low-latency applications where the customized event processing logic is paramount to their unique business problems. In any case, a comprehensive understanding of the actual data flow is a necessity to even consider commencing a CEP project. Further, the business and systems (including performance, security, and salability) goals and objectives have to be fully understood and have to be designed into the solution. References 1. J. Krumeich, B. Weis, D. Werth and P. Loos: "Event-Driven Business Process Management: where are we now?: A comprehensive synthesis and analysis of literature", Business Process Management Journal, 2014 2. Luckham, David C. Event Processing for Business: Organizing the Real-Time Enterprise. Hoboken, New Jersey: John Wiley & Sons, 2012 3. P. Vincent, CEP Blog, http://www.tibco.com/blog/author/paul-vincent/, 2015 4. Mulesoft, Twitter Complex Event Processing (CEP) with Esper and Drools, 2015 5. Thomas Vial, The Esper CEP Ecosystem, 2012 6. Run Esper with Storm, http://stackoverflow.com/questions/9164785/how-to-scale-out-with-esper, 2011 7. Storm & Esper, http://tomdzk.wordpress.com/2011/09/28/storm-esper, 2011 8. Distributed Cache to scale CEP, http://magmasystems.blogspot.com/2008/02/cep-engines-and-objectcaches.html, 2008 9. Srinath Perera, CEP Blog, 2012 10. Mathieu Despriee, CEP, 2011 11. Paul Vincent, CEP Market Players, 2014 12. CEP Wikis, www, 2015 13. World Wide Web CEP resources in general, 2015