Comparing Solace s Broker- Based Messaging Architecture with Peer-to-Peer Architecture When considering messaging middleware technologies, it s important to understand the business requirements of a given application and consider them in context of the characteristics and strengths of the technology s underlying architecture. This paper compares peer-to-peer messaging architecture with broker based architecture, and in particular Solace s hardware-based message router. It explains why peer-to-peer architecture is inappropriate for the vast majority of scenarios because it sacrifices mission-critical characteristics capabilities in the attempt to reduce latency well beyond the point that it matters. Table of Contents 1 Introduction... 2 2 Architecture Descriptions... 3 2.1 Peer-to-Peer Architecture... 3 2.2 Solace Hardware-Based Broker Architecture... 5 3 Critical Messaging Characteristics... 7 3.1 System Simplicity... 7 3.2 Policy Control, Status Monitoring, Capacity Planning, Troubleshooting... 9 3.3 Security... 10 3.4 System Scalability (while maintaining simplicity)... 12 3.5 System Robustness, Predictability and Real-Time Decoupling... 10 3.6 Enterprise grade, robust and feature rich persistent queuing... 14 3.7 Lower application development costs and fast time to market... 15 4 Summary... 15 Copyright Solace Systems, Inc. http://www.solacesystems.com
1 Introduction The peer-to-peer architecture optimizes one particular characteristic, namely latency, to a degree that is not important to most applications 1. This complete focus on latency leads to sacrificing other characteristics that are much more important to the vast majority of applications. This includes financial services applications such as: order management systems (OMS), middle office, global reference data distribution, many market data distribution use cases, single dealer platforms, global trade and risk aggregation. Outside of finance, this includes applications such as ESB, SOA and EDA backbones, ecommerce, betting and gaming platforms, real-time control platforms, and automated processing systems. All of these applications value performance to some degree, but other features are more important than achieving sub 10 microseconds of latency. Achieving that level of performance is important for algorithmic and high-frequency trading applications, however this paper is not intended to include the ultra-low latency scenarios where it s critical to shave off every possible microsecond because stakes in algo/hft trading require different assumptions and compromises. This paper is focused on the overwhelming majority of applications that value and can t sacrifice the following characteristics: 1. Simplicity 2. Policy control, status monitoring, capacity planning and troubleshooting 3. System robustness, predictability and real-time decoupling 4. Security 5. Scalability (while maintaining simplicity) 6. Enterprise grade, robust and feature rich persistent queuing Solace s hardwarebased message broker meets many key needs that trump the theoretical advantages of peer to peer systems, many of which don t hold up in real-world implementations anyway. 7. Lower application development costs and fast time to market Each of these points will be discussed in detail, but first we provide a description of both the typical peer-to-peer architecture and the Solace hardware-based broker architecture as necessary background to the discussion. 1 Some claim that providing large fanout of data using multicast is an advantage of peer-to-peer over broker-based architectures but brokerbased solutions can also support multicast delivery. In fact, when brokers provide multicast delivery, they isolate publishers from retransmit requests (NACKs) from misbehaving or slow consumers. 2
2 Architecture Descriptions 2.1 Peer-to-Peer Architecture While there will be variations among products, the basic peer-to-peer nothing in the middle architecture is as depicted below. The figure shows servers housing publisher/subscriber applications on the bottom and servers housing messaging infrastructure functionality at the very top. Connectivity between these servers is via a multicast network, which means all communicating parties must be on the same layer 2 subnet, otherwise IP multicast routing must be used between subnets, which is not typical. Since there s nothing in the middle, any functionality other than direct application-toapplication non-persistent messaging needs to be implemented at the edge. Such typical additional components are shown here as the orange server icons and in summary, these functions include: 2.1.1 Store Processes Store processes provide persistence by storing messages and their delivery state to a local or remote disk. In most cases, store processes are deployed in odd numbers of three or more and use quorum consensus or some other dynamic election mechanism to determine whether the persistence service is operational. Message delivery occurs directly from publisher to subscriber with the persistence occurring in parallel to minimize latency. Each message and all subscriber ACKs must be processed and stored by all (three) store processes. In the case of lost messages and slow or offline subscribers, subscribers contact a store process to retrieve their queued messages and rejoin the real-time delivery stream if and when they catch up. In the case of a store process failing or becoming too slow, synchronization with other store processes comes into play. Such synchronization is a complex process that products handle differently, if at all. Another function that products handle differently is if and how they replicate and activate configuration changes to each store process. It s frequently claimed that since peer-to-peer message delivery does not require the store process to filter and deliver messages to subscribers, store processes offer better performance than brokers. While this may be true when all subscribers are online and behaving well, this architecture degrades to a store and forward broker architecture for all subscribers who fall far enough behind the real-time message flow that they need to recover from the store process and thus this performance advantage is negated. 3
2.1.2 Inter-subnet Gateways Since peer-to-peer solutions count on multicast to effect the one-to-many message replication, the nothing in the middle advantage only applies to applications where the publisher and subscriber are on the same IP subnet. To reach applications in other subnets or other datacenters, TCP gateways must be deployed at the edge which receive the multicast traffic, filter by topic and then forward to one or more remote gateways over TCP like a broker would do. These gateways need to be deployed in a redundant manner for high availability, and must be continuously monitored. 2.1.3 Desktop Distribution Gateways Similar to inter-subnet gateways, gateways are required to distribute data to desktops over TCP since multicast is confined to the local server subnet 2.2 High Availability and use of TCP Each of the additional components described above needs to be deployed in a redundant manner on a separate server in order to provide high availability. From an operational point of view, each component must be monitored, configured, managed and upgraded. Even though most applications don t send each message to many subscribers (aka high fanout ), multicast is frequently used to enable communications between applications. This forces networking teams to manage multicast traffic and makes application teams deal with the allocation and engineering of multicast groups. To not use multicast would require an N**2 mesh of TCP connections between applications as shown here, which introduces the issues of TCP connection scaling, topic resolution maps, message replication burden on publishers to name a few. 2.3 Multi-site Architectures As messaging is typically required in multiple subnets or in multiple datacenters, this infrastructure must be replicated and interconnected with gateways as shown here. So the only situation in which peer-to-peer solutions are advantageous is when the need for latency under 10 microseconds is mission critical and the most important factor. 4
2.4 Solace Hardware-Based Broker Architecture Solace s purpose-built message router provides the important characteristics and functionality of a broker without the performance penalty introduced by software-based brokers. Solace can route millions of non-persistent messages per second and hundreds of thousands of persistent messages per second. Due to the broker-based architecture employed, these high rates can be provided with low latency, and without sacrificing the many other key technical characteristics most applications require. This diagram shows the same application deployment as above with a Solace broker-based architecture. Each application uses a software API to connect to a Solace message router via TCP no multicast. The message router filters, routes, queues, replicates and retransmits messages toward subscriber applications freeing publishers and subscribers from those responsibilities functions that are performed within the messaging APIs of these applications in a peerto-peer architecture. The message router performs all the messaging functions described in Section 2.1 but without any additional components. Here s how: 2.4.1 Persistence Solace s patented solution handles persistence with dedicated cards called Assured Delivery Blades that are integrated into the message router. These ADBs enable failsafe persistent message queuing at very high rate with low consistent latency. An external disk array accessed via fibre channel or SAN is used as overflow to store messages when slow or offline consumers are unable to receive them or keep up with the message flow.high availability is provided via a second message router which maintains the same delivery state as the active message router in RAM, and is ready to automatically take over in a few seconds if the active message router fails. In the unlikely event of a datacenter-level failure, all messages and their delivery state are flushed to non-volatile memory. To learn more go to http://solacesystems.com/products/guaranteed-messaging 2.4.2 Inter-subnet messaging Since TCP is used to communicate to applications, they are not required to be in the same subnet as the Solace message router. They can connect to any message router across subnets and even over the WAN. Communication to message routers in other datacenters is performed by configuring TCP neighbor adjacencies between message routers. No additional components are required and messages flow between the message routers using Solace dynamic topic routing protocols. 5
Solace also supports datapath features that optimize the efficiency and performance of communication over the WAN such as streaming compression, parallel TCP connections and large TCP windows, so no additional inter-subnet gateways are required 2.4.3 Desktop Distribution Similarly, messages can be fanned out to desktops using TCP to thick.net or Java clients without requiring additional fanout gateways or daemons. 2.4.4 Internet/Mobile Streaming Communication to HTML5, Silverlight and Flex Rich Internet Applications (RIAs) as well as to tablet and mobile devices over an HTTP infrastructure of proxies and firewalls is also supported from the same message router without needing additional HTTP Push gateways. 2.4.5 Inter-Datacenter Connectivity Supporting the same type of inter-datacenter connectivity as described in Section 2.1 is shown in the following figure. No additional gateways are required since the message routers used to provide connectivity between applications within a datacenter also provide the gateway functionality required to link remote datacenters in a true subscription-based pub/sub manner. 6
3 Critical Messaging Characteristics 3.1 System Simplicity Application architectures must be designed from the beginning to be as simple as possible since they tend to become more complex as they evolve over time. High system complexity translates to measurable expense in terms of higher operational costs, lower reliability (more production outages), and increased time to repair production problems (thus longer outage time per incident). Complex systems are also less flexible, which hampers the deployment and modification of applications, which reduces business agility and slows time to market. Two things make the peer-to-peer architecture inherently complex: The number of distributed messaging components, and the tight coupling among client applications and between client applications and the underlying network. Furthermore, the complexity of a peer-to-peer system increases more rapidly than that of a Solace brokered architecture as the system scales and applications are added, amplifying the operational cost, reliability and flexibility issues. System simplicity can be described in terms of operational simplicity for the messaging infrastructure and for the network infrastructure. 3.1.1 Operational simplicity of the messaging infrastructure The peer-to-peer architecture with nothing in the middle results in a messaging infrastructure composed of loosely coupled, distributed components for different functions deployed at the edge. o Complexity of Components: Since each functional component (such as store processes, desktop and inter-subnet gateways and web/mobile gateways) must provide high availability and scale for capacity, many instances of each on separate servers must be deployed to ensure that a given function can survive any single fault. This requires management of more servers, OS and networking in addition to the messaging components. The complexity of a peer-to-peer system increases more quickly than that of a brokered architecture as the system scales and applications are added, amplifying the operational cost, reliability and flexibility issues. o Capacity Planning: If you combine different functions on the same server for consolidation (e.g. a store process and a gateway process), then the infrastructure team needs to ensure that there are sufficient resources for both functions to operate properly and to monitor that over time. o Troubleshooting: Identifying and addressing system faults is very difficult in such a distributed environment where each component has multiple instances to accomplish a single function. o Redundancy Across Subnets: Since peer-to-peer draws heavily upon multicast, which is typically confined to a subnet, this same infrastructure of gateways and stores must be repeated in each subnet where server applications reside. o Duplication for Separate Applications: Providing messaging services to many independent application groups who must be isolated from each other from a performance and topic namespace point of view requires duplication of the above infrastructure. 7
The end result with peer-to-peer is lots of moving software components on many servers in each subnet which means lots of distributed state and status to deal with often without the benefit of a centralized management function which can correlate and roll up overall status and coordinate functions such as configuration changes and software upgrades/rollbacks. In contrast, the Solace architecture provides: o Integrated Solution: All functional components are integrated into a message router instead of being distributed throughout the datacenter. There are no separate persistent store or gateway processes and servers. o Unified Platform: Connectivity over the LAN, WAN or Web to browsers and mobiles is provided by the same infrastructure from a single device. o Integrated HA: High availability is provided by redundant components within the message router and at the device level for all functions. These HA functions use hard-wired communications between message routers (e.g. fiberoptic links on the Assured Delivery Blade) and networking standards proven by the IETF for maximum reliability and simplicity o Shareable Platform: The high performance of the Solace message router combined with built-in virtualization allows a single message router to be shared by multiple applications, but each with its own private message broker, thereby reducing datacenter sprawl and the number of managed entities. o Integrated Disaster Recovery: Persistent message replication to a disaster recovery site for business continuity is performed directly between the message routers without the need for additional equipment and without needing to coordinate use of storage replication. Both synchronous and asynchronous replication are supported. 3.1.2 Operational simplicity for networking Peer-to-peer architectures rely on Ethernet multicast for replication rather than a full mesh of TCP connections among all clients, and each publisher must perform its own message routing and replication. Multicast may be useful for the peer-to-peer messaging infrastructure itself so that the many Store and Gateway processes receive each message, but it is not appropriate for the vast majority of applications since typically only a very small proportion of subscribers receive any particular message. Still, all clients who join a multicast group must receive and filter all messages on that multicast group. In addition, multicast brings with it well-known problems such as multicast group management and engineering, network congestion and discards due to lack of flow control and non-uniform link speeds, retransmission requirements and tuning, crying baby syndrome and so on. Clients connect to Solace message routers using TCP, avoiding all the pitfalls of multicast. The message router handles network congestion, link speed mismatches, slow consumers and misbehaving clients. In the Solace architecture, all clients connect to the message router using TCP, avoiding all the pitfalls of multicast. Network congestion, link speed mismatches, slow consumers and isolation of misbehaving clients are all gracefully handled by the message router to protect the network and clients from overload and collapse. The message router performs perclient message filtering based on subscriptions and only sends clients messages they have subscribed to at a rate they can handle. 8
3.2 Policy Control, Status Monitoring, Capacity Planning and Troubleshooting Being able to monitor, capacity plan, troubleshoot and control a messaging environment is a critical requirement of any production system. Achieving operational excellence hinges on the ability to proactively plan to avoid outages and quickly resolve production problems when they do occur. In a peer-to-peer architecture messages don t flow through a messaging-aware intermediary, which makes it impossible to do things like: o Monitor per-client queue depths, packet loss to/from a client, determine which clients are slow (and why), which publishers are publishing too fast, which subscribers are subscribing to what topics o Capacity plan by looking at per client send/receive message rates both instantaneous and averaged o Enforce per-client policy controls such as rate limiting, resource limitations and security o Know which applications are up and connected Without a broker, administrators must either snoop and decode multicast traffic using a span port to pinpoint the problem, or instrument all applications to respond to management requests sent from some management station. Managing via protocol decodes requires extremely specialized and technical staff and is not feasible for all but the most sophisticated organizations. Instrumenting applications to aid management is expensive and fundamentally flawed because it assumes that the network, servers and applications are all sane enough to provide valid information, but that very lack of sanity is usually the problem that needs to be isolated and resolved. If an application is hung or in a loop or CPU is at 100% or there is a networking problem affecting the server, then there is no way to gather status. Even if you did determine the cause of the problem and wanted to take action (such as disconnecting an application from the messaging system) you d need to build your own method for doing so because even if there were mechanisms built into the messaging system, they wouldn t work in cases where there are networking, server or application problems. All inter-application traffic flows through the Solace message router so it becomes the central nervous system of application monitoring, with perclient visibility. In the Solace architecture, all inter-application traffic flows through the Solace message router so it becomes the central nervous system of application monitoring. The message router provides per-client visibility at all layers so administrators can see packet retransmits in each direction, along with instantaneous queuing levels and high water marks, in/out message rates, connection setup attempts, subscription lists, round-trip-time latency, and a listing of fastest publishers/subscribers to name but a few. Solace message routers can enforce per-client policies such as message rate limiting and can also restrict or limit access to various resources. If you have a rogue application that is misbehaving, administrators can pinpoint and disconnect it. There is no need for the application, its server or the network to be sane to detect the problem and take corrective action. Management of the Solace message router is via a 9
dedicated management network port to support a separate command and control network. Since the control plane of the message router is separate from the hardware datapath, the message router is always responsive to management requests and can always gather detailed status information regardless of how much messaging traffic is being routed. This type of detailed and reliable management is simply not possible in a peer-to-peer architecture. 3.3 System Robustness, Predictability and Real-Time Decoupling System robustness is the ability of a system as a whole to function in the face of component-level failures. This means containing the impact of faults so they don t spread throughout the system, and being able to quickly find and fix faults. In messaging systems, typical faults include misbehaving consumers such as slow or crashing consumers or consumers who cannot keep up with bursts, networking faults (at a server or within the network) and intermittent network congestion. In the peer-to-peer architecture, the lack of an intermediary or shock absorber creates a hard wired real-time coupling both between client applications and between clients and the network which makes the overall system fragile and prone to collapse. This coupling exists because when a fault occurs, the peers in the system must work together to overcome it since there is no intermediary to do so. In the peer-to-peer architecture, all consumers and the network must keep up with both the packet and (unfiltered) message rates from all multicast groups they have joined whether they are messages they want or not. Packet loss due to a slow subscriber requires NACK processing and retransmission directly from the publisher, thereby increasing the load on the publisher. In some products, retransmissions are multicast to many subscribers so a single misbehaving client can affect other properly operating clients. During packet recovery, applications must buffer all newly arriving packets until the lost one is received. This places additional memory and processing burden on an already taxed subscriber which can cause it to drop more packets thereby affecting more publishers hence the potential for collapse. In peer-to-peer systems the lack of an intermediary creates a hard-wired real-time coupling between publishers and subscribers, which makes the system fragile. Packet loss in the network, typically many packets in bursts, is even more problematic as it impacts all downstream consumers causing them all to send NACKs to all publishers of dropped packets requesting retransmissions, thereby further increasing demands on all affected publishers. These situations become even more likely as systems scale with more endpoints, networks become more complex and as networks and servers become less homogeneous, thereby giving rise to speed mismatches that cause packet loss. With peer-to-peer, there is no way to flow control the publisher (as it would delay traffic to all other consumers) nor can you isolate applications because there is nothing to provide the isolation. Publishers publish as fast as they can and generate more traffic to gap fill for packet loss, whether this traffic is helping or not. Many messaging systems provide configurable parameters for these behaviours, but knowing how to tune them for all conditions can be difficult to avoid the fix here, break there situation. It s also difficult to determine where and why packet loss is occurring and where the traffic is coming from due to lack of centralized management. Even worse, such storms make the network unreliable so it can be even more difficult to troubleshoot. 10
All the above mechanisms lead to a system that has good performance in a small, uniform, highly managed environment, but performance and stability quickly deteriorate as rates increase, the system scales, the communication patterns become more chaotic, the environment gets less uniform or less high touch managed. Especially in cases where most messages go to only a small number of subscribers, the problems associated with multicast far exceed their benefits. In the Solace architecture, message delivery is tailored to the needs and abilities of each client. The only messages sent to subscribers are for topics that they have subscribed to not all messages in a multicast group. This means that: o An application only needs to scale with the messages it needs to process not at the rate someone decided for a multicast group. o The bandwidth to servers and CPU processing is reduced since message filtering is performed by the message router. o Slow consumer handling is more orderly and controlled with no impact to other clients in the system This leads to a system with higher performance that s much more predictable as system load increases and the system scales. The Solace message router provides an orderly, controlled and isolated handling of misbehaving consumers as follows: o When the offered load is too high for a subscriber for a period of time, the TCP window from the message router to the consumer closes and no more messages are sent. Neither the network nor the client is bombarded with traffic they can t handle. o Messages for the slow client are queued by the message router to guard against message loss. Queue lengths are configurable and additional features such as rate controls and message overwriting (called eliding ) can be applied by the message router to help that consumer. With Solace message delivery is tailored to the needs and abilities of each client, and messages are only sent to applications that are subscribed to their specific topic. o In addition to message pre-filtering, the message router improves the efficiency of slow client processing by bundling many small messages into fewer larger packets for that client, reducing per-packet processing overhead. o In the case of packet loss in the network, the message router interacts with the client to gap fill so there is never any impact on the publisher. Also, if N packets are lost within the network only those N destination applications are affected not all applications that have joined the multicast group that the packet was sent on. Again, this contains the scope of impact. o The fact that the message router is queuing and point-to-point delivering messages to an application also provides an infrastructure point to monitor message rates, queuing levels and packet loss for each client and asynchronously notify the operations team of anomalies without any application code being developed. Rather than the unavoidable hardwired inter-client coupling effect necessitated by peer-topeer, Solace s hardware brokered architecture provides a real-time shock absorber and a true decoupling mechanism for inter-application communication. This in turn creates an architecture that is more forgiving to faults in the system and provides the isolation required 11
to contain the fault and avoid systemic collapse while at the same time providing critical management visibility as to the location and cause of the fault. 3.4 Security The security requirements of applications vary greatly depending on their regulatory environment and organizational policies, but most mission critical deployments require one or many of the following: o Authentication: Restrict access so not just any application or user can connect to the message-level communication bus. Authentication can be as simple as username/password or more complex requiring X.509 certificates or one-time passwords. o Authorization: Maintain entitlement control over which applications can publish and receive certain content. o Encryption: Ensure that no data between applications is in the clear on the network. Such precautions could be required to prevent internal deployment mistakes or guard against malicious or illegal actions. In many applications, the lack of one or more of these capabilities renders that technology ineligible for use. Peer-to-peer architectures can t meet these requirements due to the lack of a messaging intermediary to enforce them. Network level mechanisms such as IEEE 802.1X can be used to provide server-level authenticated access to the network, but they do not provide application-level authentication, nor do they support multicast group access controls, much less enforce per message topic-based publish/subscribe controls. Similarly, transport level encryption standards such as SSL and TLS are not supported by multicast. You can t avoid the issues of multicast by moving to a full mesh of TCP connections as a transport between applications because that would require all peers to support server-side functions such as authentication, authorization, certificate management, etc.. In the Solace broker-based architecture, client applications must connect via TCP to the Solace message router to access the message bus in a client-server model. Per-client authentication is supported, per-client topic-based publish/subscribe access control rules can be enforced in real-time on a per-message basis, and transport level security of various strengths is supported using SSL/TLS. P2P messaging systems can t implement authentication, authorization or encryption because there is no intermediary to enforce them. 3.5 System Scalability (while maintaining simplicity) Messaging deployments typically scale in many dimensions over time number of client endpoints, message rates, number of separate applications, number of sites, number of subnets, etc. The ability to scale each dimension in a manner that is cost effective and minimizes operational complexity is an important consideration when selecting a messaging infrastructure. In the peer-to-peer architecture, the only way to scale the system is to add more multicast groups, gateways, store processes and requisite servers. In general, more hardware and software infrastructure components are added at the edges to accommodate scale (and yet more for high availability) for each function of the messaging system. For example, to add a second application requiring persistence that 12
requires isolation from the first application, typically 3 more store processes need to be deployed on 3 additional servers. Other peer-to-peer considerations as a system scales include: o Messaging mechanisms that were unnoticed in small deployments become problematic. Examples include topic resolution cache sizes, retransmission cache sizes, topic to ID binding refresh rates and aging, retransmission parameters, etc. Not only do you need to monitor and tune these parameters, you need to deploy new settings to all client applications which can be a management nightmare. o Maintaining a uniform hardware environment is also more difficult and expensive as the system scales, and not doing so leads to a fragile system and the need for messaging tuning. For example, a heterogeneous 10GE/1GE network is more likely to cause packet loss at the 10:1 link speed reduction points. Such network-based based packet loss affects all downstream consumers and is more likely with UDP traffic than with TCP traffic due to lack of flow control in UDP/multicast. In the Solace architecture, scalability is achieved as follows: o Virtualization: Each message router can support the addition of new independent applications through the easy creation of virtual message brokers. This provides significant savings and lets you offer messaging as a service in the network. o Supporting clients in different subnets does not require additional gateways or servers since those clients can connect to a common Solace message router due to the use of TCP instead of multicast. o Scaling to connect to other sites doesn t require additional gateways or servers either, since message routers can be configured for inter-device adjacencies regardless of location. o Each message router can support thousands of concurrent connections, and handle throughput of 24M non-persistent msgs/sec and 450,000 persistent messages per second. It s easy to add more capacity with the installation of a new message router, or pair of message routers for HA. On a per-virtual broker basis, these message router pairs can be neighbored or federated to allow clients within the same application to communicate with each other even though they are connected to different message routers. o Issues such as topic discovery and resolution mechanisms on each client do not exist in the Solace architecture since the message router maintains this state and brokers this communication, thereby decoupling publishers and subscribers from each other and dramatically reducing the amount of client-based configuration required. Thanks to high capacity and virtualization, it s easy to scale Solacebased messaging systems to support new applications, messaging functionality or capacity requirements. o The network and server environments need not be as uniform as with peer-to-peer multicast since the message router performs per-client filtering and uses unicast TCP with its flow control to navigate heterogeneous networks in a controlled manner. In many dimensions, due to the hardware datapath and virtualization capabilities, scaling a system based on Solace requires no additional investment, hardware deployment or complexity and in cases where additional message routers are required, the inter-device routing capabilities make federation operationally simple to achieve. 13
3.6 Enterprise-grade, robust and feature rich persistent queuing Other than market data and sensor networks, the vast majority of distributed applications require message persistence in their messaging infrastructure since message loss for any reason is not acceptable. Many require high rate, many require sophisticated queuing capabilities and some require low latency, but all of them require a fault tolerant, highly available system that doesn t lose messages and this most often is far more important than driving latency below 10 microseconds. In the peer-to-peer architecture, persistence is handled using a technique known as parallel persistence by store processes which are special subscribers to the multicast bus as described in Section 2.1. The goal of parallel persistence is to minimize latency within a small solution rather than addressing the needs of an enterprise messaging system because: o Parallel persistence only works within a subnet. Beyond that, if it is supported at all, gateways are required between the subnets which then starts to look a lot like a brokered solution, only it involves both store processes and gateway process making it more complex than a broker solution. o Ensuring storage integrity and synchronization with three independently running store processes is fragile especially in the face of store process outages and networking problems o Providing replication to remote datacenters for disaster recovery is an afterthought in most peer-to-peer systems since the active site has three (or more) copies of the data. o Increasing the number of independent applications using persistence can require deploying another set of store processes for each new application to provide appropriate isolation, thereby creating much larger datacenter sprawl than with software brokers By providing high performance broker, Solace offers the benefits of a brokered architecture without the performance limitations and other drawbacks of software brokers. o Important messaging capabilities such as transactions, selectors, queue overflow handling and sophisticated queue servicing algorithms that exist to facilitate application architectures are completely at odds with parallel delivery, and in some cases impossible to implement with parallel delivery. For example, you can t deliver messages in a transaction until the transaction is committed as it might be aborted by the application. While it is true that many peer-to-peer products also support the ability to have the store process receive messages from publishers and forward to subscribers that makes it a brokered architecture with all the associated software performance issues but without the management of a full broker-based system. The Solace persistence architecture uses patented techniques and custom-built hardware specifically designed for the needs of high rate, low latency never lose a message enterprise wide persistent queuing to overcome the bottlenecks associated with file systems and rotating and or solid state storage systems. By providing such a high performance broker, the Solace architecture provides all the benefits of a brokered architecture without the performance and real-time drawbacks of a software implementation: 14
o Messages and message delivery state are synchronously stored on two message routers and synchronized across message router activity switches at up to 450,000 msgs/sec. o Messages can be synchronously or asynchronously replicated and queue status synchronized to a remote site to provide disaster recovery functionality without needing to involve storage replication such as SRDF. o The store and forward queuing enables support for transactions, selectors, various queue servicing disciplines. o Message routers can be federated to allow producers and consumers of persistent messages to be in different datacenters or in different continents. o Management, monitoring and high availability are all built-in and since all messages flow through the message router, message rates and queue levels can all be monitored and asynchronous events generated when thresholds are exceeded for maximum visibility. 3.7 Lower application development costs and fast time to market Due to combinations of the many points above, application development, deployment and operations costs are significantly less with a Solace hardware broker-based architecture than with peer-to-peer. The principle reasons are fewer moving pieces in production, lower complexity of the overall system, fewer parameters to tune, ubiquitous reach and integrated platform for management and high availability. Solace delivers a completely integrated messaging solution rather than the components of a solution that need to be integrated, engineered and made manageable by the customer and then maintained on an ongoing basis. In doing so, Solace customers can focus their energy on their differentiators and what makes them great rather than building infrastructure. 4 Summary Messaging systems that have been deployed at an enterprise scale have historically had some type of broker or daemon between client applications for many of the reasons listed in this paper. Recent interest in peer-to-peer architectures for high performance applications has been driven by the poor performance and predictability of software broker messaging products, and led some to consider deploying such peer-to-peer products at larger than departmental scale. To learn more visit solacesystems.com or call +1 613-271-1010. Solace s hardware-based solution changes the game by providing all the benefits of a brokered architecture with the high rate, low consistent latency of a hardware solution purpose built to meet the needs of high performance, enterprise wide messaging. 15