Using Oracle TimesTen to Deploy Low Latency VOIP Applications in Remote Sites Thomas Lynn Comcast National Engineering & Technical Operations October 009
Comcast Digital Voice (CDV) Comcast Digital Voice has become the third largest residential phone service provider in the U.S. 7 million customers 80+ VOIP switches deployed across the united states VOIP Application Background Comcast had a need to build a VOIP application server to support the needs of multiple value added applications Universal Caller ID (UCID) is first in the deployment list UCID provides a link between Comcast s Digital Voice, Digital Cable and High-Speed Internet offerings UCID provides caller id on televisions through Digital Cable and PCs with a small downloadable client UCID is deployed on a general purpose Communications Application Server infrastructure Communications Applications Server (CAS) Some CAS applications deployed including UCID would need to be part of the Session Initiation Protocol (SIP) call flow Applications that are part of SIP call flow can increase call connect times Call connection times relate directly to customer perception of CDV quality Low internal application latency was deemed to be extremely important to maintain customer satisfaction
Applications In The SIP Call Flow Simplified VOIP Call Flow. The calling party s VOIP Phone contacts Originating VOIP Switch. Originating VOIP Switch determines destination and contacts Terminating VOIP Switch 3. Terminating VOIP Switch contacts the destination party s VOIP Phone Simplified VOIP Call Flow with UCID. The calling party s VOIP Phone contacts the Originating VOIP switch. Originating VOIP Switch determines destination and contacts the Terminating VOIP Switch 3. Terminating VOIP Switch determines the call should be routed to CAS and contacts CAS 4. CAS performs determination of customer services and responds to the Terminating VOIP Switch 5. The Terminating VOIP Switch contacts the destination party s CDV phone 6. Once the destination phone rings, UCID contacts the destination party s TV and PC Application transactions in the SIP call flow must be fast to add value and not detract from quality 3
Communications Application Server Topology Multiple site deployments near customers were deemed necessary to eliminate the possibility of network latency Many other steps were taken to reduce application latency, one of them was database architecture Application Response Time of less than 00ms is necessary to maintain high quality connect times Database Requirements To reduce application latency and drive call quality, a local data source at each site makes sense Centralized data for ease of administration Geo-Redundancy High Availability Internal Site Redundancy Response time of less than 0ms Meeting The Requirements TimesTen meets the needs of real time queries Cache Connect to Oracle meets the centralized and localized strategy at the same time Oracle Data Guard serves the Geo-Redundancy requirements Oracle RAC provides high availability Data In The Application Layer Reduces Latency Observed local query time 0.0074 sec/query Observed WAN query time 0.50009 sec/query RAC DB EAST RAC DB WEST 4
Meeting internal site redundancy requirements Once data is replicated into the site, a second strategy was required to meet site redundancy requirements TimesTen Active Standby Pair with Subscribers was decided upon because it provided the ability to maintain replicas of the cached data the site As data enters the site from Oracle Cache Connect it is committed to the Standby Master first, this ensures that the Standby is always up to date and can take over for the Master without complete cache reload The Master is then committed Finally the subscribers replicate via the Standby since it contains the most accurate cache in the site Active Subscriber 0g Database Standby Subscriber Cache Connect TimesTen Replication Application Data Source Strategy In order to maintain speed in the application layer and add redundancy, applications must have built in data source failover with priority One of the reasons TimesTen is so fast is that the applications reside on the same server as the database No network protocol overhead for local queries Applications should prioritize data sources and always select local sources over remote 5
Table Structure Cache Replication Cache Connect to Oracle allows for simple and accurate replication strategies Cache Connect groups can follow Primary Key / Foreign Key table design in Oracle source database tables Cache groups can be built with a tiered structure that allows multiple tables to replicate together in single transactions, this allows TimesTen to mimic constraints in Oracle and replicate data without constraint errors If there are requirements for data to differ from site to site there is a filtering ability in cache groups that allows for data from all sites to be stored centrally but filtered by Cache Connect If cache group filters are applied on parent tables in the cache, only data linked to that parent will be replicated based on the PK/FK relationship 6
Cache Connect Replication With Where Clause allows for data from all sites to be stored in the same central tables but replicate to only desired sites TimesTen SITE SITEID Oracle Source Tables SITE SITENAME Philadelphia San Francisco SITEID TEL SITE SITENAME Philadelphia CUSTOMERS ADDRESS ZIP SITEID CUSTOMERS 555-555-555 nowhere TEL ADDRESS ZIP SITEID 555-555-5553 33 nowhere 555-555-555 nowhere 555-555-5555 55 nowhere 555-555-555 nowhere 555-555-5553 33 nowhere TimesTen SITE 555-555-5554 555-555-5555 555-555-5556 44 nowhere 55 nowhere 66 nowhere SITEID SITE SITENAME San Francisco Cache Tables Creation Syntax CALLID.SITE ( SITEID NUMBER(38) NOT NULL, SITENAME VARCHAR(30 BYTE) INLINE NOT NULL, primary key (SITEID)) where (CALLID.SITE.SITEID=), CALLID.CUSTOMERS ( TEL VARCHAR(0 BYTE) INLINE NOT NULL, ADDRESS VARCHAR(30 BYTE) INLINE, ZIP VARCHAR(0 BYTE) INLINE, SITEID NUMBER(38), primary key (TEL), foreign key (SITEID) references CALLID.SITE (SITEID)) TEL 555-555-555 555-555-5554 555-555-5556 CUSTOMERS ADDRESS nowhere 44 nowhere 66 nowhere ZIP SITEID 7
Putting It All Together In Clusters Enhances Built in Redundancy In order to ease maintenance CAS has been separated into multiple functional units or clusters per site Each unit can stand alone to serve SIP requests during maintenance windows The following possibilities can be overcome with little affect when running in a mode such as this: Loss of Network Connection to Oracle Source DB Loss of Active Master Loss of Standby Master Loss of Subscriber Load Balancer Data Guard 8
Built in Redundancy: Loss of Network Connection to Oracle DB During network outage to the Source Database only updates are lost to the site The Site can run as a stand alone entity for hours or days if necessary Triggers created by Cache Connect maintain change records in intermediate tables on the Source Database Once connection is restored, TimesTen will receive incremental changes from Oracle Load Balancer Data Guard 9
Built in Redundancy: Loss of Active Master Active master loss can be overcome by elevating the Standby to Master Since the Standby is Committed before the Master, it is always the most accurate copy in the site so only incremental changes are required Applications that were locally connected to the master on must gracefully failover to In this case minimal query latency will be introduced on due to network protocol overhead and LAN latency Load Balancer Data Guard 0
Built in Redundancy: Loss of Standby Standby failure can be overcome by replicating the Subscribers directly from the Master When the Standby is inactive, the Master will commit first Applications that were locally connected to must gracefully failover to In this case minimal query latency will be introduced on due to network protocol overhead and LAN latency Load Balancer Data Guard
Built in Redundancy: Loss of Subscriber Subscriber Loss is the most simplistic case since these data stores only pull data incrementally from the Master or Standby Subscribers have no interaction with Oracle so there are no concerns with the incremental state of Cache Connect Applications that were locally connected to the failed instance must failover and minimal latency is introduced on the affected server Load Balancer Data Guard
Built in Redundancy: Ease of Maintenance To perform maintenance on sites, load can be redirected to, this creates an application snapshot in Cluster which continues to serve subscribers while is modified Schema changes could occur in Oracle Source Tables Complete rebuilds of cache groups could occur New applications could be installed If source table changes occur in Oracle, full cache table rebuilds must occur so that triggers can be validated Load Balancer Data Guard 3
Built in Redundancy: Ease of Maintenance Load Balancer Once initial maintenance is complete, traffic can be repointed to while is modified Data Guard 4
Issues Encountered / Fixes Cache Groups should always be dropped when shutting down sites for extended periods of time, or decommissioning, if this is not performed intermediate tables will grow infinitely or until tablespace is full tracking the incremental changes in Oracle Tracking down TimesTen sites that are causing intermediate table growth is relatively simple Determine the intermediate table tt_03_{number}_l causing db load select * from tt_03_agent_status where object_id = {same number as table} order by bookmark; The lowest bookmark should be the site with issues Reconnecting Cache Connect will cause intermediate table cleanup Even minor modifications to Source Table Schema require complete Cache Group rebuild else logs will complain about validity Conclusion TimesTen with Cache Connect to Oracle allows us to meet our strict latency requirements Simplifies redundancy Provides ease of site specific data replication TimesTen with Cache Connect provides an excellent way to maintain a centralized database with datasources at the edge 5