Data Challenges of Telcos and Service Providers HawkEye AP Delivers Impressive Results October 2014
Proven Deployment In a recent deployment at a mobile Service Provider, Hexis delivered impressive results: CDR average streamloading frequency: 12 minutes Loading CDRs in online mode (concurrent with queries) showed only a 10% impact on system performance 150 concurrent users were able to run call detail queries against three billion records with an average response time of 1.6 seconds 15,000 call detail queries were performed in only 70 minutes Understanding the Ins and Outs of CDR Investigations Increasingly, Telcos and Service Providers are at the center of investigations ranging from fraud, cyberterrorism, online bullying, and international drug trafficking and child pornography. This is because many of crimes revolve around Internet and mobile devices activities. Law enforcement officials and government agencies realize that, buried within ISP transactional logs, clues to past or future crimes are uncovered. It was this realization that drove the European Parliament to approve rules forcing telephone companies to retain Call Detail Records (CDR) and Internet Protocol Detail Records (IPDR) for use in anti-terrorism investigations. The specific data retention requirements are mandated in the Directive 2006/24/EC to provide for National Security and Serious Crime, and require the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks. The Directive requires Member States to ensure that communications providers must retain, for a period of between six months and two years, necessary data as specified in the Directive, in order to: trace and identify the source of a communication trace and identify the destination of a communication identify the date, time and duration of a communication identify the type of communication identify the communication device identify the location of mobile communication equipment The Directive also specifies collection of data from ALL telecom services, fixed, mobile and Internet as well as all Internet access, including email access. The data is required to be available to competent national authorities in specific cases, for the purpose of the investigation, detection and prosecution of serious crime, as defined by each Member State in its national law. Similarly, a new bill was introduced in the U.S. House of Representatives that, if passed, will require Internet Telco or Service Providers to retain subscriber information for up to 18 months to assist federal law enforcement in investigations into online child pornography and child exploitation cases. These broad mandates force Telco or Service Providers to maintain a view of data generated across the various systems and platforms that support their operations: Network security systems, including Internet gateways and firewalls, endpoint and mobile device management solutions, billing and customer relationship management systems. This leads to complex data collection, retention and analysis requirements that are just beginning to emerge. Understanding the Big Data Problem In order to size the challenge, let s first start with the mobile device services that have to be tracked: Voice, SMS (text), MMS, Internet and Internet telephony. The following are average mobile per-subscriber statistics to give a sense of the scale: Average voice calls: 300 per month Average SMS (text): 300 per month Average mobile Internet data requests: 150 per month For an operator with 20 million subscribers, just these three categories of traffic will generate over one billion detailed records daily which must be made available for queries. The need to retain this volume of data for two years, much less perform detailed traffic analysis at the request of authorized parties, is a significant challenge. The complexity is compounded by the requirement to correlate those records against billing information is added. This is driven by the need to answer questions like: Who owns the device being used? Does the billing address correspond with where the device activity is originating? Is there an unusual amount of activity, as compared to previous billing cycles? Based on call logs, are there variances in voice versus text usage suddenly? In some cases, operators have tried to solve these challenges by deploying systems to process the CDRs from Billing Mediation Servers (which handle CDRs for completed calls). The weakness in this approach: a significant portion of the criminal behavior of interest involves calls that are not completed, or may only be placed once. Examples of this are signals to other parties (ring twice, then hang up) or as triggers to explosive devices. In order to detect these and other non-standard suspicious events, operators must have the ability to process CDRs and IPDRs generated by all nodes in the operators network including the Base Station Controllers (BSCs) which would include all placed calls adding ring, no answer calls that went to voicemail as well as those that were terminated prior to going to voicemail. 2 Data Challenges of Telcos and Service Providers
Once the right set of data is collected, there is the issue of analyzing it coherently and quickly. Investigations often require teams to run complex, ad-hoc queries in order to discover new patterns of usage or behaviors. Therefore, it is not a realistic option to make data only accessible to certain experts in the organization or provide analysts with basic, pre-defined reports. Hexis Capabilities Hexis provides a scalable, high performance event data management and analysis platform that makes compliance with the EU Data Retention Directive and other Communications Data Record management requirements feasible and cost effective. Requirements to Address These Challenges There are four main stages to consider in the security event lifecycle: Collect Retain Analyze Dispose Structured and Raw Data Flexibility in storage capabilities is a key benefit to Hexis customers. In many cases, Telco and Service Providers need to store both structured and raw data for different business purposes. In a single event data warehouse, this is an example of a Hexis customers data storage: 2987 million CDRs collected per day Average size of structured CDR data: 500 bytes Average size of original/raw CDR data: 200 bytes Total 851 GB of loaded each day Hexis for Telco or Service Providers Today, Hexis is deployed at mobile, fixed-line and cable Telco and Service Provider customers in the United States, Europe, Africa and Asia. The Hexis solution is purpose-built for this market and selected because of its unique ability to: handle high volume and load rates of data store the data in a highly compressed format retain large volumes of raw data indefinitely (e.g. ten years or more, if needed) perform complex analysis, utilizing Sparse Query Optimization The Hexis Data Management Platform was built from the ground-up to store extremely large volumes of structured and unstructured data (multiple terabytes to petabytes) while providing the ability to run queries across the entire database. These basic characteristics of the system enable the ability to ingest data from any source and to execute the business logic necessary to tackle the biggest challenges in the security domain. Hexis delivers the following benefits: Diverse Collection: Captures and centrally aggregates all event records from all relevant sources This includes telephony, email messaging, web traffic and custom applications both in ASCII (processed) and Binary (unprocessed) formats Efficient Management: Parses and stores event data in a highly compressed format to reduce storage requirements Utilization of intelligent / active archive platforms also reduces storage management overhead. High-speed, Online Analysis Rapid, pinpoint search through terabytes of data, correlating across event source types Scalable Performance Exceptional data load and query performance that can be easily expanded built upon a patented columnar database structure Open Access to Stored Data Allows direct access from standard Business Intelligence tools via a standard ODBC/JDBC interface 3 Data Challenges of Telcos and Service Providers
Integration with Third Party Applications While Hexis provides a fully functional management and retrieval system for Communications Data, many Telcos and Service Providers have existing workflow systems or enhanced requirements for local Law Enforcement Agencies (LEA). In these circumstances, Hexis provides a tight integration with external applications via either a Perl DBI interface or standard ODBC / JDBC calls. Examples of such systems are Web Portals for direct LEA access or ETSI compliant applications. The diagram below provides a high-level overview of a standard Hexis deployment within the Telco or Service Provider s network: Additional Telco or Service Provider Use Cases In addition to addressing the law enforcement/compliance aspect of Telco or Service Provider data, there are other data management cases Hexis can address. Below are a few of the advanced use cases driven by proactive operators: 1. Consolidation of CDRs into an event operational data store 2. Fraud/SPAM/Phishing detection 3. Interconnect, roaming/call termination causes 4. Billing audits/revenue assurance 4 Data Challenges of Telcos and Service Providers
1. Consolidation of CDRs into an Event Operational Data Store A key driver for Telco Operators is managing cost of data storage and access within an environment that is growing significantly. Subscriber events, source data from upstream sources (i.e. from the network), provisioning and billing information, create a staggering amount of data that must be collected, managed, stored, analyzed and disposed of. Today, this data is mediated by the Mediation system which applies business logic to create subsets that are consumed by various downstream applications. These applications then create separate databases of CDRs etc that grow independently of each other and will contain multiple copies of elements of the source data which can get out of sync. As data volumes grow, the utilization of Mediation for business logic processing increases, along with the volume of downstream databases. 2. Fraud/SPAM/Phishing Detection Mobile operators and their subscribers are susceptible to various fraud schemes. With the ability to look across large volumes of data and for extended periods of time, operators are able to detect a number of fraudulent patterns and take corrective actions to minimize liabilities and protect consumers: Fraudulent Use: There are a number of schemes employed by fraudulent users that allow set up a large unpaid bill on a post-paid plan. By looking at all calls for telltale-calling patterns (e.g. roaming calls placed from regions where fraud has been detected or to fraud associated numbers) can be an early warning that fraudulent behavior is underway. The account can be placed on hold and the subscriber contacted to determine if the calling is legitimate or not. Implementation of a Hexis Operation Data Store (ODS) allows Telco operators to create a centralized repository of static CDR and other source data which can be accessed by the downstream applications, thereby reducing the workload on the Mediation platform in terms of data manipulation and business logic. Benefits include: Reduced / consolidated database licensing costs A single source of verified CDR and other information that is shared between applications, thereby ensuring consistency of data between downstream processes Ability to scale in a controlled manner as the subscriber base grows Unrequested Text Messages/Phishing Attacks: The ability to detect abnormal SMS patterns, operators are able to identify suspect (and legitimate) senders and black or white list as appropriate, direct the throttling of traffic from suspect senders until an analysis is complete, and inspect the contents for phishing text keywords. SPAM/Virus/Trojan Reductions: A similar pattern of detection abnormal traffic patterns in IP traffic and suspect originating servers can help reduce the spread of viruses and Trojans. An approach that compliments real-time detection and throttling systems (like policy charging and enforcement nodes), Hexis can look back at systems affected by various attacks to detect nodes that need better protection or spare capacity to carry legitimate traffic during new attacks. 5 Data Challenges of Telcos and Service Providers
3. Interconnect, Roaming/Call Termination Causes When a subscriber calls a number not on the carrier s network, the carrier must pay an interconnect fee to the callee s Telco or Service Provider to complete the call. Similarly, if a carrier s subscriber places a call when roaming (not on the primary carrier s network), the primary carrier s roaming partner charges a hefty fee to handle the call. The ability of Hexis to efficiently store and query on all CDR data allows queries that identify: On/Off Network Calling Patterns: Understanding when off-network calls are placed, in which regions of the network, time of day, customer segment, length of call, etc can allow the carrier to offer tariff plans that are design to minimize cost and negotiate better terms with roaming/interconnect partners. The end result is improved profitability and plans tailored to the needs of subscribers (increasing retention and reducing acquisition costs) Call Success/Termination Causes: Tracking call termination causes (e.g. coverage, saturated network, neighbor list errors, etc ) will help RF Engineering teams make tuning adjustments or plan for additional capacity/coverage to improve network performance statistics. Roaming Analysis: Analysis of roaming calling patterns (for their own and roaming subscribers on their network) will permit operators to negotiate optimal roaming terms with partners. When this information is also combined with call termination data and RF data from Radio Access Network nodes, it would be possible for operators to determine areas where a lack of coverage is increasing roaming costs, and even the loss of customers to competitors (users in low coverage areas are likely to switch to an operator with better coverage at the end of the contract term). 4. Billing Audits/Revenue Assurance For many years, Telco and Service Providers (mobile and fixed) have been leaking revenue due to errors in the billing processing chain. The size of the problem is significant, with average revenue leakage in the 1-12% of gross revenues, and in some cases up to 20% on some data services. Based on AT&T s FY2011 financials, even at 1%, this could be a $1.2B problem for them. Missing or incomplete CDR records generated by various service nodes in the network can cause billing errors. Detection of these errors is possible by reviewing events logged from all the nodes in the call path and in the billing cycle. These include the Serving or Gateway GPRS Support Nodes (SGSNs, GGSNs), Base Station Controllers (BSCs), Mobile Switching, Multimedia or Short Message Service Centers (MSCs, MMSCs, SMSCs), mediation and billing servers. By looking for records generated by upstream nodes that do not map to complete/valid CDRs, Hexis is able to identify unbilled service delivery. Greater Data Management, Lower Cost In addition to addressing the critical requirements for these use cases, Hexis provides the added value of reducing storage costs. The following charts show the monthly CDR data load of a typical Hexis deployment: 6 Data Challenges of Telcos and Service Providers
Rapid Analysis over Large Data Sets In another Hexis Telco customer, a Service Level Agreement is in place around query response time across massive volumes of data. Analysts must complete 524 queries in ten hours against 60 days of data - The average processing time for request: 344 seconds CDR average load per day frequency: 150 million IPDR average load per day frequency: 95 million Based on the Hexis event data warehouse technologies, including a patented columnar storage and powerful compression, this is an example of a customer s storage savings. Total number of loaded records in 24 months Total source size Total storage used in Hexis deployment Storage saved by using Hexis 471,419,711,147 357 TB 43 TB 314 TB With over 314 TB saved in disk space, and using a calculation of six to ten thousand dollars per TB in SAN storage/operational cost these days, the return on investment is evident, as this compression ratio is less than one-tenth the data being loaded. Raw byte size (in TB) Online storage used in Hexis Nearline storage used in Hexis 7 Data Challenges of Telcos and Service Providers
Summary Hexis provides a cost-effective, scalable, high performance event data management and analysis platform that makes the management and utilization of large volumes of CDR and IPDR data over time feasible and cost effective. Hexis delivers the following benefits: Diverse Collection: Captures and centrally aggregates all event records from all relevant sources including telephony, email messaging, web traffic and custom applications Efficient Management: parses and stores event data in a highly compressed format to reduce storage requirements High-speed, Online Analysis: rapid, pinpoint search through terabytes of data, correlating across event source types Scalable Performance: exceptional data load and query performance that can be easily expanded Integration with Nearline Archive Platforms: Considerably lowers the costs of managing operational network data Hexis Cyber Solutions, Inc. a wholly-owned subsidiary of The KEYW Holding Corporation 7740 Milestone Parkway Suite 400 Hanover, MD 21076 info@hexiscyber.com 443.733.1900 About Hexis Cyber Solutions Hexis Cyber Solutions, Inc., a wholly-owned subsidiary of The KEYW Holding Corporation (NASDAQ: KEYW) based in Hanover, Maryland, provides complete cybersecurity solutions for commercial companies, and government agencies. Our mission is to ensure that business IT infrastructure is equipped with tools and capabilities to detect, engage, and remove both external and internal cyber threats. Cyber terrorists, organized crime, and foreign governments focus tremendous effort on commercial, government, and military interests as their prime targets. Hexis Cyber Solutions HawkEye family of products offer active, multidisciplined approaches to achieve a higher standard of cybersecurity that is based on our expertise supporting advanced cybersecurity missions within the US, ensuring your business or organization operates at its maximum potential. For more information contact Hexis Cyber Solutions, 7740 Milestone Parkway, Suite 400, Hanover, Maryland 21076; Phone 443-733-1900; Fax 443-733- 1901; E-mail info@hexiscyber.com; or on the Web at www. hexiscyber.com. Copyright 2013 Hexis Cyber Solutions, Inc. All rights reserved. Hexis Cyber Solutions and HawkEye are protected by U.S. and international copyright and intellectual property laws. Hexis Cyber Solutions and HawkEye are registered trademarks or trademarks of Hexis Cyber Solutions Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. 8 Data Challenges of Telcos and Service Providers Rev. Oct.22, 2014