An Informix TimeSeries based Telco Data Retention Solution: Lessons Learned



Similar documents
Programming Against Hybrid Databases with Java Handling SQL and NoSQL. Brian Hughes IBM

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

What new with Informix Software as a Service and Bluemix? Brian Hughes IBM

The IBM Archive Cloud Project: Compliant Archiving into the Cloud

Leveraging WebSphere Commerce for Search Engine Optimization (SEO)

Development Environment and Tools for Java. Brian Hughes IBM

WebSphere Business Monitor

Virtualization and the U2 Databases

Industry Models and Information Server

IBM WebSphere Application Server Communications Enabled Applications

IBM WebSphere Partner Gateway V6.2.1 Advanced and Enterprise Editions

Focus on the business, not the business of data warehousing!

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

XDR. Big Data solution.

IBM Cognos Controller Version New Features Guide

IBM WebSphere Application Server

WebSphere Commerce V7 Feature Pack 2

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Session D15 Simple Visualization of your TimeSeries Data. Shawn Moe IBM

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Practical Performance Understanding the Performance of Your Application

IBM Tivoli Provisioning Manager V 7.1

IBM TRIRIGA Application Platform Version Reporting: Creating Cross-Tab Reports in BIRT

IBM Enterprise Marketing Management. Domain Name Options for

IBM Enterprise Marketing Management. Domain Name Options for

Database lifecycle management

Memory-to-memory session replication

IBM TRIRIGA Version 10 Release 4.2. Inventory Management User Guide IBM

SINGLE SIGNON FUNCTIONALITY IN HATS USING MICROSOFT SHAREPOINT PORTAL

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Integrating ERP and CRM Applications with IBM WebSphere Cast Iron IBM Redbooks Solution Guide

How To Integrate Pricing Into A Websphere Commerce Pricing Integration

IBM Tivoli Service Request Manager 7.1

IBM WebSphere Application Server

C05 Discovery of Enterprise zsystems Assets for API Management

Linux. Managing security compliance

Best Practices with IBM Cognos Framework Manager & the SAP Business Warehouse Agnes Chau Cognos SAP Solution Specialist

PORTA ONE. New Features Guide Maintenance Release 21.

Rapid Data Backup and Restore Using NFS on IBM ProtecTIER TS7620 Deduplication Appliance Express IBM Redbooks Solution Guide

Deploying a private database cloud on z Systems

Business Process Management IBM Business Process Manager V7.5

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

Communications Server for Linux

How To Choose A Business Continuity Solution

Symantec NetBackup OpenStorage Solutions Guide for Disk

VMware vsphere Data Protection 6.0

Redbooks Paper. Local versus Remote Database Access: A Performance Test. Victor Chao Leticia Cruz Nin Lei

Cisco and IBM: Enhancing the Way People Work Through Unified Communications

z/os V1R11 Communications Server System management and monitoring Network management interface enhancements

Hadoop Basics with InfoSphere BigInsights

Security of Cloud Computing for the Power Grid

DATA RETENTION. Guidelines for Service Providers

IBM Software Services for Collaboration

IBM FlashSystem. SNMP Guide

SAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs

CA ARCserve Backup for Windows

z/os V1R11 Communications Server system management and monitoring

IBM Tivoli Network Manager V3.9

Packet Capture Users Guide

TSM (Tivoli Storage Manager) Backup and Recovery. Richard Whybrow Hertz Australia System Network Administrator

Symantec NetBackup Vault Operator's Guide

SAP Sybase Adaptive Server Enterprise Shrinking a Database for Storage Optimization 2013

Tivoli Endpoint Manager for Security and Compliance Analytics

Tivoli Endpoint Manager for Security and Compliance Analytics. Setup Guide

WebSphere Commerce and Sterling Commerce

End-To-End Invoice Processing Automation at Land O Lakes NATALIE HAWLEY LAND O LAKES

Declaration of Conformity 21 CFR Part 11 SIMATIC WinCC flexible 2007

IBM Lotus Enterprise Integrator (LEI) for Domino. Version August 17, 2010

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Ten questions to ask when evaluating contract management solutions

WebSphere DataPower Release DNS Enhancements

Driving Smarter, More Efficient Supply Chains Through Analytics

IBM FileNet System Monitor FSM Event Integration Whitepaper SC

ETNO Expert Contribution on Data retention in e- communications - Council s Draft Framework Decision, Commission s Proposal for a Directive

BizTalk Server Business Activity Monitoring. Microsoft Corporation Published: April Abstract

IBM Digital Experience meets IBM WebSphere Commerce

IBM DB2 specific SAP NetWeaver Business Warehouse Near-Line Storage Solution

Informix Performance Tuning using: SQLTrace, Remote DBA Monitoring and Yellowfin BI by Lester Knutsen and Mike Walker! Webcast on July 2, 2013!

CA ARCserve Backup for Windows

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Rational Asset Manager 7.2 Editions and Licensing

IBM Cognos Controller Version New Features Guide

IBM InfoSphere Master Data Management Server

Business-driven governance: Managing policies for data retention

IBM Software Hadoop Fundamentals

Enterprise content management solutions Better decisions, faster. Storing, finding and managing content in the digital enterprise.

IBM RATIONAL PERFORMANCE TESTER

Oracle Enterprise Manager

WebSphere Commerce V7 Feature Pack 3

Creating a Cloud Backup Service. Deon George

Click here for Explanatory Memorandum

How To Set Up Total Recall Web On A Microsoft Memorybook (For A Microtron)

Introducing STAR-GATE Enhancements for Packet Cable Networks

WebSphere Business Monitor

Version 8.2. Tivoli Endpoint Manager for Asset Discovery User's Guide

Transcription:

An Informix TimeSeries based Telco Data Retention Solution: Lessons Learned Alexander Koerner IBM Germany (On behalf of Cedros Gesellschaft für Datenverarbeitung mbh, Germany) 1

Alexander Celebrating 25 Years of Informix InfoWorld, Nov 13th, 1989 (https://books.google.de/books?id=staeaaaambaj&lpg=pt79&dq=informix%201989&pg=pt78#v=onepage&q&f=false) 2

My Informix@Home Project An Informix TimeSeries, REST API, JSON & Raspberry Pi 2 based Weather Station... 3

Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They areprovided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warrantyof any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBMor its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results theymay have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Some material in this presentation is Copyright IBM Corporation 2015. All rights reserved. U.S. Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.comare trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others. 4

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 5

Telco Data Retention: Legal Background Europe On 15 March 2006, the European Union adopted the Data Retention Directive. It requires Member States to ensure that communications providers retain the necessary data as specified in the Directive for a period of between 6 months and 2 years in order to... Trace and identify the source of a communication Trace and identify the destination of a communication Identify the date, time, and duration of a communication Identify the type of communication Identify the communication device Identify the location of mobile communication equipment The Directive covers fixed telephony, mobile telephony, Internet access, email, and VoIP On 8 April 2014, the Court of Justice of the European Union declared the Directive 2006/24/EC invalid for violating fundamental rights The Council's Legal Services have been reported to have stated in closed session that paragraph 59 of the European Court of Justice's ruling "suggests that general and blanket data retention is no longer possible" Source: Wikipedia 6

Telco Data Retention: Legal Background United States The National Security Agency (NSA) commonly records Internet metadata for the whole planet for up to a year in its MARINA database, where it is used for pattern-of-life analysis. U.S. persons are not exempt because metadata are not considered data under US law. Its equivalent for phone records is MAINWAY. The NSA records SMS and similar text messages worldwide through DISHFIRE. Various United States agencies leverage the (voluntary) data retention practised by many U.S. commercial organizations through programs such as PRISM and MUSCULAR The United States does not have any Internet Service Provider (ISP) mandatory data retention laws similar to the European Data Retention Directive. All attempts to create mandatory retention legislation have failed. Source: Wikipedia 7

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 8

cedros An Overview cedros isoneoftheoldest and biggest IBM Premier Business Partners in Germany and authorized for resale in all software families cedros develops information and communication technology that is characterized by high creation of value for customers. Efficiency and investment security are of crucial importance in this context Their primary competencies are in four areas: Enterprise Software Solutions Telco Solutions & Services Software Infrastructure & Services IBM Software sales 9

cedros Telco Solutions Legal Compliance Telecommunications operators or service providers are obligated by law to setup technical systems in order to monitor their infrastructures and, when asked to do so, to provide information on the stored data or communication content Cedros develops solutions that realize both the technical systems and the information process Efficiency, convenience and data protection are of central importance along the way 10

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 11

Data Retention Requirements According ETSI TS 102 657 V1.14.1 (2014-03) Telephony / Mobile / VoIP Originating(A) and destination (B) number/id, forwarding (C) number Start/End time, time zone for the call; for VoIP: IP address for A/B/C For mobile: IMEI, IMSI, Cell-ID (for all involved subscribers) Call type/service (Voice, Fax, Data, SMS, MMS, GPRS, LTE etc.) Prepaid: First activation, cell id; Internet Access Login-Name, assigned IP-Address (Radius/DHCP) Line/Port identifier, Connection/Service Type Start-/End of the session E-Mail Service E-Mail in/out: Subscriber Id, From/To/CC/BCC, Message-Id, Date/Time, Servers, IP-Addresses Mailbox Access: Subscriber Id, server, IP addresse of the client 12

cedros Data Retention Solution Data Storage Server: c.-drs implements data retention according to legal requirements uses IBM Informix Version 12 for the data store utilizes the Informix Time Series feature provides an Web UI for simple retrieval process Information Process/Workflow: c.-ais connects to all relevant backend systems (DR store, invetory data, ERP, LI system etc.) implements the complete workflow including data processing/formatting, invoicing etc. runs or IBM Lotus Notes or as Web UI 13

Sample Infrastructure Network element Logfiles Invetory Data Approval by clerk Radius Mediation (CDR) c.-drs Informix with Time Series: Retention Data Case Authorities Case History ERP- System Invoice 14

A Customer Example A Swiss telecommunications provider Requirements based on Swiss data retention laws Implemention close to the European Telecommunications Standards Institute s (ETSI) standards CDR processing Input format is CSV via SFTP TSL-loaded into Informix Query definition via Web Interface Results sets are generated in XML About 60 120 million CDRs/day Data retention period: 180 days Database size: 6 TB 15

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 16

Informix Time Series forcdrs 1/2 Informix providesan optimizedstorageforthe Retention Data: Subscriber Idbuildsthe primarykey, CDR dataisstoredastime series The Time Series Loaderprovidesan optimized waytoloadhugeamountsoftime seriesdata Data canberetrievedbasedon virtualtables (lookslike classic SQL table) High scaleabiliy, runson Linux 17

Informix Time Series forcdrs 2/2 CDRs access requirements are typically based on the primary key(s) Provide me all communication activities for a given subscriber id or a set of subscriber ids with an optional date range The Informix Time Series rolling windows container concept is the perfect functionality to fulfill legal data retention period requirements JSON based Time Series can be utilized for flexible CDR data structures 18

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 19

Implementation Specifics The current cedros c.-drs server is based on Informix 12.10.FC5 We did run into some TSL resource issues in FC4W1 which have been addressed in FC5 cedros wanted to utilize some of the new FC5 TSL features CDR records can be provided in different formats (e.g. XML) and are converted into a suitable TSL format The original records are archived to due legal requirements 20

Structured TimeSeries vs. JSON TS Each class of CDRs might has its own elements Telephony CDRs Internet Access CDRs Email CDRs One TimeSeries for each CDR class or one TimeSeries for all kind of CDRs classes? A JSON TimeSeries provides complete data structure flexibility Slight storage and processing (parsing) overhead Maximum TS JSON element size (12.10.xC5): 4 kb / JSON document 21

Example for a CDR JSON TimeSeries create row type cdr_data_t ( cdr_event datetime year to fraction(5), cdr_element bson); create table cdr_email_ts ( subscriber_adr varchar(254), msgstore_id varchar(32), subscriber_id varchar(64), cdr_recs timeseries(cdr_data_t), primary key (subscriber_adr, msgstore_id, subscriber_id) ); execute procedure TSCreateVirtualTab ( cdr_email_v', cdr_email_ts', 'origin(2015-04-24 00:00:00.00000), calendar(cdr_1sec), container(tscontainerpoolroundrobin(cdr_cont_pool)), threshold(0),irregular', 'putelem,scan_discreet', cdr_recs'); insert into cdr_email_v values ( akoerner@de.ibm.com, MSGSTORE001, Alexander Koerner, 2015-04-24 09:30:00 ::DATETIME YEAR TO SECOND, { "seq_number":1, "ts_import":"2014-09-24 11:55:00", "client_id":"test client_id", "other_email_adr":"andreasw@de.ibm.com", "message_id":"message_id_01_02_03", "op_status":"s", "server_id": "server_id 001", "protocol":"i", "srcsrv_name":"sender_server_001", "dstsrv_name":"receiver_server_001", "operation":"t" } ::JSON 22

Rolling Window Containers CDR data retention periods are defined by legal regulations European Union guidance: between 6 and 24 months Italy: ISP data needs to be stored for 12 months and telephony data for 24 months Switzerland: ISP data needs to be retained for 6 months (180 days) Germany (draft): 4 weeks (cell data) 10 weeks (phone numbers, call details, SMS/MMS time stamps for mobile phone telephony/messaging), 10 weeks for Internet ISP data Deleting single time series elements after the retention period is not very efficient (e.g. via the DelElem() or the DelRange() TS functions) The TSContainerPurge() TS function might be better for bulk deletes The Rolling Window Container feature (introduced with Informix 12.10.xC1)provides a fast and elegant way of purging old ( aged ) time series data automatically 23

Rolling Window Container Example execute procedure TSContainerCreate ( 'cdr_cont1', 'cdr_cont1_dbs1, cdr_cont1_dbs2, cdr_cont1_dbs3', 'cdr_data_t', 2000000, 500000, '2015-04-24 00:00:00.00000'::datetime year to fraction(5), 'day', 180, 1, NULL, 1, 2048, 512 ); execute procedure TSContainerCreate ( 'cdr_cont2', 'cdr_cont2_dbs1, cdr_cont2_dbs2, cdr_cont2_dbs3', 'cdr_data_t', 2000000, 500000, '2015-04-24 00:00:00.00000'::datetime year to fraction(5), 'day', 180, 1, NULL, 1, 2048, 512 ); execute procedure TSContainerSetPool ('cdr_cont1', 'cdr_cont_pool'); execute procedure TSContainerSetPool ('cdr_cont2', 'cdr_cont_pool'); 24

Resource Efficiency Carefully choose appropriate page sizes: CDRs come in large amounts and might contain many elements! One can store up to 254 TS elements/logical page Each TS has its own sets of pages Example: One CDR is on average 200 bytes. There are about 10 CDRs per TS per day 10 x 200 bytes = 2000 bytes 2 kb page size For JSON CDR Time Series consider using short JSON element identifiers Max JSON document size in a time series 4 kb Less JSON identifier length less disk storage If you choose a too large page size......you will have wasted space on disk...you might run into some buffer pool memory size issues (due to unused memory page space) 25

CDR Load Performance The VTI interface is suitable for small batches of CDRs and/or continous loading of single records Use the TS Loader API for large amounts of CDR data Obey the obvious recommendations for disk I/O optimizations (e.g. I/O seperation of logical logs chunks from the dbspaces chunks) Fragment the TS base table and distribute the TS across multiple containers Consider using parallel TSL sessions Utilize the new reduced TSL message logging capabilties in 12.10.xC5 To simplify the loading process, you might want to use the new.xc5 TSL_SetNewTS() function to automaticall create a new TS in the base table if that TS doesn t exist 26

CDR TSL Example execute function TSL_Init('cdr_email_ts','cdr_recs',3,4,NULL,'%Y-%m-%d %H:%M:%S','/tmp/cdr_rejected.log',NULL); execute function TSL_SetLogMode ('cdr_email_ts cdr_recs',1,2,'/tmp/cdr_error.log'); execute function TSL_SetNewTS('cdr_email_ts cdr_recs', 'origin(2015-04- 24 00:00:00.00000), calendar(cdr_1sec), TSContainerPoolRoundRobin(cdr_cont_pool), threshold(0), irregular', 1); execute function TSL_Put ('cdr_email_ts cdr_recs',filetoclob('/tmp/test.mail','server')); begin work; execute function TSL_FlushAll('cdr_email_ts cdr_recs'); commit work; execute function TSL_SessionClose('cdr_email_ts cdr_recs'); execute procedure TSL_Shutdown('cdr_email_ts cdr_recs'); 27

Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 28

Lessionslearned 1/2 High Data volume and complexity of data requires a careful design ofthedatamodel Choose the optimal primary key(s) to avoid too small time series Carefully choose the optimal page size and disk location for time series containers Consider the usage of rolling window containers for legal requirements and simplified data housekeeping (time series data deletion) Use latest Informix TS loader features Informix 12.10.xC5 provides some very helpful new TSL capabilities and better resource usage Automatic creation of missing time series Support for JSON time series Load time series data from an external file 29

Lessionslearned 2/2 Data volumecanvaryextremly, esp. Events createdbye-mail Service (mobile IMAP!) Telcosdon tknowtheirdatastructures (source, volume, quality) in detail Beawareofdifferent numberingplans(0, 0049, 49 ) anddataqualityproblems 30

Questions? 31

Alexander Koerner vcard Twitter: @AlexKoeMUC 32