An Informix TimeSeries based Telco Data Retention Solution: Lessons Learned Alexander Koerner IBM Germany (On behalf of Cedros Gesellschaft für Datenverarbeitung mbh, Germany) 1
Alexander Celebrating 25 Years of Informix InfoWorld, Nov 13th, 1989 (https://books.google.de/books?id=staeaaaambaj&lpg=pt79&dq=informix%201989&pg=pt78#v=onepage&q&f=false) 2
My Informix@Home Project An Informix TimeSeries, REST API, JSON & Raspberry Pi 2 based Weather Station... 3
Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They areprovided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warrantyof any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBMor its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results theymay have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Some material in this presentation is Copyright IBM Corporation 2015. All rights reserved. U.S. Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.comare trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others. 4
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 5
Telco Data Retention: Legal Background Europe On 15 March 2006, the European Union adopted the Data Retention Directive. It requires Member States to ensure that communications providers retain the necessary data as specified in the Directive for a period of between 6 months and 2 years in order to... Trace and identify the source of a communication Trace and identify the destination of a communication Identify the date, time, and duration of a communication Identify the type of communication Identify the communication device Identify the location of mobile communication equipment The Directive covers fixed telephony, mobile telephony, Internet access, email, and VoIP On 8 April 2014, the Court of Justice of the European Union declared the Directive 2006/24/EC invalid for violating fundamental rights The Council's Legal Services have been reported to have stated in closed session that paragraph 59 of the European Court of Justice's ruling "suggests that general and blanket data retention is no longer possible" Source: Wikipedia 6
Telco Data Retention: Legal Background United States The National Security Agency (NSA) commonly records Internet metadata for the whole planet for up to a year in its MARINA database, where it is used for pattern-of-life analysis. U.S. persons are not exempt because metadata are not considered data under US law. Its equivalent for phone records is MAINWAY. The NSA records SMS and similar text messages worldwide through DISHFIRE. Various United States agencies leverage the (voluntary) data retention practised by many U.S. commercial organizations through programs such as PRISM and MUSCULAR The United States does not have any Internet Service Provider (ISP) mandatory data retention laws similar to the European Data Retention Directive. All attempts to create mandatory retention legislation have failed. Source: Wikipedia 7
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 8
cedros An Overview cedros isoneoftheoldest and biggest IBM Premier Business Partners in Germany and authorized for resale in all software families cedros develops information and communication technology that is characterized by high creation of value for customers. Efficiency and investment security are of crucial importance in this context Their primary competencies are in four areas: Enterprise Software Solutions Telco Solutions & Services Software Infrastructure & Services IBM Software sales 9
cedros Telco Solutions Legal Compliance Telecommunications operators or service providers are obligated by law to setup technical systems in order to monitor their infrastructures and, when asked to do so, to provide information on the stored data or communication content Cedros develops solutions that realize both the technical systems and the information process Efficiency, convenience and data protection are of central importance along the way 10
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 11
Data Retention Requirements According ETSI TS 102 657 V1.14.1 (2014-03) Telephony / Mobile / VoIP Originating(A) and destination (B) number/id, forwarding (C) number Start/End time, time zone for the call; for VoIP: IP address for A/B/C For mobile: IMEI, IMSI, Cell-ID (for all involved subscribers) Call type/service (Voice, Fax, Data, SMS, MMS, GPRS, LTE etc.) Prepaid: First activation, cell id; Internet Access Login-Name, assigned IP-Address (Radius/DHCP) Line/Port identifier, Connection/Service Type Start-/End of the session E-Mail Service E-Mail in/out: Subscriber Id, From/To/CC/BCC, Message-Id, Date/Time, Servers, IP-Addresses Mailbox Access: Subscriber Id, server, IP addresse of the client 12
cedros Data Retention Solution Data Storage Server: c.-drs implements data retention according to legal requirements uses IBM Informix Version 12 for the data store utilizes the Informix Time Series feature provides an Web UI for simple retrieval process Information Process/Workflow: c.-ais connects to all relevant backend systems (DR store, invetory data, ERP, LI system etc.) implements the complete workflow including data processing/formatting, invoicing etc. runs or IBM Lotus Notes or as Web UI 13
Sample Infrastructure Network element Logfiles Invetory Data Approval by clerk Radius Mediation (CDR) c.-drs Informix with Time Series: Retention Data Case Authorities Case History ERP- System Invoice 14
A Customer Example A Swiss telecommunications provider Requirements based on Swiss data retention laws Implemention close to the European Telecommunications Standards Institute s (ETSI) standards CDR processing Input format is CSV via SFTP TSL-loaded into Informix Query definition via Web Interface Results sets are generated in XML About 60 120 million CDRs/day Data retention period: 180 days Database size: 6 TB 15
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 16
Informix Time Series forcdrs 1/2 Informix providesan optimizedstorageforthe Retention Data: Subscriber Idbuildsthe primarykey, CDR dataisstoredastime series The Time Series Loaderprovidesan optimized waytoloadhugeamountsoftime seriesdata Data canberetrievedbasedon virtualtables (lookslike classic SQL table) High scaleabiliy, runson Linux 17
Informix Time Series forcdrs 2/2 CDRs access requirements are typically based on the primary key(s) Provide me all communication activities for a given subscriber id or a set of subscriber ids with an optional date range The Informix Time Series rolling windows container concept is the perfect functionality to fulfill legal data retention period requirements JSON based Time Series can be utilized for flexible CDR data structures 18
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 19
Implementation Specifics The current cedros c.-drs server is based on Informix 12.10.FC5 We did run into some TSL resource issues in FC4W1 which have been addressed in FC5 cedros wanted to utilize some of the new FC5 TSL features CDR records can be provided in different formats (e.g. XML) and are converted into a suitable TSL format The original records are archived to due legal requirements 20
Structured TimeSeries vs. JSON TS Each class of CDRs might has its own elements Telephony CDRs Internet Access CDRs Email CDRs One TimeSeries for each CDR class or one TimeSeries for all kind of CDRs classes? A JSON TimeSeries provides complete data structure flexibility Slight storage and processing (parsing) overhead Maximum TS JSON element size (12.10.xC5): 4 kb / JSON document 21
Example for a CDR JSON TimeSeries create row type cdr_data_t ( cdr_event datetime year to fraction(5), cdr_element bson); create table cdr_email_ts ( subscriber_adr varchar(254), msgstore_id varchar(32), subscriber_id varchar(64), cdr_recs timeseries(cdr_data_t), primary key (subscriber_adr, msgstore_id, subscriber_id) ); execute procedure TSCreateVirtualTab ( cdr_email_v', cdr_email_ts', 'origin(2015-04-24 00:00:00.00000), calendar(cdr_1sec), container(tscontainerpoolroundrobin(cdr_cont_pool)), threshold(0),irregular', 'putelem,scan_discreet', cdr_recs'); insert into cdr_email_v values ( akoerner@de.ibm.com, MSGSTORE001, Alexander Koerner, 2015-04-24 09:30:00 ::DATETIME YEAR TO SECOND, { "seq_number":1, "ts_import":"2014-09-24 11:55:00", "client_id":"test client_id", "other_email_adr":"andreasw@de.ibm.com", "message_id":"message_id_01_02_03", "op_status":"s", "server_id": "server_id 001", "protocol":"i", "srcsrv_name":"sender_server_001", "dstsrv_name":"receiver_server_001", "operation":"t" } ::JSON 22
Rolling Window Containers CDR data retention periods are defined by legal regulations European Union guidance: between 6 and 24 months Italy: ISP data needs to be stored for 12 months and telephony data for 24 months Switzerland: ISP data needs to be retained for 6 months (180 days) Germany (draft): 4 weeks (cell data) 10 weeks (phone numbers, call details, SMS/MMS time stamps for mobile phone telephony/messaging), 10 weeks for Internet ISP data Deleting single time series elements after the retention period is not very efficient (e.g. via the DelElem() or the DelRange() TS functions) The TSContainerPurge() TS function might be better for bulk deletes The Rolling Window Container feature (introduced with Informix 12.10.xC1)provides a fast and elegant way of purging old ( aged ) time series data automatically 23
Rolling Window Container Example execute procedure TSContainerCreate ( 'cdr_cont1', 'cdr_cont1_dbs1, cdr_cont1_dbs2, cdr_cont1_dbs3', 'cdr_data_t', 2000000, 500000, '2015-04-24 00:00:00.00000'::datetime year to fraction(5), 'day', 180, 1, NULL, 1, 2048, 512 ); execute procedure TSContainerCreate ( 'cdr_cont2', 'cdr_cont2_dbs1, cdr_cont2_dbs2, cdr_cont2_dbs3', 'cdr_data_t', 2000000, 500000, '2015-04-24 00:00:00.00000'::datetime year to fraction(5), 'day', 180, 1, NULL, 1, 2048, 512 ); execute procedure TSContainerSetPool ('cdr_cont1', 'cdr_cont_pool'); execute procedure TSContainerSetPool ('cdr_cont2', 'cdr_cont_pool'); 24
Resource Efficiency Carefully choose appropriate page sizes: CDRs come in large amounts and might contain many elements! One can store up to 254 TS elements/logical page Each TS has its own sets of pages Example: One CDR is on average 200 bytes. There are about 10 CDRs per TS per day 10 x 200 bytes = 2000 bytes 2 kb page size For JSON CDR Time Series consider using short JSON element identifiers Max JSON document size in a time series 4 kb Less JSON identifier length less disk storage If you choose a too large page size......you will have wasted space on disk...you might run into some buffer pool memory size issues (due to unused memory page space) 25
CDR Load Performance The VTI interface is suitable for small batches of CDRs and/or continous loading of single records Use the TS Loader API for large amounts of CDR data Obey the obvious recommendations for disk I/O optimizations (e.g. I/O seperation of logical logs chunks from the dbspaces chunks) Fragment the TS base table and distribute the TS across multiple containers Consider using parallel TSL sessions Utilize the new reduced TSL message logging capabilties in 12.10.xC5 To simplify the loading process, you might want to use the new.xc5 TSL_SetNewTS() function to automaticall create a new TS in the base table if that TS doesn t exist 26
CDR TSL Example execute function TSL_Init('cdr_email_ts','cdr_recs',3,4,NULL,'%Y-%m-%d %H:%M:%S','/tmp/cdr_rejected.log',NULL); execute function TSL_SetLogMode ('cdr_email_ts cdr_recs',1,2,'/tmp/cdr_error.log'); execute function TSL_SetNewTS('cdr_email_ts cdr_recs', 'origin(2015-04- 24 00:00:00.00000), calendar(cdr_1sec), TSContainerPoolRoundRobin(cdr_cont_pool), threshold(0), irregular', 1); execute function TSL_Put ('cdr_email_ts cdr_recs',filetoclob('/tmp/test.mail','server')); begin work; execute function TSL_FlushAll('cdr_email_ts cdr_recs'); commit work; execute function TSL_SessionClose('cdr_email_ts cdr_recs'); execute procedure TSL_Shutdown('cdr_email_ts cdr_recs'); 27
Agenda Legal Background cedros An Overview The cedros Telco Data Retention Solution Informix Time Series for CDRs Implementation Specifics Lessons Learned Summary 28
Lessionslearned 1/2 High Data volume and complexity of data requires a careful design ofthedatamodel Choose the optimal primary key(s) to avoid too small time series Carefully choose the optimal page size and disk location for time series containers Consider the usage of rolling window containers for legal requirements and simplified data housekeeping (time series data deletion) Use latest Informix TS loader features Informix 12.10.xC5 provides some very helpful new TSL capabilities and better resource usage Automatic creation of missing time series Support for JSON time series Load time series data from an external file 29
Lessionslearned 2/2 Data volumecanvaryextremly, esp. Events createdbye-mail Service (mobile IMAP!) Telcosdon tknowtheirdatastructures (source, volume, quality) in detail Beawareofdifferent numberingplans(0, 0049, 49 ) anddataqualityproblems 30
Questions? 31
Alexander Koerner vcard Twitter: @AlexKoeMUC 32