Security Level: Challenges of Big Platform www.huawei.com HUAWEI TECHNOLOGIES CO., LTD.
Contents Categories of data in carrier network Network insight Customer behavior insight Society activity insight Challenges
Business Domain Volume Five Categories of Enterprise Management Generated Manually 100TB~10TB OSS Generated by machine 1TB~100TB xxgb / Year Network Element Generated by Machine 10PB / Year,1~3 years accumulation BSS Generated Manually 100GB~10TB xxtb / Year VAS Generated Manually 100TB~10PB 100TB / Year Source E-Learning ERP Account HR CMS NE Parameter NE Config NE Log Alert Perf CHR CDR SDR MR Counter NE Log Billing MKT report Order User Profile CRM Order Usage Service Content GIS ISP Characteri stics Structured(Table) Unstructured (graphics text video) Structured(table) Unstructured(Time series data) Semi-Structured (signaling call records) Unstructured(Time series data) OSS Structured(table) Structured(table point sets) Semi-Structured(column cluster) Unstructured(graphics text video time-series data) BSS VAS MRP SCM FRM ERP HRM Probe or NE Integration 企 业 管 理 域 NodeB RNC SGSN GGSN/DPI
Evolutions of data analytic business in big data era Past: Typical analytic business is operation analysis, based on statistics, off line, isolated data; Nowadays: New business,such as network optimization, customer experience, etc. Large volume, real-time, various kinds of data type; order VAS Offer design Operational analytic system:operation reports/kpi reports (statistics) Stats of network management performance Network schedule (statistics) CRM/Billing Performance Alerts BSS OSS NE data CEM NPM/SQM AD promotion HR, Financial reports HR/FRM/SRM Enterprise management E v o l u t i o n Statistics offline isolated Large volume real-time, convergent of various data types Business Set>100TB Volume/Flow Velocity Variety flow rate Accumulation rate ( >60% scenarios) Operation Report Statistics data Offline Statistic scenario,low accumulation rate No Requirements on scale-out format and sources CRM Billing,structured Billing Verification <100T Offline Fixed No Billing structured Network optimization Network equipment data,10pb Customer experience Indicator Precise marketing Elastic data processing cluster of over 100 servers, Handle 1PB data Network data, 10PB ~200Gbps Archive 1 year s data Elastic data processing cluster of over 100 servers, Handle 1PB data Customer profile 100GB~300GB ~100,000 packages/s from NEs, such as RAN, PS, etc Network signaling, xdr, traffic stastics, NE configuration data, semi-structured data takes the majority Fixed volume In-memory computing CRM billing xdr, structured data, semi-structured data
evolutions driven by carrier business Business Evolution Three categori es of Big business Network Insight Analytics based on network data, combined with user data, to adjust network layout; Focus on network status: location, equipment workload, adjust network dynamically Customer Insight Analytics based on user data, combined with network equipment data, to recognize characters of customer behavior To understand who is using network, consume which service, and to optimize business Society Insight Analytics based on laws behind data,,to dig out data values Based on laws, guide carrier develop new valuable business
Categories and characteristics of carrier big data business Business Network Insight Customer Insight Society Insight NE data Summary Operational data VAS and External data Achieved data Capability TS DPI MR Log xdr Dial test Traffi c test order Ac co unt UP Complaints User account User consuming CRM CBS IPCC VAS Netw ork Mark eting LBS VAS Internet usage User profil e xdr Log Traffic statistics Ad-Hoc Query Real-time response Multi-dimension visualization, rich and complex models representation and query Query is not complex High concurrency Complex Query Complex data mining algorithms, need the guides from data scientist and industry experts storage and integration Raw data Large volume,10pb level, Low cost Low data volume Summarized data Moderate Volume Mixed with raw data and summarized data volume varies in different domain, averagely 10PB level, requires low cost ETL High performance loading Real time update model complex Cross domain data integration High performance Low cost Real time High concurrency Complex Query Complex models and algorithms
Business requirements onnetwork Insight processing procedure Requirements 3 representation 4 analytics and processing Multi-dimension analytic For a carrier network to provide service for 40M users, there are several challenges: Volume: 120T -> 5.6P; Integration: 33 nodes -> 6 nodes; query response time: 100s -> 15s; Multi-dimension analytics Target(40M users) Management 3 DW preprocessing 2 1 Summari ze Archieve 140k Records/s 354kRecords/s 60 days,120t 1 Year,5.6P summarizatio n and storage 2: raw data summarization 1:Archive and query raw data 3:statistics /analysis libs Feeding rate 90,000rows/s Ensure stable query performance 1 year s data,5.6p Compression rate: 10:1 Support a few AD-hoc queries Support complex queries invoving10 tables 20 concurrent reporting queries, respond in 15 seconds ingress PS CS NMS EMS 20M users,25gbps, 60 days raw data, 120TB 40M users,200gbps, 1 year s raw data, 5.6PB analytics and processing 4: Multidimension analytics Multi- Dimension:14 dimensions; General analytics:combination of 5 to 9 dimensions of SDR BKPI combination of 10 to14 dimensions in BKPI Second level response time, on 1.4 billion rows
Business requirements on Customer Insight Precise AD promotion based on user behavior information, refined event content requirements from suppliers Promote electronic magazine for people taking public traffic Promote Wifi offers to people in coffee shops without wifi services Promote cosmetics vouchers to females in shopping market 8 AM Go to office Working days weekends holidays vocations Big Platform Get subscriber s location Based on behaviors,analysis users consuming characteristic, favorite content ant offers;
Business requirements on Customer Insight Two general requirements on BI technologies:high performance DW with low cost, analysis & mining algorithms based on user behaviors and values processing procedure Requirements Application Service capabilities (information archive, process) Item inquiry Dynamic policy ingress Characteristic profile Traffic analysis Performance assess retrieve Network analysis Finance analysis Text processing Content visualization classification Location service Graphic service Customer insight Marketing management Pain point 1:Poor OLAP performance, minute level response time with server hundreds GB data. OLAP system is built by ROLAP solution, such as Cognos, DB2 etc; Pain point 2:Poor DW performance, high cost(raw data storage and computation costs above 70% capability of a DW,reach the maximum volume and capability of traditional database) Pain point 3:high software / hardware cost:solution is composed with high end servers, disk array and commercial dbms, expensive license and hardware aggregation classification Infrastructure Distributed/Distributed Statistics analysis ( mining, analysis) DBMS query engine Distributed platform Hardware Distributed file system Distributed database association predicates Distributed computation Query: Point query and analytic query from RTD Exploring query such as customer segmentation requires full table scan and muti-table join Query on predefined 1024 KPIs Tag,labeling, 500+ indicators, 50+ graphic computation mining: Customized model(user Modeling) User/Item/content/properties/similarity,Min Hash(CF) Behavior Targeting,customer profiling based on behavior and values
Business requirements on society Insight Focus on anonymous wireless users and location based application, focus on government, industry and enterprise application Traffic Application:Congestion information possible through Telco signaling data Population Analytics:traffic planning, city resources distribution, abnormal events
Business requirements on society Insight To dig out laws of group activity through data mining algorithms applied on maps and dimensional data. Core part is the data analysis layer. Visualization OD Graph&Matrix Population Density OD transport classification Traffic congestion detection Analysis UniBI Reporting Tools Population Density OD Table OD transportation Mode Classification Traffic Congestion Detection HDFS + Map/Reduce Preprocessing Map preprocessing District segmentation Extract district coordinates Cleaning Integration Exploration Selection HDFS + HQL Road segmentation Extract road coordinates Sources MR (Time, IMSI, Longitude, Latitude, RNCID, CellID)
Summary of big data business requirements Huawei product lines is attempting to build new big data business. Huawei product lines have various requirements on big data components: mainly on MPP DB in-memory analytics DB streaming computation MOLAP parallel computation, analytics & mining algorithms; Requirements storage and computation MPP DB:Support 10PB level volume; 100+ node linear scalability; respond queries on 0.1 billion rows in 1 minute;10:1 compression ratio; Real-time analytics in-memory DB:100TB, columnar, wide table with 2000-5000 columns, 30,000 updates/s, ad-hoc query respond in 3 seconds, to support real time business policy adjustment, real-time KPI calculation Streaming processing : 1 million events per second; 1 micro second latency for each event analytics MOLAP:support SQL and MDX, <5s response time in 80~90% scenarios; 1s response latency on TB data with hundred dimensions Real-time dashboard; mining : High accuracy, various algorithms, online data mining, quick response.
Thank you www.huawei.com Copyright 2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.