Future Proof Analytics Techniques for Web 2.0 Applications Near Real Time Support Techniques Constantine J. Aivalis* - Technological Education Institute Crete, Anthony C. Boucouvalas ** - University of Peloponnese Tripoli e-mails: *costis@teicrete.gr, **acb@uop.gr
Keywords Analytics E-Commerce Web Analytics Real Time Log File Analysis Tagging Systems RIA 2
Contents of the Presentation Introduction Web Analytics Comparison of Methodologies The Problem The Solution Architecture Functionality Results Customer Behavioral Model Graph Measurements Current Work Applications Conclusion 3
Introduction WWW is today's common business platform. E-Commerce infrastructure must be reliable, robust and scalable. Web systems produce huge amounts of user activity data that often stay unused. User activity data must be converted to information. Intelligent Customer classification allows better customized services and increases sales. 4
Analytics As a Component 5
Basic Models of Analytics Technologies Log File Analysis Systems, Tagged Systems, Network Data Collection Devices and Hybrid Systems. 6
Web Analytics Analysis of log files Log files contain very detailed information about each request. The data have to be carefully selected. Page Tagging Page Tagging requires an extra web server, to whom the visitors browser is automatically sent. This server collects the log data generated by this visit and stores it to a specific data base for each site, based on an account number. Network Data Collection Devices Sniffers, Black boxes that capture IP packages. Hybrid Methods Combine Analysis of log files and Tagging, in order to reduce the disadvantages of each method. 7
Available Vendors Google Analytics (Urchin) Microsoft adcenter Analytics (DeepMetrix) Yahoo Web Analytics (indextools) Clickstream.com Adobe Web Analytics (Omniture) IBM Unica NetInsight ChartBeat Inc. Hitmatic 8
Comparison of Methodologies Source: Brian Clifton Web Traffic Data Sources & Vendor Comparison 9
Google Analytics JavaScript <meta name="google-site-verification" content="xxxxxxxxxxxx" /> <script type="text/javascript"> var _gaq = _gaq []; _gaq.push(['_setaccount', 'UA-XXXXXXXX-Y']); _gaq.push(['_trackpageview']); (function() { var ga = document.createelement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol? 'https://ssl' : 'http://www') + analytics.com/ga.js'; var s = document.getelementsbytagname('script )[0]; s.parentnode.insertbefore(ga, s); })(); </script> '.google- 10
The Problem E-shops often operate in blind folded fashion. Only successful sales transactions are visible to the administration and management. Most e-commerce systems have no built-in performance measuring mechanisms. Only registered-customer actions are taken into consideration. Visitor majority may not be customers yet. Their behavior has to be analyzed in order to win them. Access log files include all interaction data details. Manual access log file scrutinizing is too inconvenient to be performed on regular basis. 11
Rotating Access Log Files 12
Access Log file Sample 13
Initial Architecture of the System 14
Initial Functionality of the System 15
Extended System 16
GWT Applications Access Log "GET /LogDB/ HTTP/1.1" 304 /LogDB/ 117 "GET /LogDB/LogDB.css HTTP/1.1" 304 /LogDB/LogDB.css 19 "GET /LogDB/com.art.logdb.LogDB/com.art.logdb.LogDB.nocache.js HTTP/1.1" 304 44 "GET /LogDB/com.art.logdb.LogDB/1D630481E14BBD07DE7ED3D963A012CE.cache.html HTTP/1.1" 304 23 "POST /LogDB/com.art.logdb.LogDB/MySQLConnection HTTP/1.1" 500 450 "POST /LogDB/com.art.logdb.LogDB/MySQLConnection HTTP/1.1" 500 371 "POST /LogDB/com.art.logdb.LogDB/MySQLConnection HTTP/1.1" 500 449 "POST /LogDB/com.art.logdb.LogDB/MySQLConnection HTTP/1.1" 500 378 "POST /LogDB/com.art.logdb.LogDB/MySQLConnection HTTP/1.1" 500 885-17
MySQL General Log File 110602 12:55:20 3 Connect root@localhost on w2p 3 Query /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect' 110602 12:55:21 3 Query /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SELECT @@session.auto_increment_increment 3 Query SHOW COLLATION 3 Query SET NAMES utf8mb4 3 Query SET character_set_results = NULL 3 Query SET autocommit=1 3 Query select * from user where username = 'asdasd' and password = 'aa' 1 Query show global status 110602 12:55:25 1 Query show global status 110602 12:55:30 3 Query select * from user where username = 'costis' and password = 'foobar' Future Proof Analytics Techniques for Web 2.0 Applications 18
Near Real Time Analytics Tools 19
Components & Data Flow 20
Trigger for Order Producer CREATE TABLE `transmitted` ( `transmittedid` int(11) NOT NULL AUTO_INCREMENT, `ordernumber` int(11) DEFAULT NULL, `transmitted` bit(1) DEFAULT NULL, PRIMARY KEY (`transmittedid`) ) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8; USE `kona5502`; DELIMITER $$ CREATE TRIGGER `fixorders` AFTER INSERT ON `orders` FOR EACH ROW -- trigger body BEGIN insert into `kona5502`.`transmitted` values (null, NEW.`orders_id`, 0); END 21
Data Queuing Sequence Diagram 22
Customer Behavioral Model Graph 23
Measurements 24
Measurements Order Values/Numbers Visits Time spent per product or service Accesses per product or service Orders per Product or service Bots visited Visitors Uncompleted ordering sessions Profitable customer groups Profitable products or services Overall profits Promotion impact Speed Performance metrics 25
Current Research Bot Behavioral Analysis Bot Classification Use off ibeacons in order to allow Analytics Techniques to be applied to physical stores Predictive Algorithms 26
Thank you Constantine J. Aivalis costis@teicrete.gr Anthony Boucouvalas acb@uop.gr 27