Research of Postal Data mining system based on big data



Similar documents
Database Marketing, Business Intelligence and Knowledge Discovery

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

Data Mining Solutions for the Business Environment

The WAMS Power Data Processing based on Hadoop

Blog Post Extraction Using Title Finding

Database Construction of Real Estate Assessment Based on Big Data Liang Zhou 1,2,a, Liang Shi 3, Sijia Zhang

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

How To Use Neural Networks In Data Mining

Introduction to Data Mining

Cloud computing based big data ecosystem and requirements

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

SPATIAL DATA CLASSIFICATION AND DATA MINING

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Application of Data Mining Techniques in Intrusion Detection

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

How To Make Sense Of Data With Altilia

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Role of Social Networking in Marketing using Data Mining

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

locuz.com Big Data Services

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Transforming the Telecoms Business using Big Data and Analytics

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Random forest algorithm in big data environment

Research of Smart Space based on Business Intelligence

Questions to Ask When Selecting Your Customer Data Platform

The Application Research of Ant Colony Algorithm in Search Engine Jian Lan Liu1, a, Li Zhu2,b

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Data Mining in Telecommunication

DATA MINING TECHNIQUES AND APPLICATIONS

Advanced In-Database Analytics

Sanjeev Kumar. contribute

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Course Syllabus For Operations Management. Management Information Systems

BIG DATA What it is and how to use?

Introduction. A. Bellaachia Page: 1

Medical Big Data Interpretation

Value of. Clinical and Business Data Analytics for. Healthcare Payers NOUS INFOSYSTEMS LEVERAGING INTELLECT

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Term extraction for user profiling: evaluation by the user

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

BIG DATA EXPERT KNOWLEDGE

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

The future of Big Data A United Hitachi View

How To Get A Computer Engineering Degree

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

A SURVEY ON WEB MINING TOOLS

Web analytics: Data Collected via the Internet

S.Thiripura Sundari*, Dr.A.Padmapriya**

Client Overview. Engagement Situation. Key Requirements

A Logistic Regression Approach to Ad Click Prediction

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems

The Scientific Data Mining Process

Spatial Data Mining Methods and Problems

Data Mining System, Functionalities and Applications: A Radical Review

DATA MINING AND WAREHOUSING CONCEPTS

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Massive Cloud Auditing using Data Mining on Hadoop

Big Data: Study in Structured and Unstructured Data

Data Warehousing and Data Mining in Business Applications

Open source Google-style large scale data analysis with Hadoop

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Search Result Optimization using Annotators

A Knowledge Management Framework Using Business Intelligence Solutions

Neural Networks in Data Mining

Vinay Parisa 1, Biswajit Mohapatra 2 ;

An Effective Analysis of Weblog Files to improve Website Performance

Exploration on Security System Structure of Smart Campus Based on Cloud Computing. Wei Zhou

Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Big Data Storage Architecture Design in Cloud Computing

Distributed Framework for Data Mining As a Service on Private Cloud

APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Transcription:

3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication Technical College, 050021, Shijiazhuang, China 1 huxia-068@163.com, 1 yanfengjin@163.com KEYWORDS: big data, postal data, data mining, precision marketing ABSTRACT: With the rapid developing of cloud computing and big data technology, enterprises have accumulated vast amounts of data and all of them have got more effective treatment, so companies get greater value of market. This issue analyzed the data China Post has and proposed an architecture of China Post s data-digging system in large data environments and carried out a detailed design of precise process for China Post based on big data. Finally, we used China Post s data-digging system on the precise database marketing and personalized product recommendations and other aspects, and achieved good results. Introduction With the rising of mobile Internet, network, social network technologies and applications, the amount of data is growing rapidly in global scope. The time of big data has coming. With the rapid development of Direct Mail business, the demand of name and address data is increasing. China Post has experienced 10 years of information technology, a large number of postal service production data has accumulated, and is in urgent need of development. At the end of 2010, China Post established a data analysis team, which is responsible for the postal internal business data analysis and integration, to support the headquarters of marketing projects. Since its inception, the extraction of data samples for a variety of business is researched, the postal existing data resources was basically understanding, and the team also summarized some data analysis methods, constantly trying to build some models to produce some data products, for marketing application, and verification. But no integrated platform to store the full amount of the relevant business data, the data can not be processed, reconstructed, the early results can not be transformed into products. So a postal data mining platform based on big data structures need to be established, which establish business system data acquisition mechanism, and extract the full amount of data. And package part of common data preprocessing and data cleansing rules to the platform, and provides manufacturing capabilities and data platform for data analysis team. Postal enterprises are facing a critical period of transition, a large data processing system will be a key factor in the transformation of China Post. Postal data characteristics analysis of big data environments With Chinese postal deeply reform continued, postal services, courier logistics and finance and insurance business are booming. Integration within the plate, and the convergence between the plate become more and more prominent. The platform need a access to data sources, including postal class, express logistics, finance and so on. as follows: Feature analysis of postal data.postal service class data includes: Newspaper subscription data, savings SMS and delivering SMS data, customer data from the Post family, air ticket membership 2015. The authors - Published by Atlantis Press 643

data, Ule membership data, care package data, philately membership data, mail acceptance data, international packet customer data, packet acceptance data, account management system data, name and address date from system, organization data system. Express Data Characteristics. Express data contains courier SMS information and express acceptance data. Courier send about 4 million SMS data per month, just include two fields: e-mail and telephone numbers. Express acceptance produces 1.8 million data a day. The quality is not high before the full name and address data entry work carried out. Feature analysis of financial data. Financial data includes the Postal Savings Bank customer data, transaction data and electronic exchange savings SMS data. Electronic exchange transactions daily increasing about one million, mainly used for supporting Direct Mail. Postal Savings SMS data has 80 million customers a year, the size of transaction data about 3TB. Data mining process and methods under big data environment Postal big data mining system framework. Due to the characteristics of postal big data post, postal corporations can not apply the traditional data analysis techniques Both traditional data analysis and big data mining extract great useful information and show leanings from the data And they are the process of in-depth analysis and value-added exploitation of the data. However, there are essentially differences between them and mainly reflected in: 1The size of the data is different. The traditional data analysis and process the data which is usually stored in a database or file. The size of data is GB level or below, but the size of big data mining is generally PB level, even more massive stage. 2The types of data for these two analysis method are different. The traditional data analysis focused on static and structured data. Big data mining, on the other hand, is able to process either the structured data or semi-structured and unstructured data, often in real-time based. 3The analysis methods are also different. The main algorithm for traditional data analysis is based on the statistics. As shown in figure1, classification and Prediction are two common methods of data analysis. Big data mining need not only statistical method, but also machine learning and artificial intelligence algorithm for lots of situations. Fig.1 Big data mining framework of the postal corporations Postal big data mining process design. For the process of analysis, the traditional data analysis is relative simple. The data is usually is organized in a file or metadata in database, and then is carried on the sample selection. Meanwhile, we use the classification algorithm and prediction algorithm to predict the type of discrete and continuous values of the data object. 644

Different from traditional data analysis, big data mining is a process of knowledge discovery automatically. In the absence of defined goals, we get data from different data sources, preprocess data, and greatly use machine learning and artificial intelligence algorithm to analyze the observational data. The postal data mining focus on solving such a problem: in big data from different the postal services, valuable knowledge is from analysis of the characteristics of the user groups and users' personal characteristics further, to obtain commercial value. As shown in figure 2, the data mining process includes: data collection, data preparation, data conversion, data extraction, data mining and mining applications. Fig.2 Postal big data mining process Data collection. The postal data is collected from the postal service data, financial business data and express delivery and logistics business data, stored in the postal system platform. For the press data as an example, there is interaction( a overlap) between press data and customized address list data, and also in the data content. Therefore data should be classified on the basis of data categories and properties. Data preprocessing. Data preprocessing includes data preparation, data transformation and data extraction. Data preprocessing determines the quality of the mining results. To some extent, data preprocessing tend to affect the validity of data mining. Due to the noise data, redundant data and the missing value and so on, during the process of data preparation, data analysis, cleaning, refactoring and filling the missing value can improve the quality of the data mining. Then the data which is unstructured, semi-structured is processed into machine language or index, for example, converting natural language user reviews, log data into weighted term logic or fuzzy logic, and different words mapped to a standard of value. Structured data is filtered to extract meaningful data and eliminate the invalid data in order to improve the efficiency of analysis. The last step is data extraction, which detects the correlation and relevance of data. The relevant data showed more specific user activity characteristic. And those data itself can also be used for personalized services. Data mining and application. In the process of data mining, we select the mining model according to different application requirements for the depth of mining data. The main models include: Association Analysis, classification analysis, clustering analysis and so on. There are also some user models for data mining. These user models will group people with gender, race, age and 645

interest to get data mining results to interpreted and applied. The general mining applications, include the ranking and personalized recommendation, anomaly detection, Web data mining and search, and visual calculation and analysis of big data. The Postal data mining application under big data environment The postal data mining application is with in-depth analysis of data to dig out the user's behavioral characteristics, consumption habits and interest focus, so that postal clients can target the audiences more accurately to obtain greater business value. Postal Data mining can help postal clients to develop more accurate and effective marketing strategy. As shown in figure 3: Fig. 3 Personalized Recommendations based on Postal Data Mining Precise Marketing Based on Postal Data Mining. Postal Data Mining represents a more refined market, a more accurate prediction of user behaviors and more precise needs. By collecting, processing and handling large amounts of information related to the user's consumption behaviors, it determines interests, consumption habits, consumption trends and consumption demand for a particular user groups or individuals, and then to infer consumption behaviors for next step. And based on this, specific marketing content will target to the identified user group. Compared with the traditional mass marketing tool without distinguishing characteristics of the user targets, it saves marketing costs and improves marketing effectiveness. The Data Mining Application for Postal Clients. Understanding users key nodes in different consumer behaviors by postal data mining can provide a reference for online advertising strategies and targeted advertising to achieve personalized marketing for clients. Based on uses data, data mining can help on establishing the probabilistic knowledge base and fuzzy knowledge base, to do probability analysis on real-time online information. Through analyzing and dividing Ads visitors potential information features, marketers could decide who are the real customer to the business; through analyzing customer response to certain Ads, marketers could decide next delivery channels and timing; through cluster analysis, marketers could send out targeted ads to the targeted 646

audiences. When the accumulated data to a certain size, marketers can accurately calculate ROI of each keyword of ads for businesses through data mining and optimize advertising content. Conclusion With the development of cloud computing and data mining techniques, the inherent value of the data will be increasingly likely to be explored. With the rapid development of China's postal database Direct Mail business, the demand for lists is increasing. China Post has experienced 10 years of information technology construction and accumulated a large number of postal service data, which is need to be analyzed and used urgently. First, we analyzed China Postal existing data. Based on that we proposed the framework of postal data mining system under big data environment. Besides, the postal data mining process is designed and implemented in detail. Finally, postal big data mining can be applied to precise marketing and personalized product recommendations(referrals), etc. Applications of big data technologies will be very important for the development and transformation of postal services, And it will also provide a tremendous commercial value for China post. References [1] Qian Y, Liang J, Pedrycz W, et al. Positive approximation: An accelerator for attribute reduction in roughest theory[j].artificial Intelligence,2010,174(9-10):597-618. [2] Niu S, Guo J. Top-k learning to rank: Labeling, ranking and evaluation[c]//proceeding of the 35th international ACM SIGIR conference on Research and development in information retrieval, New York: ACM, 2012:751-760. [3] Mahony M W. Randomized algorithms for,matrices and data[j]. Foundations and Trends in machine Learning,2010,3(2):123-224. [4] Gasso G, Pappaioannou A, Spivak M, el al. Batch and online learning algorithms for non convex Ney man-pearson classification[j].acm Transaction on Intelligent System and Technologies, 2011,2(3):Article No 28. [5] Cheng X Q, The application and Scientific Problems of Big Data(Scientific Forum on math and Big Data)[R].Beijing: Chinese Academy of Sciences,2013(Ch). [6] Wang Y Z, Jin X L, Cheng X Q. Network big data: Present and future[j].chinese Journal of Computers,2013,36(6):1125-1138(Ch). [7] Gasso G, Pappaioannou A, Spivak M, et al. Batch and online learning algorithms for non convex Ney man-pearson classification[j].acm Transaction on Intelligent System and Technologies, 2011,2(3):Article No 28. 647