Index Terms Big Data, Machine Learning, Supervised learning, unsupervised learning. Reinforcement learning.
|
|
- Jonas Page
- 8 years ago
- Views:
Transcription
1 Volume 5, Issue 4, April 2015 ISSN: X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Special Issue on Impact of Technology on Skill Development Conference Held at IETE Amravati Center, Maharashtra, India Machine Learning in Big Data Analytics: An Overview Prof. Kanchan M. Tarwani, Prof. Saleha S. Saudagar, Prof. Harshal D. Misalkar Abstract Big data is much more than storage of and access to data. Big data Analytics plays an important role in making sense of the data and exploiting its value. But it s a significant challenge to learn and develop new types of machine learning algorithms. Scaling up big data to proper dimensionality is an challenge that can encounter in machine learning algorithms plus there are challenges of dealing with velocity, volume and many more for all types of machine learning algorithms. Here, in this paper, we are first exploring big data concept, bringing with an urgent need for advanced data acquisition, management, and analysis mechanisms, we have presented the concept of big data and highlighted the four phases of big data that are generating data, acquisition of data, storing this large data, and then analyzing data. The next phase of this paper, focuses on dealing with big data using machine learning(ml), and highlighted the three ML methods: supervised learning, unsupervised learning and reinforcement learning and its impact on big data. Index Terms Big Data, Machine Learning, Supervised learning, unsupervised learning. Reinforcement learning. I. INTRODUCTION The emerging big-data paradigm, owing to its broader impact, has profoundly transformed our society and will continue to attract diverse attentions from both technological experts and the public in general. It is obvious that we are living a data deluge era, evidenced by the sheer volume of data from a variety of sources and its growing rate of generation. For instance, an IDC report [1] predicts that, from 2005 to 2020, the global data volume will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, representing a double growth every two years. The term of ``big-data'' was coined to capture the profound meaning of this data-explosion trend and indeed the data has been touted as the new oil, which is expected to transform our society. Machine learning can be defined as a Field of study that gives computers the ability to learn without being explicitly programmed. It s a science of algorithms says that, the algorithms learn from the dataset, identifying patterns or classifying trends for instance, and then automates output- whether that s sorting data into categories or making predictions on future outputs. Machine Learning is now quite a matured discipline and captures in general any activity that involves automated learning from data or experience. At the core of machine learning is the ability of a software or machine to improve the performance of certain tasks through being exposed to data and experiences. Machine learning model first learns the knowledge from the data it is exposed to and then applies this knowledge to deliver predictions about the new data which is previously unseen data [1] [7]. The quality of the predictions delivered by machine learning model depend on a number of factors: how well the relevant knowledge is represented by the module, how effectively complete and representative were the learning data, how easily we can forecast the problem in general. Most of the times, machine learning became a very good weapon at certain recognition, identification or categorization tasks like fingerprint, faces or voice recognition [1][4]. Likewise recent clustering algorithms became very good at automatically grouping people according to their profiles, identify market segments, forming communities, and even segment images or distinguish genes [1], [6]. The successful application of ML models to these problems was possible not only thanks to the learning model ingenuity but primarily thanks to the very careful data with the properties like transformative, identifiable, and generation of multidimensional that made the learning problem discriminative enough to make easy distinctions between the predicted targets based on the data [1], [2], [6]. Depending upon the depth of knowledge that is available for learning, machine learning models can be categorized into supervised, unsupervised and semisupervised learning algorithms [1], [3], [5], [7]. Big data is of two types: Structured data and Semi structured data. Structured data are numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smart phones, and global positioning system (GPS) devices. Structured data also include things like account balances, sales figures, and transaction data. Unstructured data include more complex information such as reviews of customers and forums from commercial websites, photos and other multimedia, and comments on social networking sites. These data cannot easily be divided into categories or analyzed numerically. 2015, IJARCSSE All Rights Reserved Page 270
2 II. BIG DATA AND ITS IMPACT B. Big Data Phases A big-data system is complex, providing functions to deal with different phases in the digital data life cycle, ranging from its birth to its destruction. At the same time, the system usually involves multiple distinct phases for different applications [16], [17]. In this section, we focus on the chain for big data analytics. Specifically, we describe a big data chain that consists of four stages (generation, acquisition, storage, and processing). A. Layered Architecture for Big Data The big data system can be decomposed into a layered structure, as illustrated in Fig. 1. The layered structure is divisible into three layers, i.e., the infrastructure layer, the computing layer, and the application layer, from bottom to top. This layered view only provides a conceptual hierarchy to underscore the complexity of a big data system. The function of each layer is as follows. The infrastructure layer consists of a pool of ICT resources, which can be organized by cloud computing infrastructure and enabled by virtualization technology. These resources will be exposed to upper-layer systems in a _ne-grained manner with a special service-level agreement (SLA). Within this model, resources must be allocated to meet the big data demand while achieving resource efficiency by maximizing system utilization, energy awareness, operational simplification, etc. The computing layer encapsulates various data tools into a middleware layer that runs over raw ICT resources. In the context of big data, typical tools include data integration, data management, and the programming model. Data integration means acquiring data from disparate sources and integrating the dataset into a unified form with the necessary data pre-processing operations. Data management refers to mechanisms and tools that provide persistent data storage and highly efficient management, such as distributed _le systems and SQL or NoSQL data stores. The programming model implements abstraction application logic and facilitates the data analysis applications. Map Reduce [12], Dryad [13], Pregel [14], and Dremel [15] exemplify programming models. _ The application layer exploits the interface provided by the programming models to implement various data analysis functions, including querying, statistical analyses, clustering, and classification; then, it combines basic analytical methods to develop various field related applications. Fig. 1. Architecture for Big Data Fig. 2. Big Data Value Chain a. Data Generation Data Generation the first and most important phase of Big data chain. This section highlights The trends of big data generation that can be characterized by the data generation rate. Specifically, the data generation rate is increasing due to technological advancements. we roughly classify data generation patterns into three sequential stages: Stage I : The first stage began in the 1990s. As digital technology and database systems were widely adopted, many management systems in various organizations were storing large volumes of data, such as bank trading transactions, shopping mall records, and government sector archives. These datasets are structured and can be analyzed through database-based storage management systems. Stage II : The second stage began with the growing popularity of web systems. The Web 1.0 systems, characterized by web search engines and ecommerce businesses after the late 1990s, generated large amounts of semi-structured and/or unstructured data, including web pages and transaction logs. Since the early 2000s, many Web 2.0 applications created an abundance of user-generated content from online social networks, such as forums, online groups, blogs, social networking sites, and social media sites. Stage III : The third stage is triggered by the emergence of mobile devices, such as smart phones, tablets, sensors and sensor-based Internet-enabled devices. The mobile centric network has and will continue to create highly mobile, location-aware, person-centered, and context relevant data in the near future. With this classification, we can see that the data generation pattern is evolving rapidly, from passive recording in Stage I to active generation in Stage II and automatic production in Stage III. These three types of data constitute the primary sources of big data, of which the automatic production pattern will contribute the most in the near future. b. Data Acquisition As illustrated in the big data value chain, the task of the data acquisition phase is to aggregate information in a digital form for further storage and analysis. Intuitively, 2015, IJARCSSE All Rights Reserved Page 271
3 the acquisition process consists of three sub-steps, data instant the challenge came from all sides: accessing large collection, data transmission, and data pre-processing, as amounts of data, learning the models from it and carry out illustrated in Fig. 3. mass predictions all in a reasonable time. Data collection refers to the process of retrieving raw data from real-world objects. The process needs to be well Since big data processing requires decomposition, designed. Otherwise, inaccurate data collection would parallelism, modularity and/or recurrence, inflexible impact black-box type machine learning models failed in an the subsequent data analysis procedure and ultimately lead outset. In fact all machine learning algorithms with to computational complexity of O(n2) immediately become invalid results. intractable when faced with billions of data points. Data transmission refers to once we gather the raw data, we must transfer it into a data storage infrastructure, Depending on the depth of knowledge that is available for commonly in a data center, for subsequent processing. The learning, ML models can be categorized into supervised, transmission procedure can be divided into two stages, IP unsupervised and semi-supervised learning algorithms [1], backbone transmission and data center transmission. [3], [5], [7]. Fig. 3. Subtasks of Data Acquisition Data pre-processing is also of equal importance in data acquisition phase. Because of their diverse sources, the collected data sets may have different levels of quality in terms of noise, redundancy, consistency, etc. Transferring and storing raw data would have necessary costs. On the demand side, certain data analysis methods and applications might have strict requirements on data quality. As such, data preprocessing techniques that are designed to improve data quality should be in place in big data systems. c. Data Storage The data storage subsystem in a big data platform organizes the collected information in a convenient format for analysis and value extraction. For this purpose, the data storage subsystem should provide two sets of features: The storage infrastructure must accommodate information persistently and reliably. The data storage subsystem must provide a scalable access interface to query and analyze a vast quantity of data. This functional decomposition shows that the data storage subsystem can be divided into hardware infrastructure and data management. d. Data Analysis The last and most important stage of the big data chain is data analysis, the goal of which is to extract useful values, suggest conclusions and/or support decisionmaking. First, we discuss the purpose and classification metric of data analytics. Second, we review the application evolution for various data sources and summarize the six most relevant areas. Finally, we introduce several common methods that play fundamental roles in data analytics. III. MACHINE LEARNING FOR BIG DATA The core abilities of the data analytics and the machine learning to model, learn from, and predict data were very deeply affected by the emergence of big data. Almost in an A. Supervised Learning Supervised learning algorithms use labeled training examples. The input data and the target outputs are given explicitly for the model to learn the mapping or function between them. Once this is captured the model then uses the learned mapping and the new unseen input data to predict their outputs. Within supervised learning family we can further distinguish between classification models which focus on prediction of discrete (categorical) outputs or regression models which predict continues outputs. Among large number of models reported in the literature linear and nonlinear density based classifiers, decision trees, naïve Bayes, support vector machines, neural networks and nearest neighbor are the most frequently cited and applied in practical applications [1][5]. Decision trees despite their simplicity were particularly successful in various industrial applications due to their robustness, transparency and the portability of the model effectively resulting in a set of SQL queries. Despite their operational simplicity induction a decision tree is at best O(n log n) complex and still requires significant data reduction effort to handle very large data [1] [5]. Support vector machines (SVM) were also reported as very powerful in terms of achieved performance on the numerical data thanks to their explicit effort to maximise the margin of misclassification along the boundaries among the classes of data [1]- [5]. Traditional SVMs are quadratically complex O(n2), yet there are successful attempts to cleverly reduce their complexity to linear complexity [15]. We intend to expand on these developments. Neural networks (NN) are yet another example of very powerful, flexible and robust predictors for both classification and regression problems [1]- [5]. Its generic structure allows to learn virtually any function successfully with enough hidden layers of processing nodes and training data. Like for SVM, NNs are typically O(n3) or at best quadratic ally complex with the number of examples. Complex structures further expand computational demands of learning NN. There have been recent attempts to redesign NN into more scalable architectures [16], further work is required to make NNs fully scalable for big data processing. 2015, IJARCSSE All Rights Reserved Page 272
4 Fig. 4 captures a brief demonstration of a supervised TABLE I learning applied to the famous Iris species detection EXPONENTIAL GROWTH OF BIG DATA problem based on just 2-dimensional data of petal width 100 represnts Sr. No Terms the peak search content and petal length: 1 The Big Data Big Data Analytics 60 3 Data Analysis 55 4 Hadoop 45 5 Big Data Hadoop 45 6 GoopgleBig Data 30 7 IBM Big Data 20 8 Oracle Big Data 20 9 Big Data Cloud 20 This table shows the exponential growth of the big data google trends, along geographical distribution and the most popular related terms, source Fig.4 Demonstration of a supervised learning applied to the famous Iris species detection problem B. Unsupervised Learning Unsupervised learning typically referred to as clustering is concerned with recognizing natural grouping among the data i.e. separating similar from dissimilar data based on multidimensional data. Figure 2 visually depicts some examples of with k-means and mixture of Gaussians clustering methods. Many clustering methods from k-means through density based DBSCAN and hierarchical clustering models were reported for simple problems on small data sizes [6], [10] Traditional hierarchical clustering usually involves computation of the pair wise distances among the input data examples which normally involves quadratic ally complex computation step. Assigning data samples to different clusters based on the distance hierarchy poses yet another computational overhead. DBSCAN is one of the most cited clustering algorithms. It is based on the data density and uses the concept local reach ability among points to decide about cluster memberships. However, even if optimized with efficient indexing structure it is at best O(n log n) complex in the number of examples [6]. K-means clustering, on the other hand, is linearly complex in the number of examples and data dimensionality but assumes fixed number of algorithm iterations and fixed number of clusters. Due to its linear complexity it is however the most scalable clustering model to date which also evolved into a successful artificial neural network equivalent the Kohonen Nets [6]. Some recent research reported further improved scalability of k-means model when the average aggregation is replaced with the median [10]. We intend to expand on the promising scalable supervised and unsupervised learning models as well as develop the new models that flexibly manage the balance between the computational complexity of learning and the prediction performance through modularization and recurrence at different data granularity. C. Reinforcement learning Reinforcement learning lies somewhat in between supervised and unsupervised learning in a sense that learning is not done on the fully labeled or evaluated data but a hint in a form of local reward is provided to every actionable data. The objective of the reinforcement learning is to maximize the long term reward through exploration and learning the optimal action policy in response to the environmental data. Reinforcement learning found prime applications in robotics, agent technologies and many complex exploration systems requiring continuous response to the changing environment while maximizing strategic longer-term objectives IV. BIG DATA REVOLUTION The emergence of the notion of Big Data (BD) initiated the true revolution in the way we store, manage and process the data. Coming now in Exabyte per day big data and its popularity continue to grow exponentially as it can be seen from the Google Trend chart presented in the table 1. Big data are linked to the explosive growth in mobile devices and their ability to generate, collect, share and access massive amounts of textual, numeric, imagery and video data. Combined with a correlated expansion of internet resources, networked services and the cult of sharing on ever growing social networks what we are experiencing is a true data revolution. V. CONCLUSION The era of big data is upon us, bringing with it an urgent need for advanced data acquisition, management, and analysis mechanisms. In this paper, we have presented the concept of big data and highlighted the big data value chain, which covers the entire big data lifecycle. The big data value chain consists of four phases: data generation, data acquisition, data storage, and data analysis. The next part of this paper, focuses on dealing with big data using machine learning(ml), and highlighted the three ML methods: supervised learning, unsupervised learning and reinforcement learning and its impact on big data. 2015, IJARCSSE All Rights Reserved Page 273
5 REFERENCES AUTHORS Prof. Kanchan M. Tarwani Received MTech. in Information Technology from YCCE, Nagpur in Received B.E. degree in Computer Science and Engineering from PRMIT&R, Badnera, Sant Gadge baba Amravati university in [1]. T.M. Mitchell. Machine Learning. McGraw-Hill, New York, [2]. R.O. Duda, P.E. Hart and D.G. Stork. Pattern Classification. John Wiley & Sons, New York, [3]. C.M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, [4]. S. Marsland. Machine Learning, An Algorithmic Perspective. Chapman and Hall / CRC Press, Boca Raton, [5]. A. Mohri, A. Rostamizadeh and A. Talwalker. Foundations of Machine Learning. The MIT Press, Cambridge, [6]. C.C. Aggarwal and S.K. Reddy. Data Clustering, Algorithms and Applications. Chapman and Hall / CRC, Boca Raton, [7]. O. Chapelle, B. Cholkopf, A. Zien. Semi- Supervised Learning (Adaptive Computation and Machine Learning Series). MIT Press, Cambridge, [8]. J. Liebowitz. Big Data and Business Analytics. CRC Press, Boca Raton, [9]. V. Mayer-Schonberger and K. Cukier. Big Data: A Revolution That Will Transform How We Live, Work and Think. Houghton Mifflin Harcourt Publishing, New York, [10]. YC Kwon, D. Nunley, J.P. Gardner, M. Balazinska, B. Howe, S. Loebman. Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster. Scientific and Statistical Database Management 6187: , [11] J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,'' in Proc. IDC iview, IDC Anal. Future, [12] J. Dean and S. Ghemawat, ``Mapreduce: Simpli_ed data processing on large clusters,'' Commun. ACM, vol. 51, no. 1, pp. 107_113, [13] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, ``Dryad: Distributed data-parallel programs from sequential building blocks,'' in Proc. 2nd ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst., Jun. 2007, pp. 59_72. [14] G. Malewicz et al., ``Pregel: A system for largescale graph processing,'' in Proc. ACM SIGMOD Int. Conf. Manag. Data, Jun. 2010, pp. 135_ S. Melnik et al., ``Dremel: Interactive analysis of web-scale datasets,'' Proc. VLDB Endowment, vol. 3, nos. 1_2, pp. 330_339, [16 ] E. B. S. D. D. Agrawal et al., ``Challenges and opportunities with big data_a community white paper developed by leading researchers across the united states,'' The Computing Research Association, CRA White Paper, Feb [17] D. Fisher, R. DeLine, M. Czerwinski, and S. Drucker, ``Interactions with big data analytics,'' Interactions, vol. 19, no. 3, pp. 50_59, May Prof. Saleha S. Saudagar Received MTech. in Computer Science and Engineering from SBITM Betul University, Madhya Pradesh in Received B.E. degree in Computer Science and Engineering from Government College of Engineering, Amravati in Prof. Harshal D. Misalkar Received M.E. in Information Technology from PRMIT&R, Badnera, Sant Gadge baba Amravati university in 2014.Received B.E. in Information Technology from PRMIT&R Badnera, Sant Gadge baba Amravati university in , IJARCSSE All Rights Reserved Page 274
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationSteven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg
Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationA STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationInternational Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationInternational Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationProcessing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015
Processing of Big Data Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015 Acknowledgement Some slides in this set of slides were provided by EMC Corporation and Sandra Avila, University
More informationBig Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationBig Data Analytic and Mining with Machine Learning Algorithm
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationBig Data Racing and parallel Database Technology
EFFICIENT DATA ANALYSIS SCHEME FOR INCREASING PERFORMANCE IN BIG DATA Mr. V. Vivekanandan Computer Science and Engineering, SriGuru Institute of Technology, Coimbatore, Tamilnadu, India. Abstract Big data
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationNetwork Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationCLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
More informationData Isn't Everything
June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationUnderstanding traffic flow
White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow
More informationBIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
More informationReconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationBig Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationPacket Flow Analysis and Congestion Control of Big Data by Hadoop
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.456
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationResearch of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
More informationCURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING
Journal homepage: http://www.journalijar.com INTERNATIONAL JOURNAL OF ADVANCED RESEARCH RESEARCH ARTICLE CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING R.Kohila
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationWrangling Actionable Insights from Organizational Data
Wrangling Actionable Insights from Organizational Data Koverse Eases Big Data Analytics for Those with Strong Security Requirements The amount of data created and stored by organizations around the world
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationMaster big data to optimize the oil and gas lifecycle
Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationFramework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b
International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b 1
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationSAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics
SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationResearch on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
More informationOpen Access Research of Massive Spatiotemporal Data Mining Technology Based on Cloud Computing
Send Orders for Reprints to reprints@benthamscience.ae 2244 The Open Automation and Control Systems Journal, 2015, 7, 2244-2252 Open Access Research of Massive Spatiotemporal Data Mining Technology Based
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.
More informationNEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationA New Approach For Estimating Software Effort Using RBFN Network
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,
More informationName: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationOperating Stoop for Efficient Parallel Data Processing In Cloud
RESEARCH INVENTY: International Journal of Engineering and Science ISBN: 2319-6483, ISSN: 2278-4721, Vol. 1, Issue 10 (December 2012), PP 11-15 www.researchinventy.com Operating Stoop for Efficient Parallel
More informationThe Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationMachine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
More information