Index Terms Big Data, Machine Learning, Supervised learning, unsupervised learning. Reinforcement learning.

Size: px
Start display at page:

Download "Index Terms Big Data, Machine Learning, Supervised learning, unsupervised learning. Reinforcement learning."

Transcription

1 Volume 5, Issue 4, April 2015 ISSN: X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Special Issue on Impact of Technology on Skill Development Conference Held at IETE Amravati Center, Maharashtra, India Machine Learning in Big Data Analytics: An Overview Prof. Kanchan M. Tarwani, Prof. Saleha S. Saudagar, Prof. Harshal D. Misalkar Abstract Big data is much more than storage of and access to data. Big data Analytics plays an important role in making sense of the data and exploiting its value. But it s a significant challenge to learn and develop new types of machine learning algorithms. Scaling up big data to proper dimensionality is an challenge that can encounter in machine learning algorithms plus there are challenges of dealing with velocity, volume and many more for all types of machine learning algorithms. Here, in this paper, we are first exploring big data concept, bringing with an urgent need for advanced data acquisition, management, and analysis mechanisms, we have presented the concept of big data and highlighted the four phases of big data that are generating data, acquisition of data, storing this large data, and then analyzing data. The next phase of this paper, focuses on dealing with big data using machine learning(ml), and highlighted the three ML methods: supervised learning, unsupervised learning and reinforcement learning and its impact on big data. Index Terms Big Data, Machine Learning, Supervised learning, unsupervised learning. Reinforcement learning. I. INTRODUCTION The emerging big-data paradigm, owing to its broader impact, has profoundly transformed our society and will continue to attract diverse attentions from both technological experts and the public in general. It is obvious that we are living a data deluge era, evidenced by the sheer volume of data from a variety of sources and its growing rate of generation. For instance, an IDC report [1] predicts that, from 2005 to 2020, the global data volume will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, representing a double growth every two years. The term of ``big-data'' was coined to capture the profound meaning of this data-explosion trend and indeed the data has been touted as the new oil, which is expected to transform our society. Machine learning can be defined as a Field of study that gives computers the ability to learn without being explicitly programmed. It s a science of algorithms says that, the algorithms learn from the dataset, identifying patterns or classifying trends for instance, and then automates output- whether that s sorting data into categories or making predictions on future outputs. Machine Learning is now quite a matured discipline and captures in general any activity that involves automated learning from data or experience. At the core of machine learning is the ability of a software or machine to improve the performance of certain tasks through being exposed to data and experiences. Machine learning model first learns the knowledge from the data it is exposed to and then applies this knowledge to deliver predictions about the new data which is previously unseen data [1] [7]. The quality of the predictions delivered by machine learning model depend on a number of factors: how well the relevant knowledge is represented by the module, how effectively complete and representative were the learning data, how easily we can forecast the problem in general. Most of the times, machine learning became a very good weapon at certain recognition, identification or categorization tasks like fingerprint, faces or voice recognition [1][4]. Likewise recent clustering algorithms became very good at automatically grouping people according to their profiles, identify market segments, forming communities, and even segment images or distinguish genes [1], [6]. The successful application of ML models to these problems was possible not only thanks to the learning model ingenuity but primarily thanks to the very careful data with the properties like transformative, identifiable, and generation of multidimensional that made the learning problem discriminative enough to make easy distinctions between the predicted targets based on the data [1], [2], [6]. Depending upon the depth of knowledge that is available for learning, machine learning models can be categorized into supervised, unsupervised and semisupervised learning algorithms [1], [3], [5], [7]. Big data is of two types: Structured data and Semi structured data. Structured data are numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smart phones, and global positioning system (GPS) devices. Structured data also include things like account balances, sales figures, and transaction data. Unstructured data include more complex information such as reviews of customers and forums from commercial websites, photos and other multimedia, and comments on social networking sites. These data cannot easily be divided into categories or analyzed numerically. 2015, IJARCSSE All Rights Reserved Page 270

2 II. BIG DATA AND ITS IMPACT B. Big Data Phases A big-data system is complex, providing functions to deal with different phases in the digital data life cycle, ranging from its birth to its destruction. At the same time, the system usually involves multiple distinct phases for different applications [16], [17]. In this section, we focus on the chain for big data analytics. Specifically, we describe a big data chain that consists of four stages (generation, acquisition, storage, and processing). A. Layered Architecture for Big Data The big data system can be decomposed into a layered structure, as illustrated in Fig. 1. The layered structure is divisible into three layers, i.e., the infrastructure layer, the computing layer, and the application layer, from bottom to top. This layered view only provides a conceptual hierarchy to underscore the complexity of a big data system. The function of each layer is as follows. The infrastructure layer consists of a pool of ICT resources, which can be organized by cloud computing infrastructure and enabled by virtualization technology. These resources will be exposed to upper-layer systems in a _ne-grained manner with a special service-level agreement (SLA). Within this model, resources must be allocated to meet the big data demand while achieving resource efficiency by maximizing system utilization, energy awareness, operational simplification, etc. The computing layer encapsulates various data tools into a middleware layer that runs over raw ICT resources. In the context of big data, typical tools include data integration, data management, and the programming model. Data integration means acquiring data from disparate sources and integrating the dataset into a unified form with the necessary data pre-processing operations. Data management refers to mechanisms and tools that provide persistent data storage and highly efficient management, such as distributed _le systems and SQL or NoSQL data stores. The programming model implements abstraction application logic and facilitates the data analysis applications. Map Reduce [12], Dryad [13], Pregel [14], and Dremel [15] exemplify programming models. _ The application layer exploits the interface provided by the programming models to implement various data analysis functions, including querying, statistical analyses, clustering, and classification; then, it combines basic analytical methods to develop various field related applications. Fig. 1. Architecture for Big Data Fig. 2. Big Data Value Chain a. Data Generation Data Generation the first and most important phase of Big data chain. This section highlights The trends of big data generation that can be characterized by the data generation rate. Specifically, the data generation rate is increasing due to technological advancements. we roughly classify data generation patterns into three sequential stages: Stage I : The first stage began in the 1990s. As digital technology and database systems were widely adopted, many management systems in various organizations were storing large volumes of data, such as bank trading transactions, shopping mall records, and government sector archives. These datasets are structured and can be analyzed through database-based storage management systems. Stage II : The second stage began with the growing popularity of web systems. The Web 1.0 systems, characterized by web search engines and ecommerce businesses after the late 1990s, generated large amounts of semi-structured and/or unstructured data, including web pages and transaction logs. Since the early 2000s, many Web 2.0 applications created an abundance of user-generated content from online social networks, such as forums, online groups, blogs, social networking sites, and social media sites. Stage III : The third stage is triggered by the emergence of mobile devices, such as smart phones, tablets, sensors and sensor-based Internet-enabled devices. The mobile centric network has and will continue to create highly mobile, location-aware, person-centered, and context relevant data in the near future. With this classification, we can see that the data generation pattern is evolving rapidly, from passive recording in Stage I to active generation in Stage II and automatic production in Stage III. These three types of data constitute the primary sources of big data, of which the automatic production pattern will contribute the most in the near future. b. Data Acquisition As illustrated in the big data value chain, the task of the data acquisition phase is to aggregate information in a digital form for further storage and analysis. Intuitively, 2015, IJARCSSE All Rights Reserved Page 271

3 the acquisition process consists of three sub-steps, data instant the challenge came from all sides: accessing large collection, data transmission, and data pre-processing, as amounts of data, learning the models from it and carry out illustrated in Fig. 3. mass predictions all in a reasonable time. Data collection refers to the process of retrieving raw data from real-world objects. The process needs to be well Since big data processing requires decomposition, designed. Otherwise, inaccurate data collection would parallelism, modularity and/or recurrence, inflexible impact black-box type machine learning models failed in an the subsequent data analysis procedure and ultimately lead outset. In fact all machine learning algorithms with to computational complexity of O(n2) immediately become invalid results. intractable when faced with billions of data points. Data transmission refers to once we gather the raw data, we must transfer it into a data storage infrastructure, Depending on the depth of knowledge that is available for commonly in a data center, for subsequent processing. The learning, ML models can be categorized into supervised, transmission procedure can be divided into two stages, IP unsupervised and semi-supervised learning algorithms [1], backbone transmission and data center transmission. [3], [5], [7]. Fig. 3. Subtasks of Data Acquisition Data pre-processing is also of equal importance in data acquisition phase. Because of their diverse sources, the collected data sets may have different levels of quality in terms of noise, redundancy, consistency, etc. Transferring and storing raw data would have necessary costs. On the demand side, certain data analysis methods and applications might have strict requirements on data quality. As such, data preprocessing techniques that are designed to improve data quality should be in place in big data systems. c. Data Storage The data storage subsystem in a big data platform organizes the collected information in a convenient format for analysis and value extraction. For this purpose, the data storage subsystem should provide two sets of features: The storage infrastructure must accommodate information persistently and reliably. The data storage subsystem must provide a scalable access interface to query and analyze a vast quantity of data. This functional decomposition shows that the data storage subsystem can be divided into hardware infrastructure and data management. d. Data Analysis The last and most important stage of the big data chain is data analysis, the goal of which is to extract useful values, suggest conclusions and/or support decisionmaking. First, we discuss the purpose and classification metric of data analytics. Second, we review the application evolution for various data sources and summarize the six most relevant areas. Finally, we introduce several common methods that play fundamental roles in data analytics. III. MACHINE LEARNING FOR BIG DATA The core abilities of the data analytics and the machine learning to model, learn from, and predict data were very deeply affected by the emergence of big data. Almost in an A. Supervised Learning Supervised learning algorithms use labeled training examples. The input data and the target outputs are given explicitly for the model to learn the mapping or function between them. Once this is captured the model then uses the learned mapping and the new unseen input data to predict their outputs. Within supervised learning family we can further distinguish between classification models which focus on prediction of discrete (categorical) outputs or regression models which predict continues outputs. Among large number of models reported in the literature linear and nonlinear density based classifiers, decision trees, naïve Bayes, support vector machines, neural networks and nearest neighbor are the most frequently cited and applied in practical applications [1][5]. Decision trees despite their simplicity were particularly successful in various industrial applications due to their robustness, transparency and the portability of the model effectively resulting in a set of SQL queries. Despite their operational simplicity induction a decision tree is at best O(n log n) complex and still requires significant data reduction effort to handle very large data [1] [5]. Support vector machines (SVM) were also reported as very powerful in terms of achieved performance on the numerical data thanks to their explicit effort to maximise the margin of misclassification along the boundaries among the classes of data [1]- [5]. Traditional SVMs are quadratically complex O(n2), yet there are successful attempts to cleverly reduce their complexity to linear complexity [15]. We intend to expand on these developments. Neural networks (NN) are yet another example of very powerful, flexible and robust predictors for both classification and regression problems [1]- [5]. Its generic structure allows to learn virtually any function successfully with enough hidden layers of processing nodes and training data. Like for SVM, NNs are typically O(n3) or at best quadratic ally complex with the number of examples. Complex structures further expand computational demands of learning NN. There have been recent attempts to redesign NN into more scalable architectures [16], further work is required to make NNs fully scalable for big data processing. 2015, IJARCSSE All Rights Reserved Page 272

4 Fig. 4 captures a brief demonstration of a supervised TABLE I learning applied to the famous Iris species detection EXPONENTIAL GROWTH OF BIG DATA problem based on just 2-dimensional data of petal width 100 represnts Sr. No Terms the peak search content and petal length: 1 The Big Data Big Data Analytics 60 3 Data Analysis 55 4 Hadoop 45 5 Big Data Hadoop 45 6 GoopgleBig Data 30 7 IBM Big Data 20 8 Oracle Big Data 20 9 Big Data Cloud 20 This table shows the exponential growth of the big data google trends, along geographical distribution and the most popular related terms, source Fig.4 Demonstration of a supervised learning applied to the famous Iris species detection problem B. Unsupervised Learning Unsupervised learning typically referred to as clustering is concerned with recognizing natural grouping among the data i.e. separating similar from dissimilar data based on multidimensional data. Figure 2 visually depicts some examples of with k-means and mixture of Gaussians clustering methods. Many clustering methods from k-means through density based DBSCAN and hierarchical clustering models were reported for simple problems on small data sizes [6], [10] Traditional hierarchical clustering usually involves computation of the pair wise distances among the input data examples which normally involves quadratic ally complex computation step. Assigning data samples to different clusters based on the distance hierarchy poses yet another computational overhead. DBSCAN is one of the most cited clustering algorithms. It is based on the data density and uses the concept local reach ability among points to decide about cluster memberships. However, even if optimized with efficient indexing structure it is at best O(n log n) complex in the number of examples [6]. K-means clustering, on the other hand, is linearly complex in the number of examples and data dimensionality but assumes fixed number of algorithm iterations and fixed number of clusters. Due to its linear complexity it is however the most scalable clustering model to date which also evolved into a successful artificial neural network equivalent the Kohonen Nets [6]. Some recent research reported further improved scalability of k-means model when the average aggregation is replaced with the median [10]. We intend to expand on the promising scalable supervised and unsupervised learning models as well as develop the new models that flexibly manage the balance between the computational complexity of learning and the prediction performance through modularization and recurrence at different data granularity. C. Reinforcement learning Reinforcement learning lies somewhat in between supervised and unsupervised learning in a sense that learning is not done on the fully labeled or evaluated data but a hint in a form of local reward is provided to every actionable data. The objective of the reinforcement learning is to maximize the long term reward through exploration and learning the optimal action policy in response to the environmental data. Reinforcement learning found prime applications in robotics, agent technologies and many complex exploration systems requiring continuous response to the changing environment while maximizing strategic longer-term objectives IV. BIG DATA REVOLUTION The emergence of the notion of Big Data (BD) initiated the true revolution in the way we store, manage and process the data. Coming now in Exabyte per day big data and its popularity continue to grow exponentially as it can be seen from the Google Trend chart presented in the table 1. Big data are linked to the explosive growth in mobile devices and their ability to generate, collect, share and access massive amounts of textual, numeric, imagery and video data. Combined with a correlated expansion of internet resources, networked services and the cult of sharing on ever growing social networks what we are experiencing is a true data revolution. V. CONCLUSION The era of big data is upon us, bringing with it an urgent need for advanced data acquisition, management, and analysis mechanisms. In this paper, we have presented the concept of big data and highlighted the big data value chain, which covers the entire big data lifecycle. The big data value chain consists of four phases: data generation, data acquisition, data storage, and data analysis. The next part of this paper, focuses on dealing with big data using machine learning(ml), and highlighted the three ML methods: supervised learning, unsupervised learning and reinforcement learning and its impact on big data. 2015, IJARCSSE All Rights Reserved Page 273

5 REFERENCES AUTHORS Prof. Kanchan M. Tarwani Received MTech. in Information Technology from YCCE, Nagpur in Received B.E. degree in Computer Science and Engineering from PRMIT&R, Badnera, Sant Gadge baba Amravati university in [1]. T.M. Mitchell. Machine Learning. McGraw-Hill, New York, [2]. R.O. Duda, P.E. Hart and D.G. Stork. Pattern Classification. John Wiley & Sons, New York, [3]. C.M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, [4]. S. Marsland. Machine Learning, An Algorithmic Perspective. Chapman and Hall / CRC Press, Boca Raton, [5]. A. Mohri, A. Rostamizadeh and A. Talwalker. Foundations of Machine Learning. The MIT Press, Cambridge, [6]. C.C. Aggarwal and S.K. Reddy. Data Clustering, Algorithms and Applications. Chapman and Hall / CRC, Boca Raton, [7]. O. Chapelle, B. Cholkopf, A. Zien. Semi- Supervised Learning (Adaptive Computation and Machine Learning Series). MIT Press, Cambridge, [8]. J. Liebowitz. Big Data and Business Analytics. CRC Press, Boca Raton, [9]. V. Mayer-Schonberger and K. Cukier. Big Data: A Revolution That Will Transform How We Live, Work and Think. Houghton Mifflin Harcourt Publishing, New York, [10]. YC Kwon, D. Nunley, J.P. Gardner, M. Balazinska, B. Howe, S. Loebman. Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster. Scientific and Statistical Database Management 6187: , [11] J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,'' in Proc. IDC iview, IDC Anal. Future, [12] J. Dean and S. Ghemawat, ``Mapreduce: Simpli_ed data processing on large clusters,'' Commun. ACM, vol. 51, no. 1, pp. 107_113, [13] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, ``Dryad: Distributed data-parallel programs from sequential building blocks,'' in Proc. 2nd ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst., Jun. 2007, pp. 59_72. [14] G. Malewicz et al., ``Pregel: A system for largescale graph processing,'' in Proc. ACM SIGMOD Int. Conf. Manag. Data, Jun. 2010, pp. 135_ S. Melnik et al., ``Dremel: Interactive analysis of web-scale datasets,'' Proc. VLDB Endowment, vol. 3, nos. 1_2, pp. 330_339, [16 ] E. B. S. D. D. Agrawal et al., ``Challenges and opportunities with big data_a community white paper developed by leading researchers across the united states,'' The Computing Research Association, CRA White Paper, Feb [17] D. Fisher, R. DeLine, M. Czerwinski, and S. Drucker, ``Interactions with big data analytics,'' Interactions, vol. 19, no. 3, pp. 50_59, May Prof. Saleha S. Saudagar Received MTech. in Computer Science and Engineering from SBITM Betul University, Madhya Pradesh in Received B.E. degree in Computer Science and Engineering from Government College of Engineering, Amravati in Prof. Harshal D. Misalkar Received M.E. in Information Technology from PRMIT&R, Badnera, Sant Gadge baba Amravati university in 2014.Received B.E. in Information Technology from PRMIT&R Badnera, Sant Gadge baba Amravati university in , IJARCSSE All Rights Reserved Page 274

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Processing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015

Processing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015 Processing of Big Data Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015 Acknowledgement Some slides in this set of slides were provided by EMC Corporation and Sandra Avila, University

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

EFFICIENT DATA ANALYSIS SCHEME FOR INCREASING PERFORMANCE IN BIG DATA

EFFICIENT DATA ANALYSIS SCHEME FOR INCREASING PERFORMANCE IN BIG DATA EFFICIENT DATA ANALYSIS SCHEME FOR INCREASING PERFORMANCE IN BIG DATA Mr. V. Vivekanandan Computer Science and Engineering, SriGuru Institute of Technology, Coimbatore, Tamilnadu, India. Abstract Big data

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Data Isn't Everything

Data Isn't Everything June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Big Data Analytic and Mining with Machine Learning Algorithm

Big Data Analytic and Mining with Machine Learning Algorithm International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Packet Flow Analysis and Congestion Control of Big Data by Hadoop

Packet Flow Analysis and Congestion Control of Big Data by Hadoop Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.456

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b 1

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

A New Approach For Estimating Software Effort Using RBFN Network

A New Approach For Estimating Software Effort Using RBFN Network IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Operating Stoop for Efficient Parallel Data Processing In Cloud

Operating Stoop for Efficient Parallel Data Processing In Cloud RESEARCH INVENTY: International Journal of Engineering and Science ISBN: 2319-6483, ISSN: 2278-4721, Vol. 1, Issue 10 (December 2012), PP 11-15 www.researchinventy.com Operating Stoop for Efficient Parallel

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Wrangling Actionable Insights from Organizational Data

Wrangling Actionable Insights from Organizational Data Wrangling Actionable Insights from Organizational Data Koverse Eases Big Data Analytics for Those with Strong Security Requirements The amount of data created and stored by organizations around the world

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Master big data to optimize the oil and gas lifecycle

Master big data to optimize the oil and gas lifecycle Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning.  CS 2750 Machine Learning. Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Open Access Research of Massive Spatiotemporal Data Mining Technology Based on Cloud Computing

Open Access Research of Massive Spatiotemporal Data Mining Technology Based on Cloud Computing Send Orders for Reprints to reprints@benthamscience.ae 2244 The Open Automation and Control Systems Journal, 2015, 7, 2244-2252 Open Access Research of Massive Spatiotemporal Data Mining Technology Based

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information

Data Mining for Knowledge Management in Technology Enhanced Learning

Data Mining for Knowledge Management in Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems

Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems paper:38 Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems Edson Ramiro Lucsa Filho 1, Ivan Luiz Picoli 2, Eduardo Cunha de Almeida 2, Yves Le Traon 1 1 University of Luxembourg

More information