LEVERAGING BIG DATA ANALYTICS THROUGH ANALYTICS-AS-A-SERVICE (AAAS) TOOL



Similar documents
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Deriving Business Intelligence from Unstructured Data

Data sets preparing for Data mining analysis by SQL Horizontal Aggregation

Terms analytics service for CouchDB: a document-based NoSQL. Richard K. Lomotey* and Ralph Deters

Getting Started Practical Input For Your Roadmap

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Healthcare Measurement Analysis Using Data mining Techniques

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Refinery with Big Data Aspects

INTEGRATED APPROACH FOR DATA MINING AND CLOUD MINING:CASE STUDY

SECURE VARIOUS-KEYWORD STRATIFIED SEARCH THROUGH ENCRYPTED DATA IN CLOUD

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Government Technology Trends to Watch in 2014: Big Data

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

How To Turn Big Data Into An Insight

Integrated Social and Enterprise Data = Enhanced Analytics

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

The 4 Pillars of Technosoft s Big Data Practice

A Review of Data Mining Techniques

Big Data Integration: A Buyer's Guide

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Big Impacts from Big Data UNION SQUARE ADVISORS LLC

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

BIG DATA CHALLENGES AND PERSPECTIVES

Achieving Business Value through Big Data Analytics Philip Russom

Information Governance

Big Data Analytics- Innovations at the Edge

MAKING YOUR COMPANY BECOME DATA-DRIVEN

How To Learn To Use Big Data

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

The Next Wave of Data Management. Is Big Data The New Normal?

Manifest for Big Data Pig, Hive & Jaql

Convergence of Big Data and Cloud

Hexaware E-book on Predictive Analytics

Data Mining Techniques for Banking Applications

Sentiment Analysis on Big Data

SPACK FIREWALL RESTRICTION WITH SECURITY IN CLOUD OVER THE VIRTUAL ENVIRONMENT

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

CONNECTING DATA WITH BUSINESS

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

DATA ANALYTICS SERVICES. G-CLOUD SERVICE DEFINITION.

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Ten Mistakes to Avoid

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

An Overview of Knowledge Discovery Database and Data mining Techniques

A Survey on Data Warehouse Architecture

Enhancement of Security in Distributed Data Mining

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

New Design Principles for Effective Knowledge Discovery from Big Data

We are Big Data A Sonian Whitepaper

JOURNAL OF OBJECT TECHNOLOGY

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Data Integration Checklist

The Future of Data Management

Global Big Data Market: Trends & Opportunities ( ) June 2015

Delivering Smart Answers!

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Business white paper. Lower risk and cost with proactive information governance

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Public Auditing for Shared Data in the Cloud by Using AES

Formal Methods for Preserving Privacy for Big Data Extraction Software

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

Turkish Journal of Engineering, Science and Technology

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Big Data Defined Introducing DataStack 3.0

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Value of. Clinical and Business Data Analytics for. Healthcare Payers NOUS INFOSYSTEMS LEVERAGING INTELLECT

The Principles of the Business Data Lake

Oracle Big Data Building A Big Data Management System

Cisco Solutions for Big Data and Analytics

Oracle Big Data SQL Technical Update

DATA MINING TECHNIQUES AND APPLICATIONS

Welcome to the webinar Does your department or company use the valuable data it collects to plan for future needs and trends?

This Symposium brought to you by

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

IJRCS - International Journal of Research in Computer Science ISSN:

MDM and Data Warehousing Complement Each Other

Transforming the Telecoms Business using Big Data and Analytics

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Reaping the Rewards of Big Data

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data at Cloud Scale

Data Analytics in Organisations and Business

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Sources: Summary Data is exploding in volume, variety and velocity timely

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

How To Use Big Data For Business

IBM Big Data in Government

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS

Transcription:

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 2 / JAN 2016 LEVERAGING BIG DATA ANALYTICS THROUGH ANALYTICS-AS-A-SERVICE (AAAS) TOOL 1 SRAVAN RENTALA, 2 V UMA RANI 1 M.Tech Student, Department of CSE, School Of Information Technology, JNTUH, Kukatpally, Hyderabad, Telangana state, India. 2 Assistant Professor, Department of CSE, School Of Information Technology, JNTUH, Kukatpally, Hyderabad, Telangana state, India. Abstract Big Data has become the recent IT hokum of recent competitive world. The term Big Data is taken from the fact that we have a penchant to developing huge and mounting volumes of data on a daily basis travelling paradigm across different domains and platforms. Its presence has all been secreted from the domain of computer science. Big Data relates to however firm the information is very than its complexity. during this paper, we have a tendency to concisely describe regarding huge Data's progression from RDBMS, data processing, Image mining, Computer vision along side its varied knowledge Storage techniques wherever multiple difficult tasks had been created both for computer code engineers similar as framework management services. This paper additionally illustrates the interaction of Big Data, mobile, and cloud computing making new opportunities to become key enabler and demand for bigger, better, and quicker applications. Big_Data has found its applications all together sectors and is so turning into a dominant class of applications that are deployed over virtualized environments. Keywords big data, analytics,mobile, rdbms, virtualization, applications. I. INTRODUCTION Society is changing into a lot of more instrumented and as a result, organisations area unit manufacturing and storing Brobdingnagian amounts of data. Managing and gaining insights from the made information could be a challenge and key to competitive advantage. Analytics solutions that mine structured and unstructured information area unit necessary as they can facilitate organisations gain insights not solely from their in private acquired information, however additionally from massive amounts of knowledge in public accessible on the online. the power to cross-relate personal info on client preferences and product with info from tweets, blogs, product evaluations, and information from social networks opens a good vary of prospects for organisations to grasp the needs of their customers, predict their desires and demands, and optimise the employment of resources. This paradigm is being popularly termed as massive information. Despite the recognition on analytics and massive information, putt them into observe continues to be a fancy and time intense endeavour. As Yu points out, massive information offers substantial price to organizations willing to adopt it, however at a similar time poses a substantial number of challenges for the realisation of such further price. An organisation willing to use analytics technology ofttimes acquires expensive software system licences; employs massive computing infrastructure; and pays for consulting hours of analysts World Health Organization work with the organisation to raised perceive its business, organize its data, and integrate it for analytics. This joint effort of organization and analysts typically aims to assist the organisation perceive its customers desires, behaviours, and future demands for new product or selling methods. Such effort, however, is usually costly and infrequently lacks flexibility. yet, analysis and application of massive information area unit being extensively explored by governments, as proven by initiatives from USA and Britain; by academics, like the bigdata@csail initiative from Massachusetts Institute of Technology ; and by firms like Intel. Cloud computing has been

revolutionising the IT trade by adding flexibility to the approach it's consumed, facultative organizations to pay just for the resources and services they use. In an attempt to reduce IT capital and operational expenditures, organisations of all sizes area unit victimization Clouds to supply the resources needed to run their applications. Clouds vary considerably in their specific technologies and implementation, however typically offer infrastructure, platform, and software system resources as services. The most typically claimed edges of Clouds embrace giving resources in a pay-as-you-go fashion, improved handiness and snap, and cost reduction. Clouds will forestall organisations from spending money for maintaining peak-provisioned IT infrastructure that they're unlikely to use most of the time. while initially glance the worth proposition of Clouds as a platform to hold out analytics is powerful, there area unit several challenges that require to be overcome to make Clouds a perfect platform for ascendible analytics. In this article we have a tendency to survey approaches, environments, and technologies on area unitas that are key to massive information analytics capabilities and discuss however they assist building analytics solutions for Clouds. We focus on the foremost necessary technical problems on facultative Cloud analytics, however additionally highlight a number of the non-technical challenges faced by organisations that need to supply analytics as a service in the Cloud. additionally, we have a tendency to describe a collection of gaps and proposals for the analysis community on future directions on Cloudsupported Big information computing. II. RELATED WORK Organisations square measure progressively generating massive volumes of information as results of instrumented business processes, observance of user activity, computing machine following, sensors, finance, accounting, among different reasons. With the appearance of social network internet sites, users produce records of their lives by daily posting details of activities they perform, events they attend, places they visit, pictures they take, and things they relish and wish. This knowledge deluge is often said as huge knowledge; a term that conveys the challenges it poses on existing infrastructure with regard to storage, management, ability, governance, and analysis of the data. In today s competitive market, having the ability to explore knowledge to grasp customer behaviour, phase client base, supply custom-made services, and gain insights from knowledge provided by multiple sources is vital to competitive advantage. though call manufacturers would like to base their selections and actions on insights gained from this knowledge, creating sense of information, extracting non obvious patterns, and mistreatment these patterns to predict future behaviour square measure not new topics. data Discovery in knowledge (KDD) aims to extract non obvious data mistreatment careful and careful analysis and interpretation. data processing [133,84], additional specifically, aims to find antecedently unknown interrelations among apparently unrelated attributes of information sets by applying strategies from several areas together with machine learning, info systems, and statistics. Analytics contains techniques of KDD, data processing, text mining, applied mathematics and mensuration, informative and prognostic models, and advanced and interactive image to drive decisions and actions. depicts the common phases of a standard analytics workflow for large knowledge. knowledge from varied sources, including databases, streams, marts, and knowledge warehouses, square measure wont to build models. the massive volume and differing types of the info will demand pre-processing tasks for integration the info, cleanup it, and filtering it. The ready knowledge is employed to coach a model and to estimate its parameters. Once the model is calculable, it ought to be valid before its consumption. unremarkably this part needs the utilization of the first computer file and specific strategies to validate the created model. Finally, the model is consumed and applied to knowledge as it arrives. This phase, known as model evaluation, is employed to get predictions, prescriptions, and proposals. The results square measure taken and evaluated, wont to generate new models or calibrate existing ones, or square measure integrated to pre-processed knowledge. Analytics solutions is classified as descriptive, predictive, or

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 2 / JAN 2016 prescriptive as illustrated in Fig Descriptive analytics uses historical knowledge to spot patterns and build management reports; it's involved with modelling past behaviour. Predictive analytics tries to predict the longer term by analysing current and historical knowledge. Prescriptive solutions assist analysts in selections by determining actions and assessing their impact relating to business objectives, necessities, and constraints. Despite the promotional material concerning it, mistreatment analytics continues to be a labour intensive endeavour. this is often as a result of current solutions for analytics square measure often supported proprietary appliances or package systems designed for general functions. Thus, important effort is required to tailor such solutions to the precise wants of the organisation, which incorporates integrating totally different knowledge sources and deploying the package on the company s hardware (or, within the case of appliances, integration the appliance hardware with the remainder of the company s systems). Such solutions square measure sometimes developed and hosted on the customer s premises, square measure typically advanced, and their operations can take hours to execute. Cloud computing provides a noteworthy model for analytics, wherever solutions is hosted on the Cloud and consumed by customers during a pay-as-you-go fashion. For this delivery model to become reality, however, many technical issues should be self-addressed, like knowledge management, tuning of models, privacy, knowledge quality, and knowledge currency. This work highlights technical problems and surveys existing work on solutions to supply analytics capabilities for large knowledge on the Cloud. Considering the standard analytics work flow bestowed in Fig. 1, we have a tendency to specialise in key problems within the phases of AN analytics answer. With huge knowledge it's evident that a lot of of the challenges of Cloud analytics concern knowledge management, integration, and process. Previous work has centered on problems like knowledge formats, data representation, storage, access, privacy, and knowledge quality. Section 3 presents existing work addressing these challenges on Cloud environments. In Section four, we have a tendency to elaborate on existing models to supply and measure knowledge models on the Cloud. Section five describes solutions for knowledge image and client interaction with analytics solutions provided by a Cloud. we have a tendency to conjointly highlight a number of the business challenges expose by this delivery model after we discuss service structures, service level agreements, and business models. Security is actually a key challenge for hosting analytics solutions on public Clouds. we have a tendency to take into account, however, that security is an in depth topic and would thus be a study of its own. Therefore, security and analysis of information correctness square measure out of scope of this survey. III. FRAME WORK A. Analytics as a Service As per IBM research the Data explosion is occurring in each the net and client facilities. having the ability to leverage such knowledge, that is usually unstructured, brings many opportunities to make new businesses and make existing ones a lot of economical. However, reworking this knowledge into one thing that has business worth needs process power and specialists from many domains. Having such resources in-house is typically overpriced, and so, outsourcing them may be a key mechanism to be ready to extract information from data in giant scale. The goal of this project is to research and develop technology to supply Analytics-as-a-Service (AaaS). many challenges ar concerned so as to create a platform to supply AaaS, that embrace SLA definitions, QoS observation techniques, pricing, analysis and management of unstructured knowledge, and business models.

In addition to structured and unstructured knowledge, there is additionally a 3rd category: semi-structured knowledge. Semi-structured knowledge is data that does not reside in an exceedingly electronic information service however that does have some structure properties that create it easier to investigate. samples of semi-structured knowledge may embody XML documents and NoSQL databases. Fig. 1. The Analytics-as-a-Service provisioning as proposed by IBM B. UNSTRUCTURED DATA MINING a. Unstructured Data Unstructured information files typically embody text and transmission content. Examples embody e-mail messages, data processing documents, videos, photos, audio files, displays, webpages and lots of different kinds of business documents. Note that whereas these kinds of files could have an interior structure, they're still thought of "unstructured" as a result of the info they contain does not work showing neatness in an exceedingly information. Experts estimate that eighty to ninety % of the info in any organization is unstructured. and also the quantity of unstructured information in enterprises is growing considerably typically persistently quicker than structured databases area unit growing. b. Unstructured Data and Big Data As mentioned on top of, unstructured knowledge is that the opposite of structured knowledge. Structured knowledge usually resides in an exceedingly electronic information service, and as a result, it's generally known as relative knowledge. this sort of knowledge will be simply mapped into pre-designed fields. as an example, a info designer could discovered fields for phone numbers, nada codes and mastercard ranges that settle for a definite number of digits. Structured knowledge has been or will be placed in fields like these. in contrast, unstructured knowledge isn't relative and does not work into these forms of pre-defined knowledge models. The term massive knowledge is closely related to unstructured knowledge. massive knowledge refers to very massive datasets that ar tough to investigate with ancient tools. massive knowledge will embody each structured and unstructured knowledge, however IDC estimates that 90 % of massive knowledge is unstructured knowledge. several of the tools designed to investigate massive knowledge will handle unstructured knowledge. c. Mining Unstructured Data Many organizations believe that their unstructured information stores embrace data that might facilitate them build higher business selections. sadly, it's usually terribly troublesome to investigate unstructured information. to assist with the matter, organizations have turned to variety of various package solutions designed to look unstructured information and extract necessary data. the first advantage of these tools is that the ability to reap unjust data that may facilitate a business achieve a competitive setting. Because the degree of unstructured information is growing thus speedily, several enterprises conjointly address technological solutions to assist them higher manage and store their unstructured information. These will embrace hardware

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 2 / JAN 2016 or package solutions that alter them to create the foremost economical use of their accessible cupboard space. d. Unstructured Data Management Organizations use of form of completely different computer code tools to assist them organize and manage unstructured information. These will embrace the following: Big data tools Software like Hadoop will method stores of each unstructured and structured information that square measure extraordinarily giant, terribly complicated and dynamical speedily. Business intelligence software Also referred to as atomic number 83, business intelligence may be a broad class of analytics, data processing, dashboards and coverage tools that facilitate corporations be of their structured and unstructured information for the aim of constructing higher business selections. Data integration tools These tools mix information from disparate sources in order that they will be viewed or analyzed from one application. they generally embrace the aptitude to unify structured and unstructured information. Document management systems Also referred to as enterprise content management systems, a DMS will track, store and share unstructured information that's saved within the variety of document files. Information management solutions This type of computer code tracks structured and unstructured enterprise information throughout its lifecycle. Search and assortment tools These tools retrieve info from unstructured information files like documents, websites and photos. IV. CONCLUSION By the current proceeding of big transactions, and those showing us like the necessity of Big Data. Because of these things only maximum of the business transactions modelized in to digital transactions. The users are increasing day by day because of this easiness of the digitalized systems. When compared all the things with Big Data the Big Data has tremendous advantages and at the same time it has some challenges. Before Big Data the data mining techniques and KDD process were designed to mainly work on Schema based databases. What ever the things going on now those are forced the data tobe multifaced and there is no schema these are nothing but called as NoSQL databases.that is fine to here but when you want to work with these NoSQL databases there is less number of tools available for working on these type of databases. Here we proposed a tool called as Analytics-as-a-Service (AaaS) tool which is much useful for performing mining and analytics on multiple data sources. Currently our tool supports the Data bases like NoSQL, SQL, Tag and Flat file Storage and Text documents etc. here we are doing the work for proposal, design and development of the framework. REFERENCES [1] D. KUONEN, Challenges in Bioinformatics for Statistical Data Miners, Bulletin of the Swiss Statistical Society, Vol. 46 (October 2003), pp. 10-17. [2] J. Y. HSU, AND W. YIH, Template-Based Information Mining from HTML Documents, American Association for Artificial Intelligence, July 1997. [3] M. DELGADO, M. MARTÍN-BAUTISTA, D. SÁNCHEZ, AND M.VILA, Mining Text Data: Special Features and Patterns, Pattern Detection and Discovery, Lecture Notes in Computer Science, 2002, Volume 2447/2002, 175-186, DOI: 10.1007/3-540-45728-3_11. [4] Q. ZHAO AND S. S. BHOWMICK, Association Rule Mining: A Survey, Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116, 2003. [5] W. ABRAMOWICZ, T. KACZMAREK, AND M. KOWALKIEWICZ, Supporting Topic Map Creation Using Data Mining Techniques, Australasian Journal of Information Systems, Special Issue 2003/2004, Vol 11, No 1. [6] B. JANET, AND A. V. REDDY, Cube index for unstructured text analysis and mining, In Proceedings of the 2011 International Conference on Communication, Computing & Security (ICCCS '11). ACM, New York, NY, USA, 397-402. [7] L. HAN, T. O. SUZEK, Y. WANG, AND S. H. BRYANT, The Textmining based PubChem Bioassay neighboring

analysis, BMC Bioinformatics 2010, 11:549 doi:10.1186/1471-2105-11-549 [8] L. DEY, AND S. K. M. HAQUE, Studying the effects of noisy text on text mining applications, In Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data (AND '09). ACM, New York, NY, USA, 107-114. [9] A. BALINSKY, H. BALINSKY, AND S. SIMSKE, On the Helmholtz Principle for Data Mining, Hewlett-Packard Development Company, L.P. [10] S. GODBOLE, I. BHATTACHARYA, A. GUPTA, AND A. VERMA, Building re-usable dictionary repositories for real-world text mining, In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, 1189-1198. [11] R. FELDMAN, M. FRESKO, H. HIRSH, Y. AUMANN, O. LIPHSTAT, Y. SCHLER, AND M. RAJMAN, Knowledge Management: A Text Mining Approach, Proc. of the 2nd Int. Conf. on Practical Aspects of Knowledge Management (PAKM98), Basel, Switzerland, 29-30 Oct. 1998. [12] R. FELDMAN, M. FRESKO, Y. KINAR, Y. LINDELL, O. LIPHSTAT, M. RAJMAN, Y. SCHLER, AND O. ZAMIR, Text mining at the term level, Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98) [13] J. C. SCHOLTES, Text-Mining: The next step in search technology, DESI-III Workshop Barcelona, Monday June 8, 2009. [14] J. LEE, D. GROSSMAN, O. FRIEDER, AND M. C. MCCABE, Integrating structured data and text: a multi-dimensional approach, Proc. of Information Technology: Coding and Computing, 2000. International Conference on, vol., no., pp.264-269, 2000. [15] V. GUPTA AND G. S. LEHAL, A Survey of Text Mining Techniques and Applications, Journal of Emerging Technologies in Web Intelligence, VOL. 1, NO. 1, AUGUST 2009.