A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing

Size: px
Start display at page:

Download "A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing"

Transcription

1 A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Of Masters of Science in Software Engineering At the College of Computer and Information Sciences At Prince Sultan University By: Mashail A. Alswilmi May, 2015

2 A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing By Mashail A. Alswilmi This thesis was defended on 25 th May 2015 Supervisor: Prof. Dr. Ajantha Dahanayake Members of the Exam Committee Prof. Dr. Ajantha Dahanayake Dr. Areej Alwabil Dr. Sarab AlMuhaideb Chair Member Member 2

3 ACKNOWLEDGMENTS First and foremost, praises and thanks to Allah, for his showers of blessings throughout my research work and to complete it successfully. I would like to express my deep and sincere gratitude to my research supervisor Prof. Ajantha Dahanayake for the continuous support of my master study and research, for giving me the opportunity to do research, providing invaluable guidance and for her patience, motivation, enthusiasm, and immense knowledge throughout this research. She has taught me the methodology to carry out the research and to present it as clearly as possible. It was a great privilege and honor to work and study under her guidance. Her dynamism, vision, sincerity and motivation have deeply inspired me and I am extremely grateful for what she has offered me. I am extremely grateful to my parents for their love, prayers, caring and sacrifices for educating and preparing me for my future. I am very much thankful to my husband and my son for their love, understanding, prayers and continuing support to complete this research. Also I wish to express thanks to my sisters, and brothers for their support and valuable prayers. Finally, my thanks extends to all those individuals who supported me to complete the research project, either directly or indirectly. I

4 Abstract This master s thesis utilizes a decision-back concept to optimize the process of social media data collection. Leveraging this type of Big Data extends the requirements of traditional data capturing techniques, due to their large volume, velocity, variety, and veracity. Comprehensive analysis of the properties of the problem at hand and determining the analyzing needs upfront for the data collection, eliminates the chance of being overwhelmed by masses of irrelevant data, and helps users and businesses to generate management decisions and answer mission critical questions in an efficient and timely manner. Therefore, this master s thesis has developed an architecture of a requirements acquisition tool that applies a decision-back approach to capture social media data analyzing requirements. The tool captures the requirements by providing a set of questions in multiple phases. In the first phase: Problem Domain set of questions; the system is analyzing the user answers by using NLP technique to extract keywords, time, and location constraints. Then with the second phase: Data Source set of questions; the system is analyzing user s selections by using data source recommendation system to recommend the most suitable data source. Within the final phase: Analytical Tool set of questions; the system is analyzing user s selections by using an analytical tool recommendation system to recommend the most suitable analytical tool. The tool outputs are: keywords, time and location constraints, recommended data source, and recommended analytical tool. This tool is validated for correctness and efficiency quality factors, through performing an experiment that compares data collection for social media analytics with and without the use of the tool. The experiment proved that the correctness and the efficiency average rate of improvements increased after using the tool. The main contribution of this research is the design of a value-added and well-defined process to capture social media data analyzing requirements upfront for the data collection to accelerate the analytics tasks. The requirements acquisition tool also contributes to: 1) Requirements engineering field, by building a tool that helps the user captures his requirements prior to data collection process during the social media data Analytic and 2) Software engineering field, by providing a II

5 user-centered solution that captures the user s social media data analyzing needs within a user friendly environment. III

6 ملخص البحث تستخدم رسالة الماجستير مفهوم "القرار التابع للنتيجة "Decision-back لتحسين عملية جمع بيانات وساي ل الا علام الاجتماعية. لا ن الاستفادة من هذا النوع من البيانات الضخمة يتجاوز ما توفره تقنيات جمع البيانات التقليدية نظرا لضخامة كميتها و سرعتها و تنوعها ومدى صحتها. ا ن التحليل الشامل على طريقة "القرار التابع للنتيجة" عن طريق تحليل خصاي ص المشكلة الحالية وتحديد احتياجات التحليل قبل البدء بجمع البيانات سوف يقلل من فرص الانغمار في كتل من البيانات غير ذات العلاقة ومساعدة المستخدمين والشركات لاتخاذ قرارات ا دارية والا جابة على ا سي لة المهمات الحاسمة بطريقة فعالة وفي الوقت المناسب لذلك رسالة الماجستير هذه تطور بنية ا داة جمع المتطلبات و التي تطبق مفهوم "القرار التابع للنتيجة" لجمع بيانات وساي ل الا علام الاجتماعية. الا داة تجمع المتطلبات بمجموعة من الا سي لة على عدة مراحل المرحلة الا ولى: مجموعة ا سي لة مجال المشكلة توفر الا ساس لتحليل ا جابات المستخدم بطريقة "معالجة اللغة الطبيعية "NLP لاستخراج مفاتيح البحث و قيود الوقت والمكان. المرحلة الثانية: مجموعة ا سي لة مصدر البيانات حيث يتم تحليل اختيارات المستخدم عن طريق نظام توصية مصدر البيانات. المرحلة الا خيرة: مجموعة ا سي لة ا دوات تحليل البيانات حيث يتم تحليل اختيارات المستخدم عن طريق نظام توصية ا داة تحليل البيانات. مخرجات الا داة هي: مفاتيح البحث وقيود الوقت والمكان مصدر البيانات الموصى به و ا داة التحليل الموصى بها. تم التحقق من فعالية هذه الا داة بالنسبة لعاملي الملاي مة والكفاءة من خلال ا جراء تجربة تقارن تحليل بيانات وساي ل الا علام الاجتماعية مع وبدون استخدام الا داة. ا ثبتت التجربة ا ن متوسط معدل تحسن عامل الملاي مة و الكفاءة ارتفع بعد استخدام الا داة. المساهمة الري يسية لهذا البحث هو تصميم عملية ذات قيمة مضافة واضحة المعالم لجمع متطلبات تحليل بيانات وساي ل الا علام الاجتماعية قبل جمع البيانات لتسريع مهام التحليل. كما تساهم ا داة اكتساب المتطلبات ا يضا في: ۱) مجال هندسة المتطلبات للبرمجيات من خلال بناء الا داة التي توفر مجموعة من الا سي لة التي تساعد المستخدم على التقاط احتياجاته قبل عملية جمع البيانات خلال تحليلات بيانات وساي ل الا علام الاجتماعية. ۲) مجال هندسة البرمجيات من خلال توفير حل محوره المستخدم و الذي يخدم احتياجات المستخدمين في تحليل بيانات وساي ط الا علام الاجتماعية في بيي ة سهلة ومريحة. IV

7 Table of Contents Acknowledgment... I Abstract... II Abstract in Arabic... IV Table of Contents...V List of Figures... XII List of Tables... XV List of Appendix Figures... XVI List of Appendix Tables...XVII List of Abbreviations... XVIII Chapter 1: Introduction Introduction Motivation Definition of Big Data Social Media Big Data Definition of Social Media Data Capturing and Analyzing Challenges of Social Media Requirements Engineering for Social Media Big Data Analytics Problem Statement Research Questions and Objectives V

8 1.8. Scope of the Thesis Related Published Paper Outline of the Thesis Chapter 2: Research Methods Research Methods Research Design Research Participants Research Techniques and Data Analysis Research Work Packages Research Instruments and Procedures Social Mention Trackur Chapter 3: Literture Review Big Data and Social Media Related Works Innovative Big Data and Data Capturing Approaches Literatures Analysis Innovative Social Media Data Collection and Analytics Approaches Literatures Analysis Software Engineering and Social Media Data Analytics Reverse Engineering Software Requirements Engineering Decision-back Data Capturing Approach Literatures Analysis VI

9 3.4. Related Tools and Environments for Social Media Data Analytics Hadoop The Big Data Management Framework Apache Hadoop Literatures Analysis Theories and Frameworks W*H Conceptual Model for Services Stanford CoreNLP Framework Summary Chapter 4: Social Media Types and Analytical Techniques Social Media Types Social Media Sites Categorizations Social Networking Microblogging Blogging Photo Sharing Video Sharing Social Media Sites Examples Facebook Twitter LinkedIn Google Summary of Social Media Sites Characteristics Social Media Analytical Tools VII

10 Social Listening Software/ Social Media Monitoring Software Social Conversation Software/ Social Media Engagement Software, Social Media Management Software Social Marketing Software/ Social Media Management Software Social Analytics Software Social Influencer Software Social Media Analytical Tools Examples Chapter 5: Decision-Back Data Capturing Approach for Social Media Data Backward Analysis Capturing Social Media Data Plan The Conceptual Model Identification of the Problem Domain Identification of the Data Source and the Analytical Tool W*H Conceptual Model for Services Defining the Social Media Data Capturing Model Tool Architecture and Design Tool Layers: Data Ingest Module. (Presentation Layer) Data Analysis Module (Middle Layer) Database Layer Tool s User Interface Design Part1: Problem Domain Part2: Data Source VIII

11 Part3: Analytical Tool Chapter 6: Case Study Stanford CoreNLP Tool Part of Speech Tagger Named Entity Recognizer Case 1: Start On-Line Business Project Problem Description Tool Application Case 2: A Saving Lincoln Movie Promotion Problem Description Tool Application Case 3: YouTube Music Channel Promotion Problem Description Tool Application Case 4: Middle East Respiratory Syndrome Awareness Problem Description Tool Application Case 5: DAESH Terrorist Movement Problem Description Tool Application Chapter 7: Tool Experiment and Validation Purpose of the Experiment Design and the Scope of the Experiment IX

12 7.3 Experiment Case 1: Start On-Line Business Project Without Tool With Tool Results Case 4: Middle East Respiratory Syndrome Awareness Without Tool With Tool Results Case 5: Start On-Line Business Project Without Tool With Tool Results Results Comparison Rate of Improvements (ROI) Unpaired T-Test Chapter 8: Discussion Analysis of Research Outcomes Resulting outcome of tool Tool Evaluation Case Studies Tool Validation Experiment Chapter 9: Conclusion and Future Work Conclusion X

13 9.2 Limitations Limited Number of Cases Limited Databases tools and Social Media sites Experiment and Validation Lack of Generalizability Limited Quality Factors Validation The Limited use of NLP Tool Limited Number of Cases in the Experiment Future Work Directions Refernces Appendices Appendix A. Glossary Appendix B. Hadoop Components Appendix C. Dimentions of the W*H Model for Services Appendix D. Analytical Tools Database Appendix E. Related published papers XI

14 List of Figures Figure 1.1: The Interest for the Term "Big Data" on Google Feb, Figure 1.2: Timeline of the Launch Dates of Many Major Social Networks Sites and Dates Until 2005 [21]. 8 Figure 1.3: Social Media Analytics Life Cycle Figure 2.1: Thesis Work Packages (WPs) Figure 3.1: Hortonworks Data Platform Figure 5.1: Decision Back Approach Applied in the Analysis Process Figure 5.2: The Conceptual Model of the Decision Back Capturing Approach Figure 5.3: The W*H Service Description Model [34] Figure 5.4: Refined Model for Decision-Back Approach for Capturing Social Media Data Analyzing Requirements Figure 5.5: The 4+1 View Model [80] Figure 5.6: Requirements Acquisition Tool Architecture for Decision-Back Approach for Capturing Social Media Data Analyzing Requirements Figure 5.7: NLP Analysis Subsystem Figure 5.8: Stanford CoreNLP Example Figure 5.9: Data Source Recommendation Subsystem Figure 5.10: Analytical Tool Recommendation Subsystem Figure 5.11: Tool Interface Design (Home Page) Figure 5.12: Tool Interface Design (Process Part1) Figure 5.13: Tool Interface Design (Process Part2) Figure 5.14: Tool Interface Design (Process Part3) Figure 5.15: Tool Interface Design (Result) Figure 6.1: Annotation Guidelines [96] XII

15 Figure 6.2: Part of Speech NLP - Case Figure 6.3: Named Entity Recognition NLP- Case Figure 6.4: Part of Speech NLP - Case Figure 6.5: Named Entity Recognition NLP - Case Figure 6.6: Part of Speech NLP - Case Figure 6.7: Named Entity Recognition NLP - Case Figure 6.8: Part of Speech NLP - Case Figure 6.9: Named Entity Recognition NLP - Case Figure 6.10: Part of Speech NLP - Case Figure 6.11: Named Entity Recognition NLP - Case Figure 7.1: Experiment Time Recording Log Figure 7.2: Snapshot of Trackur - Case 1 Without Tool Figure 7.3: Time Recording Log Sheet - Case 1 Without Tool Figure 7.4: Snapshot of Social Mention Case 1 With Tool Figure 7.5: Time Recording Log Sheet - Case 1 With Tool Figure 7.6: Quality Factors Comparison - Case Figure 7.7: Time Recording Log Sheet - Case 4 Without Tool Figure 7.8: Time Recording Log Sheet - Case 4 With Tool Figure 7.9: Quality Factors Comparison - Case Figure 7.10: Time Recording Log Sheet - Case 5 Without Tool Figure 7.11: Time Recording Log Sheet for Case 5 - With Tool Figure 7.12: Quality Factors Comparison - Case Figure 7.13: Quality Factors Comparison Chart Average Results XIII

16 Figure 8.1: Experiment Summary - Correctness Factor Comparison Figure 8.3: Experiment Summary - Effeciency Factor Comparison XIV

17 List of Tables Table 4.1: Summary of Social Media Sites Categorization Based on their Functionalities Table 4.2: Facebook Information [70] Table 4.3: Twitter Information [71] Table 4.4: LinkedIn Information [71] Table 4.5: Google+ Information [69] Table 4.6: Social Media Sites Characteristics Summary Table 4.7: Social Media Analytical Tools' Characteristics Example Table 7.1: Keywords Relevant Feeds Numbers - Case1 Without Tool Table 7.2: Keywords Relevant Feeds Numbers - Case 1 With Tool Table 7.3: Keywords Relevant Feeds Numbers - Case 4 Without Tool Table 7.4: Keywords Relevant Feeds Numbers Case 4 With Tool Table 7.5: Keywords Relevant Feeds Numbers - Case 5 Without Tool Table 7.6: Keywords Relevant Feeds Numbers Case 5 With Tool Table 7.7: Results Comparison Table 7.8: Correctness and Effeciency Rate of Improvments (ROI) Table 7.9: Experiment Data Sample XV

18 List of Appendix Figures Figure C 1: The W H Inquiry Based Conceptual Model for Services XVI

19 List of Appendix Tables Table B1: Hadoop Ecosystem Components [2][16] Table D1: Analytical Tools Database XVII

20 List of Abbreviations SDLC RE HDFS HDP YARN NOSQL SM ETL NLP MIDIS SNAP POS MOH MOI CCC SHC NER MERS-COV WRM Software Development Life Cycle Requirements Engineering Hadoop Distributed File System Hortonworks Data Platform Yet Another Resource Negotiator Not Only SQL Social Media Extract, Transform, and Load Natural Language Processing Multi-Intelligence Data Integration Services Stanford Network Analysis Platform Part Of Speech Ministry of Health Ministry Of Interior Command & Control Center Supreme Hajj Committee Named entity recognizer Middle East Respiratory Syndrome Coronavirus Wholesale Revenue Management XVIII

21 CHAPTER 1: INTRODUCTION

22 Chapter1: Introduction 1.1. Introduction Social Media (SM) Data is a representative of Big Data with its massive growth, its multiple channels and the enormous scope of its content and subject matter [1]. In the business world, SM is a powerful marketing tool, which is reshaping the way organizations engage with their customers and nurture their relationship into brands, products and services [1] [2]. It can be deployed to share news from a corporate event on a near real-time basis, create a buzz about a great new product within minutes of its launch, or it can be used to share the details of an unpleasant experience with customer services [3] [4]. It has many other innovative uses, such as political leaders who try to influence public opinion through them [5], creation of job applications, including organization of learning groups, online training sessions, and many others [2] [6]. When it comes to analyzing this powerful source of data, many organizations are concerned with the amount of collected data becoming so cumbersome that it is difficult to find the most valuable pieces of information. Many questions arise [3]: What if data volume gets so large and varied, that one does not know how to deal with it? How much data should be stored? All the data? Or only a subset? How much data should be analyzed? All the data? Or only a subset? How can one find out which data sets are really important? Until recently, organizations were limited to using subsets of their data, or they were constrained to simplistic analyses, because the sheer volumes of data overwhelmed their processing platforms [7]. There are two choices in this context [4] [8]: Incorporate massive data volumes in the analysis. The needed answers be better provided by analyzing all the data. High-performance technologies that extract value from massive amounts of data are here today. One approach is to apply high- 2 Page

23 Chapter1: Introduction performance analytics to analyze the massive amounts of data using technologies such as grid computing, in-database processing and in-memory analytics. Determine upfront which data is relevant. Traditionally, the trend has been to store everything (some call it data hoarding) and only when querying the data, the analyst discovers what is relevant. Then the ability to apply analytics on the front end determines the relevance based on the particular context. This type of analysis determines which data should be included in analytical processes and which can be placed in low-cost storage for later use if needed. Gathering massive amounts of data are proving to be impractical in a SM world that is expanding with infinite amounts of user generated data [5]. The consequence of this approach the case of SM data is that users are often unable to obtain specific relevant information from large-scale, high volatile, varied SM data collections. On the other hand, determining upfront the relevant data and specifying the analyzing requirements prior to data collection is the approach that should be followed in SM data analytics. It should not be a fishing expedition [8], because discovering patterns and information from this large, and complex collection of datasets is not only challenge, but also immensely time consuming. Due to the advances in data acquisition and business computing, today s datasets are becoming increasingly complex [8]. Some authors and data analysts such as [3], [8], [9] and many others, recommended Decision-back approach, which begins with answering the right questions that can give the road map for a more structured data collection and SM data analytics processes. Therefore, a more structured plan for capturing SM data analyzing requirements is needed to avoid a waste of time and resources in analyzing irrelevant data Motivation Analyzing SM Big Data with low latency update, almost in real-time, is a challenge in the near future [10]. It has special characteristics and requires continuous investigation and analysis [11], since in real-life cases it is important to know what is happening now and 3 Page

24 Chapter1: Introduction make decisions as quickly as possible [12]. Therefore, this thesis is motivated by the vision of ensuring access to the most valuable sources with minimal resources. It emphasizes the demand for a well-defined mechanism that aims to develop an effective process. This takes the maximum value from the available data that brings decision makers close to extracting value out of SM data. The need for a value-added and well-defined process to capture SM data analyzing requirements upfront for data collection is the main contribution of this research Definition of Big Data There is no perfect definition of Big Data. The term is used by many companies and literatures in varying definitions, and became more popular as a search keyword as shown in Figure 1.1 with Google s tool: Google Trend 1. Year Figure 1.1: The Interest for the Term "Big Data" on Google Feb, Big Data is defined by Gartner, the leading IT industry research group, as: Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process Page

25 Chapter1: Introduction optimization [13]. Gartner characterized Big Data by three main elements: volume, velocity, and ariety which are known as the 3V s model [14]: Volume: The size of data is very large and is in terabytes and petabytes. Velocity: A conventional understanding of velocity, typically considers how quickly the data is arriving and stored, and its associated rates of retrieval. Variety: It extends beyond the structured data, including unstructured data of all varieties: text, audio, video, posts, log files etc. Some researchers use a slightly modified 3V s model. Sam Madden describes Big Data as data that is too big, too fast, or too hard [15], where too hard refers to data that does not fit neatly into existing processing tools. Therefore too hard is very similar to data variety. Kaisler et al. define Big Data as the amount of data just beyond technology s capability to store, manage and process efficiently, but mention variety and velocity as additional characteristics [16]. Tim Kraska moves away from the 3V s, but still acknowledges, that Big Data is more than just volume. He describes Big Data as data for which the normal application of current technology doesn t enable users to obtain timely, cost-effective, and quality answers to data-driven questions [17]. However, he leaves which characteristics of this data go beyond normal application of current technology open [18]. IBM uses the 3V s model, but they introduced an additional V veracity : Veracity: Uncertainty of data, and data trust worthiness [19], signals that data keeps changing so one cannot trust the data on making decisions. The leader in analytics, Statistical Analysis System (SAS) Institute considers two additional dimensions [7]: 5 Page

26 Chapter1: Introduction Variability: In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Seasonal and event-triggered peak data loads can be challenging to manage which further intensifies with unstructured data. Complexity: Today's data comes from multiple sources, and it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or the data can quickly spin out of control. Overall the 4V s model or adaptations of it seems to be the most widely used and accepted description of what the term Big Data means [20]: Gartner 3V s model [14] + IBM additional V [19]. The model clearly describes characteristics that can be used to derive requirements for respective technologies and products. However, the primary concerns of this thesis are volume, velocity, veracity, and variety, as they are the main barriers to an interoperable analytic platform [20]. Handling the same volume might be a really hard problem if it is arriving fast and needs to be processed within seconds. Meanwhile, handling volume might get harder as the data set to be processed becomes unstructured. This adds the necessity to conduct pre-filtering steps so only the data that matter may enter to be processed and analyzed Social Media Big Data Definition of Social Media Nowadays, SM Networks such as MySpace, Facebook, Cyworld, Twitter, Instagram, Bebo, Snapchat, LinkedIn...etc. (see Figure. 1.2) have become increasingly popular, and they support a wide range of interests and practices. While their key technological features are fairly consistent, the cultures that emerge around Social Networks are varied. Most sites support the maintenance of pre-existing social networks, but others help strangers connect 6 Page

27 Chapter1: Introduction based on shared interests, political views, or activities. Some sites cater to diverse audiences, while others attract people based on a common language or shared racial, sexual, religious, or nationality-based identities. Sites also vary to the extent in which they incorporate new information and communication tools, such as mobile connectivity, blogging, and photo/video-sharing [21]. Many such social networks are extremely rich, and they typically contain a tremendous amount of content and linkage data which can be leveraged for analysis. The linkage data is essentially the graph structure of the social network and the communications between entities; whereas the content data contains the text, images and other multimedia data in the network [22] [23]. 7 Page

28 Chapter1: Introduction Figure 1.2: Timeline of the Launch Dates of Many Major Social Networks Sites and Dates Until 2005 [21] Social networks have been defined by [24] as web-based services that allow individuals to: (1) Construct a public or semi-public profile within a bounded system. (2) Articulate a list of other users with whom they share a connection. (3) View and traverse their list of connections and those made by others within the system. 8 Page

29 Chapter1: Introduction These connections or relationships are often displayed in a diagram, where entities are the points (also called nodes) and connections are the lines [25]. This definition is the one used to define SM sites in this thesis as it is widely used by many publications [24] [25] Data Capturing and Analyzing Challenges of Social Media Data generated from SM sites are different from conventional attribute-value data for classical data mining. SM data are largely user-generated content on SM sites [2]. SM data are typically Big Data with its special characteristics: volatile, noisy, distributed, unstructured, and vast. Main SM data challenges and issues as a Big Data representative are [26]: Privacy and Security: It is the most important issue with big data which is sensitive and includes conceptual, technical as well as legal significance. Data Access and sharing information: If data is to be used to make accurate decisions in time it becomes necessary that it should be available in a precise, complete and timely manner. This makes the Data management and governance process more complex adding the necessity to make Data open and available to government agencies in a standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements. Storage and Processing Issues: The available storage cannot accommodate the large amount of data which is being produced since: SM sites are themselves a great contributor along with the sensor devices. The processing of such enormous sets of data is also time consuming. To find suitable elements, all of the data set needs to be scanned, which is somewhat impossible. Analytical Issues: The main challenging questions are: 9 Page

30 Chapter1: Introduction (1) What if the data volume gets so unwieldy and varied that it is too difficult to manipulate? (2) Does all data need to be stored? (3) Does all data need to be analyzed? (4) Which data points are really important and for what reasons? (5) How can the data be used for the best advantage? Skill Requirements: Since Big Data is a fledgling and an emerging technology, it needs to attract organizations and youth with diverse new skill sets. These skills should not only be limited to technical ones but should also extend to research, analytical, interpretive and creative ones. Technical Challenges: (1) Fault Tolerance. (2) Scalability. (3) Quality of Data. (4) Heterogeneous Data. Indeed this thesis is motivated to address storage and processing issues, and analytical issues. Hence, the lack of an effective process for capturing SM data analyzing requirements in organizations adapting SM data solutions, can result in a negative impact in the financial as well as the industry s reputation and credibility. [27] Requirements Engineering for Social Media Big Data Analytics Requirements acquisition is being recognized as one of the most important albeit difficult phases in software engineering [28]. The literature repeatedly cites the role of well-defined requirements and requirements acquisition process in problem analysis and project management, as beneficial to software development throughout the life cycle: during design, coding, testing, maintenance and documentation of software [28] [29]. By recognizing SM Big Data collection and analytics similar to when designing IT software 10 Page

31 Chapter1: Introduction systems, it needs to invest in a Requirement Engineering approach that specifies the requirements prior to data collection and acquire the structure for gathering and collecting user s analytics requirements. Therefore, a tool architecture for requirements acquisition is the supporting software solution for the requirement engineering phase; for the SM data collection process. The guiding questions within the tool defines a structured process for system analysts to elicit the SM data analyzing requirements in a more effective and userfriendly manner. This approach to Requirements Engineering is one of the main principles of the Software Development Life Cycle (SDLC) [28] Problem Statement As a result of SM s rapid growth, recent years have seen an accelerating shift in different domains away from traditional channels such as print and broadcast to digital channels [1]. This transformation is being driven by the cost advantages and precision offered by digital platforms. In particular, the growing area of applications to manage the increasing volume and influence of SM [5]. Here are some statistics that offer an insight of the scope of the SM phenomenon [1]: 1.43 billion people worldwide visited a social networking sites last year (2014). Nearly 1 in 8 people worldwide have their own Facebook page. In 2014, one million new accounts were added to Twitter everyday Three million new blogs come online every month 65 percent of SM users say they use it to learn more about brands, products and services. The amount of information is continuing to increase at an enormous rate. Therefore, it is imperative that businesses, organizations, and associations find better approaches for information filtering and requirements capturing which would effectively decrease the information overload and improve the precision of analytics results [30]. All things considered SM data analytics can only be effective when the underlying data collection 11 Page

32 Chapter1: Introduction processes are able to leverage the relevant information to a particular domain [31]. It is critical to improve the usefulness of the analysis results and accelerate the SM data analytics. Therefore, a more powerful mechanism of data analytics requirements capturing guidance is needed to reduce both time and resource consumption when analyzing irrelevant data Research Questions and Objectives The study examines the decision-back approach for data capturing and its ability to be applied in capturing SM data analytics requirements. Therefore, the research question is formulated as follows: How can we define an architecture of a SM data requirements capturing tool, which accelerates the analytics tasks? This research defines a requirements acquisition tool architecture that captures SM data analytics requirements using decision back approach, which can play a role for SM data capturing process. Therefore, the main objectives of the study are: To examine SM sites, and determine what make them different from each other. To explore different SM data analytical tools, and their different techniques and main vendors. To examine the decision-back approach, and how it can leverage the SM data collection. To provide a well-planned tool architecture which can ease the analyst task on capturing SM analysis requirements. This tool is to apply the decision-back approach, through determining what the output requirements are and then filter the input data accordingly. To examine specific real-life cases from different problem domains using this tool to prove its worthiness. 12 Page

33 Chapter1: Introduction To validate the tool for its correctness and efficiency to ensure that it answers the research question. Therefore, the goal of this thesis is to define coherent processes to acquiring user s analyzing requirements. Thus, data analytics can be done in smaller time frames, allowing decisions to be made faster and with higher precision, by improving the current data capturing process from where one can draw accurate and useful conclusions. Then it will contribute to changing the way people are collecting analyzing requirements and subsequently transform decision making in a way that gives businesses the required advantage Scope of the Thesis Figure 1.3 is a simplified adaptation to SM analytics life cycle. As presented, it has four main stages: Data Collection, Data Processing, Data Storage, and Data Analysis [2]. The first stage: Data Collection, is the phase that is concerned with collecting SM data from different SM sources e.g. blogs, microblogs, etc. The goal of this thesis is to reduce this tremendous amount of data by identifying analysis requirements prior to data collection phase in SM analytics solution. This is inspired by and is similar to the primary phase of a Software Development Life Cycle (SDLC), which is Requirements Engineering (RE) [28]. This study follows RE in providing a well-defined tool architecture to capture SM analyzing requirements to improve the data collection process and accelerate the analytics tasks. Investigating the other phases of SDLC or Big Data analytics process, and examining other SM analytics problems are beyond the scope of this thesis. 13 Page

34 Chapter1: Introduction 1.9. Related Published Paper Figure 1.3: Social Media Analytics Life Cycle Published papers under this research [1] M. Alswilmi, A. Dahanayake, (2015), A Requirements Acquisition Tool Architecture for the Decision Back Approach for Social Media Big Data Capturing 5 th Advances in Software Engineering Conference, Prince Sultan University [2] M. Alswilmi, N. Alnajran, A. Dahanayake, (2014), Conceptual Framework for Big Data Analytics Solutions Proceedings of 24 th International Conference on Information Modelling and Knowledge Bases, EJC [3] M. Alswilmi, N. Alnajran, A. Dahanayake, (2013), Conceptual Framework for Big Data Analytics Solutions 2 nd Advances in Software Engineering Conference, Prince Sultan University Page

35 Chapter1: Introduction Outline of the Thesis Apart from the introduction, the remainder of this research is structured in to eight Chapters as outlined in: Research Method, Literature Review, SM Types and Analytical Techniques. Decision-back Data Capturing Approach for SM Data, Case Study, Tool Experiment and Validation, Discussion, and Conclusion and Future Work. Chapter 2 consists of the research methods. It provides a demonstration of the adapted methodology to conduct this research. The literature review in Chapter 3 discusses the related works including some available data reduction approaches, highlighting the innovativeness of this research. Additionally, an overview of the tools and frameworks that has been used to build the proposed tool is presented. Chapter 4 is discussing SM sites categorizations, and SM data analytical tools and different analytical techniques. The tool architecture is built and proposed as the core of this research in Chapter 5 along with supporting materials. Chapter 6 provides an application of the framework on five case studies from different problem domains. The framework has been validated through a prototype and an experiment to prove its correctness and efficiency in Chapter 7. Afterwards, the research analysis, a discussion on the tool prototype, and its evaluation and validation is provided in Chapter 8. Finally, Chapter 9 contains the conclusion, limitations of this research and future research directions. 15 Page

36 CHAPTER 2: RESEARCH METHODS

37 Chapter2: Research Method In this Chapter, the research methods followed within this study are outlined including the research design, participants, the techniques and data analysis methods used for research data analysis, and evaluation and validation approaches of the results. Moreover, the tools used to conduct the experimental work are also discussed Research Methods Research Design The major aim of this research is to apply the decision-back approach concept and to develop a requirement acquisition tool architecture to capture SM data analytics requirements. For this purpose, a qualitative and to some extent a quantitative research methods of investigation are chosen. The research is descriptive in nature and allows gathering a more in depth contextual understanding of the topic. Initially, the inductive approach is followed to analyze the qualitative data. The research begins from general information about the decision-back capturing approach and SM analyzing requirements, towards a more specific conclusions about how to apply this data capturing approach in SM Big Data analytics and a requirements acquisition tool architecture building Research Participants In order to maximize the validity of findings, the research uses a hybrid access type [32] to gather the relevant data. The primary source of data collection is through the use of indepth Internet access of SM sites and going through various scientific publications and white papers that is of interest to this research. Supporting data is collected through traditional access, observing several leading companies who benefit from Big Data and SM data capture and analysis technologies. Choice of companies is determined by the availability of information, reputation, and level of involvement in this field such as: IBM, Gartner, and SAS. 17 Page

38 Chapter2: Research Method Research Techniques and Data Analysis This is a mixed method research. It uses a variety of data collection techniques and analytical procedures to develop the foundation and to validate the tool architecture. In order to maximize the validity and trustworthiness of the findings, the research intended to use a hybrid access type to gather a richer set of data. The research advanced through multiple Work Packages (WP) to develop the tool architecture and the tool s prototype, as explained below (See Figure 2.1) Research Work Packages 1. WP1: Learning From Available Literatures 1.1. The primary source of data collection is through literature exploration and use of in-depth Internet access of SM sites and SM analytical tools, and perusing various relevant publications and white papers that discuss decision-back approach and SM data analytics Supporting data is collected through: Traditional access and conversations with interested participants in scientific conferences such as the European Japanese Conference In addition, observations were conducted involved documents reviews of data analytics solutions of several companies. 2. WP2: Developing the Conceptual Framework to Facilitate the Decision-Back SM Big Data Requirements Capturing 2.1. By examining the decision-back approach, and how it has been used in a variety of literatures, general questions from the article [8] have been used to identify the main concepts for using this approach for analyzing SM data Each concept in the framework is examined to describe how it can be beneficial for capturing SM data within more efficient timelines with less consumption of resources Connecting the framework with SM analytics life cycle and showing its relevancy to SDLC. 18 Page

39 Chapter2: Research Method 3. WP3: Fine-tuning the Conceptual Framework 3.1. Examining W*H Conceptual Model for Services [33], and customize it to be used to make the concepts in the framework more descriptive After relating the conceptual framework to the SDLC, and showing how it does work as a requirements acquisition phase in the Big Data analytics lifecycle, the requirements framework is built and its components are described accordingly. 4. WP4: Design of a tool s Prototype and the Component Architecture of a Tool that Supports the Decision-Back SM Big Data Requirements Capturing 4.1. Based on the requirements acquisition framework the tool architecture is designed Each model in the tool is described showing how it supports on capturing the SM data analyst s requirements. 5. WP5: Validation 5.1. Two types of validation tests are provided: theoretical and experimental Theoretical by showing some case studies from different problem domains Experimental by using actual analytical tools and measuring correctness and efficiency quality factors. 6. Wp6: Discussion, Conclusions and Future Research Directions 6.1. Discussing the worthiness of the provided tool architecture by comparing two results: analysis with the tool, and without the tool Conclude the research, discuss what its limitations are, and provide some future work directions for further improvements. 19 Page

40 Chapter2: Research Method Figure 2.1: Thesis Work Packages (WPs) 2.2. Research Instruments and Procedures This research attempts to build a requirements acquisition tool architecture for decisionback approach for capturing SM data. In order to validate this tool for correctness and efficiency, a prototype consisting of combination of tools need to be available to support the tool evaluation process is described below Social Mention Social Mention 2 is a free Social Media search and analysis platform that aggregates user generated content from across the universe into a single stream of information. It allows to Page

41 Chapter2: Research Method easily track and measure what people are saying about a person, a company, a new product, or any topic across the web's Social Media landscape in real-time. Social Mention monitors 100+ Social Media properties directly including: Twitter, Facebook, Friend Feed, YouTube, Digg, and Google Trackur Trackur 3 is a SM monitoring tool designed to assist companies and public relations PR professionals in tracking what is said about brands on the Internet. It scans hundreds of millions of web pages including news, blogs, videos, images, and forums and alerts the user to anything that matches the keywords monitored. It cost at least $97 a month and it offers a 10-day trial Page

42 CHAPTER 3: LITERATURE REVIEW

43 Chapter3: Literature Review 3.1. Big Data and Social Media Related Works Many software startups, research and development efforts are actively trying to harness the power of Big Data and SM, and create software with the potential to improve almost every aspect of human life. As these efforts continue to increase, full consideration needs to be given to the engineering aspects of Big Data and SM software. Since these systems exist to make predictions on complex and continuous massive datasets, they pose unique problems during collecting, processing, and analyzing data that needs to be delivered on time and within budget [34]. This research is focusing on SM requirements capturing approach, and studies that are discussing SM and data capturing approaches Innovative Big Data and Data Capturing Approaches IBM in [35], provides a means of classifying Big Data business problems according to a specified criteria. They have provided a pattern-based approach to facilitate the task of defining an overall Big Data architecture. Their idea of classifying data in order to map each problem with its suitable solution pattern provides an understanding of how a structured classification approach can lead to an analysis of the needs and a clear vision of what needs to be captured. Moreover, IBM has presented several real-life samples of Big Data case studies in [36]. The authors in [37], have studied different Big Data types and problems. They developed a conceptual framework that classifies Big Data problems according to the format of the data that must be processed. It maps the Big Data types with the appropriate combinations of data processing components. These components are the processing and analytic tools in order to generate useful patterns from this type of data. Constraint-Driven Data Mining technique proposed by [38] identifies the following classes of constraints: database constraints, pattern constraints, and time constraints. Database constraints are used to specify the source dataset. Pattern constraints specify which patterns are interesting and should be returned by the query. Finally, time constraints influence the 23 Page

44 Chapter3: Literature Review process of checking whether a given data/sequence contains a given pattern. However, data mining can only be applied to structured data that can be stored in a relational database [39], but this constraint-driven approach can provide an understanding of how these types of constraints can lead to more efficient data collection. The article [40] proposes a novel approach for consistent collective evaluation of multiple continuous queries for filtering two different types of data streams: a relational stream and an XML stream. The proposed approach commonly provides a region-based selection constructs: an attribute selection construct for relational queries and a path selection construct for XPath queries. Both collectively evaluate the selection predicates of the same attribute (path), based on the precomputed matching results of the queries in each of the disjoint regions divided by the selection predicates. The performance experiments show that the proposed approach is basically more efficient and stable at run-time. C. Anne and B. Boury in [41], proposed a framework facilitating the integration of heterogeneous unstructured and structured data, enabling Hard/Soft fusion and preparing for various analytics exploitation. It provides timely and relevant information to the analyst through intuitive search and discovery mechanisms. The authors described the design and implementation of a prototype for scalable Multi-Intelligence Data Integration Services (MIDIS), based on a flexible data integration approach, making use of Semantic Web and Big Data technologies. In [42], the white paper published by Intel walk through the challenge of extracting Big Data from multiple sources. It has explained how Hadoop infrastructure can contribute to the process of Big Data Extract, Transform & Load (ETL). It illustrates the process of loading different data formats from multiple data sources into Hadoop s warehouse from a technical point of view. However, they did not touch the idea of reducing useless data capture nor producing real-time management decisions. 24 Page

45 Chapter3: Literature Review Literatures Analysis From the IBM contributions [35], [36], in the field of Big Data, the idea of decision-back concept for a structured approach to SM data collection has emerged. Moreover [37], [40], [41], [42], discussed how the data classification according to some parameters can lead to better understanding of the problem at hand. While [40], discussed the constraint-driven approach and how can it provide an understanding of how these types of constraints can lead to more efficient data collection Innovative Social Media Data Collection and Analytics Approaches In [43] the authors present a multi-layered knowledge extraction approach of social networks with a comprehensive survey of relevant notions and techniques from multidisciplines. They analyzed the SM characteristics in a multi-mode, multi-layer knowledge dimensions using twitter as an example. They also improve the hyper graph model of social network behaviors based on the dimensions proposed in the model with a case study in Twitter illustrating the multi-dimensional relations between Twitter users. Their main focus was to improve the understanding of social network services. The authors in [23], studied the application of the concept and techniques of web mining for on-line social networks in terms of how to use web mining and a general process of its use for on-line social networks analysis. They discussed several challenges in this research area; for example: data sampling is a big issue when using web mining for on-line social networks analysis. In other web mining applications, data sampling is a simple task to reduce the amounts of data size. However, in on-line social networks analysis, it becomes a difficult task to select suitable samples representative of the real social networks. In [44], the authors empirically designed and developed the Real-time Twitter Trend Mining (RT²M) system which allows in real-time to: 1) crawl and store every textual data tweet produced in Twitter into a local database; 2) keep track of social issues by temporal Topic Modeling, and; 3) visualize mention-based user networks. They also demonstrated a 25 Page

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

5 Point Social Media Action Plan.

5 Point Social Media Action Plan. 5 Point Social Media Action Plan. Workshop delivered by Ian Gibbins, IG Media Marketing Ltd (ian@igmediamarketing.com, tel: 01733 241537) On behalf of the Chambers Communications Sector Introduction: There

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Leveraging Global Media in the Age of Big Data

Leveraging Global Media in the Age of Big Data WHITE PAPER Leveraging Global Media in the Age of Big Data Introduction Global media has the power to shape our perceptions, influence our decisions, and make or break business reputations. No one in the

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

Hadoop for Enterprises:

Hadoop for Enterprises: Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative

More information

Big Data Executive Survey

Big Data Executive Survey Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the

More information

ANALYTICS STRATEGY: creating a roadmap for success

ANALYTICS STRATEGY: creating a roadmap for success ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Data Warehousing in the Age of Big Data

Data Warehousing in the Age of Big Data Data Warehousing in the Age of Big Data Krish Krishnan AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD * PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an imprint of Elsevier

More information

Integrated Social and Enterprise Data = Enhanced Analytics

Integrated Social and Enterprise Data = Enhanced Analytics ORACLE WHITE PAPER, DECEMBER 2013 THE VALUE OF SOCIAL DATA Integrated Social and Enterprise Data = Enhanced Analytics #SocData CONTENTS Executive Summary 3 The Value of Enterprise-Specific Social Data

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

We are Big Data A Sonian Whitepaper

We are Big Data A Sonian Whitepaper EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM AdTheorent s Real-Time Learning Machine (RTLM) The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising Worldwide mobile advertising revenue is forecast to reach $11.4 billion

More information

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved. Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

The Lab and The Factory

The Lab and The Factory The Lab and The Factory Architecting for Big Data Management April Reeve DAMA Wisconsin March 11 2014 1 A good speech should be like a woman's skirt: long enough to cover the subject and short enough to

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Chartis RiskTech Quadrant for Model Risk Management Systems 2014

Chartis RiskTech Quadrant for Model Risk Management Systems 2014 Chartis RiskTech Quadrant for Model Risk Management Systems 2014 The RiskTech Quadrant is copyrighted June 2014 by Chartis Research Ltd. and is reused with permission. No part of the RiskTech Quadrant

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

How To Use Big Data To Help A Retailer

How To Use Big Data To Help A Retailer IBM Software Big Data Retail Capitalizing on the power of big data for retail Adopt new approaches to keep customers engaged, maintain a competitive edge and maximize profitability 2 Capitalizing on the

More information

How To Turn Big Data Into An Insight

How To Turn Big Data Into An Insight mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Turning Big Data into a Big Opportunity

Turning Big Data into a Big Opportunity Customer-Centricity in a World of Data: Turning Big Data into a Big Opportunity Richard Maraschi Business Analytics Solutions Leader IBM Global Media & Entertainment Joe Wikert General Manager & Publisher

More information

Beyond listening Driving better decisions with business intelligence from social sources

Beyond listening Driving better decisions with business intelligence from social sources Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social

More information

Addressing government challenges with big data analytics

Addressing government challenges with big data analytics IBM Software White Paper Government Addressing government challenges with big data analytics 2 Addressing government challenges with big data analytics Contents 2 Introduction 4 How big data analytics

More information

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES Techniques For Optimizing The Relationship Between Data Storage Space And Data Retrieval Time For Large Databases TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

ISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS)

ISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS) Nguyễn Thị Thúy Hoài, College of technology _ Danang University Abstract The threading development of IT has been bringing more challenges for administrators to collect, store and analyze massive amounts

More information

Apache Hadoop Patterns of Use

Apache Hadoop Patterns of Use Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

QUICK FACTS. Implementing a Big Data Solution on Behalf of a Media House TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

QUICK FACTS. Implementing a Big Data Solution on Behalf of a Media House TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES [ Communications, Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES Client Profile (parent company) Industry: Media, broadcasting and entertainment Revenue: Approximately $28 billion Employees:

More information

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment www.wipro.com Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment Pon Prabakaran Shanmugam, Principal Consultant, Wipro Analytics practice Table of Contents 03...Abstract

More information

Integrating SAP and non-sap data for comprehensive Business Intelligence

Integrating SAP and non-sap data for comprehensive Business Intelligence WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst

More information

Why big data? Lessons from a Decade+ Experiment in Big Data

Why big data? Lessons from a Decade+ Experiment in Big Data Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:

More information

Resource 2.19 An Introduction to Social Media for Business Types of social media

Resource 2.19 An Introduction to Social Media for Business Types of social media Page 1 of 5 An Introduction to Social Media for Business Social media is the general term used to describe the growing number of websites and networks whose users can submit and share content, communicate,

More information

Getting the most out of big data

Getting the most out of big data IBM Software White Paper Financial Services Getting the most out of big data How banks can gain fresh customer insight with new big data capabilities 2 Getting the most out of big data Banks thrive on

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013 Annex: Concept Note Friday Seminar on Emerging Issues Big Data for Policy, Development and Official Statistics New York, 22 February 2013 How is Big Data different from just very large databases? 1 Traditionally,

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

Big Data Discovery: Five Easy Steps to Value

Big Data Discovery: Five Easy Steps to Value Big Data Discovery: Five Easy Steps to Value Big data could really be called big frustration. For all the hoopla about big data being poised to reshape industries from healthcare to retail to financial

More information

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured

More information

Beyond the Single View with IBM InfoSphere

Beyond the Single View with IBM InfoSphere Ian Bowring MDM & Information Integration Sales Leader, NE Europe Beyond the Single View with IBM InfoSphere We are at a pivotal point with our information intensive projects 10-40% of each initiative

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

Teradata s Big Data Technology Strategy & Roadmap

Teradata s Big Data Technology Strategy & Roadmap Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any

More information

White Paper: Big Data and the hype around IoT

White Paper: Big Data and the hype around IoT 1 White Paper: Big Data and the hype around IoT Author: Alton Harewood 21 Aug 2014 (first published on LinkedIn) If I knew today what I will know tomorrow, how would my life change? For some time the idea

More information

How To Listen To Social Media

How To Listen To Social Media WHITE PAPER Turning Insight Into Action The Journey to Social Media Intelligence Turning Insight Into Action The Journey to Social Media Intelligence From Data to Decisions Social media generates an enormous

More information

hite News & Social Media Naroclips Instant Intelligence

hite News & Social Media Naroclips Instant Intelligence hite Papers News & Social Media at a Glance Naroclips Instant Intelligence The Essence & Types of Media In basic terms, media monitoring is the act of systematic reading, watching, listening and recording

More information

DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL 25 2013

DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL 25 2013 DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL 25 2013 BRAD HATHAWAY REGIONAL LEADER FOR INFORMATION MANAGEMENT AGENDA Major Technology Trends Focus on

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

How To Create A Data Science System

How To Create A Data Science System Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

The Rise of Industrial Big Data

The Rise of Industrial Big Data GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial

More information

Master big data to optimize the oil and gas lifecycle

Master big data to optimize the oil and gas lifecycle Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Big Data and Open Data

Big Data and Open Data Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of

More information

MCCM: An Approach to Transform

MCCM: An Approach to Transform MCCM: An Approach to Transform the Hype of Big Data into a Real Solution for Getting Better Customer Insights and Experience Muhammad Salman Sami Khan, Chief Research Analyst, Global Marketing Team, ZTEsoft

More information