A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing
|
|
- Reginald Mills
- 8 years ago
- Views:
Transcription
1 A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Of Masters of Science in Software Engineering At the College of Computer and Information Sciences At Prince Sultan University By: Mashail A. Alswilmi May, 2015
2 A Requirements Acquisition Tool Architecture for the Decision Back Approach to Social Media Big Data Capturing By Mashail A. Alswilmi This thesis was defended on 25 th May 2015 Supervisor: Prof. Dr. Ajantha Dahanayake Members of the Exam Committee Prof. Dr. Ajantha Dahanayake Dr. Areej Alwabil Dr. Sarab AlMuhaideb Chair Member Member 2
3 ACKNOWLEDGMENTS First and foremost, praises and thanks to Allah, for his showers of blessings throughout my research work and to complete it successfully. I would like to express my deep and sincere gratitude to my research supervisor Prof. Ajantha Dahanayake for the continuous support of my master study and research, for giving me the opportunity to do research, providing invaluable guidance and for her patience, motivation, enthusiasm, and immense knowledge throughout this research. She has taught me the methodology to carry out the research and to present it as clearly as possible. It was a great privilege and honor to work and study under her guidance. Her dynamism, vision, sincerity and motivation have deeply inspired me and I am extremely grateful for what she has offered me. I am extremely grateful to my parents for their love, prayers, caring and sacrifices for educating and preparing me for my future. I am very much thankful to my husband and my son for their love, understanding, prayers and continuing support to complete this research. Also I wish to express thanks to my sisters, and brothers for their support and valuable prayers. Finally, my thanks extends to all those individuals who supported me to complete the research project, either directly or indirectly. I
4 Abstract This master s thesis utilizes a decision-back concept to optimize the process of social media data collection. Leveraging this type of Big Data extends the requirements of traditional data capturing techniques, due to their large volume, velocity, variety, and veracity. Comprehensive analysis of the properties of the problem at hand and determining the analyzing needs upfront for the data collection, eliminates the chance of being overwhelmed by masses of irrelevant data, and helps users and businesses to generate management decisions and answer mission critical questions in an efficient and timely manner. Therefore, this master s thesis has developed an architecture of a requirements acquisition tool that applies a decision-back approach to capture social media data analyzing requirements. The tool captures the requirements by providing a set of questions in multiple phases. In the first phase: Problem Domain set of questions; the system is analyzing the user answers by using NLP technique to extract keywords, time, and location constraints. Then with the second phase: Data Source set of questions; the system is analyzing user s selections by using data source recommendation system to recommend the most suitable data source. Within the final phase: Analytical Tool set of questions; the system is analyzing user s selections by using an analytical tool recommendation system to recommend the most suitable analytical tool. The tool outputs are: keywords, time and location constraints, recommended data source, and recommended analytical tool. This tool is validated for correctness and efficiency quality factors, through performing an experiment that compares data collection for social media analytics with and without the use of the tool. The experiment proved that the correctness and the efficiency average rate of improvements increased after using the tool. The main contribution of this research is the design of a value-added and well-defined process to capture social media data analyzing requirements upfront for the data collection to accelerate the analytics tasks. The requirements acquisition tool also contributes to: 1) Requirements engineering field, by building a tool that helps the user captures his requirements prior to data collection process during the social media data Analytic and 2) Software engineering field, by providing a II
5 user-centered solution that captures the user s social media data analyzing needs within a user friendly environment. III
6 ملخص البحث تستخدم رسالة الماجستير مفهوم "القرار التابع للنتيجة "Decision-back لتحسين عملية جمع بيانات وساي ل الا علام الاجتماعية. لا ن الاستفادة من هذا النوع من البيانات الضخمة يتجاوز ما توفره تقنيات جمع البيانات التقليدية نظرا لضخامة كميتها و سرعتها و تنوعها ومدى صحتها. ا ن التحليل الشامل على طريقة "القرار التابع للنتيجة" عن طريق تحليل خصاي ص المشكلة الحالية وتحديد احتياجات التحليل قبل البدء بجمع البيانات سوف يقلل من فرص الانغمار في كتل من البيانات غير ذات العلاقة ومساعدة المستخدمين والشركات لاتخاذ قرارات ا دارية والا جابة على ا سي لة المهمات الحاسمة بطريقة فعالة وفي الوقت المناسب لذلك رسالة الماجستير هذه تطور بنية ا داة جمع المتطلبات و التي تطبق مفهوم "القرار التابع للنتيجة" لجمع بيانات وساي ل الا علام الاجتماعية. الا داة تجمع المتطلبات بمجموعة من الا سي لة على عدة مراحل المرحلة الا ولى: مجموعة ا سي لة مجال المشكلة توفر الا ساس لتحليل ا جابات المستخدم بطريقة "معالجة اللغة الطبيعية "NLP لاستخراج مفاتيح البحث و قيود الوقت والمكان. المرحلة الثانية: مجموعة ا سي لة مصدر البيانات حيث يتم تحليل اختيارات المستخدم عن طريق نظام توصية مصدر البيانات. المرحلة الا خيرة: مجموعة ا سي لة ا دوات تحليل البيانات حيث يتم تحليل اختيارات المستخدم عن طريق نظام توصية ا داة تحليل البيانات. مخرجات الا داة هي: مفاتيح البحث وقيود الوقت والمكان مصدر البيانات الموصى به و ا داة التحليل الموصى بها. تم التحقق من فعالية هذه الا داة بالنسبة لعاملي الملاي مة والكفاءة من خلال ا جراء تجربة تقارن تحليل بيانات وساي ل الا علام الاجتماعية مع وبدون استخدام الا داة. ا ثبتت التجربة ا ن متوسط معدل تحسن عامل الملاي مة و الكفاءة ارتفع بعد استخدام الا داة. المساهمة الري يسية لهذا البحث هو تصميم عملية ذات قيمة مضافة واضحة المعالم لجمع متطلبات تحليل بيانات وساي ل الا علام الاجتماعية قبل جمع البيانات لتسريع مهام التحليل. كما تساهم ا داة اكتساب المتطلبات ا يضا في: ۱) مجال هندسة المتطلبات للبرمجيات من خلال بناء الا داة التي توفر مجموعة من الا سي لة التي تساعد المستخدم على التقاط احتياجاته قبل عملية جمع البيانات خلال تحليلات بيانات وساي ل الا علام الاجتماعية. ۲) مجال هندسة البرمجيات من خلال توفير حل محوره المستخدم و الذي يخدم احتياجات المستخدمين في تحليل بيانات وساي ط الا علام الاجتماعية في بيي ة سهلة ومريحة. IV
7 Table of Contents Acknowledgment... I Abstract... II Abstract in Arabic... IV Table of Contents...V List of Figures... XII List of Tables... XV List of Appendix Figures... XVI List of Appendix Tables...XVII List of Abbreviations... XVIII Chapter 1: Introduction Introduction Motivation Definition of Big Data Social Media Big Data Definition of Social Media Data Capturing and Analyzing Challenges of Social Media Requirements Engineering for Social Media Big Data Analytics Problem Statement Research Questions and Objectives V
8 1.8. Scope of the Thesis Related Published Paper Outline of the Thesis Chapter 2: Research Methods Research Methods Research Design Research Participants Research Techniques and Data Analysis Research Work Packages Research Instruments and Procedures Social Mention Trackur Chapter 3: Literture Review Big Data and Social Media Related Works Innovative Big Data and Data Capturing Approaches Literatures Analysis Innovative Social Media Data Collection and Analytics Approaches Literatures Analysis Software Engineering and Social Media Data Analytics Reverse Engineering Software Requirements Engineering Decision-back Data Capturing Approach Literatures Analysis VI
9 3.4. Related Tools and Environments for Social Media Data Analytics Hadoop The Big Data Management Framework Apache Hadoop Literatures Analysis Theories and Frameworks W*H Conceptual Model for Services Stanford CoreNLP Framework Summary Chapter 4: Social Media Types and Analytical Techniques Social Media Types Social Media Sites Categorizations Social Networking Microblogging Blogging Photo Sharing Video Sharing Social Media Sites Examples Facebook Twitter LinkedIn Google Summary of Social Media Sites Characteristics Social Media Analytical Tools VII
10 Social Listening Software/ Social Media Monitoring Software Social Conversation Software/ Social Media Engagement Software, Social Media Management Software Social Marketing Software/ Social Media Management Software Social Analytics Software Social Influencer Software Social Media Analytical Tools Examples Chapter 5: Decision-Back Data Capturing Approach for Social Media Data Backward Analysis Capturing Social Media Data Plan The Conceptual Model Identification of the Problem Domain Identification of the Data Source and the Analytical Tool W*H Conceptual Model for Services Defining the Social Media Data Capturing Model Tool Architecture and Design Tool Layers: Data Ingest Module. (Presentation Layer) Data Analysis Module (Middle Layer) Database Layer Tool s User Interface Design Part1: Problem Domain Part2: Data Source VIII
11 Part3: Analytical Tool Chapter 6: Case Study Stanford CoreNLP Tool Part of Speech Tagger Named Entity Recognizer Case 1: Start On-Line Business Project Problem Description Tool Application Case 2: A Saving Lincoln Movie Promotion Problem Description Tool Application Case 3: YouTube Music Channel Promotion Problem Description Tool Application Case 4: Middle East Respiratory Syndrome Awareness Problem Description Tool Application Case 5: DAESH Terrorist Movement Problem Description Tool Application Chapter 7: Tool Experiment and Validation Purpose of the Experiment Design and the Scope of the Experiment IX
12 7.3 Experiment Case 1: Start On-Line Business Project Without Tool With Tool Results Case 4: Middle East Respiratory Syndrome Awareness Without Tool With Tool Results Case 5: Start On-Line Business Project Without Tool With Tool Results Results Comparison Rate of Improvements (ROI) Unpaired T-Test Chapter 8: Discussion Analysis of Research Outcomes Resulting outcome of tool Tool Evaluation Case Studies Tool Validation Experiment Chapter 9: Conclusion and Future Work Conclusion X
13 9.2 Limitations Limited Number of Cases Limited Databases tools and Social Media sites Experiment and Validation Lack of Generalizability Limited Quality Factors Validation The Limited use of NLP Tool Limited Number of Cases in the Experiment Future Work Directions Refernces Appendices Appendix A. Glossary Appendix B. Hadoop Components Appendix C. Dimentions of the W*H Model for Services Appendix D. Analytical Tools Database Appendix E. Related published papers XI
14 List of Figures Figure 1.1: The Interest for the Term "Big Data" on Google Feb, Figure 1.2: Timeline of the Launch Dates of Many Major Social Networks Sites and Dates Until 2005 [21]. 8 Figure 1.3: Social Media Analytics Life Cycle Figure 2.1: Thesis Work Packages (WPs) Figure 3.1: Hortonworks Data Platform Figure 5.1: Decision Back Approach Applied in the Analysis Process Figure 5.2: The Conceptual Model of the Decision Back Capturing Approach Figure 5.3: The W*H Service Description Model [34] Figure 5.4: Refined Model for Decision-Back Approach for Capturing Social Media Data Analyzing Requirements Figure 5.5: The 4+1 View Model [80] Figure 5.6: Requirements Acquisition Tool Architecture for Decision-Back Approach for Capturing Social Media Data Analyzing Requirements Figure 5.7: NLP Analysis Subsystem Figure 5.8: Stanford CoreNLP Example Figure 5.9: Data Source Recommendation Subsystem Figure 5.10: Analytical Tool Recommendation Subsystem Figure 5.11: Tool Interface Design (Home Page) Figure 5.12: Tool Interface Design (Process Part1) Figure 5.13: Tool Interface Design (Process Part2) Figure 5.14: Tool Interface Design (Process Part3) Figure 5.15: Tool Interface Design (Result) Figure 6.1: Annotation Guidelines [96] XII
15 Figure 6.2: Part of Speech NLP - Case Figure 6.3: Named Entity Recognition NLP- Case Figure 6.4: Part of Speech NLP - Case Figure 6.5: Named Entity Recognition NLP - Case Figure 6.6: Part of Speech NLP - Case Figure 6.7: Named Entity Recognition NLP - Case Figure 6.8: Part of Speech NLP - Case Figure 6.9: Named Entity Recognition NLP - Case Figure 6.10: Part of Speech NLP - Case Figure 6.11: Named Entity Recognition NLP - Case Figure 7.1: Experiment Time Recording Log Figure 7.2: Snapshot of Trackur - Case 1 Without Tool Figure 7.3: Time Recording Log Sheet - Case 1 Without Tool Figure 7.4: Snapshot of Social Mention Case 1 With Tool Figure 7.5: Time Recording Log Sheet - Case 1 With Tool Figure 7.6: Quality Factors Comparison - Case Figure 7.7: Time Recording Log Sheet - Case 4 Without Tool Figure 7.8: Time Recording Log Sheet - Case 4 With Tool Figure 7.9: Quality Factors Comparison - Case Figure 7.10: Time Recording Log Sheet - Case 5 Without Tool Figure 7.11: Time Recording Log Sheet for Case 5 - With Tool Figure 7.12: Quality Factors Comparison - Case Figure 7.13: Quality Factors Comparison Chart Average Results XIII
16 Figure 8.1: Experiment Summary - Correctness Factor Comparison Figure 8.3: Experiment Summary - Effeciency Factor Comparison XIV
17 List of Tables Table 4.1: Summary of Social Media Sites Categorization Based on their Functionalities Table 4.2: Facebook Information [70] Table 4.3: Twitter Information [71] Table 4.4: LinkedIn Information [71] Table 4.5: Google+ Information [69] Table 4.6: Social Media Sites Characteristics Summary Table 4.7: Social Media Analytical Tools' Characteristics Example Table 7.1: Keywords Relevant Feeds Numbers - Case1 Without Tool Table 7.2: Keywords Relevant Feeds Numbers - Case 1 With Tool Table 7.3: Keywords Relevant Feeds Numbers - Case 4 Without Tool Table 7.4: Keywords Relevant Feeds Numbers Case 4 With Tool Table 7.5: Keywords Relevant Feeds Numbers - Case 5 Without Tool Table 7.6: Keywords Relevant Feeds Numbers Case 5 With Tool Table 7.7: Results Comparison Table 7.8: Correctness and Effeciency Rate of Improvments (ROI) Table 7.9: Experiment Data Sample XV
18 List of Appendix Figures Figure C 1: The W H Inquiry Based Conceptual Model for Services XVI
19 List of Appendix Tables Table B1: Hadoop Ecosystem Components [2][16] Table D1: Analytical Tools Database XVII
20 List of Abbreviations SDLC RE HDFS HDP YARN NOSQL SM ETL NLP MIDIS SNAP POS MOH MOI CCC SHC NER MERS-COV WRM Software Development Life Cycle Requirements Engineering Hadoop Distributed File System Hortonworks Data Platform Yet Another Resource Negotiator Not Only SQL Social Media Extract, Transform, and Load Natural Language Processing Multi-Intelligence Data Integration Services Stanford Network Analysis Platform Part Of Speech Ministry of Health Ministry Of Interior Command & Control Center Supreme Hajj Committee Named entity recognizer Middle East Respiratory Syndrome Coronavirus Wholesale Revenue Management XVIII
21 CHAPTER 1: INTRODUCTION
22 Chapter1: Introduction 1.1. Introduction Social Media (SM) Data is a representative of Big Data with its massive growth, its multiple channels and the enormous scope of its content and subject matter [1]. In the business world, SM is a powerful marketing tool, which is reshaping the way organizations engage with their customers and nurture their relationship into brands, products and services [1] [2]. It can be deployed to share news from a corporate event on a near real-time basis, create a buzz about a great new product within minutes of its launch, or it can be used to share the details of an unpleasant experience with customer services [3] [4]. It has many other innovative uses, such as political leaders who try to influence public opinion through them [5], creation of job applications, including organization of learning groups, online training sessions, and many others [2] [6]. When it comes to analyzing this powerful source of data, many organizations are concerned with the amount of collected data becoming so cumbersome that it is difficult to find the most valuable pieces of information. Many questions arise [3]: What if data volume gets so large and varied, that one does not know how to deal with it? How much data should be stored? All the data? Or only a subset? How much data should be analyzed? All the data? Or only a subset? How can one find out which data sets are really important? Until recently, organizations were limited to using subsets of their data, or they were constrained to simplistic analyses, because the sheer volumes of data overwhelmed their processing platforms [7]. There are two choices in this context [4] [8]: Incorporate massive data volumes in the analysis. The needed answers be better provided by analyzing all the data. High-performance technologies that extract value from massive amounts of data are here today. One approach is to apply high- 2 Page
23 Chapter1: Introduction performance analytics to analyze the massive amounts of data using technologies such as grid computing, in-database processing and in-memory analytics. Determine upfront which data is relevant. Traditionally, the trend has been to store everything (some call it data hoarding) and only when querying the data, the analyst discovers what is relevant. Then the ability to apply analytics on the front end determines the relevance based on the particular context. This type of analysis determines which data should be included in analytical processes and which can be placed in low-cost storage for later use if needed. Gathering massive amounts of data are proving to be impractical in a SM world that is expanding with infinite amounts of user generated data [5]. The consequence of this approach the case of SM data is that users are often unable to obtain specific relevant information from large-scale, high volatile, varied SM data collections. On the other hand, determining upfront the relevant data and specifying the analyzing requirements prior to data collection is the approach that should be followed in SM data analytics. It should not be a fishing expedition [8], because discovering patterns and information from this large, and complex collection of datasets is not only challenge, but also immensely time consuming. Due to the advances in data acquisition and business computing, today s datasets are becoming increasingly complex [8]. Some authors and data analysts such as [3], [8], [9] and many others, recommended Decision-back approach, which begins with answering the right questions that can give the road map for a more structured data collection and SM data analytics processes. Therefore, a more structured plan for capturing SM data analyzing requirements is needed to avoid a waste of time and resources in analyzing irrelevant data Motivation Analyzing SM Big Data with low latency update, almost in real-time, is a challenge in the near future [10]. It has special characteristics and requires continuous investigation and analysis [11], since in real-life cases it is important to know what is happening now and 3 Page
24 Chapter1: Introduction make decisions as quickly as possible [12]. Therefore, this thesis is motivated by the vision of ensuring access to the most valuable sources with minimal resources. It emphasizes the demand for a well-defined mechanism that aims to develop an effective process. This takes the maximum value from the available data that brings decision makers close to extracting value out of SM data. The need for a value-added and well-defined process to capture SM data analyzing requirements upfront for data collection is the main contribution of this research Definition of Big Data There is no perfect definition of Big Data. The term is used by many companies and literatures in varying definitions, and became more popular as a search keyword as shown in Figure 1.1 with Google s tool: Google Trend 1. Year Figure 1.1: The Interest for the Term "Big Data" on Google Feb, Big Data is defined by Gartner, the leading IT industry research group, as: Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process Page
25 Chapter1: Introduction optimization [13]. Gartner characterized Big Data by three main elements: volume, velocity, and ariety which are known as the 3V s model [14]: Volume: The size of data is very large and is in terabytes and petabytes. Velocity: A conventional understanding of velocity, typically considers how quickly the data is arriving and stored, and its associated rates of retrieval. Variety: It extends beyond the structured data, including unstructured data of all varieties: text, audio, video, posts, log files etc. Some researchers use a slightly modified 3V s model. Sam Madden describes Big Data as data that is too big, too fast, or too hard [15], where too hard refers to data that does not fit neatly into existing processing tools. Therefore too hard is very similar to data variety. Kaisler et al. define Big Data as the amount of data just beyond technology s capability to store, manage and process efficiently, but mention variety and velocity as additional characteristics [16]. Tim Kraska moves away from the 3V s, but still acknowledges, that Big Data is more than just volume. He describes Big Data as data for which the normal application of current technology doesn t enable users to obtain timely, cost-effective, and quality answers to data-driven questions [17]. However, he leaves which characteristics of this data go beyond normal application of current technology open [18]. IBM uses the 3V s model, but they introduced an additional V veracity : Veracity: Uncertainty of data, and data trust worthiness [19], signals that data keeps changing so one cannot trust the data on making decisions. The leader in analytics, Statistical Analysis System (SAS) Institute considers two additional dimensions [7]: 5 Page
26 Chapter1: Introduction Variability: In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Seasonal and event-triggered peak data loads can be challenging to manage which further intensifies with unstructured data. Complexity: Today's data comes from multiple sources, and it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or the data can quickly spin out of control. Overall the 4V s model or adaptations of it seems to be the most widely used and accepted description of what the term Big Data means [20]: Gartner 3V s model [14] + IBM additional V [19]. The model clearly describes characteristics that can be used to derive requirements for respective technologies and products. However, the primary concerns of this thesis are volume, velocity, veracity, and variety, as they are the main barriers to an interoperable analytic platform [20]. Handling the same volume might be a really hard problem if it is arriving fast and needs to be processed within seconds. Meanwhile, handling volume might get harder as the data set to be processed becomes unstructured. This adds the necessity to conduct pre-filtering steps so only the data that matter may enter to be processed and analyzed Social Media Big Data Definition of Social Media Nowadays, SM Networks such as MySpace, Facebook, Cyworld, Twitter, Instagram, Bebo, Snapchat, LinkedIn...etc. (see Figure. 1.2) have become increasingly popular, and they support a wide range of interests and practices. While their key technological features are fairly consistent, the cultures that emerge around Social Networks are varied. Most sites support the maintenance of pre-existing social networks, but others help strangers connect 6 Page
27 Chapter1: Introduction based on shared interests, political views, or activities. Some sites cater to diverse audiences, while others attract people based on a common language or shared racial, sexual, religious, or nationality-based identities. Sites also vary to the extent in which they incorporate new information and communication tools, such as mobile connectivity, blogging, and photo/video-sharing [21]. Many such social networks are extremely rich, and they typically contain a tremendous amount of content and linkage data which can be leveraged for analysis. The linkage data is essentially the graph structure of the social network and the communications between entities; whereas the content data contains the text, images and other multimedia data in the network [22] [23]. 7 Page
28 Chapter1: Introduction Figure 1.2: Timeline of the Launch Dates of Many Major Social Networks Sites and Dates Until 2005 [21] Social networks have been defined by [24] as web-based services that allow individuals to: (1) Construct a public or semi-public profile within a bounded system. (2) Articulate a list of other users with whom they share a connection. (3) View and traverse their list of connections and those made by others within the system. 8 Page
29 Chapter1: Introduction These connections or relationships are often displayed in a diagram, where entities are the points (also called nodes) and connections are the lines [25]. This definition is the one used to define SM sites in this thesis as it is widely used by many publications [24] [25] Data Capturing and Analyzing Challenges of Social Media Data generated from SM sites are different from conventional attribute-value data for classical data mining. SM data are largely user-generated content on SM sites [2]. SM data are typically Big Data with its special characteristics: volatile, noisy, distributed, unstructured, and vast. Main SM data challenges and issues as a Big Data representative are [26]: Privacy and Security: It is the most important issue with big data which is sensitive and includes conceptual, technical as well as legal significance. Data Access and sharing information: If data is to be used to make accurate decisions in time it becomes necessary that it should be available in a precise, complete and timely manner. This makes the Data management and governance process more complex adding the necessity to make Data open and available to government agencies in a standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements. Storage and Processing Issues: The available storage cannot accommodate the large amount of data which is being produced since: SM sites are themselves a great contributor along with the sensor devices. The processing of such enormous sets of data is also time consuming. To find suitable elements, all of the data set needs to be scanned, which is somewhat impossible. Analytical Issues: The main challenging questions are: 9 Page
30 Chapter1: Introduction (1) What if the data volume gets so unwieldy and varied that it is too difficult to manipulate? (2) Does all data need to be stored? (3) Does all data need to be analyzed? (4) Which data points are really important and for what reasons? (5) How can the data be used for the best advantage? Skill Requirements: Since Big Data is a fledgling and an emerging technology, it needs to attract organizations and youth with diverse new skill sets. These skills should not only be limited to technical ones but should also extend to research, analytical, interpretive and creative ones. Technical Challenges: (1) Fault Tolerance. (2) Scalability. (3) Quality of Data. (4) Heterogeneous Data. Indeed this thesis is motivated to address storage and processing issues, and analytical issues. Hence, the lack of an effective process for capturing SM data analyzing requirements in organizations adapting SM data solutions, can result in a negative impact in the financial as well as the industry s reputation and credibility. [27] Requirements Engineering for Social Media Big Data Analytics Requirements acquisition is being recognized as one of the most important albeit difficult phases in software engineering [28]. The literature repeatedly cites the role of well-defined requirements and requirements acquisition process in problem analysis and project management, as beneficial to software development throughout the life cycle: during design, coding, testing, maintenance and documentation of software [28] [29]. By recognizing SM Big Data collection and analytics similar to when designing IT software 10 Page
31 Chapter1: Introduction systems, it needs to invest in a Requirement Engineering approach that specifies the requirements prior to data collection and acquire the structure for gathering and collecting user s analytics requirements. Therefore, a tool architecture for requirements acquisition is the supporting software solution for the requirement engineering phase; for the SM data collection process. The guiding questions within the tool defines a structured process for system analysts to elicit the SM data analyzing requirements in a more effective and userfriendly manner. This approach to Requirements Engineering is one of the main principles of the Software Development Life Cycle (SDLC) [28] Problem Statement As a result of SM s rapid growth, recent years have seen an accelerating shift in different domains away from traditional channels such as print and broadcast to digital channels [1]. This transformation is being driven by the cost advantages and precision offered by digital platforms. In particular, the growing area of applications to manage the increasing volume and influence of SM [5]. Here are some statistics that offer an insight of the scope of the SM phenomenon [1]: 1.43 billion people worldwide visited a social networking sites last year (2014). Nearly 1 in 8 people worldwide have their own Facebook page. In 2014, one million new accounts were added to Twitter everyday Three million new blogs come online every month 65 percent of SM users say they use it to learn more about brands, products and services. The amount of information is continuing to increase at an enormous rate. Therefore, it is imperative that businesses, organizations, and associations find better approaches for information filtering and requirements capturing which would effectively decrease the information overload and improve the precision of analytics results [30]. All things considered SM data analytics can only be effective when the underlying data collection 11 Page
32 Chapter1: Introduction processes are able to leverage the relevant information to a particular domain [31]. It is critical to improve the usefulness of the analysis results and accelerate the SM data analytics. Therefore, a more powerful mechanism of data analytics requirements capturing guidance is needed to reduce both time and resource consumption when analyzing irrelevant data Research Questions and Objectives The study examines the decision-back approach for data capturing and its ability to be applied in capturing SM data analytics requirements. Therefore, the research question is formulated as follows: How can we define an architecture of a SM data requirements capturing tool, which accelerates the analytics tasks? This research defines a requirements acquisition tool architecture that captures SM data analytics requirements using decision back approach, which can play a role for SM data capturing process. Therefore, the main objectives of the study are: To examine SM sites, and determine what make them different from each other. To explore different SM data analytical tools, and their different techniques and main vendors. To examine the decision-back approach, and how it can leverage the SM data collection. To provide a well-planned tool architecture which can ease the analyst task on capturing SM analysis requirements. This tool is to apply the decision-back approach, through determining what the output requirements are and then filter the input data accordingly. To examine specific real-life cases from different problem domains using this tool to prove its worthiness. 12 Page
33 Chapter1: Introduction To validate the tool for its correctness and efficiency to ensure that it answers the research question. Therefore, the goal of this thesis is to define coherent processes to acquiring user s analyzing requirements. Thus, data analytics can be done in smaller time frames, allowing decisions to be made faster and with higher precision, by improving the current data capturing process from where one can draw accurate and useful conclusions. Then it will contribute to changing the way people are collecting analyzing requirements and subsequently transform decision making in a way that gives businesses the required advantage Scope of the Thesis Figure 1.3 is a simplified adaptation to SM analytics life cycle. As presented, it has four main stages: Data Collection, Data Processing, Data Storage, and Data Analysis [2]. The first stage: Data Collection, is the phase that is concerned with collecting SM data from different SM sources e.g. blogs, microblogs, etc. The goal of this thesis is to reduce this tremendous amount of data by identifying analysis requirements prior to data collection phase in SM analytics solution. This is inspired by and is similar to the primary phase of a Software Development Life Cycle (SDLC), which is Requirements Engineering (RE) [28]. This study follows RE in providing a well-defined tool architecture to capture SM analyzing requirements to improve the data collection process and accelerate the analytics tasks. Investigating the other phases of SDLC or Big Data analytics process, and examining other SM analytics problems are beyond the scope of this thesis. 13 Page
34 Chapter1: Introduction 1.9. Related Published Paper Figure 1.3: Social Media Analytics Life Cycle Published papers under this research [1] M. Alswilmi, A. Dahanayake, (2015), A Requirements Acquisition Tool Architecture for the Decision Back Approach for Social Media Big Data Capturing 5 th Advances in Software Engineering Conference, Prince Sultan University [2] M. Alswilmi, N. Alnajran, A. Dahanayake, (2014), Conceptual Framework for Big Data Analytics Solutions Proceedings of 24 th International Conference on Information Modelling and Knowledge Bases, EJC [3] M. Alswilmi, N. Alnajran, A. Dahanayake, (2013), Conceptual Framework for Big Data Analytics Solutions 2 nd Advances in Software Engineering Conference, Prince Sultan University Page
35 Chapter1: Introduction Outline of the Thesis Apart from the introduction, the remainder of this research is structured in to eight Chapters as outlined in: Research Method, Literature Review, SM Types and Analytical Techniques. Decision-back Data Capturing Approach for SM Data, Case Study, Tool Experiment and Validation, Discussion, and Conclusion and Future Work. Chapter 2 consists of the research methods. It provides a demonstration of the adapted methodology to conduct this research. The literature review in Chapter 3 discusses the related works including some available data reduction approaches, highlighting the innovativeness of this research. Additionally, an overview of the tools and frameworks that has been used to build the proposed tool is presented. Chapter 4 is discussing SM sites categorizations, and SM data analytical tools and different analytical techniques. The tool architecture is built and proposed as the core of this research in Chapter 5 along with supporting materials. Chapter 6 provides an application of the framework on five case studies from different problem domains. The framework has been validated through a prototype and an experiment to prove its correctness and efficiency in Chapter 7. Afterwards, the research analysis, a discussion on the tool prototype, and its evaluation and validation is provided in Chapter 8. Finally, Chapter 9 contains the conclusion, limitations of this research and future research directions. 15 Page
36 CHAPTER 2: RESEARCH METHODS
37 Chapter2: Research Method In this Chapter, the research methods followed within this study are outlined including the research design, participants, the techniques and data analysis methods used for research data analysis, and evaluation and validation approaches of the results. Moreover, the tools used to conduct the experimental work are also discussed Research Methods Research Design The major aim of this research is to apply the decision-back approach concept and to develop a requirement acquisition tool architecture to capture SM data analytics requirements. For this purpose, a qualitative and to some extent a quantitative research methods of investigation are chosen. The research is descriptive in nature and allows gathering a more in depth contextual understanding of the topic. Initially, the inductive approach is followed to analyze the qualitative data. The research begins from general information about the decision-back capturing approach and SM analyzing requirements, towards a more specific conclusions about how to apply this data capturing approach in SM Big Data analytics and a requirements acquisition tool architecture building Research Participants In order to maximize the validity of findings, the research uses a hybrid access type [32] to gather the relevant data. The primary source of data collection is through the use of indepth Internet access of SM sites and going through various scientific publications and white papers that is of interest to this research. Supporting data is collected through traditional access, observing several leading companies who benefit from Big Data and SM data capture and analysis technologies. Choice of companies is determined by the availability of information, reputation, and level of involvement in this field such as: IBM, Gartner, and SAS. 17 Page
38 Chapter2: Research Method Research Techniques and Data Analysis This is a mixed method research. It uses a variety of data collection techniques and analytical procedures to develop the foundation and to validate the tool architecture. In order to maximize the validity and trustworthiness of the findings, the research intended to use a hybrid access type to gather a richer set of data. The research advanced through multiple Work Packages (WP) to develop the tool architecture and the tool s prototype, as explained below (See Figure 2.1) Research Work Packages 1. WP1: Learning From Available Literatures 1.1. The primary source of data collection is through literature exploration and use of in-depth Internet access of SM sites and SM analytical tools, and perusing various relevant publications and white papers that discuss decision-back approach and SM data analytics Supporting data is collected through: Traditional access and conversations with interested participants in scientific conferences such as the European Japanese Conference In addition, observations were conducted involved documents reviews of data analytics solutions of several companies. 2. WP2: Developing the Conceptual Framework to Facilitate the Decision-Back SM Big Data Requirements Capturing 2.1. By examining the decision-back approach, and how it has been used in a variety of literatures, general questions from the article [8] have been used to identify the main concepts for using this approach for analyzing SM data Each concept in the framework is examined to describe how it can be beneficial for capturing SM data within more efficient timelines with less consumption of resources Connecting the framework with SM analytics life cycle and showing its relevancy to SDLC. 18 Page
39 Chapter2: Research Method 3. WP3: Fine-tuning the Conceptual Framework 3.1. Examining W*H Conceptual Model for Services [33], and customize it to be used to make the concepts in the framework more descriptive After relating the conceptual framework to the SDLC, and showing how it does work as a requirements acquisition phase in the Big Data analytics lifecycle, the requirements framework is built and its components are described accordingly. 4. WP4: Design of a tool s Prototype and the Component Architecture of a Tool that Supports the Decision-Back SM Big Data Requirements Capturing 4.1. Based on the requirements acquisition framework the tool architecture is designed Each model in the tool is described showing how it supports on capturing the SM data analyst s requirements. 5. WP5: Validation 5.1. Two types of validation tests are provided: theoretical and experimental Theoretical by showing some case studies from different problem domains Experimental by using actual analytical tools and measuring correctness and efficiency quality factors. 6. Wp6: Discussion, Conclusions and Future Research Directions 6.1. Discussing the worthiness of the provided tool architecture by comparing two results: analysis with the tool, and without the tool Conclude the research, discuss what its limitations are, and provide some future work directions for further improvements. 19 Page
40 Chapter2: Research Method Figure 2.1: Thesis Work Packages (WPs) 2.2. Research Instruments and Procedures This research attempts to build a requirements acquisition tool architecture for decisionback approach for capturing SM data. In order to validate this tool for correctness and efficiency, a prototype consisting of combination of tools need to be available to support the tool evaluation process is described below Social Mention Social Mention 2 is a free Social Media search and analysis platform that aggregates user generated content from across the universe into a single stream of information. It allows to Page
41 Chapter2: Research Method easily track and measure what people are saying about a person, a company, a new product, or any topic across the web's Social Media landscape in real-time. Social Mention monitors 100+ Social Media properties directly including: Twitter, Facebook, Friend Feed, YouTube, Digg, and Google Trackur Trackur 3 is a SM monitoring tool designed to assist companies and public relations PR professionals in tracking what is said about brands on the Internet. It scans hundreds of millions of web pages including news, blogs, videos, images, and forums and alerts the user to anything that matches the keywords monitored. It cost at least $97 a month and it offers a 10-day trial Page
42 CHAPTER 3: LITERATURE REVIEW
43 Chapter3: Literature Review 3.1. Big Data and Social Media Related Works Many software startups, research and development efforts are actively trying to harness the power of Big Data and SM, and create software with the potential to improve almost every aspect of human life. As these efforts continue to increase, full consideration needs to be given to the engineering aspects of Big Data and SM software. Since these systems exist to make predictions on complex and continuous massive datasets, they pose unique problems during collecting, processing, and analyzing data that needs to be delivered on time and within budget [34]. This research is focusing on SM requirements capturing approach, and studies that are discussing SM and data capturing approaches Innovative Big Data and Data Capturing Approaches IBM in [35], provides a means of classifying Big Data business problems according to a specified criteria. They have provided a pattern-based approach to facilitate the task of defining an overall Big Data architecture. Their idea of classifying data in order to map each problem with its suitable solution pattern provides an understanding of how a structured classification approach can lead to an analysis of the needs and a clear vision of what needs to be captured. Moreover, IBM has presented several real-life samples of Big Data case studies in [36]. The authors in [37], have studied different Big Data types and problems. They developed a conceptual framework that classifies Big Data problems according to the format of the data that must be processed. It maps the Big Data types with the appropriate combinations of data processing components. These components are the processing and analytic tools in order to generate useful patterns from this type of data. Constraint-Driven Data Mining technique proposed by [38] identifies the following classes of constraints: database constraints, pattern constraints, and time constraints. Database constraints are used to specify the source dataset. Pattern constraints specify which patterns are interesting and should be returned by the query. Finally, time constraints influence the 23 Page
44 Chapter3: Literature Review process of checking whether a given data/sequence contains a given pattern. However, data mining can only be applied to structured data that can be stored in a relational database [39], but this constraint-driven approach can provide an understanding of how these types of constraints can lead to more efficient data collection. The article [40] proposes a novel approach for consistent collective evaluation of multiple continuous queries for filtering two different types of data streams: a relational stream and an XML stream. The proposed approach commonly provides a region-based selection constructs: an attribute selection construct for relational queries and a path selection construct for XPath queries. Both collectively evaluate the selection predicates of the same attribute (path), based on the precomputed matching results of the queries in each of the disjoint regions divided by the selection predicates. The performance experiments show that the proposed approach is basically more efficient and stable at run-time. C. Anne and B. Boury in [41], proposed a framework facilitating the integration of heterogeneous unstructured and structured data, enabling Hard/Soft fusion and preparing for various analytics exploitation. It provides timely and relevant information to the analyst through intuitive search and discovery mechanisms. The authors described the design and implementation of a prototype for scalable Multi-Intelligence Data Integration Services (MIDIS), based on a flexible data integration approach, making use of Semantic Web and Big Data technologies. In [42], the white paper published by Intel walk through the challenge of extracting Big Data from multiple sources. It has explained how Hadoop infrastructure can contribute to the process of Big Data Extract, Transform & Load (ETL). It illustrates the process of loading different data formats from multiple data sources into Hadoop s warehouse from a technical point of view. However, they did not touch the idea of reducing useless data capture nor producing real-time management decisions. 24 Page
45 Chapter3: Literature Review Literatures Analysis From the IBM contributions [35], [36], in the field of Big Data, the idea of decision-back concept for a structured approach to SM data collection has emerged. Moreover [37], [40], [41], [42], discussed how the data classification according to some parameters can lead to better understanding of the problem at hand. While [40], discussed the constraint-driven approach and how can it provide an understanding of how these types of constraints can lead to more efficient data collection Innovative Social Media Data Collection and Analytics Approaches In [43] the authors present a multi-layered knowledge extraction approach of social networks with a comprehensive survey of relevant notions and techniques from multidisciplines. They analyzed the SM characteristics in a multi-mode, multi-layer knowledge dimensions using twitter as an example. They also improve the hyper graph model of social network behaviors based on the dimensions proposed in the model with a case study in Twitter illustrating the multi-dimensional relations between Twitter users. Their main focus was to improve the understanding of social network services. The authors in [23], studied the application of the concept and techniques of web mining for on-line social networks in terms of how to use web mining and a general process of its use for on-line social networks analysis. They discussed several challenges in this research area; for example: data sampling is a big issue when using web mining for on-line social networks analysis. In other web mining applications, data sampling is a simple task to reduce the amounts of data size. However, in on-line social networks analysis, it becomes a difficult task to select suitable samples representative of the real social networks. In [44], the authors empirically designed and developed the Real-time Twitter Trend Mining (RT²M) system which allows in real-time to: 1) crawl and store every textual data tweet produced in Twitter into a local database; 2) keep track of social issues by temporal Topic Modeling, and; 3) visualize mention-based user networks. They also demonstrated a 25 Page
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationWhite Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationBig Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More information5 Point Social Media Action Plan.
5 Point Social Media Action Plan. Workshop delivered by Ian Gibbins, IG Media Marketing Ltd (ian@igmediamarketing.com, tel: 01733 241537) On behalf of the Chambers Communications Sector Introduction: There
More informationUnderstanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationLeveraging Global Media in the Age of Big Data
WHITE PAPER Leveraging Global Media in the Age of Big Data Introduction Global media has the power to shape our perceptions, influence our decisions, and make or break business reputations. No one in the
More informationThe Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
More informationUnlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach
Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
More informationHadoop for Enterprises:
Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative
More informationBig Data Executive Survey
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
More informationANALYTICS STRATEGY: creating a roadmap for success
ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationData Warehousing in the Age of Big Data
Data Warehousing in the Age of Big Data Krish Krishnan AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD * PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an imprint of Elsevier
More informationIntegrated Social and Enterprise Data = Enhanced Analytics
ORACLE WHITE PAPER, DECEMBER 2013 THE VALUE OF SOCIAL DATA Integrated Social and Enterprise Data = Enhanced Analytics #SocData CONTENTS Executive Summary 3 The Value of Enterprise-Specific Social Data
More information5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationTRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS
9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence
More informationWe are Big Data A Sonian Whitepaper
EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed
More informationHow To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
More informationExploiting Data at Rest and Data in Motion with a Big Data Platform
Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags
More informationAdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM
AdTheorent s Real-Time Learning Machine (RTLM) The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising Worldwide mobile advertising revenue is forecast to reach $11.4 billion
More informationBig Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
More informationBig Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
More informationThe Lab and The Factory
The Lab and The Factory Architecting for Big Data Management April Reeve DAMA Wisconsin March 11 2014 1 A good speech should be like a woman's skirt: long enough to cover the subject and short enough to
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationBig Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
More informationTAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
More informationChartis RiskTech Quadrant for Model Risk Management Systems 2014
Chartis RiskTech Quadrant for Model Risk Management Systems 2014 The RiskTech Quadrant is copyrighted June 2014 by Chartis Research Ltd. and is reused with permission. No part of the RiskTech Quadrant
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationANALYTICS BUILT FOR INTERNET OF THINGS
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
More informationBIG DATA FUNDAMENTALS
BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationThere s no way around it: learning about Big Data means
In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationHow To Use Big Data To Help A Retailer
IBM Software Big Data Retail Capitalizing on the power of big data for retail Adopt new approaches to keep customers engaged, maintain a competitive edge and maximize profitability 2 Capitalizing on the
More informationHow To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
More informationConverged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
More informationTurning Big Data into a Big Opportunity
Customer-Centricity in a World of Data: Turning Big Data into a Big Opportunity Richard Maraschi Business Analytics Solutions Leader IBM Global Media & Entertainment Joe Wikert General Manager & Publisher
More informationBeyond listening Driving better decisions with business intelligence from social sources
Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social
More informationAddressing government challenges with big data analytics
IBM Software White Paper Government Addressing government challenges with big data analytics 2 Addressing government challenges with big data analytics Contents 2 Introduction 4 How big data analytics
More informationTECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES
Techniques For Optimizing The Relationship Between Data Storage Space And Data Retrieval Time For Large Databases TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationUsing Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
More informationISSN:2321-1156 International Journal of Innovative Research in Technology & Science(IJIRTS)
Nguyễn Thị Thúy Hoài, College of technology _ Danang University Abstract The threading development of IT has been bringing more challenges for administrators to collect, store and analyze massive amounts
More informationApache Hadoop Patterns of Use
Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationTraditional BI vs. Business Data Lake A comparison
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
More informationA Strategic Approach to Unlock the Opportunities from Big Data
A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?
More informationUnderstanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
More informationQUICK FACTS. Implementing a Big Data Solution on Behalf of a Media House TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES
[ Communications, Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES Client Profile (parent company) Industry: Media, broadcasting and entertainment Revenue: Approximately $28 billion Employees:
More informationReal-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment
www.wipro.com Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment Pon Prabakaran Shanmugam, Principal Consultant, Wipro Analytics practice Table of Contents 03...Abstract
More informationIntegrating SAP and non-sap data for comprehensive Business Intelligence
WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst
More informationWhy big data? Lessons from a Decade+ Experiment in Big Data
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:
More informationResource 2.19 An Introduction to Social Media for Business Types of social media
Page 1 of 5 An Introduction to Social Media for Business Social media is the general term used to describe the growing number of websites and networks whose users can submit and share content, communicate,
More informationGetting the most out of big data
IBM Software White Paper Financial Services Getting the most out of big data How banks can gain fresh customer insight with new big data capabilities 2 Getting the most out of big data Banks thrive on
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationAnnex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013
Annex: Concept Note Friday Seminar on Emerging Issues Big Data for Policy, Development and Official Statistics New York, 22 February 2013 How is Big Data different from just very large databases? 1 Traditionally,
More informationVIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
More informationBig Data Discovery: Five Easy Steps to Value
Big Data Discovery: Five Easy Steps to Value Big data could really be called big frustration. For all the hoopla about big data being poised to reshape industries from healthcare to retail to financial
More informationBIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS
BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured
More informationBeyond the Single View with IBM InfoSphere
Ian Bowring MDM & Information Integration Sales Leader, NE Europe Beyond the Single View with IBM InfoSphere We are at a pivotal point with our information intensive projects 10-40% of each initiative
More informationBig Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
More informationTeradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
More informationWhite Paper: Big Data and the hype around IoT
1 White Paper: Big Data and the hype around IoT Author: Alton Harewood 21 Aug 2014 (first published on LinkedIn) If I knew today what I will know tomorrow, how would my life change? For some time the idea
More informationHow To Listen To Social Media
WHITE PAPER Turning Insight Into Action The Journey to Social Media Intelligence Turning Insight Into Action The Journey to Social Media Intelligence From Data to Decisions Social media generates an enormous
More informationhite News & Social Media Naroclips Instant Intelligence
hite Papers News & Social Media at a Glance Naroclips Instant Intelligence The Essence & Types of Media In basic terms, media monitoring is the act of systematic reading, watching, listening and recording
More informationDRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL 25 2013
DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL 25 2013 BRAD HATHAWAY REGIONAL LEADER FOR INFORMATION MANAGEMENT AGENDA Major Technology Trends Focus on
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationHow To Create A Data Science System
Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002
More informationThe Rise of Industrial Big Data
GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial
More informationMaster big data to optimize the oil and gas lifecycle
Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on
More informationIBM System x reference architecture solutions for big data
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
More informationDetecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.
Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationBeyond Watson: The Business Implications of Big Data
Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationWhat do Big Data & HAVEn mean? Robert Lejnert HP Autonomy
What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationBig Data and Open Data
Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of
More informationMCCM: An Approach to Transform
MCCM: An Approach to Transform the Hype of Big Data into a Real Solution for Getting Better Customer Insights and Experience Muhammad Salman Sami Khan, Chief Research Analyst, Global Marketing Team, ZTEsoft
More information