Chapter 3 Data Warehouse - technological growth

Size: px
Start display at page:

Download "Chapter 3 Data Warehouse - technological growth"

Transcription

1 Chapter 3 Data Warehouse - technological growth Computing began with data storage in conventional file systems. In that era the data volume was too small and easy to be manageable. With the increasing data volume the management of data and managing data in traditional file systems led to serious challenges with regard to the integrity of the data. The research to solve this issue brought in the database management system. Database management system has shown considerable growth in its capability and technology supported services. These databases could support the scaling of small volume data oriented OLTP system to very large database oriented OLTP systems. The proliferation of networking of system could extend the databases from its centralize database system to distributed database systems to manage data efficiently at multiple locations. The highly dynamic situation of market compel the utility of data, not limited to process generating timely report but also needed analytical reports and forecast to sustain the challenges of dynamic changes taking place in the business environment. Business organizations have large volume of data but have found it increasingly difficult to access and analyze it. This is because data are in different formats, exists on different platforms and resides in different databases developed by different vendors. To transform these data in a unified manner for analysis and reporting hundreds of programs are written and maintained. Once the initial findings are made, decision makers want to dig deeper into the data which requires modification in existing program or development of new programs. This process is inefficient and very time consuming. This brought in analytical processing in, "OLAP". OLAP is an analysis techniques used to explore the data. OLAP not only needs transactional based data but also past data. These generated the need of data warehouse. Data Warehouse - A very large domain of database. Data warehouse is a central repository of storage where large amount of data are stored from heterogeneous sources. Data warehouse needs a well designed analytical system. The purpose of the system is to provide analysis with an integrated and consistent view and all the data relevant to the organization. A database is an application oriented collection of data. That is 22

2 organized, structured, and coherent with minimum and controlled redundancy and which is usually accessed by several users while a data warehouse is a subject oriented collection of data specifically designed to analyze data for decision making process. According to [69], the data modeling paradigm for a data warehouse must comply with requirements that are totally different from the data models in OLTP environments. Data warehouse systems are designed to facilitate data analytics with visualization. 3.1 Data Warehouse Data warehouse is a subject oriented, non volatile, time variant collection of data in support of management's decision making process [77, 36]. Subject-oriented A data warehouse can be used to analyze subject areas and therefore data warehouse's data is organized around specific subject areas. For example, sales, customer, product. Integrated A data warehouse is build by integrating data from heterogeneous or homogeneous sources such as relational databases, flat files, etc. Integration process takes place when data passes from one or more data sources to the data warehouse. Integration defines a unique representation of data coming from different sources with data and attribute inconsistencies. Non-volatile A data warehouse data cannot be changed or updated. Retaining the data makes analysis over long time periods possible. Time-variant A data warehouse stores historical data. Time variance implies that every unit of data in the data warehouse is accurate as of some moment in time. The storage, access, usage, performance requirement and responsiveness to queries in data warehouse are different from those in OLTP environment. Data warehouse contains historical and summarized data over long period of time. The size of a data 23

3 warehouse can vary from gigabytes to terabytes. Information in the data warehouse is organized around major subject areas and is modeled in order to allow precomputation and fast access to summarized data [7]. Decision makers of business domain generates regular predefined reports, execute ad hoc or complex queries for decision making process. These queries require numerous scan, join and aggregate operation across the data warehouse to access millions of records. As a result, query response time is a matter of concern for data warehouse. For this, data warehouse must have an architecture that allows gathering, organization, manipulation and presentation of data quickly and efficiently. Figure 3.1 shows the generic data warehouse architecture. Data Warehouse builds by integrating large amount of data from multiple heterogeneous or homogeneous sources. The source data can be any operational system, data stores, files or external sources. ETL process is used to extract these data from the multiple sources. After extracting these data, data needs to be integrated, cleaned and transformed into the format and structure compatible with the data warehouse. These data are then loaded into the data warehouse. At this stage, data is restructured for query optimization. These data are then used for query, reporting and data analysis by OLAP and data mining tools. E T L P R O C E S S Data Warehouse End User Access Tool Source System Figure 3.1 Generic architecture of Data Warehouse 24

4 To build a data warehouse, Inmon proposed architecture which follows top-down approach and Kimball proposed architecture which follows bottom-up approach to build data warehouse. Hybrid approach is also being used to build data warehouse which is a combination of top-down and bottom-up approach. Inmon's top-down architecture uses ETL tool to extracts and transforms data from data source systems. After transforming the data, it is loaded into the data warehouse. After creating data warehouse, if necessary, data marts can be created. Data marts can be created for specific purpose or for specific subject domain. Following figure shows the top-down approach. E T L P R O C E S S Data Warehouse Source System Data Mart Figure 3.2 Top-Down approach architecture Kimball s bottom-up architecture starts with building data marts from individual department s data. ETL tool used to extract and transform data from source system and load it into the data mart. Then, it uses these individual data marts to build the data warehouse. The idea of this architecture is to construct the data warehouse in incremental way. Following figure shows the bottom-up approach. 25

5 Data Warehouse Source System ETL Process Data Mart Figure 3.3 Bottom-Up approach architecture 3.2 Building the Data Warehouse To build data warehouse, generic data warehouse architectural model can be described by three phases: Data Extraction and Integration Data Modeling Data Analysis Data Extraction and Integration Data in the data warehouse can be extracted from homogeneous or heterogeneous sources. These data needs to be integrated, transformed and cleansed before store it into the data warehouse. In the ETL process, data are collected from different sources and stored in the data staging area for data integration and transformation process. After transforming these data, it is loaded into the data warehouse. Following figure shows the data extraction and integration from homogeneous and heterogeneous sources using ETL. 26

6 Homogeneous Source Heterogeneous Sources Data Source DS1 DS2 DS3 ETL Tool ETL Tool Data Warehouse Data Warehouse Figure 3.4 ETL process for Homogeneous and Heterogeneous sources Data Modeling Phase To best support the needs of the data warehouse users, the data warehouse databases should be designed perfectly. Well designed data model allows the data warehouse to grow easily as well as provides good performance. To design data warehouse database different levels of models has been used. Data Modeling is a technique that records the inventory, shape, size, contents and rules of data elements used in the scope of a business process [71]. Data modeling provides a kind of map that describes the data used in process. To support the requirements of a data warehouse, the data warehouse can be designed with the help of three levels of model: Conceptual, logical and physical. Following figure shows the different levels of data model used for data warehouse. Conceptual Data Model Logical Data Model Physical Data Model Figure 3.5 Data Model for Data Warehouse 27

7 Conceptual Data Model The conceptual level data model shows a high level view of the data warehouse. It is a brief description of the users data requirements without taking into account implementation details. This model is closer to the real world than to the implementation level. Conceptual data model are typically expressed using the ER model or the Unified Modeling Language (UML) [37, 62, 43, 63]. Logical Data Model The logical data model is built based on the user requirements and then translated into the physical data model. The logical level data model shows entities and their relationships in a logically sound manner, to serve as model for physical implementation. This model includes all entities and relationships among them, all attributes, primary key for each entity and associated foreign keys. The goal of this model is to describe the data in detail as much as possible. Physical Data Model The physical data model shows the actual representation of the physical tables in the database as they are implemented. This model represents the actual design of a database. This model also includes the techniques like: indexes, materialized views and partitioning Dimensional Modeling Data warehouse uses dimensional modeling structure to store large volume of integrated data which is suitable to answer analytical queries. It stores data in a way that user can analyze data from multiple perspective. Dimensional modeling uses the concept of fact table and dimension tables. Fact table contains measures and related data. Each fact table contains the keys to associated dimension tables. These are called foreign keys in the fact table. This table has usually small number of columns and has large number of rows as compared to dimension tables. Dimension tables contain attributes that describe fact records in the fact table. It contains the information about the numerical values in the fact table. Dimension tables contain large number of columns and small number of rows as compared to fact tables. Facts are considered as dynamic part of warehouse and dimensions are considered as static entities because dimensions are computed once during the ETL process. 28

8 Multidimensional modeling technique uses star schema or snow flake schema to store data in warehouse Star Schema Star schema consists of the fact table which is surrounded by number of single level of collapsed or consolidated dimension tables. Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table. The fact table in the dimensional model is joined with all the other dimension tables, there is only a single join line connecting the fact table to the dimension tables. This will lead to better query performance. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table. D1 D2 F D3 D4 Figure 3.6 Star schema Snowflake Schema The snowflake schema is an extension of the star schema. The snowflake schema consists of a central fact table surrounded by hierarchies of dimension tables. The dimensions usually relate to the facts in one-to-many relationships and the snowflake schema exposes them as fully normalized structures usually consisting of many entities with often complex intra dimensional relationships. D1.2 2 D D F D D4.1 D Figure 3.7 Snowflake schema pattern D D

9 3.2.3 Data Analysis Phase Data from the data warehouse is retrieved and analyzed using ad-hoc query, OLAP tool, reporting tool or data mining tool. Data warehouses used to store numeric and textual data for decision making process and most industry applications are designed to operate with data warehouse of this nature. Majority of the Data warehouse systems helps in analyzing numeric data. Much research work has been done to design data warehouse for storing, aggregating and summarizing these data and good performance is achieved while accessing and analyzing these data. Data warehouse technology with numeric data is considered to be matured [9]. 3.3 Multimedia Analysis In today s business scenario the type of data is not limited to numeric or textual data but it includes wide varieties of images, audio, video etc. Multimedia data is widely used in the field of science, engineering, medicine, modern biology, geography, biometrics, weather forecast, digital libraries, manufacturing and retailing, art and entertainment, journalism, social sciences and distance learning. These data comprise of various formats like image, audio, video, text and signal data. As the usage of multimedia grows, users require sorted, combined and analyzed multimedia data in innovative ways this leads to build Multimedia data warehouse. Multimedia analysis has focused on images, audios and videos with different goals or objectives. Multimedia Information Retrieval (MIR) uses the areas of computer vision, machine learning, digital image processing, pattern recognition, database management and information retrieval. Multimedia Information Retrieval includes multimedia data analytics, feature extraction, information visualization and more. It is difficult to gather multimedia information from different sources and with different goals or objectives. Multimedia information retrieval started in the late 1970s. In 1980s, edge finding, boundary and curve detection, region growing, shape identification, feature extraction of individual images or frames of images studied. In 1990s, Content Based Image Retrieval (CBIR) [10, 30] and Content Based Video Clip Retrieval (CBVR) are accomplished. During that era, multimedia data grows due to the wide use of WWW. The computer vision community uses a visual based approach 30

10 while the database management community uses a text based approach. In the visual based approach, visual characteristics identified from the multimedia object using program and searches object based on these characteristics. In text based approach, the multimedia object is annotated manually and then searches objects using text based attributes in database management systems. Traditional data management systems are designed and suitable for structured data and expected exact query match results. Multimedia data is unstructured or semistructured data and expected exact or non-exact query match results. Traditional database management systems are unable to keep up with these ever demanding requirements. Thus, it is needed to have effective and yet efficient methods for the management of ever growing multimedia data. Content based approach is a promising way for multimedia data storage and retrieval Multimedia Data Features Multimedia data can be described by the content, characteristic or feature they provide. These features can be extracted manually, semi-automatically or automatically from data. Features are also known as descriptors. Multimedia data features can be classified according to the level of abstraction. Low-level, Mid-level and High-level features are used. Low-level Feature. Low level features are the features which can be extracted automatically using computer program with minimum human intervention. Therefore, they can be extracted and processed automatically or semi-automatically. Mid-level Feature. Mid level features are the middle level feature between low level feature and high level domain oriented semantic rich feature. Identification of objects in multimedia is known as medium level features. These features can be extracted automatically to a certain extent to a particular domain. Mid level feature requires less domain knowledge but some general knowledge is required. Shape matching and object recognition are members of this type. 31

11 High-level Feature. High level features are semantically rich information and information is close to the human perception. The description based descriptors are difficult to extract automatically and usually specified manually as text annotations. These features are extracted manually by domain experts. Domain concepts, text annotation, events, emotions, identification of objects and keywords are categorized as high level features. These features are known as semantic feature or high-level feature. Human uses high level features to interpret, recognize or analyze multimedia data. Following table shows the levels of feature to represent multimedia data: Level of Feature Low level Feature Medium level Feature High level feature Description Color, texture, shape Objects in image Keywords, event, emotion, identification of objects Table 3.1 Feature Levels of Multimedia Data Earlier methods for the representation of multimedia data use low-level features. This data seldom represent the semantic content and have little or nothing to do with human perception. Representation of multimedia data with only low level features lack of semantic meaning. Therefore, retrieval results are unsatisfactory if retrieval or analysis is based on only low level features. In addition, high level semantic features may be defined differently because each human being interprets the content what they see from their point of view. Calculated features can also be used which can be calculated from retrieved features. Image and video information retrieval system is provided in [4]. They have included traditional video analytics, video parsing and video abstraction. Video analytics from color, texture, shape and spatial similarities, video parsing includes temporal segmentation, object motion analysis, framing and scene analysis and video abstraction includes skimming, key frame extraction, content based retrieval of clips, indexing and annotation. Image information retrieval taxonomy for image retrieval system is described in [86] which includes, feature extraction using color, texture, shape, color layout and segmentation, image indexing such as dimensional reduction and multidimensional indexing. [73] extracted features from an image and form a 32

12 hierarchy of global features, salient features, signs, shapes and object features. For similarity matching, they have used machine learning and semantic interpretation. They have also provided image indexing, storage and query regarding database. [50] focused on shot boundary detection in video. [86] focused on face detection. Video indexing process is described in [74] as a hierarchy that groups different index types, characterize different genres and sub-genres according to the prominent layout and contents, and splits the hierarchy structure into named events and logical units. Multimedia data are extracted from the different sources and stored in a data archive for information retrieval [49]. They tried to bridge the semantic gap between the users and their multimedia information. They have translated the computable low level content based media features to high level concepts or terms which can be used from the user perspective. [23] used data mining technique for knowledge discovery of image databases in content based information Retrieval (CBIR). [45] proposed semantic based image retrieval system. [53] proposed content based image retrieval system modeling. Multimedia data warehouse study is rooted in traditional areas of multimedia analysis and Data warehouse, which started in the late 1990s to early 2000s. Till date, new models, architectures and framework have continued to emerged and proposed in the multimedia data warehouse research and development (R&D) community to efficiently store, access and process multimedia data in warehouse environment. There is much to do in regard to complex, multimedia data warehousing [32]. The challenges prevail in the field of data warehouse particularly in building multimedia data warehouse integrating multimedia databases with data in the form of static images or motion pictures, audio and video data and to achieve optimum level of performance for data storage, access and analysis. 3.4 Multimedia Data Warehouse Multimedia data can be stored in data warehouse from homogeneous or heterogeneous sources. Multimedia databases will have to be integrated, transformed and cleansed to develop a centralized multimedia repository i.e. multimedia data warehouse. Multimedia data warehouse should be designed in such a way that data can be quickly and easily be extracted and analyzed. Multimedia data management needs efficient storage and access mechanism that supports multimedia data 33

13 warehouse. Storage and access of multimedia data is a critical issue for the overall system's performance and functionality. Hence deployment of new techniques to store, retrieve and process multimedia data is essential and imperative. 3.5 Building Multimedia Data Warehouse Building multimedia data warehouse includes three major aspects: Multimedia feature extraction, dimensional modeling design and usage of techniques which helps to achieve performance. Among the three aspects, multimedia feature extraction is the key to accuracy performance, dimensional modeling design is critical for data storage and representation, and performance techniques is important to achieve storage and retrieval performance Multimedia Data Extraction To build multimedia data warehouse, data should be integrated, transformed and cleansed as these data are coming from different data sources and in semi-structured or unstructured format. Therefore, it is needed to integrate and transform these data in structured manner or to provide such an environment where these data can be stored in a proper manner in a way that can be efficiently accessed or analyzed later. Along with the multimedia data extraction from the source system, multimedia features should also be extracted. The representation of multimedia data is an imperative task. Multimedia data are represented by features that constitute data to be analyzed. The features are specific to the domain area therefore relevant characteristic of multimedia data should be extracted according to the analysis goal. To integrate multimedia data, [37, 62, 43] build a generic UML model that represents the multimedia data through low-level and semantic descriptors. [7, 8, 9] represents multimedia data through content based and description based descriptors. Researchers also perform content based image clustering [2]. [62, 63, 15] uses low-level features and meta information. [5, 62, 63, 82] use semantic data. The described multimedia data is stored at the operational level or at data staging level into relational and/or XML-native databases. These stored data can be loaded into a dimensional structure, in data warehouse, data marts or data cubes. After feature extraction and data transformation process, dimensional modeling process takes place. 34

14 The type of feature extracted from the multimedia data is specific to the application domain. For example, to extract features from face images, low level features such as color and texture are not effective when used for face recognition as there are many parameters that affects the image such as: angle from which image is taken, lighting condition. In case of geographic images the parameters that affect the image is satellite altitude, angle of acquisition and climate circumstances. e-learning video data are videos based on specific course content which is effective when accessed and recognized by high level features such as title, course content and keywords Biometrics Image Data Face image data in biometrics can be described by different levels of features extraction. Low level features are not effective for face recognition as human face can be described by verbal description of the high level semantic feature. Human being perceive facial images and compare their similarity using high level features such as gender, hair color, race, etc. Therefore, it is essential to describe face using high level semantic features or combination of both the features. [3] retrieved image by integrating CBIR and FERET, with semantic features and eigenface to extract low level features. [67] uses low level features with high level attributes and proposed attribute enhanced sparse coding and attributes embedded inverted indexing to retrieve image. [15] proposed methods to use automatically detected human attributes that contains semantic cues of the face by constructing semantic codewords and low level features. [89] use local and global features Geographic Image Data The geographic image data provides the information regarding objects in the real world. These objects can be represented by low level and high level semantic features. Low level features include color in RGB or HSV, shapes, polygons and textures. High level semantic feature includes application oriented semantic classes like river, forest, desert, etc. [61] designed region based image retrieval system and the similarity between two images was measured based on individual region to region similarity which is extended to image to image similarity based on all segmented regions within the 35

15 image. [51] retrieved images based on the classification of the images into predefined semantic features as cloud, water, forest, farmland, bare soil, rock, urban area using gray scale images. The same concept is used to retrieve images using multispectral isolated images [46]. They have used multi-band isolated JPEG 2000 codec images to retrieve an area of interest using hue, saturation and value color model. [84] retrieved predefined classes from isolated images and databases. They predefined classes as city, cloud, desert, field, forest and sea e-learning video data e-learning data are video data used in online course for teaching purpose. e-learning data is a combination of audio, video and text data. e-learning video data are created for specific course or topic therefore they are usually represented and retrieved by the high level semantic features. [90] proposed a system that uses a domain ontology. They have defined academic elements as introduction, definitions and theorems, theory, diagrams discussions, review, question and answer, subtopics Dimensional Modeling for Multimedia data A dimensional model for multimedia data warehouse uses star schema [81, 78, 79, 64, 58], starflake [87, 42] and snowflake schema [64, 9]. [78, 79] uses star schema and uses features of object relational database to meet the requirements of integrating heterogeneous types of data. XML based multidimensional approach [78, 79, 44, 63, 43, 62, 88, 55] for the storage of multimedia data and complex data has also been used. [9] uses snowflake schema, named multiversion model which stores measures in fact table and descriptors of multimedia data is stored in dimension tables. They present the concept of multiversion dimension, which is composed of several versions of dimensions, each one being a dimension for a given version with its own schema. The schema of various dimensions is described using the hierarchical levels. [14] proposed temporal data model for semi-structured data. [8] uses a data mining technique - decision trees to select relevant data to be modeled according to the analysis goals. [82] designed visual cube and proposed algorithm for visual cube construction. They introduce Multi-Dimension scheme in which cube has three dimension schema and Single Dimension scheme in which cube has two dimension 36

16 schemas. [5] presented a hierarchical way of structuring the data and extracted information. The data model represents facts and dimensions according to the hierarchical structure of entities captured in multimedia objects. XML based approach is also used to store data in xml data warehouse in terms of XML documents, XML database [24] and XML cubes [17, 29, 31, 32, 47, 48, 80]. Once data is modelled in data warehouse in data cubes they are analyzed or extract relevant information using appropriate tools Multimedia Data Analysis Multimedia data can be accessed and analyzed by providing different dimensional criteria. To analyze multimedia data, [9] presented prototyping model from which aggregated data for ECG is calculated. Xin Jin et al[82] constructed visual cube and extracted and count images from the presented prototype. [19] proposed prototype of Multidimensional Image Retrieval. 3.6 Performance Factors Performance factors are some of the core features that can be used to improve warehouse s storage performance and query performance. Following are main performance pillars for the data warehouse: Compression Indexing Partitioning Materialized view Compression Data compression improves storage performance. The aim of data compression is to minimize the amount of data to be stored and transmitted. Compressed files occupy less disk space than uncompressed files. Therefore, data compression reduces the storage costs. On the other hand, data compression increases the speed of data transfer because a smaller file transferred faster than a larger file. The use of data compression in databases also improves system performance by reducing the I/O cost. It also reduces the number of bits required to store and/ or transmit digital media. There are four levels at which compression can be performed on Data warehouse [25] - File 37

17 level compression, Page level compression, Record level compression and Attribute level compression. File level and Page level compression are better but as far as the query processing is concern they do not perform well as entire file or page has to be compressed or decompressed which increases overhead on CPU, hence performance degrades. Record level compression and Attribute level compression perform well but, does not give good compression ratio in comparison to the first two types [25]. Image compression refers to the reduction of irrelevance and redundancy of image data. The redundancy and similarity among different regions of images makes compression feasible. It is very convenient to store compressed data in data warehouse to save disk storage [22]. Other reasons for storing data in compressed way are [52]: Reduces query execution time as static data is stored in data warehouse. Reduces CPU overhead as it needs to search data in less space. Reduces data redundancy. Reduces the probability of transmission errors since fewer bits are transferred [54]. There are two types of compression techniques used with multimedia data, lossless compression and lossy compression. Both compression techniques further uses different techniques to compress files and each compression techniques have various methods which are used by different file formats and achieve different results. Classification of compression is described in following figure. Compression Techniques Lossless Compression Lossy Compressio n a. Run Length Encoding b. Huffman Encoding c. Arithmetic Encoding d. Entropy Encoding e. Area Coding a. Predictive Coding b. Transform Coding (FT / DCT / DWT) Figure 3.8 Classification of compression technique 38

18 Lossless Compression Lossless compression algorithm reduces file size with no loss in quality. When the file is saved it is compressed, when it is decompressed the original data is retrieved. It uses Run Length encoding, Huffman encoding [40], Arithmetic encoding, entropy encoding, or area coding algorithm. This compression is suited for drawings, logos, text and other simple images that would not look good when compressed with lossy compression. [11] discussed Huffman and Arithmetic algorithms for multimedia compression. Examples of lossless image compression format are PNG and GIF. Lossy Compression Lossy compression algorithm permanently discards redundant pixel information. This means that when the file is decompressed the original data isn't retrieved. It uses predictive and Transform coding technique. As shown in figure 3.8, transform coding uses FT, DCT [1] or DWT method. This technique is suited for photographs and videos. Example of a lossy compression format is JPEG and MPEG. Compression format can be chosen according to the type of image or video data and application domain in which it is used. Like images, videos can be compressed using lossless and lossy compression technique. Lossless video compression codec performs an average compression while lossy video compression provides better compression ratio. Majority of video compression algorithm uses lossy compression. At the same time highly compressed video may present visible or distracting pictures. Video compression uses different techniques to reduce redundancy in video data. Lossy compression technique includes MPEG format and many other. Mohd. Fraz et al[59] proposed lossless and lossy compression techniques on relational databases. The proposed technique is used at attribute level on Data warehouse by applying lossless compression on three types of attributes (string, integer, and float) and lossy compression on image attribute. They have used JPEG coding algorithm to compress image. They have got 13.5% compression ratio on image. P Singh et al[68] surveyed the features for image and video compression. For image compression they come to the conclusion that JPEG is an excellent way to store 39

19 24-bit photographic image and was designed to compress color or gray scale continuous tone images or any graphics and the vector graphics do not get compress well under JPEG. [26] performed analysis of lossy compression algorithm for medical images while [28] performed analysis of Multimedia Compression Algorithm. [57] provides comparative study on lossy image compression in multimedia data warehouse Indexing Indexing in data warehouse environment reduces the query execution time to see query results. Usage of too few indexes loads the data quickly but the query response is slow. Usage of too many indexes loads the data slowly and storage resource requirement increases but the query response is good. Selection of right index structure built on columns, improve the performance of queries [42]. B-Tree index[21], Bitmap index [18, 60, 66] and Join index [34, 63] are indexing techniques used in data warehouse. Each existing technique is suitable for a particular situation. B-Tree Indexes should only be used for high cardinality data and predicted queries. It is used in warehouse to enforce unique keys. Bitmap index is best suited for columns having low cardinality and should only be considered for low-cardinality data [72, 70].Bitmap indexing is useful for lowcardinality domains because comparison, join and aggregation operations are reduced to arithmetic, which reduces the processing time. It reduces the space and I/O. The join index is useful for maintaining the relationship between a foreign key and its matching primary key. The star schema model of data warehouse makes use of join index by the linkage between a fact table and corresponding dimension table. Join indexing maintains relationships between attribute values of a dimension and the corresponding rows in the fact table. Multimedia data or the data that represents the multimedia data can be indexed [20]. Multimedia data warehouse [15, 65, 79, 85, 16, 42] uses indexing to speed up query processing. Ankush Mittal et al[6] have designed a system for indexing videos using audio, video and power Point slides and segmenting them into various lecture components. Content server which is a repository of multimedia content 40

20 will maintain the indexes for metadata. The physical records will be retrieved by comparing their indexes with the domain specific indexes stored in content server [54] Partitioning The data warehouse houses tables which are sometimes millions of rows deep and thousands of columns wide. This increases access time and maintenance cost. The partitioning is done to enhance the performance and makes the management easy. Partitioning can be done horizontally or vertically. Horizontal partition distributes rows in a table into groups which decreases maintenance cost as the number of index level decreases due to decreases in number of rows per partition. Vertical partition creates group of columns from a table and divide them into number of tables. It improves data access time by accessing required columns. Partitioning optimize the hardware performance and simplify the management of data warehouse. The fact table in data warehouse grows to many hundreds of gigabytes in size. This too large size of fact table is very hard to manage as a single entity. By partitioning the fact table into sets of data the query procedures can be enhanced. The query performance is enhanced because the query scans the partitions that are relevant. It does not have to scan the large amount of data Materialized View Materialized view contains the aggregated data derived from a fact table in order to provide fast answer to user queries. It requires amounts of space in order to store aggregated and pre-calculated data. Views are frequently updated whenever the associated tables upon which views are built are updated. When working with large amount of joined and aggregated data, materialized view helps improve overall performance. Materialized view can also be created with unique clustered index to improve query performance because the view is stored in the database in the same way a table is stored with a clustered index. Indexed materialized view enhances performance because of the following reasons: Aggregations are pre-calculated and stored in the index Stored pre-joined tables 41

21 Materialization views can be built upon and often supports frequent queries. However for unpredicted queries, the system must scan and access the actual data. [13] proposed several algorithms for optimized cost effective selection of materialized view. Materialized view created by the selection of query clustering in XML data warehouse [33]. 42

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Modelling Architecture for Multimedia Data Warehouse

Modelling Architecture for Multimedia Data Warehouse Modelling Architecture for Warehouse Mital Vora 1, Jelam Vora 2, Dr. N. N. Jani 3 Assistant Professor, Department of Computer Science, T. N. Rao College of I.T., Rajkot, Gujarat, India 1 Assistant Professor,

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

14. Data Warehousing & Data Mining

14. Data Warehousing & Data Mining 14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

B.Sc (Computer Science) Database Management Systems UNIT-V

B.Sc (Computer Science) Database Management Systems UNIT-V 1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 4, July 2013

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 4, July 2013 An Architecture for Creation of Multimedia Data Warehouse 1 Meenakshi Srivastava, 2 Dr. S.K.Singh, 3 Dr. S.Q.Abbas 1 Assistant Professor, Amity University,Lucknow Campus, India, 2 Professor, Amity University

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction

More information

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE Carmen Răduţ 1 Summary: Data quality is an important concept for the economic applications used in the process of analysis. Databases were revolutionized

More information

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course

More information

Sterling Business Intelligence

Sterling Business Intelligence Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

ETL-EXTRACT, TRANSFORM & LOAD TESTING

ETL-EXTRACT, TRANSFORM & LOAD TESTING ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA rajesh.popli@nagarro.com ABSTRACT Data is most important part in any organization. Data

More information

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University Given today s business environment, at times a corporate executive

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option The following is intended to outline our general product direction. It is intended for

More information

Data W a Ware r house house and and OLAP Week 5 1

Data W a Ware r house house and and OLAP Week 5 1 Data Warehouse and OLAP Week 5 1 Midterm I Friday, March 4 Scope Homework assignments 1 4 Open book Team Homework Assignment #7 Read pp. 121 139, 146 150 of the text book. Do Examples 3.8, 3.10 and Exercise

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Web 3.0 image search: a World First

Web 3.0 image search: a World First Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional

More information

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

A Survey on Data Warehouse Architecture

A Survey on Data Warehouse Architecture A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8 Enterprise Solutions Data Warehouse & Business Intelligence Chapter-8 Learning Objectives Concepts of Data Warehouse Business Intelligence, Analytics & Big Data Tools for DWH & BI Concepts of Data Warehouse

More information

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT IT0457 Data Warehousing G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline What is data warehousing The benefit of data warehousing Differences between OLTP and data warehousing The architecture

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments. Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute

More information

Data Integration and ETL Process

Data Integration and ETL Process Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies Ashish Gahlot, Manoj Yadav Dronacharya college of engineering Farrukhnagar, Gurgaon,Haryana Abstract- Data warehousing, Data Mining,

More information

Microsoft Business Intelligence

Microsoft Business Intelligence Microsoft Business Intelligence P L A T F O R M O V E R V I E W M A R C H 1 8 TH, 2 0 0 9 C H U C K R U S S E L L S E N I O R P A R T N E R C O L L E C T I V E I N T E L L I G E N C E I N C. C R U S S

More information

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc. Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc. Introduction Abstract warehousing has been around for over a decade. Therefore, when you read the articles

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What

More information

Understanding Data Warehousing. [by Alex Kriegel]

Understanding Data Warehousing. [by Alex Kriegel] Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Business Intelligence, Data warehousing Concept and artifacts

Business Intelligence, Data warehousing Concept and artifacts Business Intelligence, Data warehousing Concept and artifacts Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from

More information

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach 2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020,

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 244 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 31 Introduction to Data Warehousing and OLAP Part 2 Hello and

More information

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract 224 Business Intelligence Journal July DATA WAREHOUSING Ofori Boateng, PhD Professor, University of Northern Virginia BMGT531 1900- SU 2011 Business Intelligence Project Jagir Singh, Greeshma, P Singh

More information

University Data Warehouse Design Issues: A Case Study

University Data Warehouse Design Issues: A Case Study Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

Self-Service Business Intelligence

Self-Service Business Intelligence Self-Service Business Intelligence BRIDGE THE GAP VISUALIZE DATA, DISCOVER TRENDS, SHARE FINDINGS Solgenia Analysis provides users throughout your organization with flexible tools to create and share meaningful

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.

More information

Data Warehouse Design

Data Warehouse Design Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

DATABASE MANAGEMENT SYSTEM

DATABASE MANAGEMENT SYSTEM REVIEW ARTICLE DATABASE MANAGEMENT SYSTEM Sweta Singh Assistant Professor, Faculty of Management Studies, BHU, Varanasi, India E-mail: sweta.v.singh27@gmail.com ABSTRACT Today, more than at any previous

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

Using Relational Algebra on the Specification of Real World ETL Processes

Using Relational Algebra on the Specification of Real World ETL Processes Using Relational Algebra on the Specification of Real World ETL Processes Vasco Santos CIICESI - School of Management and Technology Polytechnic of Porto Felgueiras, Portugal vsantos@estgf.ipp.pt Orlando

More information