1 1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used to capture, collect, integrate, store and analyze the data with the purpose of generating and presenting information used to support business decision making. Business intelligence is about creating intelligence about a business. This intelligence is based on learning and understanding the facts about business environment. Business intelligence is a framework that allows a business to transform data into information, information into knowledge and knowledge into wisdom which in turn empowers users to make sound business decisions. Business intelligence is a comprehensive endeavor because it encompasses all business processes within an organization. Business processes are the central units of operation in a business. Implementing business intelligence in an organization involves capturing not only business data but also the metadata. Business intelligence involves the following general steps 1. Collecting and storing operational data. 2. Aggregating the operational data into decision support data. 3. Analyzing decision support data to generate information. 4. Presenting such information to the end user. 5. Making business decisions. 6. Monitoring results to evaluate outcomes of the business decisions. Business Architecture: Business architecture covers a range of technologies and applications to manage the entire data life cycle. Business intelligence functionality ranges from simple data gathering and extraction to very complex data analysis and presentation applications. Business architecture is composed of data, people, processes, technology and management of such components. The main focus of business intelligence is to gather, integrate and store business data for the purpose of creating information. Business intelligence integrates people and processes using technology in order to add value to the business. Business intelligence architecture can be described with the help of basic components that form part of its infrastructure. There are four basic components that all business intelligence environments should provide. 1. Data extraction, transformation and loading tools:
2 This component is in charge of collecting, filtering, integrating and aggregating operational data to be saved into a data store optimized for decision support. As the name implies, this component extracts the implies, this component extracts the data, filters the extracted data to select the relevant records. 2. Data store: The data store is optimized for decision support and is generally represented by a data warehouse or a data mart. The data store contains business data extracted from operational database and from external data sources. 3. Data query and analysis tool: This component performs data retrieval, data analysis and data mining tasks using the data in the data store and business data analysis model. This component is used by data analyst to create queries that access the database. 4. Data presentation and visualize tools: This component is in charge of presenting the data to the end user in a variety of ways. This component is used by the data analyst to organize and present the data. Each BI component has generated a fast growing market for specialized tools. Some of the sample BI tools are Decision Support Systems: DSS is an arrangement of computerized tools used to assist managerial decision making within a business. Dash Boards and Business activity monitoring: Dashboards us web-based technologies to present key business performance indicators or information in a single integrated view. Portals: Portals provide a unified single point of entry for information distribution. These are web based technology that uses a web browser to integrate data from multiple sources into a single web page. Data analysis and reporting tools: Advanced tools are used to query multiple diverse data sources to create single integrated reports. Data mining: This tool provides advanced statistical analysis to uncover problems hidden within business data. Data ware houses: It is a foundation on which business intelligence infrastructure is built. OLAP tools: Online analytical processing provides multidimensional analysis. Data visualization: This tool provides advanced visual analysis to understand business data. Data warehouse: The data warehouse is an integrated, subject-oriented, time-variant, non-volatile data base that provides support for decision making. 1. Integrated: The data warehouse is a centralized, consolidated database that integrates data derived from the entire organization and from multiple sources. Data integration implies that all business entities, data elements, data characteristics are described in the same way through out the enterprise. 2. Subject oriented: The data warehouse data is arranged and optimized to provide answers to questions coming from diverse functional areas from 2
3 3 diverse functional areas within a company. Data warehouse data are organized and summarized by topic. 3. Time variant: The warehouse data represent the flow of data through time. It can even contain projected data. 4. Non volatile: Once data enter the data warehouse, they are never removed. The data warehouse Is always growing. Difference between data warehouses and operational databases Characteristic Operational database data Data warehouse data Integrated Subject-oriented Time- variant Non volatile Similar data can have different representations or meanings. Data are stored with a functional or process orientation. Data are recorded as current transactions Data updates are frequent and common Provide a unified view of all data elements with a common definition. Data are stored with a subject oriented that facilitates multiple views of data. Data are recorded with a historical perspective in mind. Data cannot be changed. Twelve rules that define a data warehouse: 1. The data warehouse and operational environments are separated. 2. The data warehouse data are integrated. 3. The data warehouse contains over a long time. 4. The data warehouse data are snapshot data captured. 5. The data warehouse data are subject oriented. 6. The data warehouse data are mainly read only. 7. The data warehouse developments life cycle differs from classical systems development. 8. The data warehouse contains current detail data, old detail data, lightly summarized data and highly summarized data. 9. The data warehouse environment is characterized by read only transactions to very large data sets. 10. The data warehouse environment has a system that traces data sources, transformations and storage. 11. The data warehouse s metadata are a critical component of this environment. 12. The data warehouse contains a charge back mechanism for resources usage.
4 4 OLAP (online Analytical processing): Online Analytical processing is an advanced data analysis environment that supports decision making, business modeling, and operations research. Characteristics of OLAP: a. Use multidimensional data analysis techniques. b. Provide advanced data base support c. Provide easy-to-use end user interface d. Support client/server architecture. Multidimensional data analysis techniques: In multidimensional analysis, data are processed and viewed as a part of multidimensional structure. This multidimensional view allows end users to consolidate or aggregate data at different levels. It allows a business analyst to easily switch business perspectives. Additional functions: 1. Advanced data presentation functions: 3D graphics, pivot tables, crosstab, 3D cubes and other such facilities are compatible with desktop spreadsheets, statistical packages and query and report packages. 2. Advanced data aggregation consolidation and classifications functions: These include business oriented variables, financial and accounting ratios and statistical and forecasting functions. 3. Advanced data modeling functions: These provide support for linear programming, variable assessments, variable contributions to outcome and other modeling tools. 4. Advanced database support: To deliver efficient decision support, OLAP tools must have advanced data access features. 1. Access to many different kinds of DBMSS, flat files, internal and external data sources. 2. Access to aggregate data warehouse data 3. Advanced data Navigational features such as drill-down and roll-up. 4. Rapid and consistent query response times. 5. The ability to map end-user s. 6. Request support for very large databases. Easy to use End- user Interface: Easy to use graphical user interfaces make sophisticated data extraction and analysis tools easily accepted and readily used. client/server architecture: The client/server environment enables us to divide an OLAP system into several components that define its architecture.
5 5 OLAP Architecture: OLAP operational characteristics can be divided into 3 main modules ü Graphical user Interface ü Analytical processing logic ü Data processing logic OLAPGUI ANALYTICAL PROCESSING LOGIC DATA PROCESSING LOGIC OPERATI ONAL DATA DATA WAREH OUSE OLAP systems are designed to use both operational and data warehouse data. OLAP system components located on a single computer. One problem here is that each data analyst must have a powerful computer to store the OLAP system and perform all data processing locally. Each analyst uses a separate copy of the data. In other words each end user must have a private the benefits of a single business image shared among all users. In this architecture the OLAPGUI runs on client workstation, while the OLAP engine or server, composed of OLAP analytical processing logic and OLAP data processing logic, runs on a shared computer. The OLAP server will be the front end and this front end or middle layer accepts and processing request generated by the many end user analytical tools. The enduser GUI might be a custom made program or more likely a plug in module. The data warehouse is created and maintained by a process or software tool that is independent of OLAP system. This independent software performs the data extraction, filtering and integration necessary to transform operational data into data warehouse data. In most implementations, the data warehouse and OLAP are interrelated while the data warehouse holds integrated, subject-oriented, time-variant, and non-volatile decision support
6 6 data, the OLAP system provides the frontend through which end users access and analyze such data. The OLAP system can provide a multidimensional data store component which can be shown as OLAPGUI Operational data Analytical processing logic Data processing logic OLAPGUI OLAPGUI DATA WAREH OUSE multi-user s access OLAP engine. Data as a corporate asset:- Data is a valuable asset that requires careful management. Data is a valuable resource that can translate into information. An organization is subject to a data-information-decision cycle, that is the data user applies intelligence to data to produce information that is the basis of knowledge used in decision making by the user. To manage data as a corporate asset, manager must understand the values of information that is, processed data. The need for and role of a database in an organization:- Data is used by different people in different departments for different reasons. Data management must address the concept of shared data. In every organization the database main role is to support managerial decision at all levels in the organization while preserving data privacy and security. An organization s managerial structure might be divided into three levels: Top level, middle level and operational level. Operational management makes daily operational decisions. Operational decisions are short term and affect only daily operations. Middle level management makes tactical decisions. Tactical decisions involve a longer time frame and affect large scale Operations. Top level management makes strategic decisions. Strategic decisions affect the long term well being of the company or even its survival. At the top management level, the database must be able to do the following 1. Provide the information necessary for strategic decision making, strategic planning, and goals definition. 2. Provide access to external and internal data to identify growth opportunities and to chart the direction of such growth. 3. Provide a frame work for defining and enforcing organizational policies. 4. Provide feedback to monitor whether the company its achieving its goals.
7 7 At middle management level, the database must be able to do the following: a) Deliver the data necessary for tactical decisions and planning. b) Provide a framework for enforcing and ensuring the security means of the data in the database. Security means protecting the data against unauthorized users. Privacy deals with the rights of individuals and organization determines to whom, what data usage is given. At the operational management level the database must do the following: a) The database must represent and support the company operations as closely as possible. b) The database has to produce query results within specified performance levels. c) The database enhances the company s short term operational ability. Security:- Security means to ensure the confidentiality, integrity and availability of an information system and data. Security makes necessary to secure the over all information. System architecture that means hardware systems, software applications, the network and its devices. The three security goals are (1) Confidentiality:- Confidentiality deals with ensuring that data is protected against unauthorized access. If the data are accessed by an authorized user, that means data are used for an authorized purpose. Data must be evaluated and classified according to the level of confidentiality. The three levels are a) Highly restricted: means very few people have access. b) Confidential: means only certain groups gave access. c) Unrestricted: means data can be accessed by all users. (2) Integrity:- Within the data security framework integrity means keeping data consistent free of anomalies or errors. The DBMS plays an important role in ensuring the integrity of the data in the database. Form security point of view integrity deals not only with the data in the database, but also with ensuring that organizational processes, users, and usage patterns maintain such integrity. (3) Availability:- Availability refers to the accessibility of data whenever required by authorized users and for authorized purpose. To ensure data availability, the entire system must be protected from service degradation or interruption caused by any source. System availability is an important goal of security. Security Policies:- Security policy is a collection of standards, policies and procedures created to guarantee the security of a system. Security Policies:- (or) Security Threats
8 8 A security vulnerability is to allow unauthorized access or cause service disruption. When a security vulnerability is left unchecked, it could become a security threat. A security threat is an imminent security violation. When a security branch has occurred, then the database integrity is affected as either preserved or corrupted. Preserved means an action is required to avoid the repetition of similar security problems, but data recovery may not be necessary. Corrupted means an action is required to avoid the repetition of similar security problems such as database access by computer viruses and by hackers. When database is corrupted it must be recovered to a consistent state. The various system components for security threats are as follows. a) People b) Workstations and servers c) Operating systems d) Applications e) Networks f) Data Database Security:- Protecting the data in the database is a function of authorization management. Authorization management defines procedures to protect database security and integrity. Those procedures include a. User access management b. View definition c. DBMS access control d. DBMS usage monitoring (a) User access management:- This function is designed to limit access to the database and include the following procedures: 1. Defining each user to the database. This is achieved at the operating system level and at the DBMS level. At the operating system level, the DBA will create a user Id that allows the end user to log on to the computer system. At the DBMS level, the DBA can create a user Id authorize the end user to access the DBMS. 2. Assigning passwords to each user. This is also done at both operating system and DBMS levels. The database passwords can be assigned with predetermined expiration dates. 3. DBA assigned access privileges or access rights to specific users to access specified database. Access rights include READ, WRITE and DELETE privileges. Access privileges in relational database are assigned through SQL GRANT and REVOKE commands. 4. Physical security can prevent unauthorized users from directly accessing the DBMS installation and facilities. Common physical security practices are electronic personnel badges, closed circuit video or biometric technology etc. (b) View definition:-
9 9 The DBA must define data views to protect and control the slope of the data that are accessible to an authorized user. A view is a logical representation of one or more tables. It is a mirror image of a table. A view takes out of a query and treats it as a table with out any restriction. Therefore view can be also called virtual table. The main advantage of view is it does not occupy space in memory because of view has no own data. The accessing rights are given to a user or group of users. The SQL command CREATE VIEW is used in relational database to define views. Ex:- to create a view for clerks the following command is used. SQL> create view emp_clerk as select * from emp where job= CLERK ; (c)dbms access control:- Database access can be controlled by placing limits on the use of DBMS query and reporting tools. (d) DBMS Usage Monitoring:- The DBA must audit the use of the data in the database. Several DBMS packages have the feature of creating an audit log, which automatically records a brief description of the database operations performed by all users. Database Administration Tools:- Database Administration tools are a) Data Dictionary b) CASE Tools. Data Dictionary:- Data Dictionary is a DBMS component which stores the characteristics and relationships of data that means it stores meta data that is data about the data. The two main types of data dictionaries are a) Integrated b) Standalone An integrated data dictionary is included with the DBMS. For example all relational DBMS include a built in data dictionary or system catalog that is frequently accessed and updated by RDBMS. The order DBMS do not have a built-in data dictionary so the DBA may use standalone data dictionary system. Data dictionaries are also classified as active or passive An active data dictionary is automatically updated by the DBMS with every database access. A passive data dictionary is not updated automatically. It requires a batch process running. In the data dictionary to store information no standalone format is followed. The data dictionary stores the following information of elements.
10 10 1. The data elements that are defined in all tables of all databases are stored in data dictionary. It contains the data elements names, data types, display format, internal storage format and validation rules. 2. The data dictionary stores the tables defined in all databases. It stores the name of the table creator, the date of creation and the number of columns. 3. The DBMS stores in data dictionary the information about indexes which are defined for each database table. The information contains index name, the attributes used, the location and the creation data etc. 4. The data dictionary also stores the relationships among data elements. It includes which elements are involved and whether the relationships are mandatory or optional. The DBA can use the data dictionary to support data analysis and design. The data dictionary can also be used by application programmers to meet all of the naming standards for the data elements in the database. So the data dictionary can be used to support a wide range of data administration activites. 2. CASE Tools:- CASE stands for Computer Aided Systems Engineering. CASE tool provides an automated framework for the systems development life cycle (SDLC). CASE tools play an important role in information systems development. CASE uses structured methodologies and powerful graphical interfaces. CASE tools provide support for the planning analysis and design phases. Back end CASE tools provide support for the coding and implementation phases. The CASE data dictionary stores data flow diagrams (DFDS), structure charts, descriptions of all external and internal entities, data stores, data items, report formats and screen formats. A CASE data dictionary also describes the relationships among the components of the system. The database and application designers use the CASE tools to store the description of the database schema, data elements, application processes, screens and reports. A CASE environment improves the quality of communication among the DBA, application designers and the end users. When the CASE tool indicates conflicts, rules violations and inconsistencies, it facilitates making corrections. A CASE tool provides 5 components. i. Graphics designed to produce structured diagrams such as data flow diagrams, ER diagrams, class diagrams and object diagrams. ii. Screen painters and report generators to produce the information systems input and output formats. iii. An integrated repository for storing the system design data. iv. An analysis segment to provide a fully automated check on system consistency and syntax. v. A program documentation generator.
11 11 The DBA s Technical Role:- The DBA s technical role include the selection, installation, operation, maintenance and upgrading of the DBMS and utility software. The DBA s technical role also include the design, development and maintenance of the application programs that interact with the database. The DBA s technical activities are a logical extension of the DBA s managerial activities. The technical aspects of the DBA s job are rooted in the following areas of operation: i. Evaluating, selecting and installing the DBMS and Utilities. ii. Designing and implementing databases and applications iii. Testing and evaluating databases and applications. iv. Operating the DBMS, utilities and applications v. Training and supporting users. vi. Maintaining the DBMS, utilities and applications. i. Evaluating, selecting and installing the DBMS and Utilities:- DBA s first and most important technical responsibilities is selecting the database management system, utility software and supporting hardware to be used in the organization. DBA must develop and execute a plan for evaluating and selecting the DBMS and that plan must be based on the organization s needs. To match DBMS capability to the organization s needs, the DBA has to develop is checklist of desired DBMS features. That DBMS checklist should address the following issues. a) Which data model that means relational or object-oriented model serves the better company s needs. b) What maximum disk and database size is required and what are other storage needs. c) Which programming languages are supported, what application development tools like database schema design, data dictionary are available. d) Does the DBMS support referential and integrity rules, access rights and does the DBMS support the use of audit tracks to spot errors and security violations. e) Does the DBMS provide some automated backup and recovery tools. f) Does the DBMS support multiple users. g) How many transactions per second does the DBMS support. h) Can the DBMS run on difficult operating systems and platforms. i) What hardware does the DBMS require, does the DBMS have a data dictionary and does DBMS support any CASE tools. j) What costs are involved in the acquisition of the software and hardware. How many additional personnel are required.
12 12 ii. Designing and Implementing Databases and Applications The DBA function also provides dat modeling and design services to endusers. The primary activities of a DBA are to determine and enforce standards and procedures to be used. The DBA must ensure that the database modeling and design activities are using appropriate standards and procedures. The DBA then provides the necessary assistance and support during the design of the database at conceptual, logical and physical levels. The DBA also works with application programmes to ensure the quality and integrity of database design and transactions. The implementations of the applications requires the implementation of the physical database. Therefore, the DBA must provide assistance during the physical design, including storage space determination and creation, data loading, conversion. The DBA s implementation tasks also include the generation, compilation, and storage of the application s access plan. An action plan is a set of instructions generated at application completion time that predetermines how the application will access the database at runtime. Before an application comes online, the DBA must develop, test and implement the operational procedures required by the new system. Such operational procedures include utilizing training, security and backup and recovery plans. iii. Testing and Evaluating Databases and Applications:- The DBA, must also provide testing and evaluation services for all the database and end user applications. Procedures and standards must be tested before any application program can be approved for use in the company. Testing usually starts with the loading of the testbed database. Testbed database contains test data for the applications. Testbed database s purpose is to check the data definition and integrity rules of the database and application programs. The testing and evaluation of a database application cover all aspects of the system. The evaluation process covers the following; a) Technical aspects of both the applications and the database. Backup and recovery, security and integrity, use of SQL and application performance must be evaluated. b) Evaluation of the written document to ensure that the documentation and procedures are accurate to follow.
13 13 c) Observance of standards for naming, documenting and coding. d) The enforcement of all data validation rules. After the through testing of all applications, the database, and the system is declared operational and can be made available to end users. iv. Operating the DBMS, Utilities and Applications:- DBMS operations can be divided into 4 main areas: a. System support b. Performance monitoring and tuning c. Backup and recovery d. Security auditing and monitoring System support activities cover all tasks directly related to the day to day operations of the DBMS and its applications. These activities include filling out job logs and verifying the status of computer hardware, disk packages and emergency power sources. Performance monitoring and tuning activities are designed to ensure that the DBMS, utilities and applications maintain satisfactory performance levels. DBMS offer include performance-monitoring tools that allow the DBA to query database usage information. To produce satisfactory performance, the DBA has to spend much time trying to educate programmers and end users on the proper use of SQL statements. The DBA should create indexes that can be used to improve system performance. The concurrency issue is important to the efficient operation of the system. The DBA must be familiar with the factors that influence concurrency. During DBMS performance tuning, the DBA must also consider available storage resources in terms of both primary and secondary memory. The allocation of storage resources is determined when the DBMS is configured. Performance monitoring issues are DBMS specific. Therefore, the DBA must before familiar with the DBMS manuals to learn the technical details involved is the performance-monitoring task. Backup and recovery activities are of primary concern during the DBMS operation. The DBA must establish a schedule for backing up database and log files at appropriate intervals. All the critical system components like the database, database applications, and transaction logs must be backed up periodically. Database recovery after a media or systems failure requires application of the transaction log to the correct database copy. Security auditing and monitoring involve creating users, assigning access rights, using SQL commands to grant and revoke access rights to users and database objects, creating audit trails to discover security violations. The DBA must periodically generate an audit trail report to determine whether there have been attempted security violations or not, and if so from what locations, by whom. v. Training and supporting users
14 14 DBA s another technical activity is to train people how to use the DBMS and its tools. DBA s also provides technical training for application programmer how to use DBMS and its utilities. Application programmers training covers the use of the DBMS tools as well as procedures and standards required for database programming. vi. Maintaining the DBMS, Utilities and Applications The maintenance activities of the DBA are an extension of the operational activities. DBMS maintenance includes management of the physical or secondary storage devices. One of the most common maintenance activities is reorganizing the physical location of data in the database. The reorganization of a database might be designed to allocate contiguous disk-page locations to the DBMS to increase performance. The reorganization process also might free the space allocated to deleted data, thus providing more disk space for new data. Maintenance activities also include upgrading the DBMS and utility software. The upgrade might require the installation of a new version of the DBMS software or an Internet front-end tool.