Data Warehouses Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations. Using Data Warehouse Information There are decision support technologies that help utilize the data available in a data warehouse. These technologies help executives to use the warehouse quickly and effectively. They can gather data, analyze it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains: Tuning Production Strategies - The product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly. Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc. Operations Analysis - Data warehousing also helps in customer relationship management, and making environmental corrections. The information also allow
business operations. s us to analyze Data Mining Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. Data, Information, and Knowledge Data Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes: operational or transactional data such as, sales, cost, inventory, payroll, and accounting nonoperational data, such as industry sales, forecast data, and macro economic data meta data - data about the data itself, such as logical database design or data dictionary definitions Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale
transaction data can yield information on which products are selling and when. Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts. Problem Statement A prospecting study indicates that there is a potentiality of investment in a virgin mining project in India. An investor is conducting a feasibility study for the viability of the project so that he may invest. Assuming your own data, carry out an investment analysis, considering all the risk involved in this
investment on the mining project. Tasks are: 1. Clearly mention your (a) what type of data are needed and your assumptions, (b) what are the preliminary statistical analyses of the data you would conduct, (c) what type of data mining methods you would be using for this type of analysis. 2. Generate the data (mentioning the bounds) depending on your choice of the deposit. 3. Use any open source data-warehousing and data-mining tool(s) for conducting this analysis. The developed tool needs to be presented during the GREATSTEP event. (Hint: logically define your key performance parameters, and choose the dimensions for generating themultidimensional database, the update frequency, data volume, the granularity levels, etc.).