Design tip #156 containing the drill- across macro:
|
|
|
- Kory Lydia Joseph
- 10 years ago
- Views:
Transcription
1 Thank you for attending the ITWeb Event in Johannesburg! Enclosed are the slides from Ralph Kimball s presentations for your personal use. Please respect the Kimball Group copyright. Please do not share or distribute these slides or place them on a public file server. Thank you. Ralph P.S. During the presentations I mentioned some Kimball Group resources that might be useful. Please find these on our public website: My four most important white papers: Design tip #156 containing the drill- across macro: and finally if you would like to receive our design tips, please select Get the Latest Design Tip on and follow the link to subscribe.
2 The Data Warehouse: The Essential Platform for Business Intelligence, Behavior Analysis, Predictive Analytics, Real Time Operational BI, A Simple and Powerful Approach #ITWebRalph Kimball Group, 2013 The Remarkably Durable Data Warehouse Mission Marshal all your available data assets Publish the assets in the most effective way Partner with the business Support decision making Be nimble, incremental, adaptive, distributed But, face the challenges before leaping to a solution 1
3 Marshaling Your Data Assets Hundreds of internal sources Thousands of external sources Varying grains, labels, business rules, quality Profound need for INTEGRATION Nationwide Insurance Customer Knowledge Store From: Delivering an On Your Side Experience, Teradata/Nationwide White Paper 2
4 11/16/13 Social Media Anyone? Publishing Effectively You are a publisher Collect inputs from a variety of sources Edit them for quality and consistency Publish on a regular basis You have professional responsibilities Hold the reader s trust Responsibility stops with you Protect the crown jewels Your success is measured by your end users 3
5 Partner With The Business Business sponsor drives the project Sponsor must be thoughtful, sophisticated observer of the development project Embed IT in the business departments Expect disruptive technologies and approaches to originate in business depts Provide infrastructure Promote cross department collaboration Support Decision Making Understand the BI cycle 1) Query and report è publish the right data 2) Find exceptions è visualize, threshold, compare, alert 3) Ask WHY è drill down, drill across, find what s correlated 4) Model potential decisions è what if; predictive analytics 5) Track the Decisions Made è Pay for the DW Understand the user Cognitive psychology User s conceptual model Willingness to program Count the clicks 4
6 Be Nimble, Incremental, Adaptive, Distributed Move quickly in an agile way with closely spaced deliverables Cut and try, backtrack and go forward Small teams, loosely coupled Deemphasize formal project overhead Hardware is tissue paper, discard after use Virtualization is good; Cloud is good A Solution Must: Maximize value of intellectual capital Apply domain knowledge Express intuition with simple gestures Make decisions Reach all decision makers While minimizing costs Design Admin (routine, little surprises, big surprises) Obvious money (personnel, hardware, software) Hidden money (lost opportunities, diversions) 5
7 The Engineer s Response Sort through the science Decompose the problem Choose a practical approach, emphasizing Re-usability; Simplicity; Symmetry; Technique Don t be distracted by ideal designs Don t assume perfect information or complete control Manage the project well Always keep the original design goals in mind But acknowledge that we have Non-negotiable design drivers: Simplicity as measured by end users Performance expectations of end users Somewhat negotiable design drivers: Overhead (design, routine admin, little and big surprises) Costs (personnel, software, hardware) Risk of no result or irrelevant result Risk of too much centralization 6
8 Decompose the Solution Data Warehouse BI Tools Querying & Reporting ad hoc query, standard report Business Requirement Interviews Data Profiling Audits The Bus Matrix Single Source Processes e.g., Orders Consolidated Processes, e.g. Profitability Find Exceptions Causal Analysis Model Potential Decisions triggers, threshold, alert, graph, hot/cold graph, statistics, decision tree, neural net, clustering cohort groups, mini dimensions, behavior scoring, case based predictions advanced analytics Decisions Track Decisions Made ad hoc query, standard report User Preferences User Interfaces First Powerful Idea: Expose the Data as Dimensional Models 7
9 Second Powerful Idea: Integrate All Dim Models with DW BUS Use the BUS MATRIX to drive the design and implement integration Technical planning Team communication Executive communication How To Start a Dimensional Design Find the natural measurement processes Immediately profile to verify readiness Build a fact table for each measurement process at the lowest possible grain Decorate each fact table with a set of dimensions describing everything you know 8
10 How To Finish a Dimensional Design Conform the common dimensions and facts that appear in more than one design Complete the logical dimensional design of as many processes as you can identify Design consolidated fact tables, e.g., profit-ability, identifying first level dependencies Begin the implementation with the first level designs the users need the most Finish the implementation with consolidated designs the users need the most Why Are Schemas So Important? Supporting BI requires Powerful ETL toolsè dimensional schemas Dimensional schemas è powerful BI tools Dimensional schemas Directly support simple, fast databases Can be continuously modified and extended Have standard designs for drilling down, drilling across, and handling time Support all five business intelligence steps 9
11 Meeting the Goal of Simplicity A dimensional approach is natural Humans chunk in 3 to 7 groupings Dimensions appear spontaneously in many UIs Data cubes OLAP Original star designs by Chris Date Challenge is to preserve the dimensional feel on the final screens We (IT designers) are genetically selected because we tolerate complexity Meeting the Goal of Performance The star schema is the simplest relational model Optimization devolves into only two strategies (IN and OUT) Indexing has become routine support many equal entry points with bitmap indexes In-memory and columnar DBMSs ideally suited for dimensional schemas OLAP cube technologies are known for astounding performance 10
12 Standard Design: Drilling Down Handling atomic data No separate archive layer! Atomic data remains in dimensional form as the foundation of each data mart! Atomic data is the most naturally dimensional data Atomic data erases the objection that a dimensional model is designed to answer a business question Drill down apps descend gracefully all the way Standard, reusable logic for drill down Standard Design: Drilling Across è INTEGRATION Handling separate data sources Combine into single fact table in the rare case of identical dimensionality and simultaneity Otherwise, drill across at query time Conformed dimensions guarantee structure of final answer Conformed facts guarantee meaning of final answer Master Data Management! All drill across applications on a DW BUS have identical, re-usable logic Drill across architecture allows separate performance tuning of each data source 11
13 Standard Design: Handling Time Handling time variance in dimensions E.g. evolving customer profile or product descrip Standard SCDs (Slowly Changing Dimensions) Type 1: Overwrite Type 2: Partition history for a physical change Type 3: Alternate reality for a label change Standard administration of keys for all 3 types: the surrogate key pipeline Standard approach simplifies ETL development: reuse modules Standard Design: Handling Time Handling time granularity in fact tables E.g., instantaneous measurement events, short and long running processes Standard choices of grain Transaction, Periodic snapshot, Accumulating snapshot Standard schemas for each type, regardless of application Standard ETL designs for each type 12
14 Standard Design: Reusable Logic for Application Development Time dimension Sales fact table Product dimension Ad hoc queries: Drag and drop directly from the schema regardless of the subject matter time_key (PK) SQL_date day_of_week week_number month Store dimension store_key (PK) store_id store_name address district region time_key (FK) product_key (FK) store_key (FK) promo_key (FK) dollars units cost product_key (PK) SKU description brand category package_type size flavor Promotion dimension promotion_key (PK) promotion_name promotion_type price_treatment ad_treatment display_treatment coupon_type drag and drop! drag and drop! drag and drop! drag and drop! compute... District Brand Total Dollars Total Cost Gross Profit Atherton Clean Fast $ 1,233 $ 1,058 $ 175 Predictable Administration Daily administration nearly identical for all applications Process dimensions (type 1,2,3) Release dimensions centrally Process facts through surrogate key pipeline Final QA with end user report writer Release fact tables locally 13
15 Minimize $ Costs Standard design techniques Improve efficiency of consulting Improve precision of size and processing estimates Re-usable, virtually identical ETL modules Reduce ETL development Much less key administration with denormalized dimensions Surrogate key fact tables extremely tight Minimize disk storage Simple indexing and aggregation strategy Performance ultimately comes from clever data structures and software Avoid brute force hardware approach Reduce Risks of No Result or Wrong Result Agile, incremental, adaptive design approach avoids the Big Bang Graceful modification when confronted with Big Surprises keeps the project moving forward Bus Matrix used as a project management guide focuses on dependencies and allows individual data sources to be published early 14
16 Little Surprises Late arriving facts Look up old surrogate keys Insert into correct partition Late arriving dimensions Generate new surrogate key Update fact table if correcting history New attribute values Data, not structure! No recoding for apps and user interfaces New facts (at existing grain) Big Surprises Alter fact table to add measure field New dimensions Alter fact table to add foreign key field New dimension attributes Alter dimension to add field Increased granularity of a dimension E.g. Store sales è Cash register sales Add dimension attribute ALL OF THESE BIG SURPRISES HAVE NO EFFECT ON EXISTING APPLICATIONS OR PRIOR REPORTS! 15
17 Big Data Earthquakes Non-relational data Name-value pairs Unstructured and hyper-structured data No database: schema on read Complex branching and iterative logic NoSQL analytic programming Does this break the Data Warehouse model? Is there hope for the Data Warehouse? Come to the afternoon talk to find out! What Have We Accomplished? Cut the data warehouse problem down to size Techniques for implementing the parts Cost effective AND user effective Five-Step support for business intelligence To learn more: Come to the afternoon seminar Read our books Download free articles from Sign up for Design Tips at Come to our classes in USA, Europe, South Africa, Australia Data Warehouse Lifecycle in Depth Dimensional Modeling in Depth Data Warehouse ETL in Depth 16
18 THANK YOU! 17
19 Architecture and Techniques for Building the Dimensional Data Warehouse #ITWebRalph Kimball Group, 2013 Highest Level Design Principles Goal: Simple and fast (users perspective!) Concentrate complexity in the ETL, not BI Resist schema compromises trying to address performance fears Expose final BI layer as dimensional Include lowest grain data in final BI layer Build a distributed EDW tied by conformed dimensions Rely on generally agile approach 1
20 Fundamental Nature of Fact Tables A record for each numeric measurement from a process fully additive ==> semi-additive ==> non-additive measurement is true to the grain A many to many relationship, hence a fact table key is a composite of several foreign keys, usually a subset the key implements the grain but the grain is expressed in business terms Fundamental Nature of Dimension Tables A dimension record is a collection of text-like descriptors that provide the context for fact records Some of the descriptors are physical and some are labels we apply Normally there is exactly one dimension record associated with a given fact record Dimensions are independent 2
21 A Typical Dimensional Schema Fact Table Keys and Fields 2 or more foreign keys (FKs), each implementing a connection to a dimension table primary key (PK) Primary key of fact table can be all the FKs but is usually a subset Must a fact table have a primary key? What does it mean if it does not? May be one or more degenerate dimensions (DDs) left over from original parent keys Remaining measurement fields should all be numeric Goal: encode every fact table key into 4 bytes: Kimball s fact tables are the tightest in the industry 3
22 Dimension Table Keys and Fields A single primary key (PK) Primary key assigned by the data warehouse, therefore is a surrogate key Surrogate key is a meaningless integer assigned in sequence as dimension records are needed May have a natural key field, if supplied by production system, and is unique and durable Many other text and text-like fields, usually of medium and low cardinality Many complex 1:1, many:1, and many:many relationships among fields Many complex business rules implicitly embedded in the structure of the dimension and fact tables and their relationship to each other How the BI Layer Uses a Fact Table Subset of fact table is chosen as result of dimensional constraints Pattern of dimension constraints completely unpredictable, especially if # dims > 6 Best indexing strategy is usually a separate index on each FK Numeric facts are aggregated at query time, almost always by summing Facts are rarely constrained directly and therefore are rarely indexed 4
23 How the BI Layer Uses a Dimension Table Existence of attributes revealed by BI tool at query time, e.g., SELECT COLNAME FROM SYSTABLES WHERE TABLENAME = Product Permissible values of an attribute revealed by BI tool at query time, e.g., SELECT DISTINCT brand_name FROM product Attribute values restricted by constraints on other fields in dimension table. Browsing Could you browse the attribute values in this dimension based on constraints on other dimensions? How? Drilling down means add a row header Building a Fact Table: The Four Design Steps Choose the Process A uniform, single source of data Choose the Grain What does a fact table record mean? Choose the Dimensions All descriptive context that has a single value in the presence of the measurement, i.e., is true to the grain Choose the Facts Numeric additive measurements, true to the grain 5
24 Key Concept #1: Surrogate Keys Data warehouse has additional responsibilities compared to legacy systems Data warehouse must take over key assignment A surrogate key is a sequentially assigned integer, devoid of any meaning or order Except for the ordering of the date surrogate key Implications of Surrogate Keys Requires lookup and replacement of current value of production key in both dimension and fact tables Allows data warehouse to assign new key version for a slowly changing dimension Allows data warehouse to encode uncertain, not known, not recorded, and null record types Insulates data warehouse from unexpected administrative changes and mistakes in production Improves performance of joins May significantly shrink fact tables Extra ETL work, but all surrogate key pipelines similar 6
25 Processing Dimension Records with Diff Compare* Today's Complete Set of DImension Records product_id description brand category package_type size Yesterday's Complete Set of Dimension Records diff compare & policy decisions Known Changed Dimension Records product_id description brand category package_type size etc. input to loading process product_id description brand category package_type size CRC a changed attribute * implement Diff Compare with CRC algorithm Adding a Changed Record to a Dimension Most Recent Key Map lookup max product key product_id product_key Known Changed Dimension Records product_id description brand category package_type size etc. update key map newly inserted surrogate key add surrogate key to load record Dimension Records With New Surrogate Key product_key (PK) product_id description brand category package_type size etc. load to DBMS dimension table 7
26 Processing a Fact Table Record Most Recent Time Key Map Most Recent Product Key Map Most Recent Store Key Map Most Recent Promotion Key Map time_id time_key product_id product_key store_id store_key promo_id promo_key Fact Table Records With Production IDS time_id product_id store_id promotion_id dollar_sales unit_sales dollar_cost replace time_id with surrogate time_key replace product_id with surrogate product_key replace store_id with surrogate store_key replace promotion_id with surrogate promotion_key Fact Table Records With Surrogate Keys time_key product_key store_key promotion_key dollar_sales unit_sales dollar_cost load fact table records into DBMS Dimensions Shared by Fact Tables in a Value Chain 8
27 A Bus Matrix For the Retail Value Chain Key Concept # 2: Conformed Dimensions Identical dimensions are obviously conformed A dimension that is a perfect subset of rows and/or columns is conformed The contents of common columns must perfectly match (drawn from the same domains) 9
28 Drilling Across Means Combining Row Headers Open separate connection to each source Assemble each answer set Merge answer sets on the conformed row headers in the BI tool Product Manufacturing Shipments Warehouse Inventory Retail Sales Turns Framis Toggle Widget What Is Centralized and What Is Not Separately Centralized Localized 10
29 Master Data Management: A New Name for Familiar Themes Master Data Management is descended from 360 degree view of the customer Customer relationship management Supply chain product master Uniform chart of accounts è Conformed dimensions! Customer Dimension Integration: an MDM in Your Future? Master data management is an elegant solution if you can control your sources. Otherwise you need a downstream MDM 22 11
30 Key Concept # 3: Three Fundamental Grains Individual Transactions Periodic Snapshots Accumulating Snapshots Comparison of the Three Grains Transaction Grain A point in space and time Record exists only if something happens Highly dimensional Usually a single generic amount fact A Transaction dimension to explain the fact No need to change this record if correct 12
31 Comparison of the Three Grains Periodic Snapshot Grain A span of time, regularly repeated Record and all the FDKs exist every period even if no activity Most dimensions common with transaction grain except for Transaction Open ended number of facts summarizing activity within period No need to change this record if correct Comparison of the Three Grains Accumulating Snapshot Grain All history from the beginning of time in a single record Single record exists because the entity was created Most dimensions common with other types but with multiple dates (4 to 8 or more typical) Open ended number of facts, similar to periodic snapshot Record is changed repeatedly as long as activity occurs Suitable for processes that have a definite beginning and end such as orders or shipments and most work flows Full detail is always available in a companion transaction table 13
32 Key Concept # 4: Slowly Changing Dimensions Type 1: Overwrite Correct mistakes, or expunge old data Type 2: Create a New Dimension Record Perfectly partition history, usually by a physically observable change Type 3: Add a Old Attribute to Existing Dim Record Allow alternate realities to exist simultaneously Usually observer-defined labels or categories Type 1: Overwrite 14
33 Type 1: Overwrite Update in place with SQL No need to change dimension key Rebuild affected aggregates, if any Replicate updated dimension to all affected data marts Type 2: Add a Dimension Row 15
34 Type 2: Add a Dimension Row Generate new surrogate key Update most recent key list Insert record into dimension No need to rebuild any aggregates, as long as change made today Replicate updated dimension to all affected data marts Type 3: Add a Dimension Column 16
35 Type 3: Add a Dimension Column Alter dimension table to add Old Field, if needed Push existing field value into Old Field Now treat as Type 1 Rapidly Changing Monster Dimensions (SCD 4) Split apart stable and volatile data Generate demographic records as they occur New dimension called mini-dimension Customer_key Name Original_Addr Date_of_Birth First_Order_Dt Income Education Marital_Status Num_Children Credit_Score becomes: Customer_key Name Original_Addr Date_of_Birth First_Order_Dt Demog_key Income_Band Education Marital_Status Num_Children Credit_Band 17
36 One Dimension or Two? Dim A: 1000 unique values Dim B: 100 unique values Scenario 1: 1000 AB combinations observed A rolls up to B in a hierarchy: one dimension Scenario 2: 10,000 AB combinations observed Each A has an average of 10 B s Interesting to browse together One dimension or two? Scenario 3: 75,000 AB combinations observed A s and B s virtually uncorrelated: two dimensions One Dimension or Two? One dimension (AB together): Allows browsing of joint occurrences independent of any fact table Creates a larger dimension than either A or B except in the extreme case of a perfect hierarchy May be unduly sensitive to Type 2 changes Two dimensions (A and B separate): Browsing of joint occurrences only possible through a fact table A factless fact table can provide coverage, otherwise potential joint occurrences missed 18
37 Advanced Concept: Many Valued Dimensions Individuals in an account Diagnoses for a patient Industry classifications (SICs) for a company Report additive measures both correctly allocated and overlapping Many Valued Diagnosis Dimension using Bridge Table Correctly weighted: group by customer attribute, using weighting factor Impact (overlapping): group by customer attribute, ignoring weighting factor 38 19
38 Advanced Concept: Variable Depth Hierarchies 1 1x1 = x2 = x3 = x4 = x5 = 10 Blocks represent individual customer entities, connected in an organization tree. Assume that revenue can be received from any or all of these entities. sum = 43! Variable Depth Hierarchies: Simple Recursive Pointers Have Limitations recursive pointer identifying the customer key of the parent Commercial Customer dimension customer_key (PK) customer_id customer_name (several fields) customer_address (several fields) customer_type industry_group date_of_first_purchase purchase_profile credit_profile parent_customer_key fact table any fact table containing customer_key as a foreign key... additive facts 20
39 Advanced Concept: Mapping Paths of a Hierarchy Customer Dimension Organization Map Revenue Fact Customer_entity_key name address type... Optional view definition to look like a normal fact table with single valued keys. Parent_entity_key Subsidiary_entity_key depth_from_parent level_name lowest_subsidiary_flag highest_parent_flag Time_key Customer_entity_key Product_key Contract_key revenue_dollars Connect the Organization Map as shown to find all revenue dollars at or below a specific Customer. Optionally constrain on depth = 1 to restrict to immediate direct subsidiaries. Optionally constrain on lowest_subsidiary_flag to restrict to lowest level subsidiaries only. There is one record in the Organization Map for every connection from an customer to each of its subsidiaries at all depths. Thus there are 43 records resulting from the graph on the preceding slide. The level name attribute in the organization map can be managed to summarize subsidiaries across partially overlapping hierarchies, such as geographic hierarchies that contain states but may or may not contain counties. Using the Path Table Requires Non Unique Parent Selections Select c.state, c.county, sum(f.revenue) from customer c, fact f, time t where f.customer_key in (select distinct b.subsidiary_key from customer d, bridge b where d.state = California and d.customer_key = b.parent_key and d.customer_key = c.customer_key) and f.timekey = t.timekey and t.month = September, 1999 group by c.state, c.county 21
40 E: Getting the Data Into the DW First: Get the Data Into the DW prepare to start Comprehensive Requirements Logical Data Map Data Profiling (1) judge data Change Data Capture (2) Extract (3) isolate changes get into DW Result: Extracted Tables incl Format Conversions 2006 Kimball University. All rights reserved. T: Clean and Conform Second : Clean and Conform cleaning machinery Cleansing Sys & Data Quality Screens (4) cleaning control d d f d Error Event Schema (5) w. Audit Dimension (6) d integration Deduplicating (7) and Conforming System (8) Result: Cleaned Tables and Conformed Dimensions 2006 Kimball University. All rights reserved. 22
41 L: Prepare for Presentation Third: Prepare for Delivery time variance SCD Manager (9) Fact Table Types (13): Late Arriving Data (16) fact types & late data keys Surrogate Key Generator (10), Pipeline (14) Dimension Manager (17), Fact Provider (18) admin hierarchies bridges Hierarchy Table Manager (11): Fixed, Variable, Ragged Special Dimensions (12), Multi-Valued Dimensions (15) Agg Tables (19), OLAP Cubes (20), DI Manager (21) Result: Fact & Dim Tables Ready for Delivery aggregates, cubes, & data integration 2006 Kimball University. All rights reserved. M: Manage All the Processes Fourth: Manage control protect Job Scheduler (22) Backup (23) Lineage & Dependency (29) Problem Escalation (30) Recovery/Restart (24) Pipeline/Parallelize (31) source respond speed control Version Control (25) & Migration (26) Security (32) guard measure Workflow Monitor (27) Compliance (33) comply speed Sorting (28) Metadata Repository (34) manage 2006 Kimball University. All rights reserved. 23
42 And (maybe) R: Adapt to Real Time Fifth: Adapt to Real Time streaming real-time ETL system d d f d d convert existing systems 2006 Kimball University. All rights reserved. Fundamental Design #1: Classic Daily Item Sales Grain = item X store X day Should promotion be part of the grain? 24
43 Transaction Level Retail Sales Grain = line item on a sales ticket X promotion Size Estimator for Retail Sales Fact table: $6 billion gross revenue, $2 average line item è 3 billion fact records per year 18 fields X 4 bytes X 3 billion X 3 years = 648 GB Add 9 fields X 4 bytes X 3 billion X 3 years (324 GB) for indexes on keys = 972 GB Product dimension: 600,000 records X 50 fields X 10 bytes = 300 MB Add factor of 3 for indexes = 1.2 GB Customer dimension: 2,000,000 records X 8 fields X 4 bytes = 64 MB Add factor of 3 for indexes = GB All other dimensions are tiny 25
44 Interesting Retail Sales Design Issues Precise time stamp needs to be split into two dimensions Product and store dimensions each have > 50 attributes Is there a cash register dimension? Customer dimension could be a 2-tier dimension Employee dimension comes from HR Causal dimension tracks price treatments, media ads, store displays, and coupons Ticket # is a degenerate parent More Interesting Retail Sales Design Issues Sales dollars and sales units are obvious facts; where do you show average retail price? How would you do a price level analysis (sales at each of the prices)? If your data source also supplies cost dollars, how do you get gross profit and gross margin? If your stores span multiple currencies, how do you generalize the dollar sales and cost fields? 26
45 SQL for Price Point Analysis SELECT product_name, sales_dollars/ sales_quantity, sum(sales_dollars) FROM WHERE GROUP BY product_name, sales_dollars/ sales_quantity In other words, you group by product and price point Resist The Urge to Snowflake Simple Hierarchies Snowflaking is converting from 2NF to 3NF Hierarchies are converted into branches Is there any difference in information content? Product Dimension product_key SKU_description SKU_number package_size_key package_type diet_type weight weight_unit_of_measure storage_type_key units_per_retail_case... and many more package_size_key package_size brand_key storage_type_key storage_type shelf_life_type_key brand_key brand subcategory_key shelf_life_type_key shelf_life_type subcategory_key subcategory category_key category_key category department_key 27
46 Fundamental Design #2: Inventory Periodic Snapshot Process = Weekly Inventory Snapshot, every product in every store Declare the lowest possible grain How do the dimensions in this design relate to the retail sales dimensions? How many facts do we include and what is the unit of measure? Simplest Inventory Snapshot Grain = week X item X store 28
47 Semi Additive Facts Inventory levels and bank balances are measures of intensity Measures of intensity are generally semi-additive or non-additive E.g., temperatures are non-additive Semi additive measures must be averaged across the combined cardinality of the constraints on all non-adding dimensions Fundamental Design #3: Full Profitability Schema Manufacturing shipment invoices Grain = invoice line item Activity based costs initially available at higher levels (such as month, product line, warehouse) Costs must be allocated down to the grain 29
48 Full Profitability Schema With Allocated Costs Fundamental Design #4: Incompatible Account Types Retail bank with 25 lines of business All account types share a few common facts (e.g., balance) and a few common attributes (e.g., primary account holder) Each account type has 5 to 20 incompatible facts and 5 to 20 incompatible attributes Requirement #1: Provide efficient single schema for CRM, cross selling and upselling Requirement #2: Allow specialists is probe deeply into special facts and attributes for each line of business 30
49 Super Type and Sub Type Designs for Incompatible Account Types Interesting Example: Student Applicant Pipeline Processes: All applicant transactions at a college Grain: End-to-end accumulating snapshot for each applicant/student Build a single tracking table that tracks timestamps and quantities for Preliminary Test Score Receipt Information Requested; Information Sent Interview Conducted On-Site Campus Visit Application Submitted; Transcript Received; Test Scores Received Recommendations Received Admissions First Pass Review Reviewed for Financial Aid Admissions Final Decision Applicant Decision Received Admitted; Enrolled 31
50 The Applicant Pipeline Accumulating Snapshot Confronting a Big Design: Web Retailer with 22 Processes All operational processes Build a BUS Matrix spanning Supplier transactions Parts and bills of materials for in house manuf. Customer facing processes Infrastructure, including Employee labor HR Facilities Web site operations 32
51 The Web Retailer BUS Matrix Upgrading Your Architecture: Real Time Data Warehousing Add a daily, in memory hot partition to each type of fact table. The partition contains all the activity that has occurred since the last update of the static data warehouse. We assume that the static tables are updated each night at midnight. links as seamlessly as possible to the grain and content of the static data warehouse fact tables If possible, is not indexed so that incoming data can be continuously dribbled in supports highly responsive queries 33
52 Hot Partition Design Details Identical dimensional structures as static historical fact table, implemented as current partition All transactions since midnight, dribbled continuously, possibly fed by message bus traffic Special exception handling for unavailable (delayed) dimension entities Transaction example: 400 MB Periodic snapshot example: 800 MB Converting Complex Behavior into a Text Fact Basic attributes for all customers: recency, frequency, intensity Divide each into quintiles Label the major clusters (A, B, C, ) 34
53 A Behavior Tracking Example Identified clusters: A: High volume repeat customer, good credit, few product returns B: High volume repeat customer, good credit, many product returns C: Recent new customer, no established credit pattern D: Occasional customer, good credit E: Occasional customer, poor credit F: Former good customer, not seen recently G: Frequent window shopper, mostly unproductive H: Other A typical cluster evolution: John Doe: C C C D D A A A B B Design Alternatives for Cluster Evolution Fact table record for each customer for each month, with the behavior tag as a textual fact. Slowly changing customer dimension record (Type 2) with the behavior tag as a single field. A new customer record is created for each customer each month. Same impact as #1. Single customer dimension record with a 24 month time series of tags as 24 attributes. #3 is the best choice in a relational database. Why? Moral: Text facts are constraint intensive, not computation intensive, so design is different 35
54 What Have We Accomplished? Introduced the main techniques of dimensional modeling that support business intelligence Four steps in design of a star schema Conformed dimensions and facts Three types of fact tables Slowly changing dimensions (SCDs) Handling variable depth hierarchies Designs for comprehensive customer facing views, full profitability, real time partitions and behavior tracking Kimball Group 1: Read the books, download 100+ free articles 2: Sign up for Kimball University design tips 3: Come to our classes! Data Warehouse Lifecycle in Depth Dimensional Modeling in Depth Data Warehouse ETL in Depth Articles, Design Tips, Class Schedules: www. kimballgroup. com 36
55 THANK YOU! 37
56 Big Data Can Drive Business and IT to Evolve and Adapt #ITWebRalph Kimball Group, 2013 Big Data Itself is Being Monetized Executives see the short path from data insights to revenue and profit Big data provides new insights, especially behavior Analytic sandbox results taken directly to management Data is becoming an asset on the balance sheet Value by Cost to produce the data Cost to replace the data if it is lost Revenue or profit opportunity provided by the data Revenue or profit loss if data falls into competitors hands Legal exposure from fines and lawsuits if data is exposed to the wrong parties 1
57 The DW: An Increasingly Sophisticated Platform Low latency operational data mixed with history New sources, new users, mixed workloads Customer behavior data Web traffic, session analysis, significant integration Analysis of big data Non-relational programming: complex branching & procedural processing, advanced analytics Many new data formats and data types: social media; device sensors, unstructured and hyper-structured Mission unchanged: Publish the Right Data Big Data Analytic Use Cases Behavior tracking Search ranking Ad tracking Location and proximity tracking Causal factor discovery Social CRM Share of voice, audience engagement, conversation reach, active advocates, advocate influence, advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment ratio, and idea impact Financial account fraud detection/intervention System hacking detection/intervention On line game gesture tracking 2
58 More Big Data Use Cases Non-numeric data and unique algorithms Document similarity testing Genomics analysis Cohort group discovery Satellite image comparison CAT scan comparisons Big science data collection Complex numeric data Smart utility meters Building sensors In flight aircraft status Data bags name/value pairs with ad hoc content Data Bags: Extremely Sparse With Unpredictable Content E.g., financial disclosure information on loan application: Payload is arbitrary list of name-value pairs No limit to number of pairs or value data types Data bags may contain data bags recursively 3
59 Big Data Requires Integration: Eliminating Cisco s Data Silos From: Houston: We Have a Problem The traditional pure relational data warehouse architecture can t handle ANY of these use cases. We need: Non-scalar data: vectors, arrays, data bags, structured text, free text, images, waveforms Iterative logic, complex branching, advanced statistics Petabyte data sources loaded at gigabytes/second Analysis in place across thousands of distributed processors, data often not in database format, full data scans often needed Schema on read Analysis while loading 4
60 Two Architectures to the Rescue: Zero Sum or Hybrid Coexistence? Extended relational Extend the current formidable RDBMS legacy Add features/functions to address big data analytics MapReduce/Hadoop Build new architecture for big data analytics Open source top level Apache project Thousands of participants Existing RDBMS Based Data Warehouse 5
61 11/16/13 Extended Relational Data Warehouse with Big Data Additions MapReduce/Hadoop Figure 2.3 from Tom White's book, Hadoop, The Definitive Guide, 2nd Edition, (O'Reilly, 2010) 6
62 Hadoop Distributed File System HDFS is the distributed file system that supports a number of Apache Hadoop projects: Native MapReduce programming (Java, Ruby, C++) High level MapReduce compilers Pig \procedural interface, leverages DBMS skills Hive SQL interface, described as a data warehouse Hbase open source, non-relational, column oriented DB running directly on HDFS Cassandra open source, distributed database based on BigTable data model Comparing the Two Architectures Relational DBMSs Proprietary, mostly Expensive MapReduce/Hadoop Open source Less expensive Data requires structuring Data does not require structuring Great for speedy indexed lookups Great for massive full data scans Deep support for relational semantics Indirect support for complex data structures Indirect support for iteration, complex branching Indirect support for relational semantics, e.g. Hive Deep support for complex data structures Deep support for iteration, complex branching Deep support for Little or no support for transaction processing transaction processing 7
63 Hybrid Possibilities Data Warehouse Oriented Architecture Alternatives MapReduce/Hadoop as an ETL step Extract row/column results from Big Data HDFS as an ultra low cost archive $10,000 per TB vs. $300 per TB! Really? SQL compatibility proliferating in BD apps Hive of course, but even tools like Splunk HDFS nodes as actual MPP RDBMS nodes! Clever possibilities for Teradata, Greenplum, etc Pseudo-HDFS and Pseudo-Apache apps Proprietary HW and SW configurations 8
64 Returning to the Integration Challenge Big data integration is familiar drilling across: Establish conformed attributes (e.g., Customer Category) in each database Fetch separate answer sets from different platforms grouped on the conformed attributes Sort-merge the answer sets at the BI layer Implementing Caches of Increasing Latency Each cache supports different class of applications Data quality improves left to right High value facts and dimensions pushed right to left 9
65 Data Warehouse Disruptions The rise of the data scientist "newly discovered patterns have the most disruptive potential, and insights from them lead to the highest ROIs." Sandboxes Demand for low latency Thirst for exquisite detail Light touch data waits for relevance to be exposed Simple analysis of all the data trumps sophisticated analysis of some of the data Dozens of special purpose programming languages Have we killed the golden goose of DW integration? Whither the Data Warehouse? Corral and embrace the data scientists Scientists and IT must meet each other half way Insist on using shared data warehouse resources Conformed dimensions è integrate now, avoid data silos! Virtualized data sets è quick prototypes, migrate to prod. Build a cross department analytic community with IT partnership Ditch the waterfall approach, go agile Led by business but sophistication required Closely spaced deliverables, midcourse corrections 10
66 The Kimball Group Resource Best selling data warehouse books NEW BOOK! DW Toolkit 3 rd Edition è In depth data warehouse classes taught by primary authors Dimensional modeling (Ralph/Margy) ETL architecture (Ralph/Bob) Dimensional design reviews and consulting by Kimball Group principals Kimball Group White Papers on Integration, Data Quality, and Big Data Analytics THANK YOU! 11
Big Data Can Drive the Business and IT to Evolve and Adapt
Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
NEWLY EMERGING BEST PRACTICES FOR BIG DATA
2000-2012 Kimball Group. All rights reserved. Page 1 NEWLY EMERGING BEST PRACTICES FOR BIG DATA Ralph Kimball Informatica October 2012 Ralph Kimball Big is Being Monetized Big data is the second era of
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
Dimensional Data Modeling for the Data Warehouse
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Dimensional Data Modeling for the Data Warehouse Prerequisites Students should
The Data Warehouse ETL Toolkit
2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,
THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days
Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project
Multidimensional Modeling - Stocks
Bases de Dados e Data Warehouse 06 BDDW 2006/2007 Notice! Author " João Moura Pires ([email protected])! This material can be freely used for personal or academic purposes without any previous authorization
COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design
COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data
Chapter 7 Multidimensional Data Modeling (MDDM)
Chapter 7 Multidimensional Data Modeling (MDDM) Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. To assess the capabilities of OLTP and OLAP systems 2.
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on
Data Warehouse (DW) Maturity Assessment Questionnaire
Data Warehouse (DW) Maturity Assessment Questionnaire Catalina Sacu - [email protected] Marco Spruit [email protected] Frank Habers [email protected] September, 2010 Technical Report UU-CS-2010-021
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 11/11/2013-1- Data Warehouse design DATA MODELLING - 2- Data Modelling Important premise Data warehouses typically reside on a RDBMS
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Lection 3-4 WAREHOUSING
Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
Kimball Dimensional Modeling Techniques
Kimball Dimensional Modeling Techniques Table of Contents Fundamental Concepts... 1 Gather Business Requirements and Data Realities... 1 Collaborative Dimensional Modeling Workshops... 1 Four-Step Dimensional
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected]
Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected] Trend Too much information is a storage issue, certainly, but too much information is also
Optimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
Getting Started Practical Input For Your Roadmap
Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.
Overview Data Warehousing and Decision Support Chapter 25 Why data warehousing and decision support Data warehousing and the so called star schema MOLAP versus ROLAP OLAP, ROLLUP AND CUBE queries Design
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
Traditional BI vs. Business Data Lake A comparison
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment firms turn big data into actionable research
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance
Brochure More information from http://www.researchandmarkets.com/reports/2248199/ Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Description: - This is the first book to provide
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Data warehousing with PostgreSQL
Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Presented by: Jose Chinchilla, MCITP
Presented by: Jose Chinchilla, MCITP Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence SQL Server 2008 Customers & Partners Current Positions: President, Agile
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
High-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Cloud Integration and the Big Data Journey - Common Use-Case Patterns
Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures
Exploring the Synergistic Relationships Between BPC, BW and HANA
September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
The Principles of the Business Data Lake
The Principles of the Business Data Lake The Business Data Lake Culture eats Strategy for Breakfast, so said Peter Drucker, elegantly making the point that the hardest thing to change in any organization
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Building an Effective Data Warehouse Architecture James Serra
Building an Effective Data Warehouse Architecture James Serra Global Sponsors: About Me Business Intelligence Consultant, in IT for 28 years Owner of Serra Consulting Services, specializing in end-to-end
Big Data and Your Data Warehouse Philip Russom
Big Data and Your Data Warehouse Philip Russom TDWI Research Director for Data Management April 5, 2012 Sponsor Speakers Philip Russom Research Director, Data Management, TDWI Peter Jeffcock Director,
Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide
Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide IBM Cognos Business Intelligence (BI) helps you make better and smarter business decisions faster. Advanced visualization
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
Microsoft Data Warehouse in Depth
Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition
Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect
Reflections on Agile DW by a Business Analytics Practitioner Werner Engelen Principal Business Analytics Architect Introduction Werner Engelen Active in BI & DW since 1998 + 6 years at element61 Previously:
Evolving Data Warehouse Architectures
Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014 TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving
Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract
224 Business Intelligence Journal July DATA WAREHOUSING Ofori Boateng, PhD Professor, University of Northern Virginia BMGT531 1900- SU 2011 Business Intelligence Project Jagir Singh, Greeshma, P Singh
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here
Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006
Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
The IBM Cognos Platform
The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata
MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy Satish Krishnaswamy VP MDM Solutions - Teradata 2 Agenda MDM and its importance Linking to the Active Data Warehousing
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
This Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze
Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the
Patterns of Information Management
PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy
White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Establish and maintain Center of Excellence (CoE) around Data Architecture
Senior BI Data Architect - Bensenville, IL The Company s Information Management Team is comprised of highly technical resources with diverse backgrounds in data warehouse development & support, business
Sterling Business Intelligence
Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:
Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann
Trivadis White Paper Comparison of Data Modeling Methods for a Core Data Warehouse Dani Schnider Adriano Martino Maren Eschermann June 2014 Table of Contents 1. Introduction... 3 2. Aspects of Data Warehouse
What s New with Informatica Data Services & PowerCenter Data Virtualization Edition
1 What s New with Informatica Data Services & PowerCenter Data Virtualization Edition Kevin Brady, Integration Team Lead Bonneville Power Wei Zheng, Product Management Informatica Ash Parikh, Product Marketing
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley
Tiber Solutions Understanding the Current & Future Landscape of BI and Data Storage Jim Hadley Tiber Solutions Founded in 2005 to provide Business Intelligence / Data Warehousing / Big Data thought leadership
