Retail POS Data Analytics Using MS Bi Tools Business Intelligence White Paper
Introduction Overview There is no doubt that businesses today are driven by data. Companies, big or small, take so much of effort to collect huge amount of data from wide range of sources and mediums such as contact details, financial and operational data, buyer behavior, and even social media data. With the help of this data, companies try to understand their strengths and weaknesses as well their competitors to make better business decisions. In the case of retail sector, retailers often have to rely on store-level POS data which is in huge quantity, gets created on daily / real-time basis and is not well-organized for analysis. Therefore, in order to understand their customers and provide them the best service and shopping experience, store / retail operators need to convert raw retail store data into intelligent information. This paper attempts to give insights about how through various business intelligence tools and technologies, organizations can derive meaningful information from huge chunks of data. Purpose The purpose of this paper is to highlight architectural and technical approach for the optimization of retail Point of Sale (POS) data analysis. Scope The paper s scope is limited to the basic concepts, tools and technologies of Microsoft Business Intelligence. Intended Audience The target audience of this paper are all decision makers, strategists, and top-level management professionals who are engaged in taking critical decisions at the strategic, tactical, and operational levels for their organizations. Contata Solutions 2015 Page 2
Problem Statement In today s competitive retail business landscape, some of the major challenges faced by the retail / store operators worldwide are: Aligning the speed of data capture: recording and converting data into information so as to take right decisions at the right time Breaking information silos Data and information consistency at every level of an organization Trend / Pattern analysis to make tactical and strategic decisions Since critical information directly affects sales and profitability, retail / store operators need to make quick strategic, marketing and operational decisions. Unavailability of such information often leads to disastrous business impact, such as: ineffective decision making due to unprocessed and incorrect data loss of time and money involved in extracting and compiling information from multiple locations / systems / subsystems time gap between the availability of information and the communication done to perform the action misalignment among strategic, tactical, and operational decisions Microsoft DWBI Tools and Technologies SQL Server SQL Server 2014 Standard vs. Enterprise Edition: SQL Server is used for relational database to store transactional database as well as define and store data warehouse. By opting the Enterprise version over the Standard version, one can optimize performance significantly. SQL Server Integration Services (SSIS) SSIS provides Extract, Transform and Load (ETL) capabilities for data import, data integration and data warehousing needs. Its GUI tools help to build workflows such as extracting data from various sources, querying data, transforming data and converting the processed data into required shapes. It can be used in day-to-day business operations as well as for data mining and data warehouse applications. SQL Server Analysis Services (SSAS) SSAS adds OLAP and data mining capabilities for SQL Server databases. SQL Server Reporting Services (SSRS) SSRS provides server-based platform designed to support wide variety of reporting needs. It delivers relevant information across the entire enterprise and helps in creating and managing both static and parameterized reports, while providing a sound platform for delivering information. Contata Solutions 2015 Page 3
Technical Solution Contata Solutions undertook a project that involved helping a retail store perform analysis on POS data. The data was in CSV format and the analysis had to be done on the basis of customer segmentation, geography, product consolidation, and seasonality / trend analysis. Given below are the requirements based on which the project was executed: Source Database Source data was available in multiple CSV format. Required Outcome The following analysis outcome was required: Customer Analysis Customer segmentation on the basis of: o Number of days since last visit o Number of orders in past 12 months o Dollar value of transactions Polarity between high-value and low-value customers Customer loyalty Store Analysis Total number of store visits on daily, weekly, and monthly basis Total sales on daily, weekly, and monthly basis High-selling products Product Analysis Products commonly bought together Sales by product category Product consolidation strategy on basis of high-value, less-cost products Seasonality / Trend Analysis Average order value by month Sales on festival season Increase in sales of a particular product during a baseball or football series Hardware Infrastructure The following hardware infrastructure was used for the project: Server 1 DB Server: 8 Core Processor, 64 GB RAM, Storage size: 1.5 TB Server 2 SSIS server: 8 Core Processor, 64 GB RAM, Storage size: 0.5 TB Decision on SQL Server Edition Case 1: SQL Server 2014 Standard Edition SQL Server 2014 Standard Edition was used initially, but the following issues were encountered: When data was transferred from CSV into the SQL Server staging database, its size was approximately 100 GB with 80% of the data distributed into 2 main tables related to daily transaction and transaction line item details. It was taking 2 minutes to count number of records. There are over 200 million records. To optimize the database, some steps had to be taken such as Table Partitioning, Columnstore Index, etc. However, Standard Edition did not have those features, hence it was decided to move to SQL Enterprise Edition. Contata Solutions 2015 Page 4
Case 2: SQL Server 2014 Enterprise Edition To improve performance, the following steps were taken: 1. SSIS Optimization To gain best results for data load, two separate servers were used for SSIS server and Database server. This was because the SQL Server uses a user mode cooperative multi-tasking and resource control that assumes 100% ownership of the system, and therefore consumes all the memory. In addition, even Lookups were cached to improve performance. 2. Table partition: Table is partitioned on the basis of months a. Hard drive with 250 GB storage was selected keeping in mind the scope for future scalability for both Transaction Database and Datawarehouse. b. Created filegroup for each month that maps each quarter filegroups with the hard drive. c. Data was partitioned on the basis of months, with the data of first month of any year going to the the partition range 1 (see below diagram). d. Define partition scheme to map partition range with filegroup. e. Associate table and partition scheme during table creation on month field. SSIS packages read the data for each partition from Staging Database and transferred the data to Datawarehouse. Both Staging Database and Datawarehouse main tables were partitioned. 3. Clustered column store index Since reports had to be defined from Datawarehouse, clustered columnstore index was used to gain query performance over traditional row-oriented storage. This was because the data was stored in columnar data format and was compressed over the uncompressed data size. As a result, query performance improved from 2 minute to 2 seconds for counting the total number of records. Contata Solutions 2015 Page 5
4. Query optimization Query are optimized like: Use Actual column in select statement instead of Select * Minimize the subquery usage Proper indexes are created in tables Avoid Full table scan wherever possible Avoid group by over multiple keys Avoid getting data from multiple left joins 5. Partial cube processing In order to do partial processing for cube for incremental data, the cube was partitioned on month s basis using views with each view corresponding to each month. Summary In summary, the following techniques were used to optimize overall architecture and query performance: SSIS optimization having SSIS run on separate server than DB server SSIS optimization using lookup cache Query optimization Table partitioning Clustered columnstore index Cube partitioning References http://technet.microsoft.com/ http://msdn.microsoft.com/ Contata Solutions 2015 For more information visit: www.contata.com