HOW TO: Using Big Data Analytics to understand how your SharePoint Intranet is being used



Similar documents
Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Updating Your SQL Server Skills to Microsoft SQL Server 2014

Upgrading Your SQL Server Skills to Microsoft SQL Server 2014 va

MS 10977B Upgrading Your SQL Server Skills to Microsoft SQL Server 2014

Updating Your SQL Server Skills from Microsoft SQL Server 2008 to Microsoft SQL Server 2014

Updating Your SQL Server Skills to Microsoft SQL Server 2014

10977B: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Course 10977: Updating Your SQL Server Skills to Microsoft SQL Server 2014

Updating Your SQL Server Skills to Microsoft SQL Server 2014 (10977) H8B96S

Upgrading Your SQL Server Skills to Microsoft SQL Server 2014

How To Extend An Enterprise Bio Solution

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

Course MS20467C Designing Self-Service Business Intelligence and Big Data Solutions

LEARNING SOLUTIONS website milner.com/learning phone

Please contact Cyber and Technology Training at for registration and pricing information.

Azure Data Lake Analytics

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Client Requirement. Why SharePoint

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Designing Self-Service Business Intelligence and Big Data Solutions

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

SQL Server 2012 Business Intelligence Boot Camp

Implementing a SQL Data Warehouse 2016

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions

Configuring and Deploying a Private Cloud. Day(s): 5. Overview

BI xpress Product Overview

Enterprise Solutions IT Services 4132 Heartleaf Ln Naperville, IL 60564

The Inside Scoop on Hadoop

Developing Microsoft Azure Solutions

Implementing a Data Warehouse with Microsoft SQL Server 2012

Developing Microsoft Azure Solutions 20532A; 5 days

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Designing a Data Solution with Microsoft SQL Server 2014

Implementing Microsoft Azure Infrastructure Solutions

Course Outline. Module 1: Introduction to Data Warehousing

Microsoft Azure for IT Professionals 55065A; 3 days

MS 20247C Configuring and Deploying a Private Cloud

MS 20465C: Designing a Data Solution with Microsoft SQL Server

Upgrading Your SQL Server Skills to Microsoft SQL Server 2014

BI on Cloud using SQL Server on IaaS

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Implementing Microsoft Azure Infrastructure Solutions

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Implementing Microsoft Azure Infrastructure Solutions

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Hadoop in the Hybrid Cloud

DBA xpress Product Overview

Server & Cloud Management

Implementing a Data Warehouse with Microsoft SQL Server 2012

SQL Server 2016 New Features!

SharePoint 2013 PerformancePoint Services Course 55057; 3 Days

First experiences using SharePoint 2016 Preview running on Windows 2016 Preview and SQL 2016 Preview.

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Building a BI Solution in the Cloud

This module explains the Microsoft Dynamics NAV architecture and its core components.

Implementing a Data Warehouse with Microsoft SQL Server 2012

Automating. Administration. Microsoft SharePoint with Windows. PowerShell 2.0. Gary Lapointe Shannon Bray. Wiley Publishing, Inc.

SharePoint Governance Execution

SQL Server Business Intelligence

For Sales Kathy Hall

Configuring and Deploying a Private Cloud with System Center 2012 MOC 10751

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Modernizing Your Data Warehouse for Hadoop

Developing Microsoft Azure Solutions 20532B; 5 Days, Instructor-led

Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice.

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

SharePoint 2013 Business Intelligence

Implementing Microsoft Azure Infrastructure Solutions

Big data variety, 179 velocity, 179 volume, 179 Blob storage containers

Course: SharePoint 2013 Business Intelligence

Office 365 Professional Onboarding Services

Implementing a Data Warehouse with Microsoft SQL Server 2012

MS 50511A The Microsoft Business Intelligence 2010 Stack

Elena Terenzi. Technology Advisor for Big Data in Oil and Gas Microsoft

SQL Server Administrator Introduction - 3 Days Objectives

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

MIGRATING SHAREPOINT TO THE CLOUD

Microsoft Training and Certification Guide. Current as of December 31, 2013

East Asia Network Sdn Bhd

Setup Guide for AD FS 3.0 on the Apprenda Platform

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

DreamFactory on Microsoft SQL Azure

This three-day instructor-led course provides existing SQL Server database professionals with the knowledge

Developing Business Intelligence and Data Visualization Applications with Web Maps

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Implementing a Data Warehouse with Microsoft SQL Server

SharePoint 2013 Business Intelligence Course 55042; 3 Days

Updating your Database skills to Microsoft SQL Server 2012

Using Application Insights to Monitor your Applications

Phase Name 1 Initiation. 2 Requirements. 3 Design

70-246: Monitoring and Operating a Private Cloud with System Center 2012

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

Administering Microsoft SQL Server 2012 Databases

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Implementing a Data Warehouse with Microsoft SQL Server

Transcription:

HOW TO: Using Big Data Analytics to understand how your SharePoint Intranet is being used About the author: Wim De Groote is Expert Leader Entreprise Collaboration Management at Sogeti. You can reach him at wim.de.groote@sogeti.com Summary: This article illustrates how Sogeti helps customers to better understand how their intranet is being used. We collect log file data from the intranet, process and analyze it in the cloud, and then display the results again inside the Intranet. To achieve this, we combine SharePoint with big data analytics HDInsight (Hadoop) on Azure. Page 1 of 7

From Data to Insights The background: We have several customers that are modernizing their collaboration tools. We as Sogeti are helping them migrate from SharePoint 2010 to SharePoint 2013, and even to SharePoint 2016. At the same time, these customers will benefit from financial and operational benefits as we move the On Premise servers into the Microsoft Azure Cloud. So when organizations invest in building intranet and collaboration facilities, they want to have some idea of how much these facilities are being used, when and by whom. Now, SharePoint, like most packages, comes with a set of pre-canned reports that provide insights into usage. These are called audit and usage reports. These reports are very useful, however, some restrictions apply: Not all audit log reports that come with SharePoint 2010 are available in SharePoint 2013 out of the box. There are requests for information and reporting that are not answered by the pre-canned reports. So we were looking at how we can provide additional audit log reports and usage information for SharePoint 2013 and beyond. After some brain storming, we concluded to use the Azure cloud not just for running of our SharePoint virtual machines. Additionally, we want to leverage the Hadoop/HDInsight services in Azure to provide additional insights as to how SharePoint is being used. So, we setup a SharePoint + HDInsight environment. This illustrates how we can use the Microsoft PowerBI stack to provide insights into how end users are using SharePoint. Page 2 of 7

The business requirements The first requirement our customers put to Sogeti is seemingly simple. Our customer publishes information about mandatory organizational procedures using their SharePoint Intranet. Every one of their employees must visit the procedures pages once a month to ensure compliance. Now, at the end of the month, we want a report showing which users visited which procedures pages. Unfortunately, as the right side of diagram below shows, all the information we have at this point are brute log files. The report we want to obtain: is everybody reading the procedures pages? Now how can we bridge the gap and get from log files to a high quality report? So the first step we took is to configure SharePoint in such a way that procedure pages are clearly identified as procedures pages and can easily be distinguished from other types of pages. We ll be Page 3 of 7

needing this clear separation later on, so for the report we can zoom in on the procedure pages, and leave other types of pages out of scope. Identifying the procedure pages (in SharePoint) we want to report on To make this report, we ll need two types of information from SharePoint: 1. The log files produced by the webserver underneath SharePoint (IIS). These log files contain in brute form the page URL s and the account names of the users that visited the page 2. A file that contains a list of all the pages, along with the template type of page. There are many page types and we only want the pages of type procedure Page 4 of 7

Now that we have the source data, it is time to move on and apply some analytics. Azure gives us the possibility to create an infrastructure on the fly, just for this report. Creating an HDInsight cluster via the Azure portal is quick and easy. However, since we need to run this report regularly, manual steps tend to get cumbersome. So we invested quite some time in writing a PowerShell script that creates the cluster automatically. Microsoft offers some sample scripts centered around the New- AzureRmResourceGroupDeployment that you can use as a starter. Once the cluster is created, we upload the SharePoint log files. Hadoop allows us to impose a schema on the data as we read it. So, it s like we can just turn our log files into a virtual table on the fly. We do this using a Pig script. A pig will eat anything, even log files An extract from our Pig script: log = load '$input' using PigStorage(' ') as ( date:chararray, Page 5 of 7

time:chararray, cs_uri_stem:chararray, cs_uri_query:chararray, cs_username:chararray, ); We filter out some additional data that does not concern us: nocomments = filter log by NOT ( date matches '.*#.*'); nosp_setup = filter nocomments by NOT ( cs_username matches '.*sp_setup.*'); And finally we can launch a query to see who has been reading what page on our Intranet: site = foreach onlypages generate FLATTEN(STRSPLIT(cs_username,'\\\\+',2))AS (domain:chararray,username:chararray), result = foreach site generate username,page, time; dump result; store result into '$output' using PigStorage('\t'); The output of the Pig job essentially gives us the answer we are looking for in the form of a text file. User Page Date Ally Safety_Procedure.aspx 12 Jan 2016 14:58 Ally Operational_Procedure.aspx 12 Jan 2016 14:58 Jane Safety_Procedure.aspx 12 Jan 2016 15:01 Jane Operational_Procedure.aspx 12 Jan 2016 15:01 Mickael Safety_Procedure.aspx 12 Jan 2016 15:02 Mickael Operational_Procedure.aspx 12 Jan 2016 15:02 Pete Safety_Procedure.aspx 12 Jan 2016 15:04 Pete Operational_Procedure.aspx 12 Jan 2016 15:04 Kate Operational_Procedure.aspx 12 Jan 2016 15:06 Page 6 of 7

We need to perform some minor cosmetics on the pig output, as the output has 1 entry for every time a user visits a page. And we need to know only if a user visited the page yes or no, not how many times. So, we import the data into SQL Server by a SSIS job and apply a group by. And then finally to display the end result, we have several options such as SSRS to generate a report and display it into SharePoint. Special thanks go to: Iny Willems for working with me on initial automation of deploying HDInsight Clusters. Bart Dresselaers for working with me on the Pig script. A late breaking note: we started working on this solution in the middle of 2015. We invested quite a lot of time in automating the creation of analytics clusters via PowerShell. Now we are at the beginning of 2016 and Microsoft has released Azure Data Factory. If we would start the same solution today, ADF would certainly eliminate a lot of the need for custom scripting. And ADF would offer a more graphical way to configure and automate BI processing via pipelines. Page 7 of 7