Introduction to SQL and SQL in R. LISA Short Courses Xinran Hu



Similar documents
COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Oracle SQL. Course Summary. Duration. Objectives

SQL SELECT Query: Intermediate

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Oracle Database: SQL and PL/SQL Fundamentals

Introduction to Microsoft Jet SQL

SQL - QUICK GUIDE. Allows users to access data in relational database management systems.

Oracle Database: SQL and PL/SQL Fundamentals NEW

Database Query 1: SQL Basics

Relational Database: Additional Operations on Relations; SQL

Oracle Database: SQL and PL/SQL Fundamentals

Financial Data Access with SQL, Excel & VBA

Database Administration with MySQL

Performing Queries Using PROC SQL (1)

Boats bid bname color 101 Interlake blue 102 Interlake red 103 Clipper green 104 Marine red. Figure 1: Instances of Sailors, Boats and Reserves

Chapter 1 Overview of the SQL Procedure

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Welcome to the topic on queries in SAP Business One.

SQL. Short introduction

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

Information and Computer Science Department ICS 324 Database Systems Lab#11 SQL-Basic Query

SQL Basics. Introduction to Standard Query Language

Using Multiple Operations. Implementing Table Operations Using Structured Query Language (SQL)

A Brief Introduction to MySQL

Information Systems SQL. Nikolaj Popov

ICAB4136B Use structured query language to create database structures and manipulate data

Effective Use of SQL in SAS Programming

Microsoft Access 3: Understanding and Creating Queries

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

2012 Teklynx Newco SAS, All rights reserved.

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

Inquiry Formulas. student guide

SQL Nested & Complex Queries. CS 377: Database Systems

Click to create a query in Design View. and click the Query Design button in the Queries group to create a new table in Design View.

Access Queries (Office 2003)

Join Example. Join Example Cart Prod Comprehensive Consulting Solutions, Inc.All rights reserved.

Query-by-Example (QBE)

Introduction to SQL for Data Scientists

Lecture 25: Database Notes

Netezza SQL Class Outline

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Database Applications Microsoft Access

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Ad Hoc Advanced Table of Contents

Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI

Creating QBE Queries in Microsoft SQL Server

Microsoft' Excel & Access Integration

Comp 5311 Database Management Systems. 3. Structured Query Language 1

3.GETTING STARTED WITH ORACLE8i

Oracle Database 12c: Introduction to SQL Ed 1.1

More on SQL. Juliana Freire. Some slides adapted from J. Ullman, L. Delcambre, R. Ramakrishnan, G. Lindstrom and Silberschatz, Korth and Sudarshan

Kaseya 2. Quick Start Guide. for VSA 6.3

Microsoft Excel 2007 Mini Skills Overview of Tables

Instant SQL Programming

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

Oracle Database 10g: Introduction to SQL

David M. Kroenke and David J. Auer Database Processing: Fundamentals, Design and Implementation

Chapter 5. SQL: Queries, Constraints, Triggers

Microsoft Excel 2010 Part 3: Advanced Excel

DBMS / Business Intelligence, SQL Server

Using AND in a Query: Step 1: Open Query Design

Data Tool Platform SQL Development Tools

Access 2010: Creating Queries Table of Contents INTRODUCTION TO QUERIES... 2 QUERY JOINS... 2 INNER JOINS... 3 OUTER JOINS...

Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database 11g SQL

Advanced Query for Query Developers

SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6

Structured Query Language (SQL)

Introduction to SQL: Data Retrieving

1 Structured Query Language: Again. 2 Joining Tables

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

9.1 SAS. SQL Query Window. User s Guide

Appendix III: SPSS Preliminary

Tutorial 3. Maintaining and Querying a Database

Microsoft Office 2010

Lecture 4: More SQL and Relational Algebra

Learning MySQL! Angola Africa SELECT name, gdp/population FROM world WHERE area > !

SQL. by Steven Holzner, Ph.D. ALPHA. A member of Penguin Group (USA) Inc.

Web Programming Step by Step

P_Id LastName FirstName Address City 1 Kumari Mounitha VPura Bangalore 2 Kumar Pranav Yelhanka Bangalore 3 Gubbi Sharan Hebbal Tumkur

Cal Answers Analysis Training Part I. Creating Analyses in OBIEE

IT2304: Database Systems 1 (DBS 1)

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

SQL: Queries, Programming, Triggers

DIPLOMA IN WEBDEVELOPMENT

IT2305 Database Systems I (Compulsory)

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

SQL: Queries, Programming, Triggers

Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY

Introduction to Proc SQL Katie Minten Ronk, Systems Seminar Consultants, Madison, WI

How To Create A Table In Sql (Ahem)

IBM SPSS Direct Marketing 23

Data exploration with Microsoft Excel: analysing more than one variable

Essentials of SQL. Essential SQL 11/06/2010. NWeHealth, The University of Manchester

Business Objects 4.1 Quick User Guide

UNIT 6. Structured Query Language (SQL) Text: Chapter 5

PSU SQL: Introduction. SQL: Introduction. Relational Databases. Activity 1 Examining Tables and Diagrams

Handling Missing Values in the SQL Procedure

Transcription:

Introduction to SQL and SQL in R LISA Short Courses Xinran Hu 1

Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Collaboration: Visit our website to request personalized statistical advice and assistance with: Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) LISA statistical collaborators aim to explain concepts in ways useful for your research. Great advice right now: Meet with LISA before collecting your data. Short Courses: Designed to help graduate students apply statistics in their research Walk-In Consulting: M-F 1-3 PM GLC Video Conference Room; 11 AM-1 PM Old Security Building Room 103 For questions requiring <30 mins All services are FREE for VT researchers. We assist with research not class projects or homework. 2

Course Plan Part I. SQL 1. Basic Concepts in SQL 2. Querying From One Table 3. Nested Queries 4. Querying From Multiple Tables Part II. SQL in R 1. sqldf Package 2. Write SQL Command in R 3

Why SQL? SQL is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS). (wikipedia) In the era of Big Data, (1) Simple Method + Massive Dataset (2) The time spent on data management often exceeds the time spent on analysis. 4

Basic Concepts: Table A Table (or a Relation) is a set of records that have the same attributes. 1. A table always have an unique table name in SQL. Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor 5

Basic Concepts: Column A Column ( or Attribute or Field) is a characteristics of objects. 1. A column always have a unique column name within a table. 2. A column is either numeric (age), categorical (gender) or one of the other pre-defined types in SQL, e.g. date. 3. Primary Key: a column that has unique value for each object. Name(PK) Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor 6

Basic Concepts: Row A Row ( or a Tuple or Record) is an object represented by a list of attributes. 1. A row always have an unique value in Primary Key. Name(PK) Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor 7

Basic SQL Command Example dataset: a world fact dataset (sqlzoo.net) http://sqlzoo.net/wiki/select_basics (Table name: world) 8

SELECT FROM Function: Selecting one or more columns from a table. Example: SELECT * FROM world SELECT name FROM world SELECT name, gdp FROM world A SQL query always start with a SELECT FROM command. 9

DISTINCT Clause Function: Removing duplicate records in query result. Example: SELECT continent FROM world (This query includes duplicate results) SELECT DISTINCT continent FROM world (This query excludes duplicate results) 10

LIMIT clause Function: Limiting the number of records returned. Example: SELECT DISTINCT continent FROM world (This query returns all distinct results) SELECT DISTINCT continent FROM world LIMIT 3 (This query returns three distinct results) 11

WHERE clause Function: Filtering the records returned by a select command. Example: Filtering based on a numeric attribute SELECT name, area FROM world /* do not forget this line*/ WHERE area = 9596961 (km^2) WHERE area<> 9596961 WHERE area > 5000000 WHERE area < 1 WHERE area IN (0, 9596961) WHERE area BETWEEN 9000000 AND 10000000 WHERE area NOT BETWEEN 1 AND 10000000 12

WHERE clause (cont.) Example: Filtering based on a categorical attribute. SELECT name, continent FROM world WHERE continent = 'north america' WHERE continent <> 'north america' WHERE continent >'north america' WHERE continent < 'north america' WHERE continent IN ('north america', south america') WHERE continent BETWEEN D AND F WHERE continent NOT BETWEEN D AND F WHERE continent LIKE '%america 13

WHERE clause (cont.) Example: Filtering based on two or more conditions. SELECT name, continent FROM world WHERE area>9000000 AND area<10000000 WHERE continent = 'north america AND area>9000000 WHERE continent = 'north america OR area>9000000 WHERE (continent = 'north america AND area>500000) OR (continent = south america AND area>500000) 14

ORDERED BY clause Function: Sorting the query result by a column. Example: Sort Ascending (Default) SELECT name, area FROM world ORDER BY area (ASC) Sort Descending SELECT name, area FROM world ORDER BY area DESC 15

Arithmetic operations SQL permits arithmetic operations on columns. Example: Query GDP: select name, gdp from world order by gdp DESC Query GDP in Trillion: select name, gdp/1000000000 from world order by gdp DESC Query GDP per capita select name, gdp/population from world order by gdp/population DESC 16

Practice: SQLZOO:SELECT basics Write the SQL queries in the first tutorial. http://sqlzoo.net/wiki/select_basics 17

Aggregate Functions Function: Producing descriptive statistics of a column. Example: Query the average GDP of the world. SELECT avg(gdp) FROM world More Aggregate Functions: max(), min(), sum(), count(), var(), stdev() Note, max() and min() operates on strings as well. 18

GROUP BY Function: Stratifying aggregate result by a categorical column Example: Query the average GDP of each continent. SELECT avg(gdp) FROM world. GROUP BY continent 19

HAVING Clause Function: Filtering aggregate results. Example: Find the continent with average country area greater than 1 million. SELECT continent, avg(area) FROM world GROUP BY continent HAVING avg(area)>1000000 20

Nested Query Query result can be used in the where clause of another query. This is referred as Nested Query SELECT name FROM world WHERE population > (SELECT population FROM world WHERE name= Russia') If we run the nested query, we will get 146000000, so the above query is equivalent to SELECT name FROM world WHERE population > 146000000 21

UNION Command Function: Concatenating two tables vertically Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Gender Age Occupation James Male 23 Student Anna Female 22 Staff Andy Male 45 Professor Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor James Male 23 Student Anna Female 22 Staff Andy Male 45 Professor 22

UNION Command Example: SELECT * FROM world WHERE name = Germany UNION SELECT * FROM world WHERE name= France Why is UNION command useful if we can use WHERE clause? Answer: These queries could come from different tables. 23

JOIN Command Function: Concatenating two tables horizontally. Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Jessica Michael Emma Department Math Statistics Art Name Gender Age Occupation Department Jessica Female 28 Student Math Michael Male 28 Professor Statistics Emma Female 34 Professor Art 24

JOIN Command JOIN is generally more complicated than union because you have to specify a common attribute to link two tables. In the previous example, we joined tables with the name column. If both tables contain a column with the same column name, we should distinguish two columns by attaching the table name in front Table1Name.ColumnName vs Table2Name.ColumnName 25

JOIN Command There are four different types of join: (1) INNER JOIN (2) LEFT JOIN (3) RIGHT JOIN (4) FULL OUTER JOIN 26

INNER JOIN INNER JOIN produce a record if it exist in BOTH tables Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Jessica John Department Math CS Name Gender Age Occupation Department Jessica Female 28 Student Math 27

LEFT JOIN LEFT JOIN produce a record if it exists in the left table Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Jessica John Department Math CS Name Gender Age Occupation Department Jessica Female 28 Student Math Michael Male 28 Professor NULL Emma Female 34 Professor NULL 28

RIGHT JOIN RIGHT JOIN produce a record only if it exists in the right table Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Jessica John Department Math CS Name Gender Age Occupation Department Jessica Female 28 Student Math John NULL NULL NULL CS 29

FULL OUTER JOIN FULL OUTER JOIN produce a record if it exists in either table. Note, FULL OUTER JOIN is not supported in MySQL. Name Gender Age Occupation Jessica Female 28 Student Michael Male 28 Professor Emma Female 34 Professor Name Jessica John Department Math CS Name Gender Age Occupation Department Jessica Female 28 Student Math Michael Male 28 Professor NULL Emma Female 34 Professor NULL John NULL NULL NULL CS 30

JOIN syntax Olympic Game Dataset http://sqlzoo.net/wiki/select_.._join Left Table Name: Games Right Table Name: City Year City 1896 Athens 1948 London 2004 Athens 2008 Beijing 2012 London Name Sydney Athens Beijing London Country Australia Greece China UK 31

JOIN Example INNER JOIN SELECT * FROM games JOIN city ON games.city=city.name LEFT JOIN SELECT * FROM games LEFT JOIN city ON games.city=city.name RIGHTT JOIN SELECT * FROM games RIGHT JOIN city ON games.city=city.name FULL OUTER JOIN SELECT * FROM games FULL OUTER JOIN city ON games.city=city.name 32

SQL in R Two ways of applying SQL queries in R (1) A Local Way. Query from a dataframe. (2) A Remote Way Accessing a SQL database remotely. 33

Package sqldf Package sqldf allows us writing SQL-like commands to query from a dataframe just like querying from a relational table. All we need to do is wrap a SQL command in double quotation and send it to the sqldf function. E.g. library(sqldf) sqldf( select * from hpc ) 34