How to Run an Apache HTTP Server With a Protocol



Similar documents
CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Computer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław

People Data and the Web Forms and CGI CGI. Facilitating interactive web applications

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

APACHE WEB SERVER. Andri Mirzal, PhD N

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

reference: HTTP: The Definitive Guide by David Gourley and Brian Totty (O Reilly, 2002)

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Working With Virtual Hosts on Pramati Server

World Wide Web. Before WWW


ICT 6012: Web Programming

People Data and the Web Forms and CGI. HTML forms. A user interface to CGI applications

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

DEERFIELD.COM. DNS2Go Update API. DNS2Go Update API

Web Server for Embedded Systems

Network Technologies

Exercises: FreeBSD: Apache and SSL: pre SANOG VI Workshop

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

Chapter 27 Hypertext Transfer Protocol

Application layer Web 2.0

Lecture 11 Web Application Security (part 1)

How To Understand The History Of The Web (Web)

Table of Contents. Open-Xchange Authentication & Session Handling. 1.Introduction...3

Apache Server Implementation Guide

CONTENT of this CHAPTER

Modern Web Development From Angle Brackets to Web Sockets

CCM 4350 Week 11. Security Architecture and Engineering. Guest Lecturer: Mr Louis Slabbert School of Science and Technology.

Computer Networks 1 (Mạng Máy Tính 1) Lectured by: Dr. Phạm Trần Vũ MEng. Nguyễn CaoĐạt

APACHE. Presentation by: Lilian Thairu

Web Application Development

The World Wide Web: History

Cyber Security Workshop Ethical Web Hacking

Outline Definition of Webserver HTTP Static is no fun Software SSL. Webserver. in a nutshell. Sebastian Hollizeck. June, the 4 th 2013

Chapter 2: Interactive Web Applications

Lecture 2. Internet: who talks with whom?

The Web History (I) The Web History (II)

Xtreeme Search Engine Studio Help Xtreeme

Architecture of So-ware Systems HTTP Protocol. Mar8n Rehák

HTTP Protocol. Bartosz Walter

CS640: Introduction to Computer Networks. Applications FTP: The File Transfer Protocol

Internet Technologies Internet Protocols and Services

Chapter 6 Virtual Private Networking Using SSL Connections

The Web: some jargon. User agent for Web is called a browser: Web page: Most Web pages consist of: Server for Web is called Web server:

Playing with Web Application Firewalls

By Bardia, Patit, and Rozheh

Web Hosting Features. Small Office Premium. Small Office. Basic Premium. Enterprise. Basic. General

Dynamic Content. Dynamic Web Content: HTML Forms CGI Web Servers and HTTP

Short notes on webpage programming languages

Cache Configuration Reference

Forms, CGI Objectives. HTML forms. Form example. Form example...

APACHE HTTP SERVER 2.2.8

1. Introduction. 2. Web Application. 3. Components. 4. Common Vulnerabilities. 5. Improving security in Web applications

10. Java Servelet. Introduction

Sticky Session Setup and Troubleshooting

Instructor: Betty O Neil

Evolution of the WWW. Communication in the WWW. WWW, HTML, URL and HTTP. HTTP Abstract Message Format. The Client/Server model is used:

Perl/CGI. CS 299 Web Programming and Design

CloudOYE CDN USER MANUAL

URLs and HTTP. ICW Lecture 10 Tom Chothia

CSCI110: Examination information.

WHAT IS A WEB SERVER?

Web Pages. Static Web Pages SHTML

The Hyper-Text Transfer Protocol (HTTP)

COMP 112 Assignment 1: HTTP Servers

Playing with Web Application Firewalls

Welcome to Apache the number one Web server in

Research of Web Real-Time Communication Based on Web Socket

Securing the Apache Web Server

JASPERREPORTS SERVER WEB SERVICES GUIDE

WIRIS quizzes web services Getting started with PHP and Java

Setting Up Scan to SMB on TaskALFA series MFP s.

JavaScript By: A. Mousavi & P. Broomhead SERG, School of Engineering Design, Brunel University, UK

Cache All The Things

CatDV Pro Workgroup Serve r

PHP Authentication Schemes

Setup Guide Access Manager 3.2 SP3

Web Development. How the Web Works 3/3/2015. Clients / Server

7 Why Use Perl for CGI?

Painless Web Proxying with Apache mod_proxy

Web Programming. Robert M. Dondero, Ph.D. Princeton University

What is Web Security? Motivation

HTTP Response Splitting

Apache Tomcat. Load-balancing and Clustering. Mark Thomas, 20 November Pivotal Software, Inc. All rights reserved.

Introduction to Server-Side Programming. Charles Liu

Last update: February 23, 2004

LAMP Server A Brief Overview

Introduction to LAN/WAN. Application Layer (Part II)

Load Balancing Web Applications

Fachgebiet Technische Informatik, Joachim Zumbrägel

INTRODUCTION TO WEB TECHNOLOGY

Transcription:

HTTP Servers Jacco van Ossenbruggen CWI/VU Amsterdam 1

Learning goals Understand: Basis HTTP server functionality Serving static content from HTML and other files Serving dynamic content from software within a HTTP server from external software Security & privacy issues 2

HTTP: The Web s network protocol Early 90s: only a few HTTP servers, but many FTP servers helped bootstrapping the Web Example: ftp://ftp.gnu.org/gnu/aspell/dict/en/ HTTP servers based on the freely available httpd web server from NSCA NCSA stopped httpd support when the associated team left to start Netscape Webmasters started to send around software patches to further improve httpd Result was referred to as a patchy server Now the open source Apache server is one of the mostly used Web servers 3

HTTP server main loop HTTP Request HTTP Response HTTP Request HTTP Response 4

HTTP server main loop while(forever) listen to TCP port 80 and wait read HTTP request from client send HTTP response to client Seems not that complicated But: regular Apache HTTP server installation installs > 24Mb of software?! What makes real servers so complex? 5

Static content from files: HTML, CSS, JavaScript, images, 6

Example HTTP request.get / HTTP/1.0. 7

Example HTTP request.get / HTTP/1.1.Host: www.few.vu.nl. Why does the client need to tell the server the server s own hostname? because the server doesn t know its own name! www.cs.vu.nl is hosted on the same machine by the same server software server may need to send different responses for different host names Virtual host configuration allows web masters to tune server to do exactly this 8

Example HTTP request.get / HTTP/1.1.Host: www.few.vu.nl. Server needs to determine what resource is associated with / Also configurable, defaults to the file index.html in the server s document root directory, e.g. /var/www/www.few.vu.nl/html/index.html Security issues GET ~yourname/../../../passwd HTTP/1.1 GET ~yourname/../~yourlogin/mail HTTP/1.1 Webmaster needs to configure which directories in the local file system may be served by the web server Webmaster: Oops, that dir should not have been on the Web User: Oops, I didn t know this dir was on the Web too 9

Example HTTP request.get / HTTP/1.1.Host: www.few.vu.nl. Server needs to send content of file index.html to the client Along with length of the content the current time/date modification date expiration date MIME type of the content (e.g. text/html) character encoding (e.g. UTF-8) etc Most of these HTTP header values need to be looked up in a configurable way Results need to be logged in the server log for later analysis 10

Example: apache HTTP logs access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:47:19 +0100] "GET /cgibin/wt-test?naam=&textarea=+ HTTP/1.0" 200 1341 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-us; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:47:48 +0100] "GET /cgibin/wt-test?naam=&textarea=+ HTTP/1.0" 200 1341 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-us; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:48:48 +0100] "GET /cgibin/wt-test?naam=&textarea=+ HTTP/1.0" 200 1341 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-us; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:55:59 +0100] "GET /cgibin/wt-test?naam=&radio=inhoudelijk&textarea=+vxfvsdfsdf%0d%0a HTTP/1.0" 200 1409 "- "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-us; rv:1.8.1.6) Gecko/20070725 Firfox/2.0.0.6" access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:56:08 +0100] "GET /cgibin/wttest?naam=cjijij&radio=inhoudelijk&checkbox1=checkbox1&textarea=+vxfvsdfs df%0d0a%0d%0afsdfsdf HTTP/1.0" 200 1487 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1 en-us; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" access_log.2:soling.few.vu.nl - - [11/Jan/2008:16:58:25 +0100] "GET /cgibin/wt-test?naam=&radio=structuur1&textarea=+ HTTP/1.0" 200 1375 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-us; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" 11

12

13

Top N of Top 10 of 2094 Total Sites #Hits Files Kbytes Visits Hostname 1 28066 25.26% 27754 27.89% 529851 34.02% 50 0.97% *.search.live.com 2 14434 12.99% 13899 13.96% 206962 13.29% 7 0.14% *.googlebot.com 3 8963 8.07% 5779 5.81% 47864 3.07% 17 0.33% *.speedy.telkom.net.id 4 6142 5.53% 5871 5.90% 59502 3.82% 82 1.59% *.cwi.nl 5 1265 1.14% 1203 1.21% 6455 0.41% 3 0.06% ipxx.speed.planet.nl 6 1237 1.11% 1228 1.23% 10163 0.65% 18 0.35% soling.few.vu.nl 7 1169 1.05% 1026 1.03% 6181 0.40% 1 0.02% XX.demon.nl 8 1050 0.94% 972 0.98% 16429 1.05% 5 0.10% XXadsl.sinica.edu.tw 9 956 0.86% 904 0.91% 5634 0.36% 5 0.10% XX.adslsurfen.hetnet.nl 10 908 0.82% 889 0.89% 13028 0.84% 21 0.41% XX.wise-guys.nl Top 7 Search Strings 1 60 37.97% the scream 2 8 5.06% vu 3 6 3.80% scream 4 4 2.53% eculture 5 4 2.53% the scream painting 6 3 1.90% the scream paintings 7 2 1.27% *.gif 14

Example HTTP request.get / HTTP/1.1.Host: www.few.vu.nl. Server needs to send content of file index.html to the client Along with length of the content the current time/date modification date expiration date MIME type of the content (e.g. text/html) character encoding (e.g. UTF-8) etc Most of these HTTP header values need to be looked up in a configurable way Results need to be logged in the server log for later analysis Assume everything you do will be logged and will be traceable back to you 15

Example HTTP response HTTP/1.1 200 OK Date: Mon, 21 Jan 2008 10:18:49 GMT Server: Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7d DAV/2 PHP/5.2.4 mod_python/3.3.1 Python/2.4.3 X-Powered-By: PHP/5.2.4 Expires: Mon, 21 Jan 2008 16:18:49 GMT Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> 16

Example HTTP response HTTP/1.1 200 OK Date: Mon, 21 Jan 2008 10:18:49 GMT Server: Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7d DAV/2 PHP/5.2.4 mod_python/3.3.1 Python/2.4.3 X-Powered-By: PHP/5.2.4 Expires: Mon, 21 Jan 2008 16:18:49 GMT Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> 17

Example HTTP response HTTP/1.1 200 OK Date: Mon, 21 Jan 2008 10:18:49 GMT Server: Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7d DAV/2 PHP/5.2.4 mod_python/3.3.1 Python/2.4.3 X-Powered-By: PHP/5.2.4 Expires: Mon, 21 Jan 2008 16:18:49 GMT Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> 18

Static vs dynamic content Not all requests are for static content stored in a file some data needs to be requested by the server from other applications (e.g. from an organisation s database) some data needs to be computed on the fly in response to the request (e.g. results of a query on a search engine) Need for dynamic content by programmable server behaviour Note: from the browser s perspective, static and dynamic content look syntactically exactly the same ( it s just a URI ) 19

REST Roy Fielding co-author of the HTTP specification co-founder of Apache described the key principles of WWW network architecture in his PhD thesis (UCI, 2000) He named these principles REST (REpresentational State Transfer) Implementations are called RESTful REST strongly influenced the early network architecture of the Web and still does: 15 Jan 2008: W3C published the SPARQL Recommendation, a web query language based on a RESTful design 20

REST: key principles All sources of information (files and applications) are resources that are uniquely addressable using a URI Clients and servers only need to know the URI of the resource (e.g. http://www.few.vu.nl/ ) the allowed actions (e.g. HTTP GET) the allowed representations (e.g. text/html ) Client does not need to know how the server generates the representation Server does not need to know how the client presents it Both client and server do not need to be aware of intermediate proxies or caches There is no communication state HTTP response does not depend on previous request Methods are idempotent: requesting the same resource multiply times will yield the same content Simplifies global design and improves performance but sometimes makes server programming more difficult 21

dynamic content computed by other software computed by the server 22

CGI: common gateway interface Commonly agreed upon way to run batch programs in response to a HTTP request HTTP server executes program server recognizes a CGI request and determines which program from the URL supplying details about the request to the program via (OS environment) variables returning program s output verbatim to the client (output needs to supply content and all required HTTP headers) 23

CGI Example: form URL you used in assignment 1 <form action="http://eculture.cs.vu.nl/cgi-bin/wt1-test" method="get"> #!/usr/bin/perl ## ## cgi-bin/wt1-test -- program which just prints its environment ## print "Content-type: text/plain\n\n"; foreach $var (sort(keys(%env))) { $val = $ENV{$var}; $val =~ s \n \\n g; $val =~ s " \\" g; print "${var}=\"${val}\"\n"; } 24

CGI response HTTP/1.1 200 OK Date: Fri, 18 Jan 2008 14:09:18 GMT Server: Apache/2.2.9 Connection: close Content-Type: text/plain DOCUMENT_ROOT="/export/data1/httpd/htdocs" GATEWAY_INTERFACE="CGI/1.1" HTTP_ACCEPT_LANGUAGE="en" HTTP_HOST= eculture.cs.vu.nl" QUERY_STRING="name=value" REMOTE_ADDR= 80.127.61.144" REMOTE_HOST= plan.xs4all.nl 25

CGI: pros & cons Very flexible can use programs written in any interpreted or compiled programming language easy way to reuse existing software in a Web context Creates a new process to re-execute program for every request very expensive: too slow for popular sites hard to maintain state between requests (we will look deeper into the concept of state later) Mixes program logic and HTML generation hard to maintain by programmers and designers Not convenient to get data from databases 26

CGI alternatives server-side scripting: server has a module that keeps the language interpreter running over multiple requests running little scripts at the server ( servlets ) is then relatively cheap Use general purpose scripting languages Apache comes standard with modules for many languages: mod_python, mod_perl, 27

Example HTTP response HTTP/1.1 200 OK Date: Fri, 18 Jan 2008 11:18:49 GMT Server: Apache/2.0.58 (Unix) mod_ssl/2.0.58 OpenSSL/0.9.7d DAV/2 PHP/5.2.4 mod_python/3.3.1 Python/2.4.3 X-Powered-By: PHP/5.2.4 Expires: Fri, 21 Jan 2008 17:18:49 GMT Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> 28

CGI alternatives: scripting Server-side scripting: server has a module that keeps the language interpreter running over multiple requests running little scripts at the server is then relatively cheap Use general purpose scripting languages: mod_python, mod_perl, need rules to determine which URLs are deferred to script module (e.g. http://www.example.org/file.py) Compiled Java bytecode programs server modules running a Java Virtual Machine are known as a web or servlet container (e.g. tomcat) servlets typically use standard Java extensions to simplify programming (javax.servlet.*) All these solutions result in files that look like programs HTML markup deeply hidden in print statements hard to maintain by non-programmers 29

Example: code with hidden HTML print <html> print <body> print <ul> for (i=1; i<n; i++) { data = get_item(i); print <li> + data +</li> } print </ul> 30

Dedicated frameworks Use dedicated scripting frameworks PHP: Hypertext Preprocessor Used to implement WordPress, MediaWiki mixes html, program code & database queries JSP: Java Server Pages mixes html & java These approaches typically result in files that look like HTML pages, with embedded code and custom tags processed by the server complex func. still requires programming but results are easier to reuse easier to maintain, also by non-programmers 31

Example: HTML with hidden code <html> <body> <ul> <? generate_items(n)?> </ul> </body> </html> 32

Typical problems in server programming Concurrency Session management & cookies Authentication & security Interfacing with other software (generating HTML from database content) 33

HTTP server main loop HTTP Request HTTP Request? HTTP Response HTTP Request HTTP Response 34

HTTP server concurrency HTTP Request HTTP Response HTTP Request HTTP Response HTTP Request HTTP Response HTTP Request HTTP Response 35

HTTP server concurrency HTTP Request HTTP Response HTTP Request HTTP Response HTTP Request HTTP Response HTTP Request HTTP Response 36

HTTP server concurrency Server-side software needs to be aware that other processes/threads processing other request may run at the same time ( multi-threading, MT-safe ) makes accessing global resources (variables, databases, files) more complicated and error prone 37

HTTP server sessions User A Request 1 User A Response 1 User B Request 2 User B Response 2 User B Request 1 User B Response 1 User A Request 2 User A Response 2 38

HTTP server sessions How to recognize which requests belong to the same user? look at client s IP address in first response, send client a small but unique piece of data ask client to send this back as part of the HTTP header of all following requests piece of data is known as a (magic) cookie 39

HTTP server sessions User A Request 1 User A Response 1 cookie: id=user00001 cookie: id=user00002 User B Request 1 User B Response 1 User B Request 2 cookie: id=user00002 cookie: id=user00001 User A Request 2 User B Response 2 cookie: id=user00002 cookie: id=user00001 User A Response 2 40

Cookie: bb.vu.nl response HTTP/1.1 302 Moved Temporarily Set-Cookie: ARPT=IZJNJNSbb3CYUQ; path=/ Date: Sun, 20 Jan 2008 20:24:23 GMT Server: Apache/1.3.33 (Unix) mod_ssl/2.8.21 OpenSSL/0.9.7e mod_jk/1.2.4 Pragma: no-cache Cache-Control: no-cache Set-Cookie: session_id=@@bccf1515b166a6be2ff476eb20e9774f Location: http://bb.vu.nl/nocookies.html Content-Length: 0 Connection: close Content-Type: application/octet-stream;charset=iso-8859-1 41

Cookies Introduced in Mosaic browser (1994) cookies were enabled by default users were not informed when a site set a cookie most users did not know about cookies at all Privacy issues became serious issue in 1996 after a publication in the Financial Times Now all major browsers allow users to delete cookies and to be alerted when cookies are set Many sites make privacy policies public on their site (P3P) 42

Cookies Handy Electronic shopping basket Personalisation user preferences user profile Authentication Tricky User tracking across websites Direct marketing Privacy issues Note: sites may set cookies without knowing it or even using them Check the cookies stored in your browser 43

Security issues see also guest lecture Thursday 44

Proxies & firewalls Some clients have no direct internet access to contact servers Browser can use a proxy server Content servers do not need to know Some servers have no direct internet access to be contacted (!) Server can use a reverse proxy server Clients do not need to know 45

Firewall responsibility of client s organization client proxy server responsibility of client server s organization reverse proxy server 46

Authentication & encryption HTTP 1.0 Basic Access Authentication username, password, content sent in plain text HTTP Digest Access Authentication username, password encrypted content still sent plain text HTTPS: HTTP entirely over secure layer public key encryption, also for content less vulnerable to man in the middle attacks 47

Man in the middle attack client fake mybank.com mybank.com HTTPS requires web site to authenticate itself using a certificate stating its identity How do you know how to trust certificate authority? many generally trusted authorities are known by your browser 48

Database connectivity client server SQL database All frameworks provide ways to simplify generating HTML out of database content Java Servlets, JSP PHP Content management systems 49

LAMP and the ubiquity of HTTP servers Typical web server needs: 1. Operating system with good TCP/IP support 2. HTTP server implementation 3. Database to store content 4. Framework for creating web pages from database content All these ingredients are currently commonly available (as open source software) and run on commodity PCs Frequently used combination is Linux, Apache, MySQL and PHP (LAMP) Many sites are served by LAMP software running on old PC hardware A web server is nothing special anymore! > 185 million servers (Netcraft, Jan 2009) 50

51

Learning goals Understand Basis HTTP server functionality Serving static HTML and other files Serving dynamic content from software within a HTTP server Serving dynamic content from external software Be aware of security & privacy issues 52