APACHE WEB SERVER Andri Mirzal, PhD N28-439-03
Introduction The Apache is an open source web server software program notable for playing a key role in the initial growth of the World Wide Web Typically Apache is run on a Unix-like operating system, and was developed for use on Linux. The application is available for a wide variety of operating systems, including Unix, FreeBSD, Linux, Solaris, Novell NetWare, OS X, Microsoft Windows, OS/2, TPF, and ecomstation Since April 1996 Apache has been the most popular HTTP server software in use As of December 2012 Apache was estimated to serve 63.7% of all active websites and 58.49% of the top servers across all domains
Introduction The Apache HTTP server is a software (or program) that runs in the background under an appropriate operating system, which supports multi-tasking, and provides services to other applications that connect to it, such as client web browsers Apache s original core is fairly basic and contains a limited number of features Its power rather comes from added functionality introduced through many modules that are written by programmers and can be installed to extend the server s capabilities To add a new module, all we need to do is install it and restart the Apache server The Apache server also supports third party modules, some of which have been added to Apache 2 as permanent features The Apache server very easily integrates with other open source applications, such as PHP and MySQL
Introduction Every device connected to a network has an IP address through which others connect to and communicate with it This IP address is sort of like a regular address that we need in real life to call or visit any contact of ours The Apache server offers a number of services through different ports hypertext transfer protocol (HTTP), typically through port 80 simple mail transfer protocol (SMTP), typically through port 25 domain name service (DNS) for mapping domain names to their corresponding IP addresses, generally through port 53 file transfer protocol (FTP) for uploading and downloading files, usually through port 21
Apache Configuration File Apache keeps all its configuration information in text files The main file is httpd.conf This file contains directives and containers that enable us to customize Apache installation Directives configure specific settings of Apache such as authorization, performance, and network parameters Containers specify the context to which those settings refer
Apache Configuration File We can edit the Apache httpd.conf file with any text editor The file must be saved in plaintext We might need to change just two parameters to enable Apache started for the first time: the name of the server and the address and port to which it is listening. Apache can usually figure out its server name from the IP address of the machine If the server does not have a valid DNS (domain name service) entry, we might need to specify one of the IP addresses of the machine If the server is not connected to a network (you might want to test Apache on a standalone machine), you can use the value 127.0.0.1, which is the loopback address The default port value is 80. We might need to change this value if a server is already running in the machine at port 80 or if we don t have administrator permissions
Apache Configuration File We can change both the listening address and the port values with the Listen directive The Listen directive takes either a port number or an IP address and a port, separated by a colon If we specify only the port, Apache listens on that port at all available IP addresses in the machine If we provide an additional IP address, Apache listens at only that address and port combination For example: Listen 80 tells Apache to listen for requests at all IP addresses on port 80 Listen 10.0.0.1:443 tells Apache to listen at only 10.0.0.1 on port 443.
Apache Configuration File The ServerName directive enables you to define the name the server will report in any self-referencing URLs The directive accepts a DNS name and an optional port, separated by a colon, e.g., ServerName localhost:80 Make sure that ServerName has a valid value. Otherwise, the server will not function properly
Directives The following rules apply for Apache directive syntax: The directive arguments follow the directive name The directive arguments are separated by spaces The number and type of arguments vary from directive to directive; some have no arguments A directive occupies a single line, but you can continue it on a different line by ending the previous line with a backslash character (\) The pound sign (#) should precede the directive, and must appear on its own line The Apache server documentation offers a quick reference for directives at http://httpd.apache.org/docs/2.4/mod/quickreference.html
Containers Directive containers, also called sections, limit the scope for which directives apply If directives are not inside a container, they belong to the default server scope (server config) and apply to the server as a whole
Containers Some default Apache directive containers: <VirtualHost> - specifies a virtual server. Apache enables us to host different websites with a single Apache installation. Directives inside this container apply to a particular website. This directive accepts a domain name or IP address and an optional port as arguments <Directory>, <DirectoryMatch> - These containers allow directives to apply to a certain directory or group of directories in the filesystem. Directory containers take a directory or directory pattern argument. Enclosed directives apply to the specified directories and their subdirectories. The DirectoryMatch container allows regular expression patterns to be specified as an argument <Location>, <LocationMatch> - These containers allow directives to apply to certain requested URLs or URL patterns. They are similar to their Directory counterparts. LocationMatch takes a regular expression as an argument <Files>, <FilesMatch> - Similar to the Directory and Location containers, Files sections allow directives to apply to certain files or file patterns
The ServerRoot Directive The ServerRoot directive takes a single argument: a directory path pointing to the directory where the server lives In my computer: c:/wamp/bin/apache/apache2.4.2 All relative path references in other directives are relative to the value of ServerRoot
Per-Directory Configuration Files Apache uses per-directory configuration files to allow directives to exist outside the main configuration file httpd.conf These special files can be placed in the file system Apache processes the content of these files if a document is requested in a directory containing one of these files or any subdirectories under it For example, if Apache receives a request for the /usr/local/apache2/htdocs/index.html file, it looks for perdirectory configuration files in the /, /usr, /usr/local, /usr/local/apache2, and /usr/local/apache2/htdocs in that order
Per-Directory Configuration Files Per-directory configuration files are called.htaccess by default The AccessFileName directive enables you to change the name of the per-directory configuration files from.htaccess to something else It accepts a list of filenames that Apache will use when looking for per-directory configuration files To determine whether you can override a directive in the per-directory configuration file, check whether the Context: field of the directive syntax definition contains.htaccess
Per-Directory Configuration Files Apache directives belong to different groups, as specified in the Override field in the directive syntax description Possible values for the Override: AuthConfig Directives controlling authorization FileInfo Directives controlling document types Indexes Directives controlling directory indexing Limit Directives controlling host access Options Directives controlling specific directory features
Per-Directory Configuration Files We can control which of these directive groups can appear in per-directory configuration files by using the AllowOverride directive AllowOverride can also take an All or a None argument All means that directives belonging to all groups can appear in the configuration file None disables per-directory files in a directory and any of its subdirectories Example (disable per-directory configuration file for cgi-bin) <Directory "cgi-bin"> AllowOverride None </Directory>
Apache log files Apache includes two log files by default; access_log file for tracking client requests, and error_log file is for recording important events, such as errors or server restarts These files don t exist until you start Apache the first time. The names of the files are access.log and error.log in Windows platforms
The access_log file When a client requests a file from the server, Apache records several parameters associated with the request, including the IP address of the client, the document requested, the HTTP status code, and the current time An example of content of the access_log file: 127.0.0.1 - - [21/Apr/2013:15:29:36 +0800] "GET / HTTP/1.1" 200 4337 127.0.0.1 - - [21/Apr/2013:16:02:21 +0800] "GET /homepage/ HTTP/1.1" 200 12 127.0.0.1 - - [21/Apr/2013:16:02:21 +0800] "GET /favicon.ico HTTP/1.1" 404 209 127.0.0.1 - - [21/Apr/2013:16:03:45 +0800] "GET / HTTP/1.1" 200 4377
The error_log file The error_log file includes error messages, startup messages, and any other significant events in the life cycle of the server This is the first place to look when we have a problem with Apache An example of content of the error_log file: [Sun Apr 21 15:29:24.919572 2013] [mpm_winnt:notice] [pid 4740:tid 404] AH00455: Apache/2.4.2 (Win64) PHP/5.4.3 configured -- resuming normal operations [Sun Apr 21 15:29:24.919572 2013] [core:notice] [pid 4740:tid 404] AH00094: Command line: 'c:\\wamp\\bin\\apache\\apache2.4.2\\bin\\httpd.exe -d C:/wamp/bin/apache/apache2.4.2 [Sun Apr 21 15:29:25.135585 2013] [mpm_winnt:notice] [pid 4120:tid 280] AH00354: Child: Starting 64 worker threads. [Sun Apr 21 17:24:36.413887 2013] [mpm_winnt:notice] [pid 4740:tid 404] AH00422: Parent: Received shutdown signal -- Shutting down the server.
Apache-Related Commands The name of the Apache executable is httpd in Linux/UNIX and Mac OS X, and httpd.exe in Windows It accepts several command-line options To get a complete list type: httpd -h in command prompt from the folder that contains httpd.exe file (C:\wamp\bin\apache\apache2.4.2\bin) On Windows, we can signal Apache using the httpd.exe, some commands: httpd -k restart Tells Apache to restart httpd -k graceful Tells Apache to do a graceful restart httpd -k stop Tells Apache to stop
How Apache Works Apache s main role is all about communication over networks, and it uses the TCP/IP protocol (Transmission Control Protocol/Internet Protocol which allows devices with IP addresses within the same network to communicate with one another) The TCP/IP protocol is a set of rules that define how clients make requests and how servers respond, and determine how data is transmitted, delivered, received, and acknowledged The Apache server is set up to run through configuration files, in which directives are added to control its behavior In its idle state, Apache listens to the IP addresses identified in its config file (httpd.conf) Whenever it receives a request, it analyzes the headers, applies the rules specified for it in the config file, and takes action
How Apache Works One server can host many websites, not just one though, to the outside world, they seem separate from one another To achieve this, every one of those websites has to be assigned a different name, even if those all map eventually to the same machine This is accomplished by using what is known as virtual hosts Since IP addresses are difficult to remember, we, as visitors to specific sites, usually type in their respective domain names into the URL address box on our browsers The browser then connects to a DNS server, which translates the domain names to their IP addresses The browser then takes the returned IP address and connects to it The browser also sends a Host header with the request so that, if the server is hosting multiple sites, it will know which one to serve back
How Apache Works For example, typing in www.google.com into your browser s address field might send the following request to the server at that IP address: 1 2 GET / HTTP/1.1 Host: www.google.com The first line contains several pieces of information. First, there is the method (in this case it s a GET), the URI, which specifies which page to be retrieved or which program to be run (in this case it s the root directory denoted by the /), and finally there is the HTTP version (which in this case is HTTP 1.1)
How Apache Works HTTP is a request / response stateless protocol, it s a set of rules that govern communication between a client and the server The client (usually but not necessarily a web browser) makes a request, the server sends back a response, and communication stops The server doesn t look forward for more communication as is the case with other protocols that stay at a waiting state after the request is over
How Apache Works If the request is successful, the server returns a 200 status code (which means that the page is found), response headers, along with the requested data The response header of an Apache server might look something like the following: 1 2 3 4 5 6 7 8 9 10 HTTP/1.1 200 OK Date: Sun, 10 Jun 2012 19:19:21 GMT Server: Apache Expires: Wed, 11 Jan 1984 05:00:00 GMT Cache-Control: no-cache, must-revalidate, max-age=0 Pragma: no-cache Last-Modified: Sun, 10 Jun 2012 19:19:21 GMT Vary: Accept-Encoding,User-Agent Content-Type: text/html; charset=utf-8 Content-Length: 7560
How Apache Works The first line in the response header is the status line. It contains the HTTP version and the status code. The date follows next, and then some information about the host server and the retrieved data The Content-Type header lets the client know the type of data retrieved so it knows how to handle it Content-Length lets the client know the size of the response body. If the request didn t go throw, the client would get an error code and message, such as the following response header in case of a page not found error:
Apache s general structure As mentioned earlier, Apache can be installed on a variety of operating systems Regardless of the platform used, a hosted website will typically have four main directories: htdocs, conf, logs, and, cgi-bin. htdocs is the default Apache web server document directory, meaning it is the public directory whose contents are usually available for clients connecting through the web. It contains all static pages and dynamic content to be served once an HTTP request for them is received
Apache s general structure conf is the directory where all server configuration files are located. Configuration files are basically plain text files where directives are added to control the web server s behavior and functionality. Each directive is usually placed on a separate line, and the hash (#) key indicates a comment so the line proceeded by it is ignored logs is the directory where server logs are kept, and includes Apache access logs and error logs cgi-bin is the directory where CGI scripts are kept. The CGI (Common Gateway Interface) defines a way for a web server to interact with external content-generating programs, which are often referred to as CGI programs or CGI scripts. These are programs or shell scripts that are written to be executed by Apache on behalf of its clients
Set up a personal home web server If you are NOT behind a firewall, you can access your web server from other computers by typing your computer's IP address into a web browser's address bar If you're not sure what your IP is, visit http://www.whatismyip.com/ to find out For example if your IP is 12.34.567.890, then type http://12.34.567.890 into a browser's address bar If you ARE behind a firewall (like a wireless router), you'll need to open up port 80 on the firewall and forward it to your computer
Access a server behind a router/firewall For users on a home network with a router installed, home servers are not accessible from the Internet because of many modern routers' built-in firewall Enabling outside access to an internal computer on a home network requires that we set up NAT (Network Address Translation) or port forwarding. Forwarding sends requests for ports on the outside of your firewall to the right computer on the inside For instance, someone on the outside requests a page from a web server at your router's IP address. With port forwarding set up, your router knows to forward requests for port 80 (a web server's default port) to the computer with the web server running only - and none of the others on your network
Access a server behind a router/firewall Port forwarding is only necessary when you want to expose a service to computers on the Internet outside your firewall Some servers you'd want to do that with: a home web server a personal wiki a BitTorrent client uploading as well as downloading a VNC server a home FTP server
Access a server behind a router/firewall Port forwarding is fairly simple, there are two steps to set it up: Determine server's internal IP address Configure the router
Determine server's internal IP address All the computers on your internal network have an IP address which looks something like 192.168.0.XXX To find out the internal IP address, open a command prompt and type ipconfig
Determine server's internal IP address
Configure the router Most routers have an web-based administrative interface that's located at http://192.168.0.1 (This address does depend on your model, consult your router user guide for more info) Once you've gone to the router administration, entered the password (if one is set up), there should be an area called "Port forwarding." There, you'll set the port number that requests from the Internet will come in, and the internal computer that should fulfill those requests Common services and their default port numbers: Web server 80 VNC (remote control) 5900 Instiki wiki 2500 FTP 21 BitTorrent 6881-6990