Web Programming Robert M. Dondero, Ph.D. Princeton University 1
Objectives You will learn: The fundamentals of web programming... The hypertext markup language (HTML) Uniform resource locators (URLs) The hypertext transfer protocol (HTTP) 2
HTML A simple instance of the Standard Generalized Markup Language (SGML) SGML: you define the tags HTML: set of tags is predefined 3
HTML Documents An HTML document: Consists of plain text Text contains content and markup <tag>...</tag> <tag attribute="value">... </tag> <tag> <tag /> Markup describes presentation, i.e. how the text should be presented/rendered Tags and attributes are case insensitive 4
Rendering HTML Documents Renderer usually is a Web browser Popular browsers: Microsoft Internet Explorer (53%) Mozilla Firefox (29%) Google Chrome (8%) Apple Safari (6%) Opera (2%) (Several others) 5
Rendering HTML Documents Browser notes: Each browser provides a way to examine the HTML document it has rendered Firefox: View Page Source Good learning/debugging tool There are substantial differences among browsers Microsoft doesn't feel obliged to conform to standards Use Firefox for Web-related assignment(s) 6
Versions of HTML HTML 4.01 Strict Requires complete adherence to HTML 4.01 spec HTML 4.01 Transitional Allows some deprecated elements and attributes We'll use 7
Variants of HTML XML More strict syntax All tags must be closed by another All tags must be correctly nested... You define the tags! Tags can be semantic in nature See upcoming "XML" lecture 8
Variants of HTML XHTML 1.0 Same as HTML 4.01, but requires use of XML syntax 9
HTML Details See showhtml.html Document structure <!DOCTYPE...> <html> <head> <title>...</title> </head> <body>... </body> </html> 10
HTML Details Comments Heading tags Paragraph-level tags Empty tags Physical character formatting tags Logical character formatting tags Entity references Character references Lists Tables 11
Uniform Resource Locators Uniform Resource Locator (URL) Format: protocol://host:port/file protocol host We'll use http Others: file, https, ftp, mailto, See http://en.wikipedia.org/wiki/uri_scheme An IP address or domain name Recall "Network Programming" lecture 12
Uniform Resource Locators port A number Recall "Network Programming" lecture For HTTP protocol, default port is 80 file A filename Can specify a path Default file specified in web server settings Often index.html, index.php 13
Uniform Resource Locators Examples: http://www.cs.princeton.edu/~rdondero/index.html http://www.cs.princeton.edu:80/~rdondero/index.html 14
HTML (cont.) See showhtml.html (again) Links and anchors Forms Each "page link" specifies a URL Each form specifies a URL User commands browser to fetch page at a URL by: Typing the URL in the browser Clicking on a page link Submitting a form 15
Hypertext Transfer Protocol Hypertext Transfer Protocol (HTTP) A client/server protocol Server = web server Apache web server Apache Tomcat web server (written in Java, can interpret Java) Microsoft Internet Information Services (IIS) web server (for MS Windows) Client = browser (usually, but not necessarily) 16
HTTP Details Question: What happens when you: Type a URL which specifies the HTTP protocol? Click on a page link whose URL specifies the HTTP protocol? Submit a form whose URL specifies the HTTP protocol? Answer... 17
HTTP Details Browser Or could be POST; see next lecture Socket GET file HTTP/1.1 Host: host <Blank line> Redundant. Why? Web Server File system file 18
HTTP Details File system Web Server Socket HTTP/1.1 200 OK Date: date Server: server Content-Type: text/html <Blank line> <Contents of file> There are many others... Browser A "program" interpreted by the browser as per the content type 19
HTTP Content Types Content types text/html text/plain image/gif image/jpeg audio/mp4... See this page: http://en.wikipedia.org/wiki/internet_media_type 20
The Princeton CS Web Server Place html files in CS Dept file system (penguins) in this directory: ~YourLoginid/public_html Change directory/file permissions: chmod 755 ~YourLoginid chmod 755 ~YourLoginid/public_html chmod 644 ~YourLoginid/public_html/yourFile.html Browse to files using this URL: http://www.cs.princeton.edu/~yourloginid/yourfile.html 21
The Princeton CS Web Server Beware: Web server demands that directories/files be accessible to all Rules concerning plagiarism apply 22
HTTP via a Browser Use a browser to visit: http://www.cs.princeton.edu http://www.cs.princeton.edu:80 http://www.cs.princeton.edu/~rdondero/ http://www.cs.princeton.edu/~rdondero/index.html 23
HTTP via Telnet Using telnet: $ telnet www.cs.princeton.edu 80 GET / HTTP/1.1 Host: www.cs.princeton.edu <Enter> $ telnet www.cs.princeton.edu 80 GET /~rdondero/ HTTP/1.1 Host: www.cs.princeton.edu <Enter> $ telnet www.cs.princeton.edu 80 GET /~rdondero/index.html HTTP/1.1 Host: www.cs.princeton.edu <Enter> 24
HTTP via Python Code See browser.py Try: browser.py www.cs.princeton.edu 80 / browser.py www.cs.princeton.edu 80 /~rdondero/ browser.py www.cs.princeton.edu 80 /~rdondero/index.html 25
HTTP via Java Code See Browser.java Try: java Browser www.cs.princeton.edu 80 / java Browser www.cs.princeton.edu 80 /~rdondero/ java Browser www.cs.princeton.edu 80 /~rdondero/index.html 26
Summary We have covered: The fundamentals of web programming... The hypertext markup language (HTML) Uniform resource locators (URLs) The hypertext transfer protocol (HTTP) 27