1 Web Architecture I Web Architecture I u www.tugraz.at
2 Outline Development of the Web Quality Requirements HTTP Protocol Web Architecture A Changing Web Web Applications and State Management Web n-tier Architecture Web Data Management
3 Introduction u www.tugraz.at
4 History of the web Devised 1989 to deliver static content Hypermedia: documents linked into a web Navigate by flowing links Underlying standards HTTP (Hyper Text Transfer Protocol) HTML (Hyper Text Mark-up Language) URL (Uniform Resource Locator) All underlying standards Simple Free of charge Tim Berners-Lee [Wikipedia] Robert Cailliau [Wikipedia]
5 World Wide Web vs. Internet https://en.wikipedia.org/wiki/world_wide_web#mediaviewer/file:internet_key_layers.png [Wikipedia]
6 Growth of the Web I
7 Growth of the Web II Time to reach 50 million people Telephone: 75 years Radio: 35 years TV: 13 years WWW: 4 years
Per 100 inhabitants 8 Growth of the Web III 100 90 80 70 60 Global ICT developments, 2001-2014 Mobile-cellular telephone subscriptions Individuals using the Internet Fixed-telephone subscriptions Active mobile-broadband subscriptions Fixed (wired)-broadband subscriptions 95,5 50 40 30 20 10 0 40,4 32,0 15,8 9,8 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014* Note: * Estimate Source: ITU World Telecommunication /ICT Indicators database
9 Quality Requirements u www.tugraz.at
10 Quality attributes I Usability - it must be very easy to use I.e. very easy to create, structure and reference information Participation was voluntary and it was the only possibility to attract the users Very error forgiving in structuring and referencing because of non-technical background of users Some things might look different from today s point of view
11 Quality attributes II Technical simplicity - it must be very easy for developers to implement All components simple and text-based I.e. the first version of HTTP: servers need to respond to the GET method HTML very simple: easy to write parsers and browsers URLs extremely simple
12 Quality attributes III Extensibility - it must be easy to add new features The first versions of components (standards) where very simple - improvements were needed User requirements change even in a closed environment In a global scope the change is only feature that does not change Examples: users wanted to have search facility apart browsing Interaction with the content HTML forms were introduced
13 Quality attributes IV Scalability - it needs to match the Internet-scale anarchic scalability (think about growth rate) The Internet is not under control of a single organization it is totally decentralized Need to continue operating when under an unanticipated load or malformed or maliciously constructed data Examples: 40,000 Google search queries every second https://en.wikipedia.org/wiki/list_of_most_viewed_ YouTube_videos
14 Quality attributes V Anarchic scalability - consequences Clients cannot be expected to maintain knowledge of all servers Make it searchable! Servers cannot be expected to retain knowledge of state across requests Make it stateless! Documents cannot have back-links: the number of references to a resource is proportional to the number of people interested in that information (Google PageRank)
Conclusion of all the quality attributes 15 Development of the Web The original Web was not designed to meet all of the requirements and quality attributed defined above It lacked also an architectural vision that would meet these ambitious requirements World Wide Web Consortium (W3C) was founded to solve these problems A lot of researchers worked on defining an architecture to meet these needs Security and Encryption was not mentioned at all
16 Web Protocols - HTTP u www.tugraz.at
17 Overview Content HTML (Hyper Text Mark-up Language) Identification URL (Uniform Resource Locator) Communication / information exchange HTTP (Hyper Text Transfer Protocol) Based on TCP Connections, where TCP itself is based on IP Original design was completely stateless
18 HTTP Characteristics Text based protocol, human readable Request consists of: Method Number of headers (key-value pairs) Some methods allow a payload Response includes Status code Number of headers (key-value pairs) Depending an the request, a payload is returned
19 HTTP Examples Request GET /webpage/index.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) Cookie: JSESSIONID=9C6694142332E65F0CB175BDF1758243; Response HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/html;charset=iso-8859-1 Content-Length: 64 Date: Wen, 03 Dec 2014 14:15:05 GMT <content>
20 HTTP Versions HTTP/0.9 - released in 1991 HTTP/1.0 - released in 1996 Stateless, i.e. each request is done in a new TCP session HTTP/1.1 - Todays standard Reusing of TCP sessions can increases the throughput (keep alive flag) Header specifying the content length needed HTTP/2 - different drafts are already tested HTTP/3 talks have already started
21 Web Architecture u www.tugraz.at
22 Deriving the Web architecture Introducing constraints on the Web architecture to obtain an optimal solution to the requirements and quality attributes Each constraint will have advantages and disadvantages The whole design process is then a balancing process Optimisation to obtain a best-match for the Web architecture
23 Client-Server: Separation of concerns I
24 Client-Server: Separation of concerns II
25 Client-Server: Separation of concerns III Separates user-interface from data manipulation concerns Supports independent evolvability Clients and servers can be developed independently and across organizational boundaries E.g. someone uses Google Maps on their own homepage Supports Internet-scale attribute
26 Stateless I
27 Stateless II Communication must be stateless in nature Each request from client must contain all the information needed to process that request I.e. it can not take advantage of session information stored on the server Session state is completely on the client Possible Drawback Information might need to be send multiple times Important Benefits are visibility, reliability and scalability
28 Stateless III Visibility: Only look at a single request to determine the full nature of the request Reliability: It eases the task of recovering from partial failures Scalability: Server can free resources after each request Simplifies implementation because servers do not need to manage information across multiple requests
29 Cache I
30 Cache II Information can be labeled (by servers) as cacheable If a response is cacheable, then a client cache is given the right to reuse that response data for later, equivalent requests Advantage: Improves efficiency, scalability, userperceived performance Disadvantage: Decreases reliability if the data does not match Midway: ask a server if the data has changed
31 Uniform interface I
32 Uniform interface II Uniform interface between components Advantages: Visibility of interactions is improved Simplifies the overall architecture Decouples implementations from the services Improves Internet-scale Disadvantages: Degrades efficiency
33 Uniform interface III Prerequisites for a uniform interface Unambiguous Identification of resources (URL) Manipulation of resources through representations In the beginning: HTML Later: Extensible Markup Language (XML - still widely used) Now: JavaScript Object Notation (JSON) Self-descriptive messages HTTP Methods describe the action (GET, POST, PUT, DELETE)
34 Layered system I https://upload.wikimedia.org/wikipedia/commons/c/c4/ip_stack_connections.svg [Wikipedia]
35 Layered system II Improves Internet-scale Application composed of layers that are only aware of the neighbouring components not the complete system Bounds complexity and promotes independence between components Each laver Uses the service of the underlying layer Provides a service to the layer above Communicates to peer-layers in the neighbouring comp
36 Layered system III Supports scalability by introduction of proxies, shared-caches, gateways E.g. load-balancing behind a gateway Reduce user-perceived performance because they add processing overhead
37 Code on demand I
38 Code on demand II Client functionality extension by downloading code Advantages: Improves extensibility Independent development Be aware of security concerns! Technologies JavaScript (by far most important) Flash (is loosing ground fast) Java applets (already dead) Microsoft Silverlight (was that ever used?)
39 A Changing Web u www.tugraz.at
40 The Web evolved as a platform I The Web evolved as a platform Started out with simple Homepages with static documents (1990s) Developed into more and more interactivity (2000s) Now the web is a complex system of different types, applications, services, Two faces of the Web nowadays The Web as an application platform The Web as a huge distributed database
41 The Web evolved as a platform II
42 Web Applications and State Management u www.tugraz.at
43 What are the issues when building Web applications? User requirements User interface and usability Application state (manage state) and hypertext (navigate) Addressability Architecture Scalability Performance Fast development circles
Web Application 1 Web Application 2 Web Application 3 44 Traditional Stack of Web-Applications Example: Apache Tomcat Application logic Answers the requests Manages the sessions Servelts packaged in a war-file Web Application Server Web Server Support for Session handling Servlet-container Catalina -part of Tomcat Stateless connection handling (HTTP) Coyote -part of Tomcat Operating System Virtual Machine / Hardware
Web Server Web Server Web Server Web Application 1 Web Application 2 Web Application 2 45 Modern Stack of Web-Applications Example: Dropwizard Each Web application is an application on OS-Level Web-Server functionality is provided by library / framework (i.e. part of Dropwizard) Result is one complete Java application as jar-file Session Handling is done by the application (with library / framework) Solves scaling issues of traditional stack Provides better isolation of applications Operating System (OS) Virtual Machine / Hardware
46 Session Tracking HTTP is stateless Sessions are tracked by unique identifiers (Session- Id) Session-Ids are transmitted from and to the server As part of the URL (URL rewriting, permalink) In the HTTP Header (Cookies) Sessions must either be tracked by Application Client Both?
47 Session Tracking on the server Cookies or URL rewriting can be used Web server provides only low-level tracking I.e. they provide the framework for session tracking not the full logic Application server has other responsibilities as well Can lead to serious scalability problems Load balancing between server becomes complicated Handover form one server to another in one session gets difficult or even impossible
48 Session Tracking on the client URL rewriting can be used Transfer parts of the application logic into the client (Code on demand) Manage it there with AJAX (Asynchronous JavaScript and XML) But other problems arise How to recover states with a new session: AJAX applications have typically single URLs? How to recover previous state, i.e. browser back button problem?
49 Session Tracking on the client and server The optimal solution is typically somewhere in the middle: Manage only important states on the server Give each state an own URL Use linking to relate states to each other No management of the state on the server: no scalability problems No management of the state on the client: no recovery problems
50 Session Management - URL Rewriting Advantages Meaningful, easier for humans, readable URL can be bookmarked, share with others Search engines can retrieve different parts and index it Advantages for service integration, as you might link services to each other Make different content representations addressable (HTML for humans, XML or JSON for services) Disadvantages Too long links, Browser limits are usually 2048 or 4096 characters
51 Session Management - Cookies Advantages Can store more data Limit depends on browser (4kB to 10MB per domain) Short URLs are kept Disadvantages Might be difficult for the user to grasp, as nothing is seen Legal issues Must be used with care, use URL rewriting when whenever possible
52 Session Management - Example Google Maps uses AJAX to maintain a permalink Any action that you execute changes the permalink The permalink is kept as a part of HTML This is the equivalent of the address bar
53 Session Management - Example
54 Session Management - Example A little bit of extra DOM/JavaScript work keeps the Permalink up to date as you navigate Every point on the map is a separate application state that has its own URL Application states were destroyed by AJAX but was put back by application design It allowed communities to grow around the Google Maps application Only because of proper management of application states with URLs
55 Web n-tier Architecture u www.tugraz.at
56 Starting point - 2-layer applications Everything runs on the server One and the same scripts implements application logic and the presentation (e.g. generating of HTML) Application / Presentation Scripts (e.g. PHP) Data Management Relational database Application / Presentation Data management
57 Problems of 2-layer applications Mixture of application and presentation related functionality Changes in application logic lead to changes in presentation functionality and vice versa E.g. changing a table that present some application data leads to changes in the return values of some application specific functions Even more dangerous the presentation layer talks directly to the database via a data manipulation language (DML) Better modularity is achieved with the third layer
58 Evolvement - 3-layer applications Separation between Application and Presentation layer No direct connection between Presentation and Data Management Decoupling of Application and Presentation layer Possibility to exchange Presentation layers Example: Making a Web gateway to an existing application Old GUI (e.g. a standalone GUI) is replaced with a Web GUI
59 3-layer applications - Architecture Presentation tier HTML, templates and scripts to generate HTML Application logic tier actual application, the business logic Data access tier manages persistent application data User Interface Process Logic Data management
60 3-layer applications - Surroundings User interacts via the Web browser Complete Work is done in the Web application Provide GUI Do the actual logic Load & store data Persistence Backend realised with a relational database Web Browser HTTP Web Application User Interface Process Logic Data management SQL Database
61 3-layer applications - Client-side / Browser inclusion I With introduction of AJAX different possibilities where to situate tiers E.g. presentation in browser: HTML + (presentation) JavaScript, application and data access on server E.g. presentation and application in browser: HTML + (presentation and application) JavaScript, data access on server Note: May require additional considerations in regard to security (if the application logic is done on the client)
62 3-layer applications - Client-side / Browser inclusion II Web Browser User Interface Process Logic HTTP Web Application Data management SQL Database Web Browser User Interface HTTP Web Application Process Logic Data management SQL Database
63 3-layer applications - Model-View-Controller There are numerous architecture variants built on the top of N-tier architectures In traditional software engineering User-oriented database applications are built with an N-tier architecture The most important for Web applications: Model- View-Controller architecture It was invented in the early days of GUIs To decouple the graphical interface from the application data and logic Very useful also for Web applications
64 3-layer applications - Current state Client User Interface Other Web Apps. HTML, JSON/XML over HTTP Web Application Process Logic Static Content Data management Browsers combine static content (HTML) with dynamic data Other Web Application only use the dynamic data Web Application provides different endpoints for static and dynamic content SQL Database NoSQL Database HTTP Web Application Combination of existing DBs/services with new ones
65 Web Data Management u www.tugraz.at
66 Data Backbone Often Web applications deal with relational databases Need to manage relational data in object-oriented applications Use design patterns like Data Access Object (DAO) Use object/relational mapping (ORM), like Hibernate framework or Java Persistence API (JPA)
67 Web as a database The Web we use is full of data Book information, opinions, prices, arrival times, blogs, tags, tweets, etc. The data is organized around a simple data model: node-link model Each node is a data item that has a unique address and a representation Representation formats are e.g. HTML, PDF,... for humans, or e.g. XML, JSON for programs Nodes can be interlinked using their unique addresses
68 Information retrieval How to find what I m looking for (again)? The mainstream approach are search engines with full-text processing Another approaches analyze links Links in databases, or within/between documents/sites Mixed approach: full-text and links, e.g. Google
69 Managing Metadata Metadata is data about other data, often semistructured On the web Tag information items (everything that you can access via URL) in a structured manner Social Web 2.0 applications http://del.icio.us or http://www.flickr.com Semantic annotation of Web content (Microformats) Search inside metadata
70 Web Architecture I Web Architecture I u www.tugraz.at