Content Transformation with Apache



Similar documents
What's new in httpd 2.2?

Painless Web Proxying with Apache mod_proxy

Apache 2.2 and mod_proxy_balancer

Apache HTTP Server. Load-Balancing with Apache HTTPD 2.2 and later. Erik Abele

10CS73:Web Programming

SiteCelerate white paper

Computer Networks. Lecture 7: Application layer: FTP and HTTP. Marcin Bieńkowski. Institute of Computer Science University of Wrocław

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents

Internet Technologies_1. Doc. Ing. František Huňka, CSc.

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Varnish the Drupal way

Nginx 1 Web Server Implementation

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Accelerating Zope applications with Squid and ESI

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with Apache Tomcat and Apache HTTP Server

The mod_proxy Cookbook

Module 6 Web Page Concept and Design: Getting a Web Page Up and Running

Oracle9i Application Server: Options for Running Active Server Pages. An Oracle White Paper July 2001

Web Design for Programmers. Brian Hogan NAPCS Slides and content 2008 Brian P. Hogan Do not reproduce in any form without permission

Speed up your web site. Alan Seiden Consulting alanseiden.com

Blackbox Reversing of XSS Filters

LabVIEW Internet Toolkit User Guide

Advanced Webserver Setup for WebGUI. WebGUI User Conference Chicago, October 2004 Leendert Bottelberghs, United Knowledge

Step into the Future: HTML5 and its Impact on SSL VPNs

An Esri White Paper January 2010 Performance and Throughput Tips for ArcGIS Server Cached Map Services and the Apache HTTP Server

Binonymizer A Two-Way Web-Browsing Anonymizer

Content. Global Delivery Network: Folders

HTTP Caching & Cache-Busting for Content Publishers

What s new in Apache HTTP Server 2.2. Jim Jagielski jim@jagunet.com

Accelerating Rails with

Chapter 4: Networking and the Internet

LAMP Secure Web Hosting. A.J. Newmaster & Matt Payne 8/10/2005

7 Why Use Perl for CGI?

Data: Small and Big Data files and formats

Office Fax

CSE 135 Server Side Web Languages Lecture # 12. Web Performance Notes

1. Introduction. 2. Web Application. 3. Components. 4. Common Vulnerabilities. 5. Improving security in Web applications

Technical specification

CentraSite SSO with Trusted Reverse Proxy

Learning Web Design. Third Edition. A Beginner's Guide to (X)HTML, Style Sheets, and Web Graphics. Jennifer Niederst Robbins

Transport Layer Security Protocols

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

A bandwidth optimization company

Chapter 10: Multimedia and the Web

Basic Internet programming Formalities. Hands-on tools for internet programming

Web Pages. Static Web Pages SHTML

Novell Access Manager

Implementation of Web Application Firewall

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

FileMaker Server 15. Custom Web Publishing Guide

Fast track to HTML & CSS 101 (Web Design)

XML Processing and Web Services. Chapter 17

XML: ITS ROLE IN TCP/IP PRESENTATION LAYER (LAYER 6)

Taking your HTML s to the Next Level. Presented by: Joey Trogdon, Asst. Director of Financial Aid & Veterans Affairs Randolph Community College

Adding web interfaces to complex scientific computer models brings the following benefits:

FileMaker Server 9. Custom Web Publishing with PHP

FileMaker Server 13. Custom Web Publishing with PHP

Web application development landscape: technologies and models

Embracing Device Diversity. MobileAware Founding Member of W3C MWI

Moore s Law and Network Optimization

Introduction to Computer Security

Reseller Quick Start Guide 1. Domain & Name Servers...2. Payment Gateways Dedicated Servers Setup Your Landing Page/Website...

FileMaker Server 13. Custom Web Publishing with XML

Interspire Website Publisher Developer Documentation. Template Customization Guide

Content Management Systems: Drupal Vs Jahia

Core Feature Comparison between. XML / SOA Gateways. and. Web Application Firewalls. Jason Macy jmacy@forumsys.com CTO, Forum Systems

Basic & Advanced Administration for Citrix NetScaler 9.2

Lesson Review Answers

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP Edge Gateway for Layered Security and Acceleration Services

Credits: Some of the slides are based on material adapted from

JISIS and Web Technologies

Adding Advanced Caching and Replication Techniques to the Apache Web Server

Xtreeme Search Engine Studio Help Xtreeme

Chapter 1 Programming Languages for Web Applications

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

Smithsonian Institution Archives Guidance Update SIA. ELECTRONIC RECORDS Recommendations for Preservation Formats. November 2004 SIA_EREC_04_03

Ricoh HotSpot Printer/MFP Whitepaper Version 4_r4

Joomla! Actions Suite

Terms and Definitions for CMS Administrators, Architects, and Developers

Working with FirePass Portal Access (Reverse Proxy)

DOCUMENTS ON WEB OBJECTIVE QUESTIONS

Course Name: Course in JSP Course Code: P5

PHP and XML. Brian J. Stafford, Mark McIntyre and Fraser Gallop

Localizing dynamic websites created from open source content management systems

PROXY SETUP WITH IIS USING URL REWRITE, APPLICATION REQUEST ROUTING AND WEB FARM FRAMEWORK OR APACHE HTTP SERVER FOR EMC DOCUMENTUM EROOM

Web Hosting. Comprehensive, scalable solutions for hosting dynamic websites, secure web services, and enterprise applications.

Developers Guide. Designs and Layouts HOW TO IMPLEMENT WEBSITE DESIGNS IN DYNAMICWEB. Version: English

Beginning Smartphone Web Development

Streaming Stored Audio & Video

ART 379 Web Design. HTML, XHTML & CSS: Introduction, 1-2

1. Introduction 2. Getting Started 3. Scenario 1 - Non-Replicated Cluster 4. Scenario 2 - Replicated Cluster 5. Conclusion

CIS 467/602-01: Data Visualization

P and FTP Proxy caching Using a Cisco Cache Engine 550 an

Integrating AJAX Approach into GIS Visualization Web Services

GUIDE TO CODE KILLER RESPONSIVE S

LifeSize UVC Video Center Deployment Guide

Web Intrusion Detection with ModSecurity. Ivan Ristic

Developing ASP.NET MVC 4 Web Applications MOC 20486

Transcription:

Content Transformation with Apache Nick Kew WebThing Limited http://www.webthing.com/ and Apache Software Foundation http://www.apache.org/

Content Transformation History Technology Foundations Projects, Applications and Modules

Prehistory: Apache 1 CGI PHP Scripting language modules Java/Tomcat Application Servers Advanced techs: Axkit, gd, etc.

History: Apache 2 2002: First production release of Apache 2 2002: First filter modules, XSLT, Accessibility Proxy & Content Assembly 2003: Modular filter applications architecture, Reverse Proxy, xmlns, ESI, WML gateway 2004: Accelerator Proxy, image processing proxy, smart filtering, mod_publisher, mod_annot, embedded SQL. 2005: Line editor. Apache 2.2.

Proxy Apache 1: simple all-in-one HTTP/1.0 proxy module Apache 2: HTTP/1.1; separate mod_proxy, mod_cache protocol framework (HTTP, FTP, AJP,...) Load Balancing Apache 2.2: proxy+cache matures

Cache History Apache 1: HTTP/1.0 Cache Apache 2: HTTP/1.1 Support Apache 2.2: Mature HTTP/1.1 Future (provisional): Cache Framework mod_cache becomes mod_cache_http Support other cacheing strategies such as ESI, CML, mod_carp

Apache 2: Filters

The Data Axis

Filter Types Content SSI, XML/XSLT Content Encoding Compression, Charset Protocol Headers, Byteranges Protocol Encoding Chunking Connection SSL, Bandwidth Network TCP, AcceptFilter

Content transformation mod_includes (SSI) mod_charset_lite (iconv) mod_deflate (compression) Many third-party modules XSLT filters mod_proxy_html etc

Filtering Proxy with Cache <VirtualHost 192.168.65.19> ServerName proxy.example.com ProxyRequests On Order Deny,Allow Deny from All Allow from 192.168.65 CacheEnable disk / CacheRoot /var/spool/www-cache FilterProvider transform accessibility $text/html FilterChain unpack transform repack </VirtualHost>

Reverse Proxy ProxyRequests Off ProxyPass /foo/ http://foo.example.com/ ProxyHTMLURLMap http://foo.example.com /foo <Location /foo/> ProxyPassReverse / ProxyHTMLURLMap / /foo/ FilterProvider transform proxy-html $text/html FilterChain unpack transform repack </Location> CacheEnable disk /foo/ CacheRoot /var/spool/www-cache

Content Assembly Filter Source Handler Source

Content Assembly SSI <!--#include... --> ESI <esi:include...> SQL <sql:select...> Scripting, Templating,... Data Discovery Aggregation Image Processing Caching often crucial to performance

Content Assembly At the browser: <img src...> <object data...> frame, applet, script,... No concern of Apache, except to ensure correct HTTP (caching, etc). But we need to consider where content assembly is best accomplished.

Multi-level Caching Cache Each Subrequest may include cache Assembly Cache Cache Cache Source Source Source

Accessibility BBC 'betsie' Perl script Accessibility Proxy (2002) mod_accessibility (2003) http://apache.webthing.com/mod_accessibility/betsie.html SAX parser -> speed and scalability any Content Generator

Reverse Proxy (2003)

XML namespaces (2003)

Accelerator Proxy (2004) Mobile Phones Low bandwidth, high latency Client capabilities string Goal: Aggressively minimise byte count for all media types. Client keepalive competes with parallelism

Reducing Byte Count Compress all compressible types (eg text) HTML: strip everything non-significant based on DTD representing the browser Images: Downsample to fit screen size & palette Many filters in a single proxy - problems Dispatching to the right filters Coping with malformed input

Issues Arising Many filters in a single proxy problems Dispatching to the right filters --> mod_filter Coping with malformed input --> correction Non-problems SSL and Compressed input work just fine within the existing filter framework (though an additional decompression filter was required in mod_deflate)

Text Transformation Steps (Decrypt) (Uncompress) (Transcode) (Error Correction) Application Filter (Transcode) (Compress) (Encrypt)

Image Transformation Steps (Decrypt) Unencode [gif/jpeg/png] --> pixels Image Processing Filters Encode pixels to [gif/jpeg/png] (Encrypt) Note: this works for arbitrary input types, including multi-layer, multi-source input types such as encountered in GIS or CAD.

Multi-type filter Origin Uncompress Decode GIF Reduce Image Decode PNG Decode JPEG Reduce HTML Reduce CSS Reduce XML Encode GIF Encode JPEG Encode PNG Compress Client

Dispatching: mod_filter

Multi-type filter FilterProvider unpack INFLATE $text FilterProvider unpack djpeg image/jpeg FilterProvider unpack gifunpack image/gif FilterProvider reduce htmlstrip $text/html FilterProvider reduce downsample $image FilterProvider repack DEFLATE $text FilterProvider repack cjpeg image/jpeg FilterProvider repack gifpack image/gif FilterChain unpack reduce repack

Multi-type filter Compress Text Client Repack jpeg gif cjpeg gifpack downsample Image Reduce HTML CSS htmlstrip cssstrip djpeg gifunpack jpeg gif Unpack Text Uncompress Origin

Generic Markup Filtering (2004)

Generic Text Filtering mod_line_edit: sed -like editor Line-oriented: overcomes the problem of arbitrary breaks whilst streaming Generic editing for non-markup text Can also be used in front of a markup filter to correct input errors

Example - Markup Fixup Normalisation to HTML 4 or XHTML 1 <p align= right >paragraph text <font...>different text</font> and some more text.</p> Text-only representation <img src= apachecon.png alt= ApacheCon >

Example Analysis: TOC <h1>... </h1> <p>bla bla bla</p> <h2>... </h2>... <table summary=... otherattributes> <caption>...</caption>... </table>

Example - Linearisation <table><tr> <td>first text presented in left column</td> <td><img src=... alt= image in middle ></td> <td>second text presented in right column</td> </tr></table> <div class= table > <p>first text...</p> <p><img src=... alt= image in middle ></p> <p>second text...</p> </div>

Example Data Discovery <p>to learn more, <a href= more.html >click here</a>.</p> <p>to learn more, <a href= more.html title= Data Discovery >click here</a>.</p>

Example - Templating var sponsor file /path/to/sponsors.inc startelement handler: <sponsor/> <p class= sponsor >Sponsored by...</p> Comment Handler <!--#include file= /path/to/sponsors.inc --> <p class= sponsor >Sponsored by...</p>

Example Sitewide markup <head>, </head>, <body>, </body> are implied, so we can hook site-wide templates on them. <link rel= stylesheet type= text/css href= style.css /> </head> <body> <div id= logo ><a href= / ><img src= logo.gif alt= My Site /></a></div> body text... <div id= navbar >... </div> <div id= footer >... </div> </body>

References Applications Development with Apache (the apache modules book). http://httpd.apache.org/docs/2.2/mod/mod_filter.html, mod_proxy.html, mod_cache.html http://apache.webthing.com/ (best selection of filters for text and markup). http://apache.webthing.com/xmlns.html http://www.apachetutor.org/apps/annot http://www.apacheweek.com/features/reverseproxies http://www.miswebdesign.com/resources/articles/mod_accessibility.html http://www.xml.com/pub/a/2004/12/15/apache-namespaces.html http://www.idealliance.org/xtech/05/call/xmlpapers/02-04-02.1566/.02-04-02.html