A Practical Guide to Content Deliuery Networks Second Edition Gilbert Held CRC Press Taylor & Francis Group Boca Raton London NewYork CRC Press is an imprint of the Taylor & Francis Croup, an informa business AN AUERBACH BOOK
Contents Preface Acknowledgments xiii xvii Chapter 1 Introduction to Content Delivery Networking 1 1.1 Hie Modern Content Delivery Network 1 1.1.1 Advantages 2 1.1.2 Disadvantages 4 1.2 Evolution 5 1.2.1 Client-Server Computing 6 1.2.1.1 Client-to-Mainframe Data Flow 6 1.2.1.2 Modern Client-Server Operations 9 1.2.2 Use of Video Servers 11 1.2.2.1 Video Length 11 1.2.2.2 Video Resolution 12 1.2.2.3 Frame Rate 12 1.2.2.4 Color Depth 12 1.2.2.5 Data Compression 13 1.2.3 Server NetworkArchitecture 13 1.2.3.1 Two-Tier Architecture 14 1.2.3.2 Three-Tier Architecture 14 1.2.4 Tie Road to Push Technology 16 1.2.4.1 Teletext Systems 16 1.2.4.2 Videotext 17 1.2.5 Pull Technology 17 1.2.5.1 Role of Caching 18 1.2.5.2 Pull Limitations 22 V
VI CONTENTS 1.2.6 Multicast 23 1.2.6.1 Advantages 24 1.2.6.2 Addresses 24 1.2.6.3 Limitations 25 1.2.7 Push Technology 26 1.2.7.1 Evolution 26 1.2.7.2 Crawling 27 1.2.7.3 Feeds 28 1.2.7.4 Advantages 30 1.2.7.S Disadvantages 30 1.3 Content Delivery Networking 32 1.3.1 Client-Server Operations on the Internet 32 1.3.2 Client Server Operating on the Same Network 33 1.3.3 Client-Server Operations on Different Networks 33 1.3.4 Peering Point 33 1.3.5 Video Considerations 38 Client-Server Models 41 2.1 Overview 42 2.2 Client Operations 43 2.2.1 URLs 43 2.2.1.1 Absolute and Relative 46 2.2.1.2 Shortening URLs 47 2.2.2 HTML 47 2.2.2.1 Versions 47 2.2.2.2 HTML Documents 48 2.2.2.3 Font Control 49 2.2.2.4 Hypertext Links 50 2.2.2.5 Adding Images 50 2.2.2.6 Adding Video 52 2.2.3 HTTP 56 2.2.3.1 Versions 56 2.2.3.2 Operation 56 2.2.3.3 HTTP 1.1 59 2.2.3.4 State Maintenance 61 2.2.4 Browser Programs 62 2.2.4.1 Helpers 64 2.2.4.2 Plug-Ins 65 2.2.4.3 Java 65 2.2.4.4 VBScript 68 2.2.4.5 ActiveX 69 2.3 Server Operations 70 2.3.1 Evolution 70 2.3.2 Common Web Server Programs 71 2.3.2.1 Server Characteristics 71
CONTENTS VII 2.3.3 Application Servers 74 2.3.3.1 Access 74 2.3.3.2 Java Application Servers 75 2.3.3.3 General Server Tools 76 2.3.3.4 Microsoft's.NET Framework 77 2.4 Distance Relationship 78 2.4.1 Using Ping 78 2.4.2 Using Traceroot 80 Chapter 3 Understanding TCP/IP 83 3.1 The TCP/IP Protocol Suite 83 3.1.1 Protocol Suite Components 83 3.1.2 Physical and Data-Link Layers 84 3.1.2.1 MAC Addressing 85 3.1.2.2 Layer 3 Addressing 85 3.1.2.3 ARP 88 3.1.3 The Network Layer 88 3.1.3.1 IP Header 89 3.1.4 The Transport Layer 91 3.1.4.1 TCP 91 3.1.4.2 UDP 93 3.1.4.3 Port Meanings 94 3.2 The Domain Name System 95 3.2.1 Need for Address Resolution 96 3.2.2 Domain Name Servers 96 3.2.3 Top-Level Domain 97 3.2.4 DNS Operation 98 3.2.5 Configuring Your Computer 98 3.2.6 Root Name Servers 100 3.2.7 The NSLookup Tool 101 3.2.8 Expediting the Name Resolution Process 102 3.2.9 DNS Resource Records 103 3.2.9.1 SOAResource Record 103 3.2.9.2 Name Server (NS) Records 104 3.2.9.3 Address (A) records 104 3.2.9.4 Host Information (HINFO) Record 104 3.2.9.5 Mail Exchange (MX) 3.2.9.6 Canonical Name (CNAME) Records 105 Records 105 3.2.9.7 Other Records 105 Chapter 4 The CDN Model 107 4.1 Why Performance Matters 107 4.1.1 Economics of Poor Performance 108 4.1.2 Predictability 109 4.1.3 Customer Loyalty 110 4.1.4 Scalability 111
CONTENTS 4.1.S Flexibility 112 4.1.6 Company Perception 112 4.1.7 Summary 113 4.2 Examining Internet Bottlenecks 113 4.2.1 Entry and Egress Considerations 113 4.2.2 Access Delays 114 4.2.3 Egress Delays 121 4.2.4 Benefits of Edge Servers 124 4.2.5 Peering Points 125 4.2.5.1 Rationale 125 4.2.5.2 Peering and Transit Operations 126 4.2.5.3 Transit and Peering Operations 130 4.2.5.4 Global Structure of Peering Points 134 4.2.5.5 Representative Peering Points 135 4.3 4.2.5.6 Peering Point Delays 144 Edge Operations 148 4.3.1 CDN Operation 149 4.3.2 The Akamai Network 149 4.3.2.1 Type of Content Support 150 4.3.2.2 Centralized Web Site Access 150 4.3.2.3 Edge Server Model 151 4.3.2.4 Limitations 153 4.3.3 Edge Side Includes 153 4.3.3.1 ESI Support 156 4.3.3.2 Inclusion and Conditional Inclusion 157 4.3.3.3 Environmental Variables 157 4.3.3.4 Exception and Error Handling 157 4.3.3.5 Language Tags 158 4.3.3.6 Tne ESI Template 158 4.3.4 Edge Side Includes for Java 159 4.3.5 Statistics 160 4.3.6 Summary 161 4.4 The Akamai HD Network 161 4.4.1 Using the HD Network with Flash 162 4.4.1.1 Selecting the Client Population 163 4.4.1.2 Selecting Bit Rates 163 4.4.1.3 Selecting Frame Sizes 163 4.4.1.4 Profiles 4.4.1.5 Levels 164 165 4.4.1.6 Keyframes 165 Caching and Load Balancing 167 5.1 Caching 167 5.1.1 Browser Cache 168 5.1.2 Other Types ofweb Caches 169 5.1.2.1 Proxy Caches 169 5.1.2.2 Gateway Caches 172 5.1.2.3 Server Caches 173
CONTENTS IX 5.1.3 Application Caching 173 5.1.4 Cache Operation 174 5.1.5 Cache Control Methods 175 5.1.5.1 META Tags 175 5.1.5.2 HTTP Headers 178 5.1.5.3 Cache-Control Header 180 5.1.5.4 Directive Application 182 5.1.5.5 Cache-Request Directives 182 5.1.5.6 Cache-Response Directives 185 5.1.6 Windows DNS Caching Problems 187 5.1.7 Viewing HTTP Headers 187 5.1.8 Considering Authentication 191 5.1.9 Enhancing Cacheability 191 5.2 Load Balancing 194 5.2.1 Types ofload Balancing 194 5.2.2 Rationale 195 5.2.3 Load Balancing Techniques 195 5.2.3.1 DNS Load Balancing 196 5.2.3.2 Load Balancing Methods 197 5.2.4 Hardware versus Software 198 5.2.5 DNS Load Balancing 199 5.2.6 DNS Load-Sharing Methods 199 5.2.6.1 Using CNAMES 200 5.2.6.2 UsingA Records 200 5.2.7 Managing User Requests 201 5.2.7.1 Hidden Fields 202 5.2.7.2 Settings 202 5.2.7.3 URL Rewriting 203 The CDN Enterprise Model 205 6.1 Overview 205 6.1.1 Rationale 206 6.1.1.1 Concentrated Customer Base 207 6.1.1.2 Distributed Locations Available for Use 207 6.1.1.3 Knowledgeable Staff 208 6.1.1.4 Control 208 6.1.1.5 Economics 209 6.1.2 Summary 209 6.2 Traffic Analysis 210 6.2.1 Using Web Logs 210 6.2.1.1 Apache Access Logs 211 6.2.1.2 Access Records 212 6.2.1.3 HTTP Response Codes 212 6.2.2 Using Logging Strings 214 6.2.3 Web-Log Analysis 215 6.2.4 Top Referring Domains 217
X CONTENTS 6.2.5 Considering Status Codes 218 6.2.6 Web-Log Statistics 220 6.2.7 Reverse Mapping 221 6.2.8 SOA Record Components 223 6.2.9 Origination Country 22^ 6.2.10 Originating Time Zone 226 6.2.11 Other Statistics 227 6.2.12 Other Analysis Tools 22«6.2.13 Cookies 232 6.2.13.1 Cookie Basics 233 6.2.13.2 Writing Cookies 235 6.2.13.3 How a Cookie Moves Data 235 6.2.13.4 How Web Sites Use Cookies 236 6.2.13.5 Problems with Cookies 237 6.2.14 Other Logging Information 238 6.2.15 Microsoft's Performance Monitor 238 6.2.15.1 Activating Performance Monitor 239 6.2.15.2 Adding Counters and Instances 240 6.2.15.3 Working with Performance Monitor 242 6.2.15.4 Summary 244 6.2.16 Using a Network Analyzer 246 6.2.17 Other Tools to Consider 248 6.3 Content Delivery Models 249 6.3.1 Single-Site, Single-Server Model 249 6.3.1.1 Advantages 249 6.3.1.2 Disadvantages 250 6.3.1.3 Considering Server Options 251 6.3.1.4 Considering Network Operations 251 6.3.2 Single-Site, Multiple-Server Model 252 6.3.2.1 Advantages 252 6.3.2.2 Disadvantages 252 6.3.3 Multiple-Sites, Single-Server per Site Model 254 6.3.3.1 Advantages 254 6.3.3.2 Disadvantages 255 6.3.4 Multiple-Site, Multiple-Server per Site Model 256 6.3.4.1 Advantages 257 6.3.4.2 Disadvantages 257 6.3.5 An In-Between Model 257 Chapter 7 Web-Hosting Options 7.1 Rationale 259 259 7.1.1 Cost Elements and Total Cost 260 7.1.2 Performance Elements 263 7.1.3 Server-Side Language Support 266 7.1.4 Web-Service Tools 266 7.1.5 Hie Importance of Images 266
CONTENTS XI 7.1.6 Back-End Database Support 269 7.1.7 Facility Location(s) 269 7.2 Types of Web-Hosting Facilities 270 7.2.1 Dedicated Hosting 270 7.2.2 Shared Server Hosting 271 7.2.3 Colocated Hosting 272 7.3 Evaluation Factors 273 Index 277