REBOL [ Title: "REBOL HTTP Proxy" Date: 13-Jul-2001 Name: "Proxy Server" Version: 1.0.0 File: %proxy.r Author: "Sterling Newton" Purpose: {This script serves many purposes. 1. Act as an HTTP proxy 2. See what your broswer sends out as an HTTP request 3. Add data filters to remove Javascript pop-up windows, remove banner ads, and more... Uncomment line towards the bottom of the script marked for JavaScript to enable JavaScript popup window death!!} History: { ^-13-Jul-2001 New build. There's a couple of new filters in here so ^-take a look at it below where all the true/false assignments are. ^-Those flags turn all of the filters on and off. The big stuff is ^-ad filtering. Don't you hate it when your page doesn't load ^-because some ad server is bogged down? Yeah, me too until ^-recently. :) Most filters are off by default so turn on the ones ^-you want. ^-3-Aug-2000 Added many comments about the filters and added filter ^-flags to turn them on and off as desired. ^-27-July-2000 Added some chars to the URL parser for this script. ^-REBOL, as of Core 2.3 still strips all hex encoding in the scanner ^-so some data is lost for the URL parser. Special characters were ^-added to prevent blowouts but the solution is not optimal. ^-2-Dec-1999 Added some error catching to make the system more ^-stable. Added line to kill JavaScript popup windows. } Email: sterling@rebol.com library: [ level: 'advanced platform: none type: none domain: [web tcp] tested-under: none support: none license: none see-also: none ] ] Comment [{ To make this script work, you need to do the following: 1. configure your browser to have an HTTP proxy set to localhost on port XXXX, where XXXX is 9005 (set below) or whatever port number you change it to. 2. add the right network setup to this script so it can use any proxy you already have. The first thing you want to do is set up your network proxy (if you have one) using set-net. The next section of code simply checks out your proxy settings to use them if needed. There are two debug values that may be set in order to have the proxy print out the outgoing request from your browser and/or the incoming page response from the web. The main loop of the script is more simple than it appears. It waits for an incoming request from the browser, then connects to the target machine directly (if no proxy is set) or by going through the proxy. It then waits on both the port to the web and the port to the browser and passes all data that comes in one to the other. When it gets a read of zero bytes, the socket to the outside is closed and it closes its ports and cycles again.} ] ; !!! do network setup with set-net here system/console/busy: none ; turn off the spinner so we can page back ;;; set up the network ;;; add some chars to the url parsing scheme ;;; temporary hack since all hex escapes are gone by the time we see the url insert net-utils/url-parser/path-char "^-[]{};\<>" ;;; get proxy info so we can go through it if needed port-spec: make port! [ scheme: 'tcp port-id: 80 proxy: make system/schemes/default/proxy [] ] if any [system/schemes/default/proxy/host system/schemes/http/proxy/host] [ proxy-spec: make port! [scheme: 'http] port-spec/proxy: make proxy-spec/proxy copy [] ] serv: open tcp://:9005 size: 10000 ; size of read buffer data: make string! size conn-list: make block! 20 link-list: make block! 20 ;;; filters to run debug-request: true ; print out connection info - it's fun to "watch" the web! debug-all: false show-packets: true ; just prints out dots as each incoming "packet" is analyzed for ; HTML filters no-html-colors: false ; strips all color tags from HTML no-banner-ads: false ; changes in personal images for banner ads ; set logo-dir var below if true no-javascript-popups: true ; kills off javascript popup windows no-stat-cookies: false ; prints out all incoming cookies ; removes cookies from doubleclick (I hate them) ; not sure this is really working perfectly no-keep-alive: false ; removes the Proxy-Connection: Keep-Alive header from the requests no-web-trackers: false ; don't visit links that are from tracking sites like doubleclick.net -- I hate these guys no-adservers: false ; filters on sites like "adserver.*" "ads.*", etc. (add your own!) ;;; directory of image files to replace banner ads with ;;; all files in this directory are used at random and ;;; should all be banner ad sized (apporximately) ;;; banner ad replacement follows these criteria for ;;; image size (w = width; h = height): ;;; all [w > 325 w < 700 h > 30 h < 85 temp: w / h temp < 24 temp > 4] ;logo-dir: %some dir where your logos are insert conn-list serv while [true] [ if error? err: try [ conn: wait reduce conn-list ] [probe disarm err] if block? conn [conn: first conn] either conn = serv [ ; new connection so we need to connect it if debug-request [print "==================== NEW CONNECTION ===================="] conn: first serv read-io conn data size target: second parse copy/part data find data "^/" none if debug-request [print [tab "Connection target:" target]] replace/all target "!" "%21" ; hexify '!' character if error? err: catch [ port-spec/host: port-spec/path: port-spec/target: none tgt: net-utils/URL-Parser/parse-url port-spec target ] [print 'error if debug-request [print "DEATH!!!"]] if debug-request [print [tab "Parsed target:" port-spec/host port-spec/path port-spec/target]] either any [ ; filter out stupid webtrackers all [no-web-trackers find port-spec/host "doubleclick.net"] ; don't even read stuff that comes from ad servers... what a waste of time and bandwidth all [no-adservers any [ find/any port-spec/host "adserver.*" find/any port-spec/host "ads.*" ] ] ] [ insert conn {HTTP/1.0 200 OK Content-Type: text/html Content-Length: 29 Link filtered.} close conn clear data print ["** NOT reading an evil web-tracker or ad link:" target] ] [ either error? err: try [ all [system/schemes/http/proxy/type <> 'generic system/schemes/default/proxy/type <> 'generic tmp: find data "http://" remove/part tmp find find/tail tmp "//" "/"] Root-Protocol/open-proto port-spec if debug-request [print [tab "Opened port to:" port-spec/target]] partner: port-spec/sub-port ] [insert conn "HTTP/1.0 400 Bad Request^/^/" close conn clear data print "Death!" probe disarm err] [ if no-keep-alive [ if tmp: find data "Proxy-Connection" [remove/part tmp find/tail tmp newline] ] ; send the request if not empty? data [write-io partner data length? data if debug-request [probe data] clear data] ; add the pair of connections to the link list insert/only tail link-list reduce [conn partner] append conn-list conn ; add the connections to the connection list append conn-list partner ; add the connections to the connection list ] ] ] [ ; just data to transfer so do it ; find the match to the connection we're working with repeat x length? link-list [ any [ all [conn = link-list/:x/1 partner: link-list/:x/2 index: x break] all [conn = link-list/:x/2 partner: link-list/:x/1 index: x break] ] ] len: read-io conn data size if all [find/match data "HTTP/" tmp: find/tail data "Content-type: "] [ conn/user-data: copy/part tmp find tmp charset "^M^J" ] either len > 0 [ ;;; kill JavaScript popup windows (should get 99% of 'em) if no-javascript-popups [ if find data "window.open" [print "** Killing java window.open" replace/all data "window.open" "void"] ] ;;; print out incoming cookies and remove those from doubleclick if no-stat-cookies [ srch: data while [srch: find srch "Set-Cookie"] [ print mold srch print copy/part srch any [end: find srch newline tail srch] ; kill off all .doubleclick.net cookies (I hate being a statistic) if find/part srch ".doubleclick.net" end [print "^-Removing last cookie" remove/part srch next end end: srch] srch: end ] ] ;;; some sites will send back gzip'd data for HTML pages (Yahoo, for example) ;;; so we have to determine if this is an editable file... more tricky than one might assume if any [ all [conn/user-data find ["text/html"] conn/user-data] none? conn/target not find tmp: skip tail conn/target -5 #"." find tmp "htm" find tmp "asp" ".r" = skip tail tmp -2 conn/target = "/" find conn/target #"?" ] [ ; should we include CGI? sometimes is a binary, right? ; do html only mods (don't touch binaries or other files) if show-packets [prin #"."] ;;; kill all text, background, and link coloring if no-html-colors [ list: [["COLOR=" "XCLOR="] ["TEXT=" "XTXT="] ["BGCOLOR=" "NOCOLOR="] ["color:" "XCLOR:"] ["LINK=" "XLNK="] ["ALINK=" "XALNK="] ["VLINK=" "XVLNK="] [""] [ tag: parse copy/part next tmp end "=" h: to-integer select tag "height" w: to-integer select tag "width" all [w h w > 325 w < 700 h > 30 h < 85 temp: w / h temp < 24 temp > 4 change/part next tmp rejoin ["img src=file:" logo-dir pick read logo-dir random size? logo-dir] end ] tmp: skip tmp 30 ] [tmp: false] ] [tmp: false] ] ] ] ;;; now send the filtered data to the browser write-io partner data length? data if debug-all [print ["----------" newline data newline "----------" newline]] ] [ print ["closing ports..." divide (length? conn-list) - 1 2 "open"] if error? err: catch [ close conn close partner true ] [probe disarm err] remove skip link-list index - 1 remove find conn-list conn remove find conn-list partner ] clear data ] ]