Monday, December 3, 2012

Apache 2.2 Proxy

I may have some terminology wrong, but here's my understanding so far. Apache can be a lot of things. It could be a web server, providing static files. If you install a scripting engine like php, then it can be an application server. Similarly, tomcat could just serve static files, but it was designed to be an application server. So maybe you setup apache on port 80 as your web server, then for some urls you pass the requests on to tomcat. If you've done that, then apache is functioning as a reverse proxy. If you have two instaces of your tomcat app on separate boxes, apache can act as a loadbalancer, handing some requests to tomcat1 and some to tomcat2.

Reverse Proxy

Let us suppose that I want an apache box that for some urls serves up static files, for other urls, reverse proxies to internal box1, and for other urls reverse proxies to internal box2. How can I do it?

First you need three machines. I'm using a minimal install of centos6. Here's the quick and dirty.

New VM
Other Linux 32bit
5gb disk
512mb RAM
Power On
Connect to ISO
CTRL+ALT+INSERT
Install or Upgrade an existing system
SKIP the disk check
English, U.S. English
Basic Storage Devices, Re-initialize all
yourmachine.yourdomain.com (don't configure network)
Agree with time
Choose root password
Use All Space, Write changes to disk
Minimal, centos, Customize Later, Reboot
Disconnect ISO
vi /etc/hosts
127.0.0.1               yourmachine yourmachine.yourdomain.com localhost...
vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
HWADDR=LEAVE_THIS_AS_IS
NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO="dhcp"
DHCP_HOSTNAME="yourmachine.yourdomain.com"

We'll install some basic utilities, a timeserver so our clock is right, and apache and php. We also have to open 80 in iptables. I'm using php as a simple way to see info about http requests. It's not required.

yum -y install unzip man dos2unix ntp httpd php

ntpdate pool.ntp.org
service ntpd start
chkconfig ntpd on

iptables -I INPUT 1 -p tcp --dport 80 -j ACCEPT
/sbin/service iptables save

service httpd start
chkconfig httpd on
vi /var/www/html/index.php
<?php
 if ($_GET['cooker'] != null)
 {
   setcookie('cooker','global-'.$_GET['cooker']);
   setcookie('cooker','app2-'.$_GET['cooker'],0,'/app2/');
   echo 'set cooker cookie to '.$_GET['cooker'].'<br/>';
 }
?>
<h2>boxName</h2>
<?php
 echo 'Request Headers:<br/>';
 $headers = apache_request_headers();
 foreach ($headers as $name => $value)
 { echo "$name : $value <br/>\n"; }

 echo '<br/>Post Parameters:<br/>';
 foreach ($_POST as $name => $value)
 { echo "$name : $value <br/>\n"; }

 echo '<br/>Get Parameters:<br/>';
 foreach ($_GET as $name => $value)
 { echo "$name : $value <br/>\n"; }
?>
<br/>
document.cookie: <script>document.write(document.cookie);</script>
<br/><br/>
<form method="post" action="<?php echo $_SERVER["SCRIPT_NAME"]; ?>?cat=felix">
 <input type="text" name="dog" value="ralph"/>
 <input type="submit" value="observe post with query string"/>
</form>
<form method="get" action="<?php echo $_SERVER["SCRIPT_NAME"]; ?>?cat=felix">
 <input type="text" name="cooker" value="jim"/>
 <input type="submit" value="set cookie"/>
</form>
<input type="button" value="home" 
       onclick="document.location='<?php echo $_SERVER["SCRIPT_NAME"]; ?>';"/>

The above demonstrates some important quirks. You'll notice that you don't have a referer when you navigate directly to the url or refresh, but do when you post. Also that a cache-control header pops in and out of existance.

You'll see that a POST (the observe... button) can have both post parameters and query string parameters, but a GET cannot (the set cookie button). Any query string parameters in the action field of a form whose method is GET are ignored.

Finally notice that when the server sets a cookie, it does so by a directive in the response header. This is immediately visible to javascript executed on the client side during that same response, but won't appear in the request headers until the next request is sent.

When the browser sends a request it only sends the cookies that pertain to that request, but javascript can only see the cookies that pertain to the current page, and these sets may be different. To better understand this, make our test page also accessible in an /app2 subfolder.

mkdir /var/www/html/app2
ln -s /var/www/html/index.php /var/www/html/app2/index.php

After clicking the set cookie button, a visit to /index.php will show only the global cookie in both the request header and the javascript, but a visit to /app2/index.php will show both the global and path-restricted cookies. This means that javascript on a page outside /app2 can't know what cookies will apply to a page inside /app2. This becomes more meaningfull later on when /app2 may actually be on box2 with jsessionid=val2 and /app3 may actually be on box3 with jsessionid=val3.

Logs

To better understand what's going on, you can turn on logging of request headers.

vi /etc/httpd/conf/httpd.conf
# uncomment this line
LoadModule log_forensic_module modules/mod_log_forensic.so

# add this at end of file, or inside a VirtualHost block if you're using that
ForensicLog logs/forensic_log
service httpd restart
tail -f /etc/httpd/logs/forensic_log
+7232:5086aa09:0|GET / HTTP/1.1|Host:box1.yourdomain.com|User-Agent:Mozilla/5.0 (Windows NT 5.1; rv%3a15.0) Gecko/20100101 Firefox/15.0.1|Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8|Accept-Language:en-us,en;q=0.5|Accept-Encoding:gzip, deflate|Connection:keep-alive
-7232:5086aa09:0

Proxy

Here's a simple config that passes all /app2 to internal box2 and all /app3 to internal box3. The rewrite rule says: look for start of line, then /app2, then anything that will later be referenced as $1, then end of line. It's important to not include a trailing slash like /app2/(.*) because sometimes people won't enter a trailing slash in the url, but this means that invalid paths like /app2fake/ will also be passed to box2.

vi /etc/httpd/conf/httpd.conf
# add this at end of file, or inside a VirtualHost block if you're using that
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 6
RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]
RewriteRule ^/app3(.*)$ http://box3.yourdomain.com$1 [P]

If you created the above index.php on all three boxes, then you should see the following mappings.

http://box1.yourdomain.com/
 -> /index.php from box1
 - global cookie sent in request
 - global cookie visible in javascript
 - host header indicates box1

http://box1.yourdomain.com/app2/
 -> /index.php from box2
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box2
 - extra X-Forwarded headers sent in request

http://box1.yourdomain.com/app3/
 -> /index.php from box3
 - global cookie sent in request
 - global cookie visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request

Order matters. In the following example, any path at any depth is mapped to the root of box3 as long as it contains the special keyword. But that has to come before the app2 rule, or /app2/special will get mapped to http://box2.yourdomain.com/special instead of http://box3.yourdomain.com

RewriteRule ^/.*special$ http://box3.yourdomain.com [P]
RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]

With the above setup you should see the following mappings.

http://box1.yourdomain.com/
 -> /index.php from box1
 - global cookie sent in request
 - global cookie visible in javascript
 - host header indicates box1

http://box1.yourdomain.com/app2/
 -> /index.php from box2
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box2
 - extra X-Forwarded headers sent in request

http://box1.yourdomain.com/app2/special
 -> /index.php from box3
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request

This means that we can configure box3 as an independant back-end appliance that can see requests, including cookies, as they would have been intended for box2. We just need some rewriting magic. Note that we have to be careful when choosing our special urls in order to ensure we have the same path matching logic for cookies. If you want to modify an /app2 url such that it will be picked up by the special rule, you just have to cut off the querystring and tack /special on the end. This protects you from the /app2special case and ensures your rule will fire.

http://box1.yourdomain.com/app2special
 -> /index.php from box3
 - global cookie sent in request (bad)
 - global cookie visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request 

http://box1.yourdomain.com/app2/index.php?x=yspecial
 -> /index.php from box2 (bad)
 - RewriteRule is only looking at the path portion of the url
 - i.e. /app2/index.php
 - Thus the special rule didn't apply

http://box1.yourdomain.com/app2/index.php/special
 -> /index.php from box3 (good)
 - global and path-restricted cookies sent in request (good)
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request

Create the following symlinks on box3 and update the rewrite rules on box1.

mkdir /var/www/html/fixed
ln -s /var/www/html/index.php /var/www/html/fixed/job1.php
ln -s /var/www/html/index.php /var/www/html/fixed/job2.php
RewriteRule ^/.*special/(.*)$ http://box3.yourdomain.com/fixed/$1 [P]
RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]

With the above setup you should see the following mappings.

http://box1.yourdomain.com/
 -> /index.php from box1
 - global cookie sent in request
 - global cookie visible in javascript
 - host header indicates box1

http://box1.yourdomain.com/app2/
 -> /index.php from box2
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box2
 - extra X-Forwarded headers sent in request

http://box1.yourdomain.com/app2/special/job1.php
http://box1.yourdomain.com/app2//special/job1.php
http://box1.yourdomain.com/app2/page.php/special/job1.php
 -> /fixed/job1.php from box3
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request

http://box1.yourdomain.com/app2/special/job2.php
 -> /fixed/job2.php from box3
 - global and path-restricted cookies sent in request
 - global and path-restricted cookies visible in javascript
 - host header rewritten to indicate box3
 - extra X-Forwarded headers sent in request

http://box1.yourdomain.com/app2/special/job3.php
 -> 404, because job3.php doesn't exist on box3

Redirects

We're good so far, but what if the back-end box references itself by name? Even if your code issues redirects with a relative path, in frameworks like tomcat the underlying implementaton prefixs the the redirect with the host from the request header, thus making it an absolute location.

Create an absolute redirect file on box2.

vi /var/www/html/bounce.php
<?php
  header("Location: http://box2.yourdomain.com/index.php");
?>

If you have firebug or "Live HTTP headers" in firefox, you can watch the traffic. When you visit http://box2.yourdomain.com/bounce.php, you should see the following reply.

HTTP/1.1 302 Found
Date: Fri, 26 Oct 2012 16:44:27 GMT
Server: Apache/2.2.15 (CentOS)
X-Powered-By: PHP/5.3.3
Location: http://box2.yourdomain.com/index.php
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

The problem is that if you visit http://box1.yourdomain.com/app2/bounce.php, you'll see the same reply, thus directing you to the internal box name. We need box1 to rewrite hardcoded machine names in all responses. Let's start fresh on box1.

vi /etc/httpd/conf/httpd.conf

Comment out our rewrite rules.

#RewriteRule ^/.*special/(.*)$ http://box3.yourdomain.com/fixed/$1 [P]
#RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]

And replace them with these.

ProxyPass /app2 http://box2.yourdomain.com
ProxyPassReverse /app2 http://box2.yourdomain.com
service httpd restart

Poof. Problem solved. The above ProxyPass line is effectively the same as our earlier RewriteRule that says all requests for /app2 acutally go to the root of box2. It's the [P] on our RewriteRule that causes it to behave like ProxyPass.

RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]

The ProxyPassReverse line says all responses that contain box2 absolute urls should acutally go to /app2 on box1.

HTTP/1.1 302 Found
Date: Fri, 26 Oct 2012 17:03:24 GMT
Server: Apache/2.2.15 (CentOS)
X-Powered-By: PHP/5.3.3
Location: http://box1.yourdomain.com/app2/index.php
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Connection: close

You can't do the same regex logic in the ProxyPass statement, so we'll keep our RewriteRules for the requests and introduce ProxyPassReverse to handle the responses.

RewriteRule ^/.*special/(.*)$ http://box3.yourdomain.com/fixed/$1 [P]
RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]
ProxyPassReverse /app2 http://box2.yourdomain.com
service httpd restart

Now all our requests should work as desired.

http://box1.yourdomain.com/
 -> /index.php from box1

http://box1.yourdomain.com/app2/
 -> /index.php from box2
 - global and path-restricted cookies visible

http://box1.yourdomain.com/app2/special/job1.php
 -> /fixed/job1.php from box3
 - global and path-restricted cookies visible

http://box1.yourdomain.com/app2/bounce.php
 -> /index.php from box2
 - url still reads box1

You can use ProxyPassMatch in place of RewriteRule. The following achieve the same result:

RewriteRule ^/app2(.*)$ http://box2.yourdomain.com$1 [P]
ProxyPassMatch ^/app2(.*)$ http://box2.yourdomain.com$1

ProxyPassMatch (and ProxyPass*) directives are more desireable than RewriteRule because they support connection pooling, but less desireable because they can't be used in htaccess files and don't provided detailed rewrite logs.

Also unlike ProxyPassReverse and ProxyPass, ProxyPassMatch doesn't support ProxyPassInterpolateEnv, so you can't embed variables like the following

ProxyPassInterpolateEnv on
ProxyPassReverse /${envVar1}/ http://${envVar2}/ interpolate

At least that's the way it looks from the docs.

Advanced Examples

I found the apache docs to be awfully vague. I determined the following mostly by experimentation.

Basic usage of ProxyPass.

# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
#
# Basic usage maps one path to a different root.
#
#    http://apache.box.com/app1/hello.html
# -> http://internal.box.com/hello.html
#
#    http://apache.box.com/app2/hello.html
# -- not matched
#
ProxyPass /app1/ http://internal.box.com/

Environment variables in ProxyPass. These aren't OS variables. They are request specific.

# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassinterpolateenv
#
# Without ProxyPassInterpolateEnv, the interpolate flag is ignored and ${var1}
# is treated as literal characters which surprisingly browsers will support.
#
#    http://apache.box.com/${var1}/hello.html
# -> http://internal.box.com/hello.html
#
#    http://apache.box.com/app1/hello.html
# -- not matched
#
ProxyPass /${var1}/ http://internal.box.com/ interpolate
# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassinterpolateenv
# http://httpd.apache.org/docs/2.2/env.html
#
# With ProxyPassInterpolateEnv enabled, the interpolate flag causes ${var1} to 
# be replaced by the contents of the ${var1} internal environment variable
# which in this case is blank becasue the SetEnv directive runs late during 
# request processing thefore resulting in the following error_log
#
# [warn] proxy: No protocol handler was valid for the URL /app1/hello.html.
#
#    http://apache.box.com/app1/hello.html
# -> ://internal.box.com/
# -- internal server error
#
SetEnv var1 http
ProxyPassInterpolateEnv On
ProxyPass /app1/ ${var1}://internal.box.com/ interpolate

Setting Apache environment variables.

# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass
# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassinterpolateenv
# http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html#setenvif
#
# However, SetEnvIf does occur early enough to effect ProxyPass.
# ProxyPass itself can only interpolate from internal environment variables
# but SetEnvIf can set them based on request headers, certain aspects of the
# request or already set internal environment variables, and via regx 
# backreferences it can pass those values to the variables and thus to 
# the ProxyPass directive.
#
# However, since Request_URI starts after http(s)://host:port it can't be  
# used to learn http(s), and can't be used to identify the server requested
# by the client (which may be different than this apache if multiple
# dns map here).
#
## request headers
#
#    http://apache.box.com/referer
# -> http://internal.box.com/Referer=http://apache.box.com/hello.html
#
#    http://apache.box.com/agent
# -> http://internal.box.com/User-Agent=Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0
#
#    http://apache.box.com/accept
# -> http://internal.box.com/Accept-Encoding=gzip, deflate
#
#    http://apache.box.com/type
# -> http://internal.box.com/Content-Type=application/x-www-form-urlencoded
#
#    http://apache.box.com/length
# -> http://internal.box.com/Content-Length=17
#
## certain aspects of the request
#
#    http://apache.box.com/remotehost
# -> http://internal.box.com/Remote_Host=1.1.1.1
#
#    http://apache.box.com/remoteaddr
# -> http://internal.box.com/Remote_Addr=1.1.1.1
#
#    http://apache.box.com/serveraddr
# -> http://internal.box.com/Server_Addr=2.2.2.2
#
#    http://apache.box.com/method
# -> http://internal.box.com/Request_Method=POST
#
#    http://apache.box.com/protocol
# -> http://internal.box.com/Request_Protocol=HTTP/1.1
#
#    http://apache.box.com/uri
# -> http://internal.box.com/Request_URI=/uri
#
ProxyPassInterpolateEnv On

## request headers

SetEnvIf Referer "(.*)" referer=$1
ProxyPass /referer http://internal.box.com/Referer=${referer} interpolate

SetEnvIf User-Agent "(.*)" agent=$1
ProxyPass /agent http://internal.box.com/User-Agent=${agent} interpolate

SetEnvIf Accept-Encoding "(.*)" accept=$1
ProxyPass /accept http://internal.box.com/Accept-Encoding=${accept} interpolate

SetEnvIf Content-Type "(.*)" type=$1
ProxyPass /type http://internal.box.com/Content-Type=${type} interpolate

SetEnvIf Content-Length "(.*)" length=$1
ProxyPass /length http://internal.box.com/Content-Length=${length} interpolate

## certain aspects of the request

SetEnvIf Remote_Host "(.*)" remotehost=$1
ProxyPass /remotehost http://internal.box.com/Remote_Host=${remotehost} interpolate

SetEnvIf Remote_Addr "(.*)" remoteaddr=$1
ProxyPass /remoteaddr http://internal.box.com/Remote_Addr=${remoteaddr} interpolate

SetEnvIf Server_Addr "(.*)" serveraddr=$1
ProxyPass /serveraddr http://internal.box.com/Server_Addr=${serveraddr} interpolate

SetEnvIf Request_Method "(.*)" method=$1
ProxyPass /method http://internal.box.com/Request_Method=${method} interpolate

SetEnvIf Request_Protocol "(.*)" protocol=$1
ProxyPass /protocol http://internal.box.com/Request_Protocol=${protocol} interpolate

SetEnvIf Request_URI "(.*)" uri=$1
ProxyPass /uri http://internal.box.com/Request_URI=${uri} interpolate

Basic usage of ProxyPassMatch.

# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassmatch
#
# ProxyPassMatch is like ProxyPass but does support regex, and despite not
# mentioning it in the docs, it does support interpolate.
# 
#    http://apache.box.com/app1/hello.html
# -> http://internal.box.com/hello.html
#
#    http://apache.box.com/app2/hello.html
# -> http://internal.box.com/Accept-Encoding=gzip, deflate/hello.html
#
ProxyPassInterpolateEnv On
SetEnvIf Accept-Encoding "(.*)" accept=$1
ProxyPassMatch /app1/(.*) http://internal.box.com/$1
ProxyPassMatch /app2/(.*) http://internal.box.com/Accept-Encoding=${accept}/$1 interpolate

Basic usage of RewriteRule as a proxy.

# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
# http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassmatch
#
# The alternative to ProxyPass(Match) is RewriteRule.
# ProxyPass(Match) supports connection pooling, RewriteRule does not.
# ProxyPass(Match) cannot be defined in htaccess, RewriteRule can. (although the path it functions on becomes relative)
# ProxyPass(Match) is inherited by virtual hosts, RewriteRule is not.
# 
# When used with [P], RewriteRule functions like ProxyPass.
#
#    http://apache.box.com/app1/hello.html
# -> http://internal.box.com/hello.html
#
#    http://apache.box.com/app2/hello.html
# -- not matched
#
RewriteRule /app1/(.*) http://internal.box.com/$1 [P]

Getting data into ProxyPass output using environment variables set by RewriteRule.

# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
# http://httpd.apache.org/docs/2.2/mod/directive-dict.html#Syntax
#
# However RewriteRule is more flexible than ProxyPass(Match) and doesn't have to 
# be used as a proxy. It can be used as a flexible means of setting environment 
# variables, but because rewrite configurations aren't inherited by virtual hosts
# you must duplicate your rewrite directives in the https virtual host.
#
#    http://apache.box.com/app1/hello.html
# -> http://internal.box.com/fake/app1/app1/hello.html
#
#    https://apache.box.com/app1/hello.html
# -> https://internal.box.com/fake/app1/app1/hello.html
#
# This is effectively a more complicated way of achieving the SetEnvIf
# examples above, and like SetEnvIf it can only read from the %-decoded 
# URL-path, which means it can't access information about http(s) or server 
# name.
#
# URL       http://apache.box.com/app1/hello.html
# URL-path                                  /app1/hello.html
# URL-path from .htaccess in /app1/               hello.html
#
<VirtualHost _default_:443>
  RewriteEngine On
  RewriteLog logs/rewrite_log
  RewriteLogLevel 4
  RewriteRule /(.+)/.* - [E=folder:$1]
</VirtualHost>
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4
RewriteRule /(.+)/.* - [E=folder:$1]
ProxyPassInterpolateEnv On
ProxyPass / http://internal.box.com/fake/${folder}/ interpolate

Inheritance for RewriteRule.

# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteoptions
#
# Complexity can be reduced by setting the inherit option.
#
#    http://apache.box.com/app1/hello.html
# -> http://internal.box.com/fake/app1/app1/hello.html
#
#    https://apache.box.com/app1/hello.html
# -> https://internal.box.com/fake/app1/app1/hello.html
#
<VirtualHost _default_:443>
  RewriteEngine On
  RewriteOptions inherit
</VirtualHost>
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4
RewriteRule /(.+)/.* - [E=folder:$1]
ProxyPassInterpolateEnv On
ProxyPass / http://internal.box.com/fake/${folder}/ interpolate

Getting server variables into ProxyPass output using environment variables set by RewriteRule.

# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritecond
#
# RewriteCond which preceeds an individual RewriteRule and determines 
# whether or not that rule will fire. RewriteRule is very powerful and
# can see almost anything. Specifically, it can cause a RewriteRule to
# fire based on the %{HTTPS} server variable, and that rewrite rule
# only serve to set an environment variable later used by ProxyPass.
#
# Note, proxying to https requires SSLProxyEngine on
#
#    http://apache.box.com/hello.html
# -> http://internal.box.com/hello.html
#
#    https://apache.box.com/hello.html
# -> https://internal.box.com/hello.html
#
<VirtualHost _default_:443>
  SSLProxyEngine on
  RewriteEngine On
  RewriteOptions inherit
</VirtualHost>
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4
RewriteCond %{HTTPS} =off
RewriteRule . - [E=protocol:http]
RewriteCond %{HTTPS} =on
RewriteRule . - [E=protocol:https]
ProxyPassInterpolateEnv On
ProxyPass / ${protocol}://internal.box.com/ interpolate

Available server variables.

# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritecond
#
# Despite only vague mention in the RewriteRule docs, RewriteRule itself
# has direct access to the RewriteCond server variables. Here are some 
# examples.
#
#    http://apache.box.com/HTTP_ACCEPT/page.html?x=1
# -> http://internal.box.com/HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8/page.html?x=1
# i.e. HTTP_ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
#
#    http://apache.box.com/SCRIPT_FILENAME/page.html?x=1
# -> http://internal.box.com/SCRIPT_FILENAME=/SCRIPT_FILENAME/page.html/page.html?x=1
# i.e. SCRIPT_FILENAME = "/SCRIPT_FILENAME/page.html"
#
#    http://apache.box.com/PATH_INFO/page.html?x=1
# -> http://internal.box.com/PATH_INFO=/page.html?x=1
# i.e. PATH_INFO = ""
#
#    http://apache.box.com/QUERY_STRING/page.html?x=1
# -> http://internal.box.com/QUERY_STRING=x=1/page.html?x=1
# i.e. QUERY_STRING = "x=1"
#
#    http://apache.box.com/SERVER_NAME/page.html?x=1
# -> http://internal.box.com/SERVER_NAME=apache.box.com/page.html?x=1
# i.e. SERVER_NAME = "apache.box.com"
#
#    http://internal.box.com/SERVER_NAME/page.html?x=1
# -> http://internal.box.com/SERVER_NAME=internal.box.com/page.html?x=1
# i.e. SERVER_NAME = "internal.box.com"
# where the client's dns points internal.box.com to our apache box
# but our apache box's dns points internal.box.com to an internal box
#
#    http://apache.box.com/SERVER_ADDR/page.html?x=1
# -> http://internal.box.com/SERVER_ADDR=2.2.2.2/page.html?x=1
# i.e. SERVER_ADDR = "2.2.2.2"
#
#    http://apache.box.com/SERVER_PORT/page.html?x=1
# -> http://internal.box.com/SERVER_PORT=80/page.html?x=1 
# i.e. SERVER_PORT = "80"
#
#    http://apache.box.com/SERVER_PROTOCOL/page.html?x=1
# -> http://internal.box.com/SERVER_PROTOCOL=HTTP/1.1/page.html?x=1
# i.e. SERVER_PROTOCOL = "HTTP/1.1"
#
#    http://apache.box.com/HTTPS/page.html?x=1
# -> http://internal.box.com/HTTPS=off/page.html?x=1
# i.e. HTTPS = "off"
#
#    http://apache.box.com/REQUEST_URI/page.html?x=1
# -> http://internal.box.com/REQUEST_URI=/REQUEST_URI/page.html/page.html?x=1 
# i.e. REQUEST_URI = "/REQUEST_URI/page.html"
#
#    http://apache.box.com/THE_REQUEST/page.html?x=1
# -> http://internal.box.com/THE_REQUEST=GET /THE_REQUEST/page.html?x=1 HTTP/1.1/page.html?x=1
# i.e. THE_REQUEST = "GET /THE_REQUEST/page.html?x=1 HTTP/1.1"
#
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4
RewriteRule .* - [E=HTTP_ACCEPT:%{HTTP_ACCEPT}]
RewriteRule .* - [E=SCRIPT_FILENAME:%{SCRIPT_FILENAME}]
RewriteRule .* - [E=PATH_INFO:%{PATH_INFO}]
RewriteRule .* - [E=QUERY_STRING:%{QUERY_STRING}]
RewriteRule .* - [E=SERVER_NAME:%{SERVER_NAME}]
RewriteRule .* - [E=SERVER_ADDR:%{SERVER_ADDR}]
RewriteRule .* - [E=SERVER_PORT:%{SERVER_PORT}]
RewriteRule .* - [E=SERVER_PROTOCOL:%{SERVER_PROTOCOL}]
RewriteRule .* - [E=HTTPS:%{HTTPS}]
RewriteRule .* - [E=REQUEST_URI:%{REQUEST_URI}]
RewriteRule .* - [E=THE_REQUEST:%{THE_REQUEST}]
ProxyPassInterpolateEnv On
ProxyPass /HTTP_ACCEPT http://internal.box.com/HTTP_ACCEPT=${HTTP_ACCEPT} interpolate
ProxyPass /SCRIPT_FILENAME http://internal.box.com/SCRIPT_FILENAME=${SCRIPT_FILENAME} interpolate
ProxyPass /PATH_INFO http://internal.box.com/PATH_INFO=${PATH_INFO} interpolate
ProxyPass /QUERY_STRING http://internal.box.com/QUERY_STRING=${QUERY_STRING} interpolate
ProxyPass /SERVER_NAME http://internal.box.com/SERVER_NAME=${SERVER_NAME} interpolate
ProxyPass /SERVER_ADDR http://internal.box.com/SERVER_ADDR=${SERVER_ADDR} interpolate
ProxyPass /SERVER_PORT http://internal.box.com/SERVER_PORT=${SERVER_PORT} interpolate
ProxyPass /SERVER_PROTOCOL http://internal.box.com/SERVER_PROTOCOL=${SERVER_PROTOCOL} interpolate
ProxyPass /HTTPS http://internal.box.com/HTTPS=${HTTPS} interpolate
ProxyPass /REQUEST_URI http://internal.box.com/REQUEST_URI=${REQUEST_URI} interpolate
ProxyPass /THE_REQUEST http://internal.box.com/THE_REQUEST=${THE_REQUEST} interpolate

Infinite Loops.

# Therefore, it is possible  to dynamically proxy some requests to the actual requested
# server when dns maps a certain domain to this apache, but this apache maps that
# same domain to a different internal server. Note this requires UseCanonicalName Off
#
#    http://internal.box.com/hello.html
# -> http://internal.box.com/hello.html
# where the client's dns points internal.box.com to our apache box
# but our apache box's dns points internal.box.com to an internal box
#
# But now we risk an infinite rediret loop for requests to this apache's real name.
#
#    http://apache.box.com/hello.html
# -> http://apache.box.com/hello.html
# -- [error] server reached MaxClients setting, consider raising the MaxClients setting
#
<VirtualHost _default_:443>
  SSLProxyEngine on
  RewriteEngine On
  RewriteOptions inherit
</VirtualHost>
UseCanonicalName Off
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4
RewriteRule . - [E=PROTOCOL:http]
RewriteCond %{HTTPS} =on
RewriteRule . - [E=PROTOCOL:https]
RewriteRule . - [E=SERVER_NAME:%{SERVER_NAME}]
ProxyPassInterpolateEnv On
ProxyPass / ${PROTOCOL}://${SERVER_NAME}/ interpolate

Bringing it all together to achieve some requests for the internal box proxied to the internal box and some served locally without the apache box config knowing the name of the internal box (i.e. set dynamically).

# http://httpd.apache.org/docs/2.2/vhosts/
# http://httpd.apache.org/docs/2.2/vhosts/examples.html
# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
# http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritecond
#
# What I really want is to say that two domains map to this apache and some
# urls from one domain are meant for a different internal server. For example
# all pngs at internal.box.com, but ProxyPass(Match) can only match on the
# rooted URI, not the machine name. What about a virtual host?
#
# Maybe. But I don't think it's the best answer. What I really want
# is to say: if the request would infinite loop then don't let it.
# Which is to say that if the request was by IP or by a domain that
# this apache maps to self then don't loop it. And I will assume that
# this apache's name is known and can be hardcoded and it's IP is available
# via %{SERVER_ADDR}, although that turns out not to be useful since
# it can't be used in the CondPattern portion of RewriteCond
#
#    https://apache.box.com/hello.html
# -> 403 Forbidden
#
#    http://apache.box.com/hello.html
# -> 403 Forbidden
#
#    http://apache/hello.html
# -> 403 Forbidden
#
#    http://2.2.2.2/hello.html
# -> 403 Forbidden
#
#    http://internal.box.com/hello.html
# -- not matched, served locally
#
#    http://internal.box.com/logo.png
# -> http://internal.box.com/logo.png
# -- which maps to another box and is proxied
#
<VirtualHost _default_:443>
  # behave like :80 but with SSL enabled
  SSLProxyEngine on
  RewriteEngine On
  RewriteOptions inherit
</VirtualHost>

# report the requested server name, not the hardcoded server name
UseCanonicalName Off

# enable rewrite directives
RewriteEngine On
RewriteLog logs/rewrite_log
RewriteLogLevel 4

# reject all requests made by ip, otherwise the proxy line below could loop forever
RewriteCond %{SERVER_NAME} ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$
RewriteRule . - [F]

# reject all requests made to this box's real name, otherwise the proxy line below could loop forever
RewriteCond %{SERVER_NAME} ^apache [NC]
RewriteRule . - [F]

# record the protocol on which this server was requested
RewriteRule . - [E=PROTOCOL:http]
RewriteCond %{HTTPS} =on
RewriteRule . - [E=PROTOCOL:https]

# record the name by which this server was requested
RewriteRule . - [E=SERVER_NAME:%{SERVER_NAME}]

# if the request ends with .png
#  proxy the umnodified request to SERVER_NAME (which must map to a different box to avoid infinte loop)
# else
#  serve it locally 
ProxyPassInterpolateEnv On
ProxyPassMatch (.*\.png$) ${PROTOCOL}://${SERVER_NAME}/$1 interpolate

Despite it not being documented anywhere, it seems you can make the ProxyPassMatch case insensitive by prefixing with (?i) as described on esri.com

Warning: the system described above is vulnerable for use as an open proxy unless the firewall restricts the apache box to only allow outgoing connections to the internal box. You can address that with careful use of virtual host blocks and server name/alias.

# Documentation:
#
# http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond
# http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
# http://httpd.apache.org/docs/2.2/vhosts/
# http://httpd.apache.org/docs/2.2/vhosts/name-based.html
# http://httpd.apache.org/docs/2.2/mod/core.html#namevirtualhost
# http://httpd.apache.org/docs/2.2/mod/core.html#virtualhost

# Run httpd as root initially then switch to this user and group.
#
User notroot
Group notroot

# Define the name (and optionally port) that this apache will use to identify
# itself, for example during server-generated redirections.
#
ServerName box.site.com

# Do not override SERVER_NAME and SERVER_PORT variables with the value from the
# ServerName directive. Use the Hostname and Port supplied by the client in the
# request.
#
UseCanonicalName Off

# Enable rewrites so we can redirect incoming requests. You can define some
# rewrite directives globally and instruct all <VirtualHosts> to inherit them
# and define more rewrite directives in the <VirtualHosts>
#
RewriteEngine On

# NameVirtualHost *:80 means that you can create multiple <VirtualHosts> on port
# 80. <VirtualHost _default_:80> means port 80 for any IP for which NameVirtualHost
# is not defined. When absent, ServerName is inherited from the main configuration.
# When absent, DocumentRoot is inherited from the main configuration.
#
# When a request arrives, it is served by the first <VirtualHost> that matches IP:PORT
# and that contains a ServerName or ServerAlias directive that matches the dns to which
# the request was sent. If no ServerName or ServerAlias match then the first
# <VirtualHost> that matches IP:PORT is used. The main configuration is only used
# when no <VirtualHost> matches IP:PORT

Listen 80
NameVirtualHost *:80
<VirtualHost _default_:80>
  # This block inherits ServerName from the main config. Any dns not matched
  # by the following block will go here.

  # The rewrite directives in the main config won't fire for requests destined
  # for this virutal host unless we explicitly inherit them.
  #
  RewriteEngine On
  RewriteOptions inherit

  #### do your stuff, namely kill the request
</VirtualHost>
<VirtualHost _default_:80>
  # Only requests for the following server names/aliases will enter this block.These are
  # expected to be internal machines for which the client and this apache resolve dns
  # differently (i.e. the client maps to this apache and this apache maps to the internal
  # machine). If not listed here flow will go to the above block. It is important that none
  # of the names listed here resolve on this box to itself, otherwise they may result in
  # infinite proxy loops. For example, if this box is box1.domain.com then it is not safe
  # to include *.domain.com in this list.
  #
  ServerName box1.site1.com
  ServerAlias box2.site2.com *.site3.com

  # The rewrite directives in the main config won't fire for requests destined
  # for this virutal host unless we explicitly inherit them.
  #
  RewriteEngine On
  RewriteOptions inherit

  #### do your stuff, namely proxy it elsewhere
</VirtualHost>
{ "loggedin": false, "owner": false, "avatar": "", "render": "nothing", "trackingID": "UA-36983794-1", "description": "", "page": { "blogIds": [ 225 ] }, "domain": "holtstrom.com", "base": "\/michael", "url": "https:\/\/holtstrom.com\/michael\/", "frameworkFiles": "https:\/\/holtstrom.com\/michael\/_framework\/_files.4\/", "commonFiles": "https:\/\/holtstrom.com\/michael\/_common\/_files.3\/", "mediaFiles": "https:\/\/holtstrom.com\/michael\/media\/_files.3\/", "tmdbUrl": "http:\/\/www.themoviedb.org\/", "tmdbPoster": "http:\/\/image.tmdb.org\/t\/p\/w342" }