Fake Proxy w/ Niginx to aid Web Publishing

How to do things, solutions, recipes, tutorials
Post Reply
Message
Author
s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Fake Proxy w/ Niginx to aid Web Publishing

#1 Post by s243a »

Introduction

Say you are using a website that has a rich text editor but has no ability to create links. You can create links for the site by creating an html document and copying and pasting the text in the browser from your local file system to the remote website.

The problem is that when copied from the browser to the clipboard, all links will be absolute and consequently will only work as links to on one website. You can either make the links as absolute links pointing to the remote website, or if you make them as relative links then they will when copied point to your local computer and by consequence won't work on the remote site.

On approach might be some kind of script that translates the links when pasted. However, the approach we will consider in this post is to make our local computer look like the remote website. This approach could be adapted to other applications such as perhaps a whitelisting of sites to be routed through clearnet and have the remainder routed trough tor.

The software we are going to use to do this is a webserver software called Nginx. This software is in some ways simpler to configure then other web server software due to it's concise configuration files. However, depending on what you want to do it may require some knowledge of regular expressions, if one is not able to find a close enough example to their application.

You can either download a version of Nginx built for puppy or download via the Puppy Package Manager at a repo for a compatible distro.

If you download one not built for puppy then you may have to tweak some of the start-up scripts and the configuration files may be more complex.

If I recall correctly the nginx pet that someone made used a single configuration file. This is nice and simple for beginners but is less maintainable if you are using multiple servers.

Nginx Configuration

The main configuration file for nginx is:

/etc/nginx/nginx.conf

If this file is located somewhere else in your system then you can find it by either using the find command [1] or looking in the ~/.packages folder [2] for the list of files that came with the nginx installation.

The first thing that one should check in this configuration file is that at the top of the file it says

Code: Select all

user spot;
When using nginx you must specify a user and if you put root as the user there is another param that you must select to allow root. I won't cover this for now because user "spot" should be sufficient.

The next thing you have to do is to configure a server to listen for the remote site and return the webpage. In my example the remote website will end in "pearltrees.com". Here is a somewhat basic server block for my application:

Code: Select all

  server {
    server_name ~^(?<serv_name1>.*pearltrees[.]com)$;    
    listen     127.0.0.1:80;    
     root /root/spot/PT;   
     index index.html index.htm;       
    location / {
		try_files $uri $uri/ /index.html;
	}

  }
These blocks seem to work better if you use regular expressions because if the regular expression fails for the server name then the block won't be executed. If you actually put the name of the server e.g.

Code: Select all

    server_name www.pearltrees.com;    


this might not be the case.

The tilda sign "~" means to use regular expressions. The carrot "^" at the beginning of the regular expression means to match the beginning of the string. The left and right round paranthesis "(...)" mean that we are assigning the result of the regular expression that matches the part of the regular expression inside the parenthesis to a variable.

The question mark "?" mark following left parenthesis "(" means that we give this variable the name specified inside the angled brackets, which follow the question mark "?", rather than using the automatic variables (e.g. $1). In our case the variable name that we assign to this part of the regular expression is called server_name1.

We are not using this result currently but it could be useful for debugging or other logic. The "." following the right angled brackets means to match any character and the astrix "*" following it denotes how many characters that this "any character" symbol is to mach. In our case the Astrix modifies the dot to say that we will match zero or more instances of this "any character" wildcard. Following this wildcard we must match exactly the text "pearltrees.com" in the server name. The dot is surrounded by square brackets is because dot is a special chacter. The square brackets are used to indicate exactly which characters that we want to match [4]. For instance if we wanted to match either "." or "_" then we could use [._] instead of [.].

This server block must be placed inside the http block of the nginx.conf file. I would also put it near the bottom of the block in case some of the other directives in the block must occur first. This code can also reside in a separate file if an include statement is used. For instance on the ubuntu version downloaded via the package manager on tahrpup there are the following include statement at the bottom of http block

Lines 71 & 72 of /etc/nginx.conf

Code: Select all

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;
So in my case rather then putting this code in nginx.conf file . I created the file:
include /etc/nginx/sites-enabled/pearltrees.conf

Code: Select all

  server {
    server_name ~^(?<serv_name1>.*pearltrees[.]com)$;    
    listen     127.0.0.1:80;    
     root /root/spot/PT;   
     index index.html index.htm;       
    location / {
		try_files $uri $uri/ /index.html;
	}
    #location ~ ^(.*)$ {
    #    #return 200 "$scheme//$serv_name1$1";
 	#	try_files $uri $uri/ /index.html;       
	#	#proxy_pass "http:// /$1";
	#}

  }
  server {
    server_name ~^(?<serv_name1>.+)$;    
    listen     127.0.0.1:80;    
     root /root/spot/PT;   
     index index.html index.htm;    


    #access_log logs/domain1.access.log main;
    location /www.pearltrees.com/ {
		alias /root/spot/PT;
		autoindex on;
		allow 127.0.0.1;
		#deny all;
	}     
    location / {
		try_files $uri $uri/ /index.html;
	}
    location ~ ^(.*)$ {
        #return 200 "$scheme//$serv_name1$1";
        resolver 192.168.1.254; #https://stackoverflow.com/questions/17685674/nginx-proxy-pass-with-remote-addr
		proxy_pass "$scheme://$serv_name1$1";
	}

  }
In the first server block is for my fake pearltrees.com proxy. In this serer block I also have some debugging code commented out. This second location block in this first server is commented out because the regular expression will take precedence since the regular expression will match more of the URL than just the "/" path of the first location block.

We might want to uncomment it for debugging purposes as follows:

Code: Select all

    location ~ ^(.*)$ {
        return 200 "$scheme//$serv_name1$1";
	}
The return 200 line let's us create a debugging statement where:

$scheme will either say http or https (nginx default variable)

$serv_name1 will be the server name (returned from a previous regular expression)

$1 will be the part of the url after the server name. This $1 is an automatic variable assigned by the regular expression "^(.*)$ "

The next server block works as a pass-through proxy so that if pearltrees.com isn't matched we will go to the actual site on the Internet. This isn't strictly necessary for my setup because I'm using palemoon as my main web browser and seamonkey for my fake proxy. However, it adds some convince and is also useful for other applications (e.g. Internet security).

In this second block the main passthrough proxy is the following block:

Code: Select all

    location ~ ^(.*)$ {
        #return 200 "$scheme//$serv_name1$1";
        resolver 192.168.1.254; #https://stackoverflow.com/questions/17685674/nginx-proxy-pass-with-remote-addr
		proxy_pass "$scheme://$serv_name1$1";
	}
and I'm not sure of the rest of the location blocks in this server block matches anything or does anything [3]. I will follow up with this post once I learn more. The "proxy_pass" directive is what passes the request to the actual server on the web and to the right of the proxypass we reconstruct the server name for both the default variables that nginx assigns and from variables assigned via previous regular expressions.

Niginx seems to want one to explicitly specify the DNS resolver rather then using those listed in /etc/resolve.conf. While this is slightly less convenient it has some added security benefits and also let's us specify alternative addresses for those servers should we want to use a local version of the server instead or perhaps the server resides on a local intranet rather than the world wide web. It could even be a darknet address (e.g. ".onion" domain).

Nevertheless one can simply pick one of the addresses in resolve.conf and put it to the right of the resolver directive.

Typically in /etc/resolve.conf there will be two DNS resolver addresses, the first one is on the local netwrok. Usually the router is the first DNS resolver which typically has an address that starts with "192" and the second on is on the Internet. If you select the router as the resolver it will be faster because the router does DNS caching. However, there is some risk that a router could be hacked and spoofing websites.

Now for the fake proxy to work you would normally [5] set up a manual proxy for one of your browsers. For example in seamonkey go to:
edit -> preferences -> advanced -> proxies
Select "Manual Proxy Configuration"
Specify in the Proxy Field: 127.0.0.1
and Specify port 80

If you use a different address or port change in your config files the lines where it says:

Code: Select all

    listen     127.0.0.1:80;  
to match the manual proxy set-up in your browser.

The ip address must be an ip address that your computer is able to bind to. This will usually be a private ip address or loop back address unless maybe if you gateway to the Internet is set to pass-through.

So in my case I have seamonkey set to the manual proxy. I can still access the local server via palemoon by typing in the URL "127.0.0.1" instead of "www.pearltrees.com". However, the links won't be set up so that they will copy and past to pearltrees.com properly if I'm not using the fake proxy that I configured in seamonkey.

Starting, stopping and restarting nginx

You can start stop and restart nginx directly by typing

Code: Select all

/usr/sbin/nginx -x signal
where signal is either: start, quit, reopen or reload.

However, most installations will come with a start up script which will will allow you start start up nginx at boot. To start nginx at boot in the puppy menu go to
setup -> Puppy Setup -> Services
and then make sure nginx is checked.

This should be the menu path for most versions of puppylinux but as I type I'm specifically using tahrpup at this time, so one expects that there could be slightly different naming in other versions of puppy.

Note that before you configure nginx to start about boot you should test the start-up script. Go do the either:

Code: Select all

/etc/rc.d/init.d
or directly

Code: Select all

/etc/init.d
and type either

Code: Select all

./nginx start
or

Code: Select all

nginx start
In the case of the ubuntu version of the start-up script (at least for tahr) it works if you comment out line 25 of

/etc/init.d/nginx

as follows:

Code: Select all

#. /lib/init/vars.sh
Puppy has an equivalent of these variables but they are not required for this start-up script.

Debugging

As previously mentioned you can use debugging statemens like the following

Code: Select all

return 200 "$scheme//$serv_name1$1"; 
which is discussed above. Also check your log files for messages such as:

Code: Select all

/var/log/nginx/access.log
/var/log/nginx/error.log
after each change in the nginx configuration files you must restart the server. The easiest way is to type

Code: Select all

nginx -s reload
but other methods of starting and stoping nginx are noted above. For example one could type

Code: Select all

/etc/init.d/nginx stop
/etc/init.d/nginx start
or alternatively go through the puppy menus to start and stop the service.



Look Ahead

I want to expand upon these ideas for security applications using either DNSCrypt and/or Tor. However, this will be in another thread since this is outside the topic of website development

Notes
------------------
1. To find the nginx.conf file using the find command one can type

Code: Select all

find / -name 'nginx.conf'
2. The dot "." in the path name for the ~/.packages folder means that the directory is hidden. On Rox there is an eye symbol that one can click on to show hidden files and folders. Also in some file selection menus you can right click and select, "show hidden files". As a final note in a shell or terminal emulator you can use the "-a" option in

Code: Select all

ls -a
to list all files including hidden files.

3. As noted regular expression in nginx server directives often over ride other blocks because they match more of the URL. The following location block was created based on how I though that an http proxy might work :

Code: Select all

    location /www.pearltrees.com/ {
		alias /root/spot/PT;
		autoindex on;
		allow 127.0.0.1;
		#deny all;
	}     
basically I thought (think) the url to nginx might look like "http://127.0.0.1:80/www.pearltrees.com/..." in the http proxy.

If this location block matches then what the alias directive does is sets the root for paths following http://127.0.0.1:80/www.pearltrees.com/. to start at the path to the right of the alias diretive. In my case /root/spot/PT.

The alias directive is also useful if you want to server to look in folders outside of the www root folder. I'm not sure if nginx automatically translates http proxies and if it does then this block might not be required. I'll have to do some more debugging statements to figure this out.

4. Note that if inside the square brackets the fist character is a carrot "^" it means to match anything except the characters which follow inside the square bracket. For instance, [^._] means to match anything except the characters "." or "_".

5. As an alternative to setting up a manual proxy on the browser you could configure IPTABLES (i.e. the firewall) to redirect traffic from a particular browser to the proxy. One way to do this is to create seperate user IDs for each browser and re-direct the traffic for user (say) "seamonkey" to the fake proxy. This will be more secure as it will prevent a browser hijack from reconfiguring the proxy. This is perhaps a good topic for my next post in this thread.

Post Reply