#
# http_tarpit version 1.0
# Stephen Warren 2006/02/24
#
# I, the author of this work, hereby release it into the public domain.
# This applies worldwide.
#
# In case this is not legally possible:
# I grant anyone the right to use this work for any purpose, without any
# conditions, unless such conditions are required by law.
#

"""
Concept:

This module was developed to prevent automated (and rapid) password searching
against a password-protected HTTP resource.

The basic idea is that we detect when the client is being bad (i.e. attempting
to access the protected resource with invalid credentials), and then, after a
certain threshold of badness, we start throttling *all* requests from that
client. This prevents the client from making as many password search attempts,
and hopefully makes them give up and try cracking some other system.

Once a client starts being good again (i.e. by providing the correct
credentials), we stop throttling the client. This prevents too much
interference with valid users who simply fumble their password a large number
of times.

Finally, we ignore any historical status information that's past a certain
age. This way, if a cracker is using a dynamic IP and it gets recycled to a
legitimate user, the throttling will automatically expire after a short
period, and not affect the valid user.

Notes:

A client is an IP address. Any other scheme would rely on identifying the
client via some unique data present in the request (i.e. a cookie). Any
remotely clueful cracker would simply not send such identifying data in the
request, thus appearing to be a new identity in each request, thus avoiding
the tarpitting completely. Hopefully, nobody attacks the system using any
form of "botnet".

We detect badness/goodness by hooking into the "log" phases of HTTP request
handling (which always runs for all requests, denied or not). The HTTP status
code of the response is looked up in a table of bad and good status codes.

Bad status codes are always processed within the entire URL/directory
hierarchy where the handler is active. We do this to catch a variety of status
codes indicating bad clients, in addition to the basic authorization required
code.

Good status codes are only processed within defined portions of the URL
hierarchy. This ensures that in the following hierarchy:

/                   # http_tarpit enabled here
/password_protected

Accesses to the unprotected root directory can be configured not to reset
the tarpitting history. This prevents an attacker from attempting a few
passwords in /password_protected, then resetting their tarpit status simply
by retrieving an unprotected URL in /.

Badness is recorded in a log file (one per client). Each entry records the
HTTP status code that was designated as bad, along with a timestamp. Log
entries are fixed format (hence fixed size) - the importance of this is
explained later.

The log file is written in append mode, so that the kernel assures that writes
are atomic, even in the presence of multiple Apache processes.

We only keep at most a limited number of log entries, to prevent DOS (Denial
Of Service) attacks via filling up the whole partition with log files.

We implement tarpitting in the "header parser" phase of request processing.
This is late enough that per-directory/location configuration can be specified
in http.conf, but early enough to happen before credential verification.

It is important to tarpit before credential verification, to delay requests
that will ultimately fail that verification. If the verification does fail,
the reset of the HTTP request processing is aborted, and hence there's no
other handler phase that this code can hook into.

To implement the tarpitting, http_tarpit checks the client-specific log file
size, and divides it by the fixed size of each log entry. This is the number
of bad requests in recent history. A mapping function is used to derive the
number of seconds to delay the sending of a response to the client.

The response delay is implemented via a sleep() call directly inline with the
request processing. This blocks the entire Apache process (or potentially, the
thread) from any form of processing. This is probably not an issue, since the
main Apache process will spawn more processes/threads as needed to handle any
other requests that are received at the same time. Still, one needs to ensure
that the Apache configuration allows a reasonable number of child processes to
be spawned, so as to prevent a DOS attack.

An extension to this module would be to trigger immediate rejection of
requests from clients that persist in being bad. Thus, a few bad requests
would simply be logged, a few more would be tarpitted, and a few more would
trigger immediate rejection of all requests.

Configuration:

Create a logfile directory:

    cd /var/www/logs
    mkdir http_tarpit
    chown apache:apache http_tarpit
    chmod 750 http_tarpit

Enable mod_python using the following code at the top-level of httpd.conf:

    LoadModule python_module modules/mod_python.so

Enable http_tarpit for a given virtual server using this code:

    # NOTE: DO NOT use SetHandler/AddHandler to 'enable' mod_python here
    # It's only required IF you use plain PythonHandler, not the hooks
    # for the other phases. See:
    # http://www.modpython.org/pipermail/mod_python/2005-July/018700.html
    PythonOption http_tarpit_status_path /var/www/logs/ip_auth_status
    PythonHeaderParserHandler http_tarpit
    PythonLogHandler http_tarpit
    #PythonDebug On
    PythonAutoReload On
    PythonPath "sys.path+['/var/www/modpython']"

Enable good status code processing using this code in location blocks where
authentication is required:

    PythonOption http_tarpit_protected_location On

Note that "On" is case-sensitive, and the only value that will work.
"""

import os
import os.path
import time
from mod_python import apache

# A "list" of HTTP status codes that indicates the client is being bad.
# It's an associative array, hopefully to enable faster lookup.
bad_codes = {
    400: 1, # Bad Request: RFC 2616 10.4.1 (HTTP/1.1)
    401: 1, # Unauthorized: RFC 2616 10.4.2 (HTTP/1.1)
    403: 1, # Forbidden: RFC 2616 10.4.4 (HTTP/1.1)
    405: 1, # Method Not Allowed: RFC 2616 10.4.6 (HTTP/1.1)
    407: 1  # Proxy Authorization Required: RFC 2616 10.4.8 (HTTP/1.1)
}

# A "list" of HTTP status codes that indicates the client was good.
good_codes = {
    200: 1, # OK: RFC2616 10.2.1 (HTTP/1.1)
    201: 1, # Created: RFC2616 10.2.2 (HTTP/1.1)
    204: 1, # No Content: RFC 2616 10.2.5 (HTTP/1.1)
    206: 1, # Partial Content: RFC 2616 10.2.7 (HTTP/1.1)
    207: 1, # Multi-Status: RFC 2518 10.2 (WebDAV)
    304: 1  # Not Modified: RFC 2616 10.3.5 (HTTP/1.1)
}

status_path = None

# Calculate the filename to store stats for a given IP address
def ip_path(req):
    global status_path

    if not status_path:
        o = req.get_options()
        status_path = o['http_tarpit_status_path']
    #

    return os.path.join(status_path, req.connection.remote_ip)
#

# Look at an IP's status file, and determine how bad it's been
# in the recent past
def bad_auth_count(req):
    f = ip_path(req)
    if not os.path.exists(f):
        return 0
    #

    st = os.stat(f)
    # Older than one hour? Expire it.
    if st.st_mtime < (time.time() - (60 * 60)):
        os.unlink(f)
        return 0
    #

    # 24 == Length of log entry, including \n
    # This is affected by the code in loghandler below
    return (st.st_size + 23) / 24
#

# Map from the number of times an IP has been bad, to the amount of time
# we delay the request.
def sleep_time(req):
    c = bad_auth_count(req)
    if c < 4:
        return 0
    #
    if c < 10:
        return 1
    #
    return 5
#

# This handler gets run once for every single request, very early on in the
# request processing. We detect if the client has been previously bad, and
# if so, sleep some amount of time to slow them down.
def headerparserhandler(req):
    t = sleep_time(req)
    if t:
        time.sleep(t)
    #
    return apache.OK
#

# This handler generates the response text (web page body) sent back to the
# client. It's useful for debugging only, and typically isn't enabled.
def handler(req):
    req.content_type = 'text/html'

    ip = req.connection.remote_ip

    req.write('Bad auth count: %d<br>\nSleep time this request: %d<br>\n' % (bad_auth_count(req), sleep_time(req)))

    return apache.OK
#

# This handler gets run once for every single request, no matter what
# happened. The response has already been sent; we simply record any badness
# on the part of the client, or forget badness in the face of goodness.
def loghandler(req):
    # Is the client being bad this request?
    if bad_codes.has_key(req.status):
        # And, has the client previously been bad less than 15 times
        # within the log history?
        if bad_auth_count(req) < 15:
            # If so, log this specific instance of badness
            fn = ip_path(req)
            f = file(fn, "a")
            # If you change the size of the data written to the file,
            # Remember to update bad_auth_count for the new log entry size.
            status_str = "%3d " % req.status
            time_str = time.strftime('%Y/%m/%d %H:%M:%S')
            f.write(status_str + time_str)
            f.write('\n')
        #
    # Otherwise, has the client been good this request?
    elif good_codes.has_key(req.status):
        if req.get_options().get('http_tarpit_protected_location', 'Off') == 'On':
            # If so, clear out any badness history
            # They probably just fumbled their password a few times
            fn = ip_path(req)
            if os.path.exists(fn):
                os.unlink(fn)
            #
        #
    #

    return apache.OK
#


