• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1A Do-It-Yourself Framework
2++++++++++++++++++++++++++
3
4:author: Ian Bicking <ianb@colorstudy.com>
5:revision: $Rev$
6:date: $LastChangedDate$
7
8This tutorial has been translated `into Portuguese
9<http://montegasppa.blogspot.com/2007/06/um-framework-faa-voc-mesmo.html>`_.
10
11A newer version of this article is available `using WebOb
12<http://pythonpaste.org/webob/do-it-yourself.html>`_.
13
14.. contents::
15
16.. comments:
17
18   Explain SCRIPT_NAME/PATH_INFO better
19
20Introduction and Audience
21=========================
22
23This short tutorial is meant to teach you a little about WSGI, and as
24an example a bit about the architecture that Paste has enabled and
25encourages.
26
27This isn't an introduction to all the parts of Paste -- in fact, we'll
28only use a few, and explain each part.  This isn't to encourage
29everyone to go off and make their own framework (though honestly I
30wouldn't mind).  The goal is that when you have finished reading this
31you feel more comfortable with some of the frameworks built using this
32architecture, and a little more secure that you will understand the
33internals if you look under the hood.
34
35What is WSGI?
36=============
37
38At its simplest WSGI is an interface between web servers and web
39applications.  We'll explain the mechanics of WSGI below, but a higher
40level view is to say that WSGI lets code pass around web requests in a
41fairly formal way.  But there's more!  WSGI is more than just HTTP.
42It might seem like it is just *barely* more than HTTP, but that little
43bit is important:
44
45* You pass around a CGI-like environment, which means data like
46  ``REMOTE_USER`` (the logged-in username) can be securely passed
47  about.
48
49* A CGI-like environment can be passed around with more context --
50  specifically instead of just one path you two: ``SCRIPT_NAME`` (how
51  we got here) and ``PATH_INFO`` (what we have left).
52
53* You can -- and often should -- put your own extensions into the WSGI
54  environment.  This allows for callbacks, extra information,
55  arbitrary Python objects, or whatever you want.  These are things
56  you can't put in custom HTTP headers.
57
58This means that WSGI can be used not just between a web server an an
59application, but can be used at all levels for communication.  This
60allows web applications to become more like libraries -- well
61encapsulated and reusable, but still with rich reusable functionality.
62
63Writing a WSGI Application
64==========================
65
66The first part is about how to use `WSGI
67<http://www.python.org/peps/pep-0333.html>`_ at its most basic.  You
68can read the spec, but I'll do a very brief summary:
69
70* You will be writing a *WSGI application*.  That's an object that
71  responds to requests.  An application is just a callable object
72  (like a function) that takes two arguments: ``environ`` and
73  ``start_response``.
74
75* The environment looks a lot like a CGI environment, with keys like
76  ``REQUEST_METHOD``, ``HTTP_HOST``, etc.
77
78* The environment also has some special keys like ``wsgi.input`` (the
79  input stream, like the body of a POST request).
80
81* ``start_response`` is a function that starts the response -- you
82  give the status and headers here.
83
84* Lastly the application returns an iterator with the body response
85  (commonly this is just a list of strings, or just a list containing
86  one string that is the entire body.)
87
88So, here's a simple application::
89
90    def app(environ, start_response):
91        start_response('200 OK', [('content-type', 'text/html')])
92        return ['Hello world!']
93
94Well... that's unsatisfying.  Sure, you can imagine what it does, but
95you can't exactly point your web browser at it.
96
97There's other cleaner ways to do this, but this tutorial isn't about
98*clean* it's about *easy-to-understand*.  So just add this to the
99bottom of your file::
100
101    if __name__ == '__main__':
102        from paste import httpserver
103        httpserver.serve(app, host='127.0.0.1', port='8080')
104
105Now visit http://localhost:8080 and you should see your new app.
106If you want to understand how a WSGI server works, I'd recommend
107looking at the `CGI WSGI server
108<http://www.python.org/peps/pep-0333.html#the-server-gateway-side>`_
109in the WSGI spec.
110
111An Interactive App
112------------------
113
114That last app wasn't very interesting.  Let's at least make it
115interactive.  To do that we'll give a form, and then parse the form
116fields::
117
118    from paste.request import parse_formvars
119
120    def app(environ, start_response):
121        fields = parse_formvars(environ)
122        if environ['REQUEST_METHOD'] == 'POST':
123            start_response('200 OK', [('content-type', 'text/html')])
124            return ['Hello, ', fields['name'], '!']
125        else:
126            start_response('200 OK', [('content-type', 'text/html')])
127            return ['<form method="POST">Name: <input type="text" '
128                    'name="name"><input type="submit"></form>']
129
130The ``parse_formvars`` function just takes the WSGI environment and
131calls the `cgi <http://python.org/doc/current/lib/module-cgi.html>`_
132module (the ``FieldStorage`` class) and turns that into a MultiDict.
133
134Now For a Framework
135===================
136
137Now, this probably feels a bit crude.  After all, we're testing for
138things like REQUEST_METHOD to handle more than one thing, and it's
139unclear how you can have more than one page.
140
141We want to build a framework, which is just a kind of generic
142application.  In this tutorial we'll implement an *object publisher*,
143which is something you may have seen in Zope, Quixote, or CherryPy.
144
145Object Publishing
146-----------------
147
148In a typical Python object publisher you translate ``/`` to ``.``.  So
149``/articles/view?id=5`` turns into ``root.articles.view(id=5)``.  We
150have to start with some root object, of course, which we'll pass in...
151
152::
153
154    class ObjectPublisher(object):
155
156        def __init__(self, root):
157            self.root = root
158
159        def __call__(self, environ, start_response):
160            ...
161
162    app = ObjectPublisher(my_root_object)
163
164We override ``__call__`` to make instances of ``ObjectPublisher``
165callable objects, just like a function, and just like WSGI
166applications.  Now all we have to do is translate that ``environ``
167into the thing we are publishing, then call that thing, then turn the
168response into what WSGI wants.
169
170The Path
171--------
172
173WSGI puts the requested path into two variables: ``SCRIPT_NAME`` and
174``PATH_INFO``.  ``SCRIPT_NAME`` is everything that was used up
175*getting here*.  ``PATH_INFO`` is everything left over -- it's
176the part the framework should be using to find the object.  If you put
177the two back together, you get the full path used to get to where we
178are right now; this is very useful for generating correct URLs, and
179we'll make sure we preserve this.
180
181So here's how we might implement ``__call__``::
182
183    def __call__(self, environ, start_response):
184        fields = parse_formvars(environ)
185        obj = self.find_object(self.root, environ)
186        response_body = obj(**fields.mixed())
187        start_response('200 OK', [('content-type', 'text/html')])
188        return [response_body]
189
190    def find_object(self, obj, environ):
191        path_info = environ.get('PATH_INFO', '')
192        if not path_info or path_info == '/':
193            # We've arrived!
194            return obj
195        # PATH_INFO always starts with a /, so we'll get rid of it:
196        path_info = path_info.lstrip('/')
197        # Then split the path into the "next" chunk, and everything
198        # after it ("rest"):
199        parts = path_info.split('/', 1)
200        next = parts[0]
201        if len(parts) == 1:
202            rest = ''
203        else:
204            rest = '/' + parts[1]
205        # Hide private methods/attributes:
206        assert not next.startswith('_')
207        # Now we get the attribute; getattr(a, 'b') is equivalent
208        # to a.b...
209        next_obj = getattr(obj, next)
210        # Now fix up SCRIPT_NAME and PATH_INFO...
211        environ['SCRIPT_NAME'] += '/' + next
212        environ['PATH_INFO'] = rest
213        # and now parse the remaining part of the URL...
214        return self.find_object(next_obj, environ)
215
216And that's it, we've got a framework.
217
218Taking It For a Ride
219--------------------
220
221Now, let's write a little application.  Put that ``ObjectPublisher``
222class into a module ``objectpub``::
223
224    from objectpub import ObjectPublisher
225
226    class Root(object):
227
228        # The "index" method:
229        def __call__(self):
230            return '''
231            <form action="welcome">
232            Name: <input type="text" name="name">
233            <input type="submit">
234            </form>
235            '''
236
237        def welcome(self, name):
238            return 'Hello %s!' % name
239
240    app = ObjectPublisher(Root())
241
242    if __name__ == '__main__':
243        from paste import httpserver
244        httpserver.serve(app, host='127.0.0.1', port='8080')
245
246Alright, done!  Oh, wait.  There's still some big missing features,
247like how do you set headers?  And instead of giving ``404 Not Found``
248responses in some places, you'll just get an attribute error.  We'll
249fix those up in a later installment...
250
251Give Me More!
252-------------
253
254You'll notice some things are missing here.  Most specifically,
255there's no way to set the output headers, and the information on the
256request is a little slim.
257
258::
259
260    # This is just a dictionary-like object that has case-
261    # insensitive keys:
262    from paste.response import HeaderDict
263
264    class Request(object):
265        def __init__(self, environ):
266            self.environ = environ
267            self.fields = parse_formvars(environ)
268
269    class Response(object):
270        def __init__(self):
271            self.headers = HeaderDict(
272                {'content-type': 'text/html'})
273
274Now I'll teach you a little trick.  We don't want to change the
275signature of the methods.  But we can't put the request and response
276objects in normal global variables, because we want to be
277thread-friendly, and all threads see the same global variables (even
278if they are processing different requests).
279
280But Python 2.4 introduced a concept of "thread-local values".  That's
281a value that just this one thread can see.  This is in the
282`threading.local <http://docs.python.org/lib/module-threading.html>`_
283object.  When you create an instance of ``local`` any attributes you
284set on that object can only be seen by the thread you set them in.  So
285we'll attach the request and response objects here.
286
287So, let's remind ourselves of what the ``__call__`` function looked
288like::
289
290    class ObjectPublisher(object):
291        ...
292
293        def __call__(self, environ, start_response):
294            fields = parse_formvars(environ)
295            obj = self.find_object(self.root, environ)
296            response_body = obj(**fields.mixed())
297            start_response('200 OK', [('content-type', 'text/html')])
298            return [response_body]
299
300Lets's update that::
301
302    import threading
303    webinfo = threading.local()
304
305    class ObjectPublisher(object):
306        ...
307
308        def __call__(self, environ, start_response):
309            webinfo.request = Request(environ)
310            webinfo.response = Response()
311            obj = self.find_object(self.root, environ)
312            response_body = obj(**dict(webinfo.request.fields))
313            start_response('200 OK', webinfo.response.headers.items())
314            return [response_body]
315
316Now in our method we might do::
317
318    class Root:
319        def rss(self):
320            webinfo.response.headers['content-type'] = 'text/xml'
321            ...
322
323If we were being fancier we would do things like handle `cookies
324<http://python.org/doc/current/lib/module-Cookie.html>`_ in these
325objects.  But we aren't going to do that now.  You have a framework,
326be happy!
327
328WSGI Middleware
329===============
330
331`Middleware
332<http://www.python.org/peps/pep-0333.html#middleware-components-that-play-both-sides>`_
333is where people get a little intimidated by WSGI and Paste.
334
335What is middleware?  Middleware is software that serves as an
336intermediary.
337
338
339So lets
340write one.  We'll write an authentication middleware, so that you can
341keep your greeting from being seen by just anyone.
342
343Let's use HTTP authentication, which also can mystify people a bit.
344HTTP authentication is fairly simple:
345
346* When authentication is requires, we give a ``401 Authentication
347  Required`` status with a ``WWW-Authenticate: Basic realm="This
348  Realm"`` header
349
350* The client then sends back a header ``Authorization: Basic
351  encoded_info``
352
353* The "encoded_info" is a base-64 encoded version of
354  ``username:password``
355
356So how does this work?  Well, we're writing "middleware", which means
357we'll typically pass the request on to another application.  We could
358change the request, or change the response, but in this case sometimes
359we *won't* pass the request on (like, when we need to give that 401
360response).
361
362To give an example of a really really simple middleware, here's one
363that capitalizes the response::
364
365    class Capitalizer(object):
366
367        # We generally pass in the application to be wrapped to
368        # the middleware constructor:
369        def __init__(self, wrap_app):
370            self.wrap_app = wrap_app
371
372        def __call__(self, environ, start_response):
373            # We call the application we are wrapping with the
374            # same arguments we get...
375            response_iter = self.wrap_app(environ, start_response)
376            # then change the response...
377            response_string = ''.join(response_iter)
378            return [response_string.upper()]
379
380Techically this isn't quite right, because there there's two ways to
381return the response body, but we're skimming bits.
382`paste.wsgilib.intercept_output
383<http://pythonpaste.org/module-paste.wsgilib.html#intercept_output>`_
384is a somewhat more thorough implementation of this.
385
386.. note::
387
388   This, like a lot of parts of this (now fairly old) tutorial is
389   better, more thorough, and easier using `WebOb
390   <http://pythonpaste.org/webob/>`_.  This particular example looks
391   like::
392
393       from webob import Request
394
395       class Capitalizer(object):
396           def __init__(self, app):
397               self.app = app
398           def __call__(self, environ, start_response):
399               req = Request(environ)
400               resp = req.get_response(self.app)
401               resp.body = resp.body.upper()
402               return resp(environ, start_response)
403
404So here's some code that does something useful, authentication::
405
406    class AuthMiddleware(object):
407
408        def __init__(self, wrap_app):
409            self.wrap_app = wrap_app
410
411        def __call__(self, environ, start_response):
412            if not self.authorized(environ.get('HTTP_AUTHORIZATION')):
413                # Essentially self.auth_required is a WSGI application
414                # that only knows how to respond with 401...
415                return self.auth_required(environ, start_response)
416            # But if everything is okay, then pass everything through
417            # to the application we are wrapping...
418            return self.wrap_app(environ, start_response)
419
420        def authorized(self, auth_header):
421            if not auth_header:
422                # If they didn't give a header, they better login...
423                return False
424            # .split(None, 1) means split in two parts on whitespace:
425            auth_type, encoded_info = auth_header.split(None, 1)
426            assert auth_type.lower() == 'basic'
427            unencoded_info = encoded_info.decode('base64')
428            username, password = unencoded_info.split(':', 1)
429            return self.check_password(username, password)
430
431        def check_password(self, username, password):
432            # Not very high security authentication...
433            return username == password
434
435        def auth_required(self, environ, start_response):
436            start_response('401 Authentication Required',
437                [('Content-type', 'text/html'),
438                 ('WWW-Authenticate', 'Basic realm="this realm"')])
439            return ["""
440            <html>
441             <head><title>Authentication Required</title></head>
442             <body>
443              <h1>Authentication Required</h1>
444              If you can't get in, then stay out.
445             </body>
446            </html>"""]
447
448.. note::
449
450   Again, here's the same thing with WebOb::
451
452       from webob import Request, Response
453
454       class AuthMiddleware(object):
455           def __init__(self, app):
456               self.app = app
457           def __call__(self, environ, start_response):
458               req = Request(environ)
459               if not self.authorized(req.headers['authorization']):
460                   resp = self.auth_required(req)
461               else:
462                   resp = self.app
463               return resp(environ, start_response)
464           def authorized(self, header):
465               if not header:
466                   return False
467               auth_type, encoded = header.split(None, 1)
468               if not auth_type.lower() == 'basic':
469                   return False
470               username, password = encoded.decode('base64').split(':', 1)
471               return self.check_password(username, password)
472        def check_password(self, username, password):
473            return username == password
474        def auth_required(self, req):
475            return Response(status=401, headers={'WWW-Authenticate': 'Basic realm="this realm"'},
476                            body="""\
477            <html>
478             <head><title>Authentication Required</title></head>
479             <body>
480              <h1>Authentication Required</h1>
481              If you can't get in, then stay out.
482             </body>
483            </html>""")
484
485So, how do we use this?
486
487::
488
489    app = ObjectPublisher(Root())
490    wrapped_app = AuthMiddleware(app)
491
492    if __name__ == '__main__':
493        from paste import httpserver
494        httpserver.serve(wrapped_app, host='127.0.0.1', port='8080')
495
496Now you have middleware!  Hurrah!
497
498Give Me More Middleware!
499------------------------
500
501It's even easier to use other people's middleware than to make your
502own, because then you don't have to program.  If you've been following
503along, you've probably encountered a few exceptions, and have to look
504at the console to see the exception reports.  Let's make that a little
505easier, and show the exceptions in the browser...
506
507::
508
509    app = ObjectPublisher(Root())
510    wrapped_app = AuthMiddleware(app)
511    from paste.exceptions.errormiddleware import ErrorMiddleware
512    exc_wrapped_app = ErrorMiddleware(wrapped_app)
513
514Easy!  But let's make it *more* fancy...
515
516::
517
518    app = ObjectPublisher(Root())
519    wrapped_app = AuthMiddleware(app)
520    from paste.evalexception import EvalException
521    exc_wrapped_app = EvalException(wrapped_app)
522
523So go make an error now.  And hit the little +'s.  And type stuff in
524to the boxes.
525
526Conclusion
527==========
528
529Now that you've created your framework and application (I'm sure it's
530much nicer than the one I've given so far).  You might keep writing it
531(many people have so far), but even if you don't you should be able to
532recognize these components in other frameworks now, and you'll have a
533better understanding how they probably work under the covers.
534
535Also check out the version of this tutorial written `using WebOb
536<http://pythonpaste.org/webob/do-it-yourself.html>`_.  That tutorial
537includes things like **testing** and **pattern-matching dispatch**
538(instead of object publishing).
539