1A Do-It-Yourself Framework 2++++++++++++++++++++++++++ 3 4:author: Ian Bicking <ianb@colorstudy.com> 5:revision: $Rev$ 6:date: $LastChangedDate$ 7 8This tutorial has been translated `into Portuguese 9<http://montegasppa.blogspot.com/2007/06/um-framework-faa-voc-mesmo.html>`_. 10 11A newer version of this article is available `using WebOb 12<http://pythonpaste.org/webob/do-it-yourself.html>`_. 13 14.. contents:: 15 16.. comments: 17 18 Explain SCRIPT_NAME/PATH_INFO better 19 20Introduction and Audience 21========================= 22 23This short tutorial is meant to teach you a little about WSGI, and as 24an example a bit about the architecture that Paste has enabled and 25encourages. 26 27This isn't an introduction to all the parts of Paste -- in fact, we'll 28only use a few, and explain each part. This isn't to encourage 29everyone to go off and make their own framework (though honestly I 30wouldn't mind). The goal is that when you have finished reading this 31you feel more comfortable with some of the frameworks built using this 32architecture, and a little more secure that you will understand the 33internals if you look under the hood. 34 35What is WSGI? 36============= 37 38At its simplest WSGI is an interface between web servers and web 39applications. We'll explain the mechanics of WSGI below, but a higher 40level view is to say that WSGI lets code pass around web requests in a 41fairly formal way. But there's more! WSGI is more than just HTTP. 42It might seem like it is just *barely* more than HTTP, but that little 43bit is important: 44 45* You pass around a CGI-like environment, which means data like 46 ``REMOTE_USER`` (the logged-in username) can be securely passed 47 about. 48 49* A CGI-like environment can be passed around with more context -- 50 specifically instead of just one path you two: ``SCRIPT_NAME`` (how 51 we got here) and ``PATH_INFO`` (what we have left). 52 53* You can -- and often should -- put your own extensions into the WSGI 54 environment. This allows for callbacks, extra information, 55 arbitrary Python objects, or whatever you want. These are things 56 you can't put in custom HTTP headers. 57 58This means that WSGI can be used not just between a web server an an 59application, but can be used at all levels for communication. This 60allows web applications to become more like libraries -- well 61encapsulated and reusable, but still with rich reusable functionality. 62 63Writing a WSGI Application 64========================== 65 66The first part is about how to use `WSGI 67<http://www.python.org/peps/pep-0333.html>`_ at its most basic. You 68can read the spec, but I'll do a very brief summary: 69 70* You will be writing a *WSGI application*. That's an object that 71 responds to requests. An application is just a callable object 72 (like a function) that takes two arguments: ``environ`` and 73 ``start_response``. 74 75* The environment looks a lot like a CGI environment, with keys like 76 ``REQUEST_METHOD``, ``HTTP_HOST``, etc. 77 78* The environment also has some special keys like ``wsgi.input`` (the 79 input stream, like the body of a POST request). 80 81* ``start_response`` is a function that starts the response -- you 82 give the status and headers here. 83 84* Lastly the application returns an iterator with the body response 85 (commonly this is just a list of strings, or just a list containing 86 one string that is the entire body.) 87 88So, here's a simple application:: 89 90 def app(environ, start_response): 91 start_response('200 OK', [('content-type', 'text/html')]) 92 return ['Hello world!'] 93 94Well... that's unsatisfying. Sure, you can imagine what it does, but 95you can't exactly point your web browser at it. 96 97There's other cleaner ways to do this, but this tutorial isn't about 98*clean* it's about *easy-to-understand*. So just add this to the 99bottom of your file:: 100 101 if __name__ == '__main__': 102 from paste import httpserver 103 httpserver.serve(app, host='127.0.0.1', port='8080') 104 105Now visit http://localhost:8080 and you should see your new app. 106If you want to understand how a WSGI server works, I'd recommend 107looking at the `CGI WSGI server 108<http://www.python.org/peps/pep-0333.html#the-server-gateway-side>`_ 109in the WSGI spec. 110 111An Interactive App 112------------------ 113 114That last app wasn't very interesting. Let's at least make it 115interactive. To do that we'll give a form, and then parse the form 116fields:: 117 118 from paste.request import parse_formvars 119 120 def app(environ, start_response): 121 fields = parse_formvars(environ) 122 if environ['REQUEST_METHOD'] == 'POST': 123 start_response('200 OK', [('content-type', 'text/html')]) 124 return ['Hello, ', fields['name'], '!'] 125 else: 126 start_response('200 OK', [('content-type', 'text/html')]) 127 return ['<form method="POST">Name: <input type="text" ' 128 'name="name"><input type="submit"></form>'] 129 130The ``parse_formvars`` function just takes the WSGI environment and 131calls the `cgi <http://python.org/doc/current/lib/module-cgi.html>`_ 132module (the ``FieldStorage`` class) and turns that into a MultiDict. 133 134Now For a Framework 135=================== 136 137Now, this probably feels a bit crude. After all, we're testing for 138things like REQUEST_METHOD to handle more than one thing, and it's 139unclear how you can have more than one page. 140 141We want to build a framework, which is just a kind of generic 142application. In this tutorial we'll implement an *object publisher*, 143which is something you may have seen in Zope, Quixote, or CherryPy. 144 145Object Publishing 146----------------- 147 148In a typical Python object publisher you translate ``/`` to ``.``. So 149``/articles/view?id=5`` turns into ``root.articles.view(id=5)``. We 150have to start with some root object, of course, which we'll pass in... 151 152:: 153 154 class ObjectPublisher(object): 155 156 def __init__(self, root): 157 self.root = root 158 159 def __call__(self, environ, start_response): 160 ... 161 162 app = ObjectPublisher(my_root_object) 163 164We override ``__call__`` to make instances of ``ObjectPublisher`` 165callable objects, just like a function, and just like WSGI 166applications. Now all we have to do is translate that ``environ`` 167into the thing we are publishing, then call that thing, then turn the 168response into what WSGI wants. 169 170The Path 171-------- 172 173WSGI puts the requested path into two variables: ``SCRIPT_NAME`` and 174``PATH_INFO``. ``SCRIPT_NAME`` is everything that was used up 175*getting here*. ``PATH_INFO`` is everything left over -- it's 176the part the framework should be using to find the object. If you put 177the two back together, you get the full path used to get to where we 178are right now; this is very useful for generating correct URLs, and 179we'll make sure we preserve this. 180 181So here's how we might implement ``__call__``:: 182 183 def __call__(self, environ, start_response): 184 fields = parse_formvars(environ) 185 obj = self.find_object(self.root, environ) 186 response_body = obj(**fields.mixed()) 187 start_response('200 OK', [('content-type', 'text/html')]) 188 return [response_body] 189 190 def find_object(self, obj, environ): 191 path_info = environ.get('PATH_INFO', '') 192 if not path_info or path_info == '/': 193 # We've arrived! 194 return obj 195 # PATH_INFO always starts with a /, so we'll get rid of it: 196 path_info = path_info.lstrip('/') 197 # Then split the path into the "next" chunk, and everything 198 # after it ("rest"): 199 parts = path_info.split('/', 1) 200 next = parts[0] 201 if len(parts) == 1: 202 rest = '' 203 else: 204 rest = '/' + parts[1] 205 # Hide private methods/attributes: 206 assert not next.startswith('_') 207 # Now we get the attribute; getattr(a, 'b') is equivalent 208 # to a.b... 209 next_obj = getattr(obj, next) 210 # Now fix up SCRIPT_NAME and PATH_INFO... 211 environ['SCRIPT_NAME'] += '/' + next 212 environ['PATH_INFO'] = rest 213 # and now parse the remaining part of the URL... 214 return self.find_object(next_obj, environ) 215 216And that's it, we've got a framework. 217 218Taking It For a Ride 219-------------------- 220 221Now, let's write a little application. Put that ``ObjectPublisher`` 222class into a module ``objectpub``:: 223 224 from objectpub import ObjectPublisher 225 226 class Root(object): 227 228 # The "index" method: 229 def __call__(self): 230 return ''' 231 <form action="welcome"> 232 Name: <input type="text" name="name"> 233 <input type="submit"> 234 </form> 235 ''' 236 237 def welcome(self, name): 238 return 'Hello %s!' % name 239 240 app = ObjectPublisher(Root()) 241 242 if __name__ == '__main__': 243 from paste import httpserver 244 httpserver.serve(app, host='127.0.0.1', port='8080') 245 246Alright, done! Oh, wait. There's still some big missing features, 247like how do you set headers? And instead of giving ``404 Not Found`` 248responses in some places, you'll just get an attribute error. We'll 249fix those up in a later installment... 250 251Give Me More! 252------------- 253 254You'll notice some things are missing here. Most specifically, 255there's no way to set the output headers, and the information on the 256request is a little slim. 257 258:: 259 260 # This is just a dictionary-like object that has case- 261 # insensitive keys: 262 from paste.response import HeaderDict 263 264 class Request(object): 265 def __init__(self, environ): 266 self.environ = environ 267 self.fields = parse_formvars(environ) 268 269 class Response(object): 270 def __init__(self): 271 self.headers = HeaderDict( 272 {'content-type': 'text/html'}) 273 274Now I'll teach you a little trick. We don't want to change the 275signature of the methods. But we can't put the request and response 276objects in normal global variables, because we want to be 277thread-friendly, and all threads see the same global variables (even 278if they are processing different requests). 279 280But Python 2.4 introduced a concept of "thread-local values". That's 281a value that just this one thread can see. This is in the 282`threading.local <http://docs.python.org/lib/module-threading.html>`_ 283object. When you create an instance of ``local`` any attributes you 284set on that object can only be seen by the thread you set them in. So 285we'll attach the request and response objects here. 286 287So, let's remind ourselves of what the ``__call__`` function looked 288like:: 289 290 class ObjectPublisher(object): 291 ... 292 293 def __call__(self, environ, start_response): 294 fields = parse_formvars(environ) 295 obj = self.find_object(self.root, environ) 296 response_body = obj(**fields.mixed()) 297 start_response('200 OK', [('content-type', 'text/html')]) 298 return [response_body] 299 300Lets's update that:: 301 302 import threading 303 webinfo = threading.local() 304 305 class ObjectPublisher(object): 306 ... 307 308 def __call__(self, environ, start_response): 309 webinfo.request = Request(environ) 310 webinfo.response = Response() 311 obj = self.find_object(self.root, environ) 312 response_body = obj(**dict(webinfo.request.fields)) 313 start_response('200 OK', webinfo.response.headers.items()) 314 return [response_body] 315 316Now in our method we might do:: 317 318 class Root: 319 def rss(self): 320 webinfo.response.headers['content-type'] = 'text/xml' 321 ... 322 323If we were being fancier we would do things like handle `cookies 324<http://python.org/doc/current/lib/module-Cookie.html>`_ in these 325objects. But we aren't going to do that now. You have a framework, 326be happy! 327 328WSGI Middleware 329=============== 330 331`Middleware 332<http://www.python.org/peps/pep-0333.html#middleware-components-that-play-both-sides>`_ 333is where people get a little intimidated by WSGI and Paste. 334 335What is middleware? Middleware is software that serves as an 336intermediary. 337 338 339So lets 340write one. We'll write an authentication middleware, so that you can 341keep your greeting from being seen by just anyone. 342 343Let's use HTTP authentication, which also can mystify people a bit. 344HTTP authentication is fairly simple: 345 346* When authentication is requires, we give a ``401 Authentication 347 Required`` status with a ``WWW-Authenticate: Basic realm="This 348 Realm"`` header 349 350* The client then sends back a header ``Authorization: Basic 351 encoded_info`` 352 353* The "encoded_info" is a base-64 encoded version of 354 ``username:password`` 355 356So how does this work? Well, we're writing "middleware", which means 357we'll typically pass the request on to another application. We could 358change the request, or change the response, but in this case sometimes 359we *won't* pass the request on (like, when we need to give that 401 360response). 361 362To give an example of a really really simple middleware, here's one 363that capitalizes the response:: 364 365 class Capitalizer(object): 366 367 # We generally pass in the application to be wrapped to 368 # the middleware constructor: 369 def __init__(self, wrap_app): 370 self.wrap_app = wrap_app 371 372 def __call__(self, environ, start_response): 373 # We call the application we are wrapping with the 374 # same arguments we get... 375 response_iter = self.wrap_app(environ, start_response) 376 # then change the response... 377 response_string = ''.join(response_iter) 378 return [response_string.upper()] 379 380Techically this isn't quite right, because there there's two ways to 381return the response body, but we're skimming bits. 382`paste.wsgilib.intercept_output 383<http://pythonpaste.org/module-paste.wsgilib.html#intercept_output>`_ 384is a somewhat more thorough implementation of this. 385 386.. note:: 387 388 This, like a lot of parts of this (now fairly old) tutorial is 389 better, more thorough, and easier using `WebOb 390 <http://pythonpaste.org/webob/>`_. This particular example looks 391 like:: 392 393 from webob import Request 394 395 class Capitalizer(object): 396 def __init__(self, app): 397 self.app = app 398 def __call__(self, environ, start_response): 399 req = Request(environ) 400 resp = req.get_response(self.app) 401 resp.body = resp.body.upper() 402 return resp(environ, start_response) 403 404So here's some code that does something useful, authentication:: 405 406 class AuthMiddleware(object): 407 408 def __init__(self, wrap_app): 409 self.wrap_app = wrap_app 410 411 def __call__(self, environ, start_response): 412 if not self.authorized(environ.get('HTTP_AUTHORIZATION')): 413 # Essentially self.auth_required is a WSGI application 414 # that only knows how to respond with 401... 415 return self.auth_required(environ, start_response) 416 # But if everything is okay, then pass everything through 417 # to the application we are wrapping... 418 return self.wrap_app(environ, start_response) 419 420 def authorized(self, auth_header): 421 if not auth_header: 422 # If they didn't give a header, they better login... 423 return False 424 # .split(None, 1) means split in two parts on whitespace: 425 auth_type, encoded_info = auth_header.split(None, 1) 426 assert auth_type.lower() == 'basic' 427 unencoded_info = encoded_info.decode('base64') 428 username, password = unencoded_info.split(':', 1) 429 return self.check_password(username, password) 430 431 def check_password(self, username, password): 432 # Not very high security authentication... 433 return username == password 434 435 def auth_required(self, environ, start_response): 436 start_response('401 Authentication Required', 437 [('Content-type', 'text/html'), 438 ('WWW-Authenticate', 'Basic realm="this realm"')]) 439 return [""" 440 <html> 441 <head><title>Authentication Required</title></head> 442 <body> 443 <h1>Authentication Required</h1> 444 If you can't get in, then stay out. 445 </body> 446 </html>"""] 447 448.. note:: 449 450 Again, here's the same thing with WebOb:: 451 452 from webob import Request, Response 453 454 class AuthMiddleware(object): 455 def __init__(self, app): 456 self.app = app 457 def __call__(self, environ, start_response): 458 req = Request(environ) 459 if not self.authorized(req.headers['authorization']): 460 resp = self.auth_required(req) 461 else: 462 resp = self.app 463 return resp(environ, start_response) 464 def authorized(self, header): 465 if not header: 466 return False 467 auth_type, encoded = header.split(None, 1) 468 if not auth_type.lower() == 'basic': 469 return False 470 username, password = encoded.decode('base64').split(':', 1) 471 return self.check_password(username, password) 472 def check_password(self, username, password): 473 return username == password 474 def auth_required(self, req): 475 return Response(status=401, headers={'WWW-Authenticate': 'Basic realm="this realm"'}, 476 body="""\ 477 <html> 478 <head><title>Authentication Required</title></head> 479 <body> 480 <h1>Authentication Required</h1> 481 If you can't get in, then stay out. 482 </body> 483 </html>""") 484 485So, how do we use this? 486 487:: 488 489 app = ObjectPublisher(Root()) 490 wrapped_app = AuthMiddleware(app) 491 492 if __name__ == '__main__': 493 from paste import httpserver 494 httpserver.serve(wrapped_app, host='127.0.0.1', port='8080') 495 496Now you have middleware! Hurrah! 497 498Give Me More Middleware! 499------------------------ 500 501It's even easier to use other people's middleware than to make your 502own, because then you don't have to program. If you've been following 503along, you've probably encountered a few exceptions, and have to look 504at the console to see the exception reports. Let's make that a little 505easier, and show the exceptions in the browser... 506 507:: 508 509 app = ObjectPublisher(Root()) 510 wrapped_app = AuthMiddleware(app) 511 from paste.exceptions.errormiddleware import ErrorMiddleware 512 exc_wrapped_app = ErrorMiddleware(wrapped_app) 513 514Easy! But let's make it *more* fancy... 515 516:: 517 518 app = ObjectPublisher(Root()) 519 wrapped_app = AuthMiddleware(app) 520 from paste.evalexception import EvalException 521 exc_wrapped_app = EvalException(wrapped_app) 522 523So go make an error now. And hit the little +'s. And type stuff in 524to the boxes. 525 526Conclusion 527========== 528 529Now that you've created your framework and application (I'm sure it's 530much nicer than the one I've given so far). You might keep writing it 531(many people have so far), but even if you don't you should be able to 532recognize these components in other frameworks now, and you'll have a 533better understanding how they probably work under the covers. 534 535Also check out the version of this tutorial written `using WebOb 536<http://pythonpaste.org/webob/do-it-yourself.html>`_. That tutorial 537includes things like **testing** and **pattern-matching dispatch** 538(instead of object publishing). 539