1.. _pyporting-howto: 2 3********************************* 4Porting Python 2 Code to Python 3 5********************************* 6 7:author: Brett Cannon 8 9.. topic:: Abstract 10 11 With Python 3 being the future of Python while Python 2 is still in active 12 use, it is good to have your project available for both major releases of 13 Python. This guide is meant to help you figure out how best to support both 14 Python 2 & 3 simultaneously. 15 16 If you are looking to port an extension module instead of pure Python code, 17 please see :ref:`cporting-howto`. 18 19 If you would like to read one core Python developer's take on why Python 3 20 came into existence, you can read Nick Coghlan's `Python 3 Q & A`_ or 21 Brett Cannon's `Why Python 3 exists`_. 22 23 24 For help with porting, you can view the archived python-porting_ mailing list. 25 26The Short Explanation 27===================== 28 29To make your project be single-source Python 2/3 compatible, the basic steps 30are: 31 32#. Only worry about supporting Python 2.7 33#. Make sure you have good test coverage (coverage.py_ can help; 34 ``python -m pip install coverage``) 35#. Learn the differences between Python 2 & 3 36#. Use Futurize_ (or Modernize_) to update your code (e.g. ``python -m pip install future``) 37#. Use Pylint_ to help make sure you don't regress on your Python 3 support 38 (``python -m pip install pylint``) 39#. Use caniusepython3_ to find out which of your dependencies are blocking your 40 use of Python 3 (``python -m pip install caniusepython3``) 41#. Once your dependencies are no longer blocking you, use continuous integration 42 to make sure you stay compatible with Python 2 & 3 (tox_ can help test 43 against multiple versions of Python; ``python -m pip install tox``) 44#. Consider using optional static type checking to make sure your type usage 45 works in both Python 2 & 3 (e.g. use mypy_ to check your typing under both 46 Python 2 & Python 3; ``python -m pip install mypy``). 47 48.. note:: 49 50 Note: Using ``python -m pip install`` guarantees that the ``pip`` you invoke 51 is the one installed for the Python currently in use, whether it be 52 a system-wide ``pip`` or one installed within a 53 :ref:`virtual environment <tut-venv>`. 54 55Details 56======= 57 58A key point about supporting Python 2 & 3 simultaneously is that you can start 59**today**! Even if your dependencies are not supporting Python 3 yet that does 60not mean you can't modernize your code **now** to support Python 3. Most changes 61required to support Python 3 lead to cleaner code using newer practices even in 62Python 2 code. 63 64Another key point is that modernizing your Python 2 code to also support 65Python 3 is largely automated for you. While you might have to make some API 66decisions thanks to Python 3 clarifying text data versus binary data, the 67lower-level work is now mostly done for you and thus can at least benefit from 68the automated changes immediately. 69 70Keep those key points in mind while you read on about the details of porting 71your code to support Python 2 & 3 simultaneously. 72 73 74Drop support for Python 2.6 and older 75------------------------------------- 76 77While you can make Python 2.5 work with Python 3, it is **much** easier if you 78only have to work with Python 2.7. If dropping Python 2.5 is not an 79option then the six_ project can help you support Python 2.5 & 3 simultaneously 80(``python -m pip install six``). Do realize, though, that nearly all the projects listed 81in this HOWTO will not be available to you. 82 83If you are able to skip Python 2.5 and older, then the required changes 84to your code should continue to look and feel like idiomatic Python code. At 85worst you will have to use a function instead of a method in some instances or 86have to import a function instead of using a built-in one, but otherwise the 87overall transformation should not feel foreign to you. 88 89But you should aim for only supporting Python 2.7. Python 2.6 is no longer 90freely supported and thus is not receiving bugfixes. This means **you** will have 91to work around any issues you come across with Python 2.6. There are also some 92tools mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint_), 93and this will become more commonplace as time goes on. It will simply be easier 94for you if you only support the versions of Python that you have to support. 95 96 97Make sure you specify the proper version support in your ``setup.py`` file 98-------------------------------------------------------------------------- 99 100In your ``setup.py`` file you should have the proper `trove classifier`_ 101specifying what versions of Python you support. As your project does not support 102Python 3 yet you should at least have 103``Programming Language :: Python :: 2 :: Only`` specified. Ideally you should 104also specify each major/minor version of Python that you do support, e.g. 105``Programming Language :: Python :: 2.7``. 106 107 108Have good test coverage 109----------------------- 110 111Once you have your code supporting the oldest version of Python 2 you want it 112to, you will want to make sure your test suite has good coverage. A good rule of 113thumb is that if you want to be confident enough in your test suite that any 114failures that appear after having tools rewrite your code are actual bugs in the 115tools and not in your code. If you want a number to aim for, try to get over 80% 116coverage (and don't feel bad if you find it hard to get better than 90% 117coverage). If you don't already have a tool to measure test coverage then 118coverage.py_ is recommended. 119 120 121Learn the differences between Python 2 & 3 122------------------------------------------- 123 124Once you have your code well-tested you are ready to begin porting your code to 125Python 3! But to fully understand how your code is going to change and what 126you want to look out for while you code, you will want to learn what changes 127Python 3 makes in terms of Python 2. Typically the two best ways of doing that 128is reading the :ref:`"What's New" <whatsnew-index>` doc for each release of Python 3 and the 129`Porting to Python 3`_ book (which is free online). There is also a handy 130`cheat sheet`_ from the Python-Future project. 131 132 133Update your code 134---------------- 135 136Once you feel like you know what is different in Python 3 compared to Python 2, 137it's time to update your code! You have a choice between two tools in porting 138your code automatically: Futurize_ and Modernize_. Which tool you choose will 139depend on how much like Python 3 you want your code to be. Futurize_ does its 140best to make Python 3 idioms and practices exist in Python 2, e.g. backporting 141the ``bytes`` type from Python 3 so that you have semantic parity between the 142major versions of Python. Modernize_, 143on the other hand, is more conservative and targets a Python 2/3 subset of 144Python, directly relying on six_ to help provide compatibility. As Python 3 is 145the future, it might be best to consider Futurize to begin adjusting to any new 146practices that Python 3 introduces which you are not accustomed to yet. 147 148Regardless of which tool you choose, they will update your code to run under 149Python 3 while staying compatible with the version of Python 2 you started with. 150Depending on how conservative you want to be, you may want to run the tool over 151your test suite first and visually inspect the diff to make sure the 152transformation is accurate. After you have transformed your test suite and 153verified that all the tests still pass as expected, then you can transform your 154application code knowing that any tests which fail is a translation failure. 155 156Unfortunately the tools can't automate everything to make your code work under 157Python 3 and so there are a handful of things you will need to update manually 158to get full Python 3 support (which of these steps are necessary vary between 159the tools). Read the documentation for the tool you choose to use to see what it 160fixes by default and what it can do optionally to know what will (not) be fixed 161for you and what you may have to fix on your own (e.g. using ``io.open()`` over 162the built-in ``open()`` function is off by default in Modernize). Luckily, 163though, there are only a couple of things to watch out for which can be 164considered large issues that may be hard to debug if not watched for. 165 166 167Division 168++++++++ 169 170In Python 3, ``5 / 2 == 2.5`` and not ``2``; all division between ``int`` values 171result in a ``float``. This change has actually been planned since Python 2.2 172which was released in 2002. Since then users have been encouraged to add 173``from __future__ import division`` to any and all files which use the ``/`` and 174``//`` operators or to be running the interpreter with the ``-Q`` flag. If you 175have not been doing this then you will need to go through your code and do two 176things: 177 178#. Add ``from __future__ import division`` to your files 179#. Update any division operator as necessary to either use ``//`` to use floor 180 division or continue using ``/`` and expect a float 181 182The reason that ``/`` isn't simply translated to ``//`` automatically is that if 183an object defines a ``__truediv__`` method but not ``__floordiv__`` then your 184code would begin to fail (e.g. a user-defined class that uses ``/`` to 185signify some operation but not ``//`` for the same thing or at all). 186 187 188Text versus binary data 189+++++++++++++++++++++++ 190 191In Python 2 you could use the ``str`` type for both text and binary data. 192Unfortunately this confluence of two different concepts could lead to brittle 193code which sometimes worked for either kind of data, sometimes not. It also 194could lead to confusing APIs if people didn't explicitly state that something 195that accepted ``str`` accepted either text or binary data instead of one 196specific type. This complicated the situation especially for anyone supporting 197multiple languages as APIs wouldn't bother explicitly supporting ``unicode`` 198when they claimed text data support. 199 200To make the distinction between text and binary data clearer and more 201pronounced, Python 3 did what most languages created in the age of the internet 202have done and made text and binary data distinct types that cannot blindly be 203mixed together (Python predates widespread access to the internet). For any code 204that deals only with text or only binary data, this separation doesn't pose an 205issue. But for code that has to deal with both, it does mean you might have to 206now care about when you are using text compared to binary data, which is why 207this cannot be entirely automated. 208 209To start, you will need to decide which APIs take text and which take binary 210(it is **highly** recommended you don't design APIs that can take both due to 211the difficulty of keeping the code working; as stated earlier it is difficult to 212do well). In Python 2 this means making sure the APIs that take text can work 213with ``unicode`` and those that work with binary data work with the 214``bytes`` type from Python 3 (which is a subset of ``str`` in Python 2 and acts 215as an alias for ``bytes`` type in Python 2). Usually the biggest issue is 216realizing which methods exist on which types in Python 2 & 3 simultaneously 217(for text that's ``unicode`` in Python 2 and ``str`` in Python 3, for binary 218that's ``str``/``bytes`` in Python 2 and ``bytes`` in Python 3). The following 219table lists the **unique** methods of each data type across Python 2 & 3 220(e.g., the ``decode()`` method is usable on the equivalent binary data type in 221either Python 2 or 3, but it can't be used by the textual data type consistently 222between Python 2 and 3 because ``str`` in Python 3 doesn't have the method). Do 223note that as of Python 3.5 the ``__mod__`` method was added to the bytes type. 224 225======================== ===================== 226**Text data** **Binary data** 227------------------------ --------------------- 228\ decode 229------------------------ --------------------- 230encode 231------------------------ --------------------- 232format 233------------------------ --------------------- 234isdecimal 235------------------------ --------------------- 236isnumeric 237======================== ===================== 238 239Making the distinction easier to handle can be accomplished by encoding and 240decoding between binary data and text at the edge of your code. This means that 241when you receive text in binary data, you should immediately decode it. And if 242your code needs to send text as binary data then encode it as late as possible. 243This allows your code to work with only text internally and thus eliminates 244having to keep track of what type of data you are working with. 245 246The next issue is making sure you know whether the string literals in your code 247represent text or binary data. You should add a ``b`` prefix to any 248literal that presents binary data. For text you should add a ``u`` prefix to 249the text literal. (there is a :mod:`__future__` import to force all unspecified 250literals to be Unicode, but usage has shown it isn't as effective as adding a 251``b`` or ``u`` prefix to all literals explicitly) 252 253As part of this dichotomy you also need to be careful about opening files. 254Unless you have been working on Windows, there is a chance you have not always 255bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for 256binary reading). Under Python 3, binary files and text files are clearly 257distinct and mutually incompatible; see the :mod:`io` module for details. 258Therefore, you **must** make a decision of whether a file will be used for 259binary access (allowing binary data to be read and/or written) or textual access 260(allowing text data to be read and/or written). You should also use :func:`io.open` 261for opening files instead of the built-in :func:`open` function as the :mod:`io` 262module is consistent from Python 2 to 3 while the built-in :func:`open` function 263is not (in Python 3 it's actually :func:`io.open`). Do not bother with the 264outdated practice of using :func:`codecs.open` as that's only necessary for 265keeping compatibility with Python 2.5. 266 267The constructors of both ``str`` and ``bytes`` have different semantics for the 268same arguments between Python 2 & 3. Passing an integer to ``bytes`` in Python 2 269will give you the string representation of the integer: ``bytes(3) == '3'``. 270But in Python 3, an integer argument to ``bytes`` will give you a bytes object 271as long as the integer specified, filled with null bytes: 272``bytes(3) == b'\x00\x00\x00'``. A similar worry is necessary when passing a 273bytes object to ``str``. In Python 2 you just get the bytes object back: 274``str(b'3') == b'3'``. But in Python 3 you get the string representation of the 275bytes object: ``str(b'3') == "b'3'"``. 276 277Finally, the indexing of binary data requires careful handling (slicing does 278**not** require any special handling). In Python 2, 279``b'123'[1] == b'2'`` while in Python 3 ``b'123'[1] == 50``. Because binary data 280is simply a collection of binary numbers, Python 3 returns the integer value for 281the byte you index on. But in Python 2 because ``bytes == str``, indexing 282returns a one-item slice of bytes. The six_ project has a function 283named ``six.indexbytes()`` which will return an integer like in Python 3: 284``six.indexbytes(b'123', 1)``. 285 286To summarize: 287 288#. Decide which of your APIs take text and which take binary data 289#. Make sure that your code that works with text also works with ``unicode`` and 290 code for binary data works with ``bytes`` in Python 2 (see the table above 291 for what methods you cannot use for each type) 292#. Mark all binary literals with a ``b`` prefix, textual literals with a ``u`` 293 prefix 294#. Decode binary data to text as soon as possible, encode text as binary data as 295 late as possible 296#. Open files using :func:`io.open` and make sure to specify the ``b`` mode when 297 appropriate 298#. Be careful when indexing into binary data 299 300 301Use feature detection instead of version detection 302++++++++++++++++++++++++++++++++++++++++++++++++++ 303 304Inevitably you will have code that has to choose what to do based on what 305version of Python is running. The best way to do this is with feature detection 306of whether the version of Python you're running under supports what you need. 307If for some reason that doesn't work then you should make the version check be 308against Python 2 and not Python 3. To help explain this, let's look at an 309example. 310 311Let's pretend that you need access to a feature of :mod:`importlib` that 312is available in Python's standard library since Python 3.3 and available for 313Python 2 through importlib2_ on PyPI. You might be tempted to write code to 314access e.g. the :mod:`importlib.abc` module by doing the following:: 315 316 import sys 317 318 if sys.version_info[0] == 3: 319 from importlib import abc 320 else: 321 from importlib2 import abc 322 323The problem with this code is what happens when Python 4 comes out? It would 324be better to treat Python 2 as the exceptional case instead of Python 3 and 325assume that future Python versions will be more compatible with Python 3 than 326Python 2:: 327 328 import sys 329 330 if sys.version_info[0] > 2: 331 from importlib import abc 332 else: 333 from importlib2 import abc 334 335The best solution, though, is to do no version detection at all and instead rely 336on feature detection. That avoids any potential issues of getting the version 337detection wrong and helps keep you future-compatible:: 338 339 try: 340 from importlib import abc 341 except ImportError: 342 from importlib2 import abc 343 344 345Prevent compatibility regressions 346--------------------------------- 347 348Once you have fully translated your code to be compatible with Python 3, you 349will want to make sure your code doesn't regress and stop working under 350Python 3. This is especially true if you have a dependency which is blocking you 351from actually running under Python 3 at the moment. 352 353To help with staying compatible, any new modules you create should have 354at least the following block of code at the top of it:: 355 356 from __future__ import absolute_import 357 from __future__ import division 358 from __future__ import print_function 359 360You can also run Python 2 with the ``-3`` flag to be warned about various 361compatibility issues your code triggers during execution. If you turn warnings 362into errors with ``-Werror`` then you can make sure that you don't accidentally 363miss a warning. 364 365You can also use the Pylint_ project and its ``--py3k`` flag to lint your code 366to receive warnings when your code begins to deviate from Python 3 367compatibility. This also prevents you from having to run Modernize_ or Futurize_ 368over your code regularly to catch compatibility regressions. This does require 369you only support Python 2.7 and Python 3.4 or newer as that is Pylint's 370minimum Python version support. 371 372 373Check which dependencies block your transition 374---------------------------------------------- 375 376**After** you have made your code compatible with Python 3 you should begin to 377care about whether your dependencies have also been ported. The caniusepython3_ 378project was created to help you determine which projects 379-- directly or indirectly -- are blocking you from supporting Python 3. There 380is both a command-line tool as well as a web interface at 381https://caniusepython3.com. 382 383The project also provides code which you can integrate into your test suite so 384that you will have a failing test when you no longer have dependencies blocking 385you from using Python 3. This allows you to avoid having to manually check your 386dependencies and to be notified quickly when you can start running on Python 3. 387 388 389Update your ``setup.py`` file to denote Python 3 compatibility 390-------------------------------------------------------------- 391 392Once your code works under Python 3, you should update the classifiers in 393your ``setup.py`` to contain ``Programming Language :: Python :: 3`` and to not 394specify sole Python 2 support. This will tell anyone using your code that you 395support Python 2 **and** 3. Ideally you will also want to add classifiers for 396each major/minor version of Python you now support. 397 398 399Use continuous integration to stay compatible 400--------------------------------------------- 401 402Once you are able to fully run under Python 3 you will want to make sure your 403code always works under both Python 2 & 3. Probably the best tool for running 404your tests under multiple Python interpreters is tox_. You can then integrate 405tox with your continuous integration system so that you never accidentally break 406Python 2 or 3 support. 407 408You may also want to use the ``-bb`` flag with the Python 3 interpreter to 409trigger an exception when you are comparing bytes to strings or bytes to an int 410(the latter is available starting in Python 3.5). By default type-differing 411comparisons simply return ``False``, but if you made a mistake in your 412separation of text/binary data handling or indexing on bytes you wouldn't easily 413find the mistake. This flag will raise an exception when these kinds of 414comparisons occur, making the mistake much easier to track down. 415 416And that's mostly it! At this point your code base is compatible with both 417Python 2 and 3 simultaneously. Your testing will also be set up so that you 418don't accidentally break Python 2 or 3 compatibility regardless of which version 419you typically run your tests under while developing. 420 421 422Consider using optional static type checking 423-------------------------------------------- 424 425Another way to help port your code is to use a static type checker like 426mypy_ or pytype_ on your code. These tools can be used to analyze your code as 427if it's being run under Python 2, then you can run the tool a second time as if 428your code is running under Python 3. By running a static type checker twice like 429this you can discover if you're e.g. misusing binary data type in one version 430of Python compared to another. If you add optional type hints to your code you 431can also explicitly state whether your APIs use textual or binary data, helping 432to make sure everything functions as expected in both versions of Python. 433 434 435.. _caniusepython3: https://pypi.org/project/caniusepython3 436.. _cheat sheet: http://python-future.org/compatible_idioms.html 437.. _coverage.py: https://pypi.org/project/coverage 438.. _Futurize: http://python-future.org/automatic_conversion.html 439.. _importlib2: https://pypi.org/project/importlib2 440.. _Modernize: https://python-modernize.readthedocs.io/ 441.. _mypy: http://mypy-lang.org/ 442.. _Porting to Python 3: http://python3porting.com/ 443.. _Pylint: https://pypi.org/project/pylint 444 445.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.io/en/latest/python3/questions_and_answers.html 446 447.. _pytype: https://github.com/google/pytype 448.. _python-future: http://python-future.org/ 449.. _python-porting: https://mail.python.org/pipermail/python-porting/ 450.. _six: https://pypi.org/project/six 451.. _tox: https://pypi.org/project/tox 452.. _trove classifier: https://pypi.org/classifiers 453 454.. _Why Python 3 exists: https://snarky.ca/why-python-3-exists 455