1PyTorch Design Philosophy 2========================= 3 4This document is designed to help contributors and module maintainers 5understand the high-level design principles that have developed over 6time in PyTorch. These are not meant to be hard-and-fast rules, but to 7serve as a guide to help trade off different concerns and to resolve 8disagreements that may come up while developing PyTorch. For more 9information on contributing, module maintainership, and how to escalate a 10disagreement to the Core Maintainers, please see `PyTorch 11Governance <https://pytorch.org/docs/main/community/governance.html>`__. 12 13Design Principles 14----------------- 15 16Principle 1: Usability over Performance 17~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 18 19This principle may be surprising! As one Hacker News poster wrote: 20*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be 21not obsessed with speed/performance?* See `Hacker News discussion on 22PyTorch <https://news.ycombinator.com/item?id=28066093>`__. 23 24Soumith’s blog post on `Growing the PyTorch 25Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__ 26goes into this in some depth, but at a high-level: 27 28- PyTorch’s primary goal is usability 29- A secondary goal is to have *reasonable* performance 30 31We believe the ability to maintain our flexibility to support 32researchers who are building on top of our abstractions remains 33critical. We can’t see what the future of what workloads will be, but we 34know we want them to be built first on PyTorch and that requires 35flexibility. 36 37In more concrete terms, we operate in a *usability-first* manner and try 38to avoid jumping to *restriction-first* regimes (for example, static shapes, 39graph-mode only) without a clear-eyed view of the tradeoffs. Often there 40is a temptation to impose strict user restrictions upfront because it 41can simplify implementation, but this comes with risks: 42 43- The performance may not be worth the user friction, either because 44 the performance benefit is not compelling enough or it only applies to 45 a relatively narrow set of subproblems. 46- Even if the performance benefit is compelling, the restrictions can 47 fragment the ecosystem into different sets of limitations that can 48 quickly become incomprehensible to users. 49 50We want users to be able to seamlessly move their PyTorch code to 51different hardware and software platforms, to interoperate with 52different libraries and frameworks, and to experience the full richness 53of the PyTorch user experience, not a least common denominator subset. 54 55Principle 2: Simple Over Easy 56~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 57 58Here, we borrow from `The Zen of 59Python <https://peps.python.org/pep-0020/>`__: 60 61- *Explicit is better than implicit* 62- *Simple is better than complex* 63 64A more concise way of describing these two goals is `Simple Over 65Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are 66often used interchangeably in everyday English. Consider how one may 67model `devices <https://pytorch.org/docs/main/tensor_attributes.html#torch.device>`__ 68in PyTorch: 69 70- **Simple / Explicit (to understand, debug):** every tensor is associated 71 with a device. The user explicitly specifies tensor device movement. 72 Operations that require cross-device movement result in an error. 73- **Easy / Implicit (to use):** the user does not have to worry about 74 devices; the system figures out the globally optimal device 75 placement. 76 77In this specific case, and as a general design philosophy, PyTorch 78favors exposing simple and explicit building blocks rather than APIs 79that are easy-to-use by practitioners. The simple version is immediately 80understandable and debuggable by a new PyTorch user: you get a clear 81error if you call an operator requiring cross-device movement at the 82point in the program where the operator is actually invoked. The easy 83solution may let a new user move faster initially, but debugging such a 84system can be complex: How did the system make its determination? What 85is the API for plugging into such a system and how are objects 86represented in its IR? 87 88Some classic arguments in favor of this sort of design come from `A 89Note on Distributed 90Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not 91model resources with very different performance characteristics 92uniformly, the details will leak) and the `End-to-End 93Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__ 94(TLDR: building smarts into the lower-layers of the stack can prevent 95building performant features at higher layers in the stack, and often 96doesn’t work anyway). For example, we could build operator-level or 97global device movement rules, but the precise choices aren’t obvious and 98building an extensible mechanism has unavoidable complexity and latency 99costs. 100 101A caveat here is that this does not mean that higher-level “easy” APIs 102are not valuable; certainly there is a value in, for example, 103higher-levels in the stack to support efficient tensor computations 104across heterogeneous compute in a large cluster. Instead, what we mean 105is that focusing on simple lower-level building blocks helps inform the 106easy API while still maintaining a good experience when users need to 107leave the beaten path. It also allows space for innovation and the 108growth of more opinionated tools at a rate we cannot support in the 109PyTorch core library, but ultimately benefit from, as evidenced by 110our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other 111words, not automating at the start allows us to potentially reach levels 112of good automation faster. 113 114Principle 3: Python First with Best In Class Language Interoperability 115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 116 117This principle began as **Python First**: 118 119 PyTorch is not a Python binding into a monolithic C++ framework. 120 It is built to be deeply integrated into Python. You can use it 121 naturally like you would use `NumPy <https://www.numpy.org/>`__, 122 `SciPy <https://www.scipy.org/>`__, `scikit-learn <https://scikit-learn.org/>`__, 123 or other Python libraries. You can write your new neural network 124 layers in Python itself, using your favorite libraries and use 125 packages such as `Cython <https://cython.org/>`__ and 126 `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent 127 the wheel where appropriate. 128 129One thing PyTorch has needed to deal with over the years is Python 130overhead: we first rewrote the `autograd` engine in C++, then the majority 131of operator definitions, then developed TorchScript and the C++ 132frontend. 133 134Still, working in Python provides easily the best experience for our 135users: it is flexible, familiar, and perhaps most importantly, has a 136huge ecosystem of scientific computing libraries and extensions 137available for use. This fact motivates a few of our most recent 138contributions, which attempt to hit a Pareto optimal point close to the 139Python usability end of the curve: 140 141- `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__, 142 a Python frame evaluation tool capable of speeding up existing 143 eager-mode PyTorch programs with minimal user intervention. 144- `torch_function <https://pytorch.org/docs/main/notes/extending.html#extending-torch>`__ 145 and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__ 146 extension points, which have enabled Python-first functionality to be 147 built on-top of C++ internals, such as the `torch.fx 148 tracer <https://pytorch.org/docs/stable/fx.html>`__ 149 and `functorch <https://github.com/pytorch/functorch>`__ 150 respectively. 151 152These design principles are not hard-and-fast rules, but hard won 153choices and anchor how we built PyTorch to be the debuggable, hackable 154and flexible framework it is today. As we have more contributors and 155maintainers, we look forward to applying these core principles with you 156across our libraries and ecosystem. We are also open to evolving them as 157we learn new things and the AI space evolves, as we know it will. 158