• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1PyTorch Design Philosophy
2=========================
3
4This document is designed to help contributors and module maintainers
5understand the high-level design principles that have developed over
6time in PyTorch. These are not meant to be hard-and-fast rules, but to
7serve as a guide to help trade off different concerns and to resolve
8disagreements that may come up while developing PyTorch. For more
9information on contributing, module maintainership, and how to escalate a
10disagreement to the Core Maintainers, please see `PyTorch
11Governance <https://pytorch.org/docs/main/community/governance.html>`__.
12
13Design Principles
14-----------------
15
16Principle 1: Usability over Performance
17~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18
19This principle may be surprising! As one Hacker News poster wrote:
20*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be
21not obsessed with speed/performance?* See `Hacker News discussion on
22PyTorch <https://news.ycombinator.com/item?id=28066093>`__.
23
24Soumith’s blog post on `Growing the PyTorch
25Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__
26goes into this in some depth, but at a high-level:
27
28-  PyTorch’s primary goal is usability
29-  A secondary goal is to have *reasonable* performance
30
31We believe the ability to maintain our flexibility to support
32researchers who are building on top of our abstractions remains
33critical. We can’t see what the future of what workloads will be, but we
34know we want them to be built first on PyTorch and that requires
35flexibility.
36
37In more concrete terms, we operate in a *usability-first* manner and try
38to avoid jumping to *restriction-first* regimes (for example, static shapes,
39graph-mode only) without a clear-eyed view of the tradeoffs. Often there
40is a temptation to impose strict user restrictions upfront because it
41can simplify implementation, but this comes with risks:
42
43-  The performance may not be worth the user friction, either because
44   the performance benefit is not compelling enough or it only applies to
45   a relatively narrow set of subproblems.
46-  Even if the performance benefit is compelling, the restrictions can
47   fragment the ecosystem into different sets of limitations that can
48   quickly become incomprehensible to users.
49
50We want users to be able to seamlessly move their PyTorch code to
51different hardware and software platforms, to interoperate with
52different libraries and frameworks, and to experience the full richness
53of the PyTorch user experience, not a least common denominator subset.
54
55Principle 2: Simple Over Easy
56~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57
58Here, we borrow from `The Zen of
59Python <https://peps.python.org/pep-0020/>`__:
60
61-  *Explicit is better than implicit*
62-  *Simple is better than complex*
63
64A more concise way of describing these two goals is `Simple Over
65Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are
66often used interchangeably in everyday English. Consider how one may
67model `devices <https://pytorch.org/docs/main/tensor_attributes.html#torch.device>`__
68in PyTorch:
69
70-  **Simple / Explicit (to understand, debug):** every tensor is associated
71   with a device. The user explicitly specifies tensor device movement.
72   Operations that require cross-device movement result in an error.
73-  **Easy / Implicit (to use):** the user does not have to worry about
74   devices; the system figures out the globally optimal device
75   placement.
76
77In this specific case, and as a general design philosophy, PyTorch
78favors exposing simple and explicit building blocks rather than APIs
79that are easy-to-use by practitioners. The simple version is immediately
80understandable and debuggable by a new PyTorch user: you get a clear
81error if you call an operator requiring cross-device movement at the
82point in the program where the operator is actually invoked. The easy
83solution may let a new user move faster initially, but debugging such a
84system can be complex: How did the system make its determination? What
85is the API for plugging into such a system and how are objects
86represented in its IR?
87
88Some classic arguments in favor of this sort of design come from `A
89Note on Distributed
90Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not
91model resources with very different performance characteristics
92uniformly, the details will leak) and the `End-to-End
93Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__
94(TLDR: building smarts into the lower-layers of the stack can prevent
95building performant features at higher layers in the stack, and often
96doesn’t work anyway). For example, we could build operator-level or
97global device movement rules, but the precise choices aren’t obvious and
98building an extensible mechanism has unavoidable complexity and latency
99costs.
100
101A caveat here is that this does not mean that higher-level “easy” APIs
102are not valuable; certainly there is a value in, for example,
103higher-levels in the stack to support efficient tensor computations
104across heterogeneous compute in a large cluster. Instead, what we mean
105is that focusing on simple lower-level building blocks helps inform the
106easy API while still maintaining a good experience when users need to
107leave the beaten path. It also allows space for innovation and the
108growth of more opinionated tools at a rate we cannot support in the
109PyTorch core library, but ultimately benefit from, as evidenced by
110our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other
111words, not automating at the start allows us to potentially reach levels
112of good automation faster.
113
114Principle 3: Python First with Best In Class Language Interoperability
115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116
117This principle began as **Python First**:
118
119  PyTorch is not a Python binding into a monolithic C++ framework.
120  It is built to be deeply integrated into Python. You can use it
121  naturally like you would use `NumPy <https://www.numpy.org/>`__,
122  `SciPy <https://www.scipy.org/>`__, `scikit-learn <https://scikit-learn.org/>`__,
123  or other Python libraries. You can write your new neural network
124  layers in Python itself, using your favorite libraries and use
125  packages such as `Cython <https://cython.org/>`__ and
126  `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent
127  the wheel where appropriate.
128
129One thing PyTorch has needed to deal with over the years is Python
130overhead: we first rewrote the `autograd` engine in C++, then the majority
131of operator definitions, then developed TorchScript and the C++
132frontend.
133
134Still, working in Python provides easily the best experience for our
135users: it is flexible, familiar, and perhaps most importantly, has a
136huge ecosystem of scientific computing libraries and extensions
137available for use. This fact motivates a few of our most recent
138contributions, which attempt to hit a Pareto optimal point close to the
139Python usability end of the curve:
140
141-  `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__,
142   a Python frame evaluation tool capable of speeding up existing
143   eager-mode PyTorch programs with minimal user intervention.
144-  `torch_function <https://pytorch.org/docs/main/notes/extending.html#extending-torch>`__
145   and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__
146   extension points, which have enabled Python-first functionality to be
147   built on-top of C++ internals, such as the `torch.fx
148   tracer <https://pytorch.org/docs/stable/fx.html>`__
149   and `functorch <https://github.com/pytorch/functorch>`__
150   respectively.
151
152These design principles are not hard-and-fast rules, but hard won
153choices and anchor how we built PyTorch to be the debuggable, hackable
154and flexible framework it is today. As we have more contributors and
155maintainers, we look forward to applying these core principles with you
156across our libraries and ecosystem. We are also open to evolving them as
157we learn new things and the AI space evolves, as we know it will.
158