• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. _sortinghowto:
2
3Sorting Techniques
4******************
5
6:Author: Andrew Dalke and Raymond Hettinger
7
8
9Python lists have a built-in :meth:`list.sort` method that modifies the list
10in-place.  There is also a :func:`sorted` built-in function that builds a new
11sorted list from an iterable.
12
13In this document, we explore the various techniques for sorting data using Python.
14
15
16Sorting Basics
17==============
18
19A simple ascending sort is very easy: just call the :func:`sorted` function. It
20returns a new sorted list:
21
22.. doctest::
23
24    >>> sorted([5, 2, 3, 1, 4])
25    [1, 2, 3, 4, 5]
26
27You can also use the :meth:`list.sort` method. It modifies the list
28in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
29than :func:`sorted` - but if you don't need the original list, it's slightly
30more efficient.
31
32.. doctest::
33
34    >>> a = [5, 2, 3, 1, 4]
35    >>> a.sort()
36    >>> a
37    [1, 2, 3, 4, 5]
38
39Another difference is that the :meth:`list.sort` method is only defined for
40lists. In contrast, the :func:`sorted` function accepts any iterable.
41
42.. doctest::
43
44    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
45    [1, 2, 3, 4, 5]
46
47Key Functions
48=============
49
50Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
51function (or other callable) to be called on each list element prior to making
52comparisons.
53
54For example, here's a case-insensitive string comparison:
55
56.. doctest::
57
58    >>> sorted("This is a test string from Andrew".split(), key=str.casefold)
59    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
60
61The value of the *key* parameter should be a function (or other callable) that
62takes a single argument and returns a key to use for sorting purposes. This
63technique is fast because the key function is called exactly once for each
64input record.
65
66A common pattern is to sort complex objects using some of the object's indices
67as keys. For example:
68
69.. doctest::
70
71    >>> student_tuples = [
72    ...     ('john', 'A', 15),
73    ...     ('jane', 'B', 12),
74    ...     ('dave', 'B', 10),
75    ... ]
76    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
77    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
78
79The same technique works for objects with named attributes. For example:
80
81.. doctest::
82
83    >>> class Student:
84    ...     def __init__(self, name, grade, age):
85    ...         self.name = name
86    ...         self.grade = grade
87    ...         self.age = age
88    ...     def __repr__(self):
89    ...         return repr((self.name, self.grade, self.age))
90
91    >>> student_objects = [
92    ...     Student('john', 'A', 15),
93    ...     Student('jane', 'B', 12),
94    ...     Student('dave', 'B', 10),
95    ... ]
96    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
97    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
98
99Objects with named attributes can be made by a regular class as shown
100above, or they can be instances of :class:`~dataclasses.dataclass` or
101a :term:`named tuple`.
102
103Operator Module Functions and Partial Function Evaluation
104=========================================================
105
106The :term:`key function` patterns shown above are very common, so Python provides
107convenience functions to make accessor functions easier and faster. The
108:mod:`operator` module has :func:`~operator.itemgetter`,
109:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
110
111Using those functions, the above examples become simpler and faster:
112
113.. doctest::
114
115    >>> from operator import itemgetter, attrgetter
116
117    >>> sorted(student_tuples, key=itemgetter(2))
118    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
119
120    >>> sorted(student_objects, key=attrgetter('age'))
121    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
122
123The operator module functions allow multiple levels of sorting. For example, to
124sort by *grade* then by *age*:
125
126.. doctest::
127
128    >>> sorted(student_tuples, key=itemgetter(1,2))
129    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
130
131    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
132    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
133
134The :mod:`functools` module provides another helpful tool for making
135key-functions.  The :func:`~functools.partial` function can reduce the
136`arity <https://en.wikipedia.org/wiki/Arity>`_ of a multi-argument
137function making it suitable for use as a key-function.
138
139.. doctest::
140
141    >>> from functools import partial
142    >>> from unicodedata import normalize
143
144    >>> names = 'Zoë Åbjørn Núñez Élana Zeke Abe Nubia Eloise'.split()
145
146    >>> sorted(names, key=partial(normalize, 'NFD'))
147    ['Abe', 'Åbjørn', 'Eloise', 'Élana', 'Nubia', 'Núñez', 'Zeke', 'Zoë']
148
149    >>> sorted(names, key=partial(normalize, 'NFC'))
150    ['Abe', 'Eloise', 'Nubia', 'Núñez', 'Zeke', 'Zoë', 'Åbjørn', 'Élana']
151
152Ascending and Descending
153========================
154
155Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
156boolean value. This is used to flag descending sorts. For example, to get the
157student data in reverse *age* order:
158
159.. doctest::
160
161    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
162    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
163
164    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
165    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
166
167Sort Stability and Complex Sorts
168================================
169
170Sorts are guaranteed to be `stable
171<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
172when multiple records have the same key, their original order is preserved.
173
174.. doctest::
175
176    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
177    >>> sorted(data, key=itemgetter(0))
178    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
179
180Notice how the two records for *blue* retain their original order so that
181``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
182
183This wonderful property lets you build complex sorts in a series of sorting
184steps. For example, to sort the student data by descending *grade* and then
185ascending *age*, do the *age* sort first and then sort again using *grade*:
186
187.. doctest::
188
189    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
190    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
191    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
192
193This can be abstracted out into a wrapper function that can take a list and
194tuples of field and order to sort them on multiple passes.
195
196.. doctest::
197
198    >>> def multisort(xs, specs):
199    ...     for key, reverse in reversed(specs):
200    ...         xs.sort(key=attrgetter(key), reverse=reverse)
201    ...     return xs
202
203    >>> multisort(list(student_objects), (('grade', True), ('age', False)))
204    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
205
206The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
207does multiple sorts efficiently because it can take advantage of any ordering
208already present in a dataset.
209
210Decorate-Sort-Undecorate
211========================
212
213This idiom is called Decorate-Sort-Undecorate after its three steps:
214
215* First, the initial list is decorated with new values that control the sort order.
216
217* Second, the decorated list is sorted.
218
219* Finally, the decorations are removed, creating a list that contains only the
220  initial values in the new order.
221
222For example, to sort the student data by *grade* using the DSU approach:
223
224.. doctest::
225
226    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
227    >>> decorated.sort()
228    >>> [student for grade, i, student in decorated]               # undecorate
229    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
230
231This idiom works because tuples are compared lexicographically; the first items
232are compared; if they are the same then the second items are compared, and so
233on.
234
235It is not strictly necessary in all cases to include the index *i* in the
236decorated list, but including it gives two benefits:
237
238* The sort is stable -- if two items have the same key, their order will be
239  preserved in the sorted list.
240
241* The original items do not have to be comparable because the ordering of the
242  decorated tuples will be determined by at most the first two items. So for
243  example the original list could contain complex numbers which cannot be sorted
244  directly.
245
246Another name for this idiom is
247`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
248after Randal L. Schwartz, who popularized it among Perl programmers.
249
250Now that Python sorting provides key-functions, this technique is not often needed.
251
252Comparison Functions
253====================
254
255Unlike key functions that return an absolute value for sorting, a comparison
256function computes the relative ordering for two inputs.
257
258For example, a `balance scale
259<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_à_tabac_1850.JPG>`_
260compares two samples giving a relative ordering: lighter, equal, or heavier.
261Likewise, a comparison function such as ``cmp(a, b)`` will return a negative
262value for less-than, zero if the inputs are equal, or a positive value for
263greater-than.
264
265It is common to encounter comparison functions when translating algorithms from
266other languages.  Also, some libraries provide comparison functions as part of
267their API.  For example, :func:`locale.strcoll` is a comparison function.
268
269To accommodate those situations, Python provides
270:class:`functools.cmp_to_key` to wrap the comparison function
271to make it usable as a key function::
272
273    sorted(words, key=cmp_to_key(strcoll))  # locale-aware sort order
274
275Odds and Ends
276=============
277
278* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
279  :func:`locale.strcoll` for a comparison function.  This is necessary
280  because "alphabetical" sort orderings can vary across cultures even
281  if the underlying alphabet is the same.
282
283* The *reverse* parameter still maintains sort stability (so that records with
284  equal keys retain the original order). Interestingly, that effect can be
285  simulated without the parameter by using the builtin :func:`reversed` function
286  twice:
287
288  .. doctest::
289
290    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
291    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
292    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
293    >>> assert standard_way == double_reversed
294    >>> standard_way
295    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
296
297* The sort routines use ``<`` when making comparisons
298  between two objects. So, it is easy to add a standard sort order to a class by
299  defining an :meth:`~object.__lt__` method:
300
301  .. doctest::
302
303    >>> Student.__lt__ = lambda self, other: self.age < other.age
304    >>> sorted(student_objects)
305    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
306
307  However, note that ``<`` can fall back to using :meth:`~object.__gt__` if
308  :meth:`~object.__lt__` is not implemented (see :func:`object.__lt__`
309  for details on the mechanics).  To avoid surprises, :pep:`8`
310  recommends that all six comparison methods be implemented.
311  The :func:`~functools.total_ordering` decorator is provided to make that
312  task easier.
313
314* Key functions need not depend directly on the objects being sorted. A key
315  function can also access external resources. For instance, if the student grades
316  are stored in a dictionary, they can be used to sort a separate list of student
317  names:
318
319  .. doctest::
320
321    >>> students = ['dave', 'john', 'jane']
322    >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
323    >>> sorted(students, key=newgrades.__getitem__)
324    ['jane', 'dave', 'john']
325
326Partial Sorts
327=============
328
329Some applications require only some of the data to be ordered.  The standard
330library provides several tools that do less work than a full sort:
331
332* :func:`min` and :func:`max` return the smallest and largest values,
333  respectively.  These functions make a single pass over the input data and
334  require almost no auxiliary memory.
335
336* :func:`heapq.nsmallest` and :func:`heapq.nlargest` return
337  the *n* smallest and largest values, respectively.  These functions
338  make a single pass over the data keeping only *n* elements in memory
339  at a time.  For values of *n* that are small relative to the number of
340  inputs, these functions make far fewer comparisons than a full sort.
341
342* :func:`heapq.heappush` and :func:`heapq.heappop` create and maintain a
343  partially sorted arrangement of data that keeps the smallest element
344  at position ``0``.  These functions are suitable for implementing
345  priority queues which are commonly used for task scheduling.
346