• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1.. _sortinghowto:
2
3Sorting HOW TO
4**************
5
6:Author: Andrew Dalke and Raymond Hettinger
7:Release: 0.1
8
9
10Python lists have a built-in :meth:`list.sort` method that modifies the list
11in-place.  There is also a :func:`sorted` built-in function that builds a new
12sorted list from an iterable.
13
14In this document, we explore the various techniques for sorting data using Python.
15
16
17Sorting Basics
18==============
19
20A simple ascending sort is very easy: just call the :func:`sorted` function. It
21returns a new sorted list:
22
23.. doctest::
24
25    >>> sorted([5, 2, 3, 1, 4])
26    [1, 2, 3, 4, 5]
27
28You can also use the :meth:`list.sort` method. It modifies the list
29in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
30than :func:`sorted` - but if you don't need the original list, it's slightly
31more efficient.
32
33.. doctest::
34
35    >>> a = [5, 2, 3, 1, 4]
36    >>> a.sort()
37    >>> a
38    [1, 2, 3, 4, 5]
39
40Another difference is that the :meth:`list.sort` method is only defined for
41lists. In contrast, the :func:`sorted` function accepts any iterable.
42
43.. doctest::
44
45    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
46    [1, 2, 3, 4, 5]
47
48Key Functions
49=============
50
51Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
52function (or other callable) to be called on each list element prior to making
53comparisons.
54
55For example, here's a case-insensitive string comparison:
56
57.. doctest::
58
59    >>> sorted("This is a test string from Andrew".split(), key=str.lower)
60    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
61
62The value of the *key* parameter should be a function (or other callable) that
63takes a single argument and returns a key to use for sorting purposes. This
64technique is fast because the key function is called exactly once for each
65input record.
66
67A common pattern is to sort complex objects using some of the object's indices
68as keys. For example:
69
70.. doctest::
71
72    >>> student_tuples = [
73    ...     ('john', 'A', 15),
74    ...     ('jane', 'B', 12),
75    ...     ('dave', 'B', 10),
76    ... ]
77    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
78    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
79
80The same technique works for objects with named attributes. For example:
81
82.. doctest::
83
84    >>> class Student:
85    ...     def __init__(self, name, grade, age):
86    ...         self.name = name
87    ...         self.grade = grade
88    ...         self.age = age
89    ...     def __repr__(self):
90    ...         return repr((self.name, self.grade, self.age))
91
92    >>> student_objects = [
93    ...     Student('john', 'A', 15),
94    ...     Student('jane', 'B', 12),
95    ...     Student('dave', 'B', 10),
96    ... ]
97    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
98    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
99
100Operator Module Functions
101=========================
102
103The key-function patterns shown above are very common, so Python provides
104convenience functions to make accessor functions easier and faster. The
105:mod:`operator` module has :func:`~operator.itemgetter`,
106:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
107
108Using those functions, the above examples become simpler and faster:
109
110.. doctest::
111
112    >>> from operator import itemgetter, attrgetter
113
114    >>> sorted(student_tuples, key=itemgetter(2))
115    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
116
117    >>> sorted(student_objects, key=attrgetter('age'))
118    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
119
120The operator module functions allow multiple levels of sorting. For example, to
121sort by *grade* then by *age*:
122
123.. doctest::
124
125    >>> sorted(student_tuples, key=itemgetter(1,2))
126    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
127
128    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
129    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
130
131Ascending and Descending
132========================
133
134Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
135boolean value. This is used to flag descending sorts. For example, to get the
136student data in reverse *age* order:
137
138.. doctest::
139
140    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
141    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
142
143    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
144    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
145
146Sort Stability and Complex Sorts
147================================
148
149Sorts are guaranteed to be `stable
150<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
151when multiple records have the same key, their original order is preserved.
152
153.. doctest::
154
155    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
156    >>> sorted(data, key=itemgetter(0))
157    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
158
159Notice how the two records for *blue* retain their original order so that
160``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
161
162This wonderful property lets you build complex sorts in a series of sorting
163steps. For example, to sort the student data by descending *grade* and then
164ascending *age*, do the *age* sort first and then sort again using *grade*:
165
166.. doctest::
167
168    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
169    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
170    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
171
172This can be abstracted out into a wrapper function that can take a list and
173tuples of field and order to sort them on multiple passes.
174
175.. doctest::
176
177    >>> def multisort(xs, specs):
178    ...     for key, reverse in reversed(specs):
179    ...         xs.sort(key=attrgetter(key), reverse=reverse)
180    ...     return xs
181
182    >>> multisort(list(student_objects), (('grade', True), ('age', False)))
183    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
184
185The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
186does multiple sorts efficiently because it can take advantage of any ordering
187already present in a dataset.
188
189The Old Way Using Decorate-Sort-Undecorate
190==========================================
191
192This idiom is called Decorate-Sort-Undecorate after its three steps:
193
194* First, the initial list is decorated with new values that control the sort order.
195
196* Second, the decorated list is sorted.
197
198* Finally, the decorations are removed, creating a list that contains only the
199  initial values in the new order.
200
201For example, to sort the student data by *grade* using the DSU approach:
202
203    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
204    >>> decorated.sort()
205    >>> [student for grade, i, student in decorated]               # undecorate
206    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
207
208This idiom works because tuples are compared lexicographically; the first items
209are compared; if they are the same then the second items are compared, and so
210on.
211
212It is not strictly necessary in all cases to include the index *i* in the
213decorated list, but including it gives two benefits:
214
215* The sort is stable -- if two items have the same key, their order will be
216  preserved in the sorted list.
217
218* The original items do not have to be comparable because the ordering of the
219  decorated tuples will be determined by at most the first two items. So for
220  example the original list could contain complex numbers which cannot be sorted
221  directly.
222
223Another name for this idiom is
224`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
225after Randal L. Schwartz, who popularized it among Perl programmers.
226
227Now that Python sorting provides key-functions, this technique is not often needed.
228
229
230The Old Way Using the *cmp* Parameter
231=====================================
232
233Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
234there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
235arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
236handle user specified comparison functions.
237
238In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
239simplify and unify the language, eliminating the conflict between rich
240comparisons and the :meth:`__cmp__` magic method).
241
242In Py2.x, sort allowed an optional function which can be called for doing the
243comparisons. That function should take two arguments to be compared and then
244return a negative value for less-than, return zero if they are equal, or return
245a positive value for greater-than. For example, we can do:
246
247.. doctest::
248
249    >>> def numeric_compare(x, y):
250    ...     return x - y
251    >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP
252    [1, 2, 3, 4, 5]
253
254Or you can reverse the order of comparison with:
255
256.. doctest::
257
258    >>> def reverse_numeric(x, y):
259    ...     return y - x
260    >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP
261    [5, 4, 3, 2, 1]
262
263When porting code from Python 2.x to 3.x, the situation can arise when you have
264the user supplying a comparison function and you need to convert that to a key
265function. The following wrapper makes that easy to do:
266
267.. testcode::
268
269    def cmp_to_key(mycmp):
270        'Convert a cmp= function into a key= function'
271        class K:
272            def __init__(self, obj, *args):
273                self.obj = obj
274            def __lt__(self, other):
275                return mycmp(self.obj, other.obj) < 0
276            def __gt__(self, other):
277                return mycmp(self.obj, other.obj) > 0
278            def __eq__(self, other):
279                return mycmp(self.obj, other.obj) == 0
280            def __le__(self, other):
281                return mycmp(self.obj, other.obj) <= 0
282            def __ge__(self, other):
283                return mycmp(self.obj, other.obj) >= 0
284            def __ne__(self, other):
285                return mycmp(self.obj, other.obj) != 0
286        return K
287
288.. doctest::
289    :hide:
290
291    >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
292    [5, 4, 3, 2, 1]
293
294To convert to a key function, just wrap the old comparison function:
295
296.. testsetup::
297
298    from functools import cmp_to_key
299
300.. doctest::
301
302    >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
303    [5, 4, 3, 2, 1]
304
305In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
306:mod:`functools` module in the standard library.
307
308Odd and Ends
309============
310
311* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
312  :func:`locale.strcoll` for a comparison function.
313
314* The *reverse* parameter still maintains sort stability (so that records with
315  equal keys retain the original order). Interestingly, that effect can be
316  simulated without the parameter by using the builtin :func:`reversed` function
317  twice:
318
319  .. doctest::
320
321    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
322    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
323    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
324    >>> assert standard_way == double_reversed
325    >>> standard_way
326    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
327
328* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
329  between two objects. So, it is easy to add a standard sort order to a class by
330  defining an :meth:`__lt__` method:
331
332  .. doctest::
333
334    >>> Student.__lt__ = lambda self, other: self.age < other.age
335    >>> sorted(student_objects)
336    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
337
338* Key functions need not depend directly on the objects being sorted. A key
339  function can also access external resources. For instance, if the student grades
340  are stored in a dictionary, they can be used to sort a separate list of student
341  names:
342
343  .. doctest::
344
345    >>> students = ['dave', 'john', 'jane']
346    >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
347    >>> sorted(students, key=newgrades.__getitem__)
348    ['jane', 'dave', 'john']
349