1.. _sortinghowto: 2 3Sorting Techniques 4****************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7 8 9Python lists have a built-in :meth:`list.sort` method that modifies the list 10in-place. There is also a :func:`sorted` built-in function that builds a new 11sorted list from an iterable. 12 13In this document, we explore the various techniques for sorting data using Python. 14 15 16Sorting Basics 17============== 18 19A simple ascending sort is very easy: just call the :func:`sorted` function. It 20returns a new sorted list: 21 22.. doctest:: 23 24 >>> sorted([5, 2, 3, 1, 4]) 25 [1, 2, 3, 4, 5] 26 27You can also use the :meth:`list.sort` method. It modifies the list 28in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 29than :func:`sorted` - but if you don't need the original list, it's slightly 30more efficient. 31 32.. doctest:: 33 34 >>> a = [5, 2, 3, 1, 4] 35 >>> a.sort() 36 >>> a 37 [1, 2, 3, 4, 5] 38 39Another difference is that the :meth:`list.sort` method is only defined for 40lists. In contrast, the :func:`sorted` function accepts any iterable. 41 42.. doctest:: 43 44 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 45 [1, 2, 3, 4, 5] 46 47Key Functions 48============= 49 50Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 51function (or other callable) to be called on each list element prior to making 52comparisons. 53 54For example, here's a case-insensitive string comparison: 55 56.. doctest:: 57 58 >>> sorted("This is a test string from Andrew".split(), key=str.casefold) 59 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 60 61The value of the *key* parameter should be a function (or other callable) that 62takes a single argument and returns a key to use for sorting purposes. This 63technique is fast because the key function is called exactly once for each 64input record. 65 66A common pattern is to sort complex objects using some of the object's indices 67as keys. For example: 68 69.. doctest:: 70 71 >>> student_tuples = [ 72 ... ('john', 'A', 15), 73 ... ('jane', 'B', 12), 74 ... ('dave', 'B', 10), 75 ... ] 76 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 77 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 78 79The same technique works for objects with named attributes. For example: 80 81.. doctest:: 82 83 >>> class Student: 84 ... def __init__(self, name, grade, age): 85 ... self.name = name 86 ... self.grade = grade 87 ... self.age = age 88 ... def __repr__(self): 89 ... return repr((self.name, self.grade, self.age)) 90 91 >>> student_objects = [ 92 ... Student('john', 'A', 15), 93 ... Student('jane', 'B', 12), 94 ... Student('dave', 'B', 10), 95 ... ] 96 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 97 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 98 99Objects with named attributes can be made by a regular class as shown 100above, or they can be instances of :class:`~dataclasses.dataclass` or 101a :term:`named tuple`. 102 103Operator Module Functions and Partial Function Evaluation 104========================================================= 105 106The :term:`key function` patterns shown above are very common, so Python provides 107convenience functions to make accessor functions easier and faster. The 108:mod:`operator` module has :func:`~operator.itemgetter`, 109:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 110 111Using those functions, the above examples become simpler and faster: 112 113.. doctest:: 114 115 >>> from operator import itemgetter, attrgetter 116 117 >>> sorted(student_tuples, key=itemgetter(2)) 118 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 119 120 >>> sorted(student_objects, key=attrgetter('age')) 121 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 122 123The operator module functions allow multiple levels of sorting. For example, to 124sort by *grade* then by *age*: 125 126.. doctest:: 127 128 >>> sorted(student_tuples, key=itemgetter(1,2)) 129 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 130 131 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 132 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 133 134The :mod:`functools` module provides another helpful tool for making 135key-functions. The :func:`~functools.partial` function can reduce the 136`arity <https://en.wikipedia.org/wiki/Arity>`_ of a multi-argument 137function making it suitable for use as a key-function. 138 139.. doctest:: 140 141 >>> from functools import partial 142 >>> from unicodedata import normalize 143 144 >>> names = 'Zoë Åbjørn Núñez Élana Zeke Abe Nubia Eloise'.split() 145 146 >>> sorted(names, key=partial(normalize, 'NFD')) 147 ['Abe', 'Åbjørn', 'Eloise', 'Élana', 'Nubia', 'Núñez', 'Zeke', 'Zoë'] 148 149 >>> sorted(names, key=partial(normalize, 'NFC')) 150 ['Abe', 'Eloise', 'Nubia', 'Núñez', 'Zeke', 'Zoë', 'Åbjørn', 'Élana'] 151 152Ascending and Descending 153======================== 154 155Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 156boolean value. This is used to flag descending sorts. For example, to get the 157student data in reverse *age* order: 158 159.. doctest:: 160 161 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 162 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 163 164 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 165 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 166 167Sort Stability and Complex Sorts 168================================ 169 170Sorts are guaranteed to be `stable 171<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 172when multiple records have the same key, their original order is preserved. 173 174.. doctest:: 175 176 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 177 >>> sorted(data, key=itemgetter(0)) 178 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 179 180Notice how the two records for *blue* retain their original order so that 181``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 182 183This wonderful property lets you build complex sorts in a series of sorting 184steps. For example, to sort the student data by descending *grade* and then 185ascending *age*, do the *age* sort first and then sort again using *grade*: 186 187.. doctest:: 188 189 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 190 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 191 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 192 193This can be abstracted out into a wrapper function that can take a list and 194tuples of field and order to sort them on multiple passes. 195 196.. doctest:: 197 198 >>> def multisort(xs, specs): 199 ... for key, reverse in reversed(specs): 200 ... xs.sort(key=attrgetter(key), reverse=reverse) 201 ... return xs 202 203 >>> multisort(list(student_objects), (('grade', True), ('age', False))) 204 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 205 206The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 207does multiple sorts efficiently because it can take advantage of any ordering 208already present in a dataset. 209 210Decorate-Sort-Undecorate 211======================== 212 213This idiom is called Decorate-Sort-Undecorate after its three steps: 214 215* First, the initial list is decorated with new values that control the sort order. 216 217* Second, the decorated list is sorted. 218 219* Finally, the decorations are removed, creating a list that contains only the 220 initial values in the new order. 221 222For example, to sort the student data by *grade* using the DSU approach: 223 224.. doctest:: 225 226 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 227 >>> decorated.sort() 228 >>> [student for grade, i, student in decorated] # undecorate 229 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 230 231This idiom works because tuples are compared lexicographically; the first items 232are compared; if they are the same then the second items are compared, and so 233on. 234 235It is not strictly necessary in all cases to include the index *i* in the 236decorated list, but including it gives two benefits: 237 238* The sort is stable -- if two items have the same key, their order will be 239 preserved in the sorted list. 240 241* The original items do not have to be comparable because the ordering of the 242 decorated tuples will be determined by at most the first two items. So for 243 example the original list could contain complex numbers which cannot be sorted 244 directly. 245 246Another name for this idiom is 247`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 248after Randal L. Schwartz, who popularized it among Perl programmers. 249 250Now that Python sorting provides key-functions, this technique is not often needed. 251 252Comparison Functions 253==================== 254 255Unlike key functions that return an absolute value for sorting, a comparison 256function computes the relative ordering for two inputs. 257 258For example, a `balance scale 259<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_à_tabac_1850.JPG>`_ 260compares two samples giving a relative ordering: lighter, equal, or heavier. 261Likewise, a comparison function such as ``cmp(a, b)`` will return a negative 262value for less-than, zero if the inputs are equal, or a positive value for 263greater-than. 264 265It is common to encounter comparison functions when translating algorithms from 266other languages. Also, some libraries provide comparison functions as part of 267their API. For example, :func:`locale.strcoll` is a comparison function. 268 269To accommodate those situations, Python provides 270:class:`functools.cmp_to_key` to wrap the comparison function 271to make it usable as a key function:: 272 273 sorted(words, key=cmp_to_key(strcoll)) # locale-aware sort order 274 275Odds and Ends 276============= 277 278* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 279 :func:`locale.strcoll` for a comparison function. This is necessary 280 because "alphabetical" sort orderings can vary across cultures even 281 if the underlying alphabet is the same. 282 283* The *reverse* parameter still maintains sort stability (so that records with 284 equal keys retain the original order). Interestingly, that effect can be 285 simulated without the parameter by using the builtin :func:`reversed` function 286 twice: 287 288 .. doctest:: 289 290 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 291 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 292 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 293 >>> assert standard_way == double_reversed 294 >>> standard_way 295 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 296 297* The sort routines use ``<`` when making comparisons 298 between two objects. So, it is easy to add a standard sort order to a class by 299 defining an :meth:`~object.__lt__` method: 300 301 .. doctest:: 302 303 >>> Student.__lt__ = lambda self, other: self.age < other.age 304 >>> sorted(student_objects) 305 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 306 307 However, note that ``<`` can fall back to using :meth:`~object.__gt__` if 308 :meth:`~object.__lt__` is not implemented (see :func:`object.__lt__` 309 for details on the mechanics). To avoid surprises, :pep:`8` 310 recommends that all six comparison methods be implemented. 311 The :func:`~functools.total_ordering` decorator is provided to make that 312 task easier. 313 314* Key functions need not depend directly on the objects being sorted. A key 315 function can also access external resources. For instance, if the student grades 316 are stored in a dictionary, they can be used to sort a separate list of student 317 names: 318 319 .. doctest:: 320 321 >>> students = ['dave', 'john', 'jane'] 322 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 323 >>> sorted(students, key=newgrades.__getitem__) 324 ['jane', 'dave', 'john'] 325 326Partial Sorts 327============= 328 329Some applications require only some of the data to be ordered. The standard 330library provides several tools that do less work than a full sort: 331 332* :func:`min` and :func:`max` return the smallest and largest values, 333 respectively. These functions make a single pass over the input data and 334 require almost no auxiliary memory. 335 336* :func:`heapq.nsmallest` and :func:`heapq.nlargest` return 337 the *n* smallest and largest values, respectively. These functions 338 make a single pass over the data keeping only *n* elements in memory 339 at a time. For values of *n* that are small relative to the number of 340 inputs, these functions make far fewer comparisons than a full sort. 341 342* :func:`heapq.heappush` and :func:`heapq.heappop` create and maintain a 343 partially sorted arrangement of data that keeps the smallest element 344 at position ``0``. These functions are suitable for implementing 345 priority queues which are commonly used for task scheduling. 346