1.. _sortinghowto: 2 3Sorting HOW TO 4************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7:Release: 0.1 8 9 10Python lists have a built-in :meth:`list.sort` method that modifies the list 11in-place. There is also a :func:`sorted` built-in function that builds a new 12sorted list from an iterable. 13 14In this document, we explore the various techniques for sorting data using Python. 15 16 17Sorting Basics 18============== 19 20A simple ascending sort is very easy: just call the :func:`sorted` function. It 21returns a new sorted list: 22 23.. doctest:: 24 25 >>> sorted([5, 2, 3, 1, 4]) 26 [1, 2, 3, 4, 5] 27 28You can also use the :meth:`list.sort` method. It modifies the list 29in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 30than :func:`sorted` - but if you don't need the original list, it's slightly 31more efficient. 32 33.. doctest:: 34 35 >>> a = [5, 2, 3, 1, 4] 36 >>> a.sort() 37 >>> a 38 [1, 2, 3, 4, 5] 39 40Another difference is that the :meth:`list.sort` method is only defined for 41lists. In contrast, the :func:`sorted` function accepts any iterable. 42 43.. doctest:: 44 45 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 46 [1, 2, 3, 4, 5] 47 48Key Functions 49============= 50 51Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 52function (or other callable) to be called on each list element prior to making 53comparisons. 54 55For example, here's a case-insensitive string comparison: 56 57.. doctest:: 58 59 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 60 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 61 62The value of the *key* parameter should be a function (or other callable) that 63takes a single argument and returns a key to use for sorting purposes. This 64technique is fast because the key function is called exactly once for each 65input record. 66 67A common pattern is to sort complex objects using some of the object's indices 68as keys. For example: 69 70.. doctest:: 71 72 >>> student_tuples = [ 73 ... ('john', 'A', 15), 74 ... ('jane', 'B', 12), 75 ... ('dave', 'B', 10), 76 ... ] 77 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 78 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 79 80The same technique works for objects with named attributes. For example: 81 82.. doctest:: 83 84 >>> class Student: 85 ... def __init__(self, name, grade, age): 86 ... self.name = name 87 ... self.grade = grade 88 ... self.age = age 89 ... def __repr__(self): 90 ... return repr((self.name, self.grade, self.age)) 91 92 >>> student_objects = [ 93 ... Student('john', 'A', 15), 94 ... Student('jane', 'B', 12), 95 ... Student('dave', 'B', 10), 96 ... ] 97 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 98 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 99 100Operator Module Functions 101========================= 102 103The key-function patterns shown above are very common, so Python provides 104convenience functions to make accessor functions easier and faster. The 105:mod:`operator` module has :func:`~operator.itemgetter`, 106:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 107 108Using those functions, the above examples become simpler and faster: 109 110.. doctest:: 111 112 >>> from operator import itemgetter, attrgetter 113 114 >>> sorted(student_tuples, key=itemgetter(2)) 115 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 116 117 >>> sorted(student_objects, key=attrgetter('age')) 118 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 119 120The operator module functions allow multiple levels of sorting. For example, to 121sort by *grade* then by *age*: 122 123.. doctest:: 124 125 >>> sorted(student_tuples, key=itemgetter(1,2)) 126 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 127 128 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 129 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 130 131Ascending and Descending 132======================== 133 134Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 135boolean value. This is used to flag descending sorts. For example, to get the 136student data in reverse *age* order: 137 138.. doctest:: 139 140 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 141 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 142 143 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 144 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 145 146Sort Stability and Complex Sorts 147================================ 148 149Sorts are guaranteed to be `stable 150<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 151when multiple records have the same key, their original order is preserved. 152 153.. doctest:: 154 155 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 156 >>> sorted(data, key=itemgetter(0)) 157 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 158 159Notice how the two records for *blue* retain their original order so that 160``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 161 162This wonderful property lets you build complex sorts in a series of sorting 163steps. For example, to sort the student data by descending *grade* and then 164ascending *age*, do the *age* sort first and then sort again using *grade*: 165 166.. doctest:: 167 168 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 169 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 170 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 171 172This can be abstracted out into a wrapper function that can take a list and 173tuples of field and order to sort them on multiple passes. 174 175.. doctest:: 176 177 >>> def multisort(xs, specs): 178 ... for key, reverse in reversed(specs): 179 ... xs.sort(key=attrgetter(key), reverse=reverse) 180 ... return xs 181 182 >>> multisort(list(student_objects), (('grade', True), ('age', False))) 183 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 184 185The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 186does multiple sorts efficiently because it can take advantage of any ordering 187already present in a dataset. 188 189The Old Way Using Decorate-Sort-Undecorate 190========================================== 191 192This idiom is called Decorate-Sort-Undecorate after its three steps: 193 194* First, the initial list is decorated with new values that control the sort order. 195 196* Second, the decorated list is sorted. 197 198* Finally, the decorations are removed, creating a list that contains only the 199 initial values in the new order. 200 201For example, to sort the student data by *grade* using the DSU approach: 202 203 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 204 >>> decorated.sort() 205 >>> [student for grade, i, student in decorated] # undecorate 206 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 207 208This idiom works because tuples are compared lexicographically; the first items 209are compared; if they are the same then the second items are compared, and so 210on. 211 212It is not strictly necessary in all cases to include the index *i* in the 213decorated list, but including it gives two benefits: 214 215* The sort is stable -- if two items have the same key, their order will be 216 preserved in the sorted list. 217 218* The original items do not have to be comparable because the ordering of the 219 decorated tuples will be determined by at most the first two items. So for 220 example the original list could contain complex numbers which cannot be sorted 221 directly. 222 223Another name for this idiom is 224`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 225after Randal L. Schwartz, who popularized it among Perl programmers. 226 227Now that Python sorting provides key-functions, this technique is not often needed. 228 229 230The Old Way Using the *cmp* Parameter 231===================================== 232 233Many constructs given in this HOWTO assume Python 2.4 or later. Before that, 234there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword 235arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to 236handle user specified comparison functions. 237 238In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to 239simplify and unify the language, eliminating the conflict between rich 240comparisons and the :meth:`__cmp__` magic method). 241 242In Py2.x, sort allowed an optional function which can be called for doing the 243comparisons. That function should take two arguments to be compared and then 244return a negative value for less-than, return zero if they are equal, or return 245a positive value for greater-than. For example, we can do: 246 247.. doctest:: 248 249 >>> def numeric_compare(x, y): 250 ... return x - y 251 >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP 252 [1, 2, 3, 4, 5] 253 254Or you can reverse the order of comparison with: 255 256.. doctest:: 257 258 >>> def reverse_numeric(x, y): 259 ... return y - x 260 >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP 261 [5, 4, 3, 2, 1] 262 263When porting code from Python 2.x to 3.x, the situation can arise when you have 264the user supplying a comparison function and you need to convert that to a key 265function. The following wrapper makes that easy to do: 266 267.. testcode:: 268 269 def cmp_to_key(mycmp): 270 'Convert a cmp= function into a key= function' 271 class K: 272 def __init__(self, obj, *args): 273 self.obj = obj 274 def __lt__(self, other): 275 return mycmp(self.obj, other.obj) < 0 276 def __gt__(self, other): 277 return mycmp(self.obj, other.obj) > 0 278 def __eq__(self, other): 279 return mycmp(self.obj, other.obj) == 0 280 def __le__(self, other): 281 return mycmp(self.obj, other.obj) <= 0 282 def __ge__(self, other): 283 return mycmp(self.obj, other.obj) >= 0 284 def __ne__(self, other): 285 return mycmp(self.obj, other.obj) != 0 286 return K 287 288.. doctest:: 289 :hide: 290 291 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 292 [5, 4, 3, 2, 1] 293 294To convert to a key function, just wrap the old comparison function: 295 296.. testsetup:: 297 298 from functools import cmp_to_key 299 300.. doctest:: 301 302 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 303 [5, 4, 3, 2, 1] 304 305In Python 3.2, the :func:`functools.cmp_to_key` function was added to the 306:mod:`functools` module in the standard library. 307 308Odd and Ends 309============ 310 311* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 312 :func:`locale.strcoll` for a comparison function. 313 314* The *reverse* parameter still maintains sort stability (so that records with 315 equal keys retain the original order). Interestingly, that effect can be 316 simulated without the parameter by using the builtin :func:`reversed` function 317 twice: 318 319 .. doctest:: 320 321 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 322 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 323 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 324 >>> assert standard_way == double_reversed 325 >>> standard_way 326 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 327 328* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons 329 between two objects. So, it is easy to add a standard sort order to a class by 330 defining an :meth:`__lt__` method: 331 332 .. doctest:: 333 334 >>> Student.__lt__ = lambda self, other: self.age < other.age 335 >>> sorted(student_objects) 336 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 337 338* Key functions need not depend directly on the objects being sorted. A key 339 function can also access external resources. For instance, if the student grades 340 are stored in a dictionary, they can be used to sort a separate list of student 341 names: 342 343 .. doctest:: 344 345 >>> students = ['dave', 'john', 'jane'] 346 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 347 >>> sorted(students, key=newgrades.__getitem__) 348 ['jane', 'dave', 'john'] 349