• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2:mod:`csv` --- CSV File Reading and Writing
3===========================================
4
5.. module:: csv
6   :synopsis: Write and read tabular data to and from delimited files.
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9
10.. versionadded:: 2.3
11
12.. index::
13   single: csv
14   pair: data; tabular
15
16The so-called CSV (Comma Separated Values) format is the most common import and
17export format for spreadsheets and databases.  There is no "CSV standard", so
18the format is operationally defined by the many applications which read and
19write it.  The lack of a standard means that subtle differences often exist in
20the data produced and consumed by different applications.  These differences can
21make it annoying to process CSV files from multiple sources.  Still, while the
22delimiters and quoting characters vary, the overall format is similar enough
23that it is possible to write a single module which can efficiently manipulate
24such data, hiding the details of reading and writing the data from the
25programmer.
26
27The :mod:`csv` module implements classes to read and write tabular data in CSV
28format.  It allows programmers to say, "write this data in the format preferred
29by Excel," or "read data from this file which was generated by Excel," without
30knowing the precise details of the CSV format used by Excel.  Programmers can
31also describe the CSV formats understood by other applications or define their
32own special-purpose CSV formats.
33
34The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
35write sequences.  Programmers can also read and write data in dictionary form
36using the :class:`DictReader` and :class:`DictWriter` classes.
37
38.. note::
39
40   This version of the :mod:`csv` module doesn't support Unicode input.  Also,
41   there are currently some issues regarding ASCII NUL characters.  Accordingly,
42   all input should be UTF-8 or printable ASCII to be safe; see the examples in
43   section :ref:`csv-examples`.
44
45
46.. seealso::
47
48   :pep:`305` - CSV File API
49      The Python Enhancement Proposal which proposed this addition to Python.
50
51
52.. _csv-contents:
53
54Module Contents
55---------------
56
57The :mod:`csv` module defines the following functions:
58
59
60.. function:: reader(csvfile, dialect='excel', **fmtparams)
61
62   Return a reader object which will iterate over lines in the given *csvfile*.
63   *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
64   string each time its :meth:`!next` method is called --- file objects and list
65   objects are both suitable.   If *csvfile* is a file object, it must be opened
66   with the 'b' flag on platforms where that makes a difference.  An optional
67   *dialect* parameter can be given which is used to define a set of parameters
68   specific to a particular CSV dialect.  It may be an instance of a subclass of
69   the :class:`Dialect` class or one of the strings returned by the
70   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
71   can be given to override individual formatting parameters in the current
72   dialect.  For full details about the dialect and formatting parameters, see
73   section :ref:`csv-fmt-params`.
74
75   Each row read from the csv file is returned as a list of strings.  No
76   automatic data type conversion is performed.
77
78   A short usage example::
79
80      >>> import csv
81      >>> with open('eggs.csv', 'rb') as csvfile:
82      ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
83      ...     for row in spamreader:
84      ...         print ', '.join(row)
85      Spam, Spam, Spam, Spam, Spam, Baked Beans
86      Spam, Lovely Spam, Wonderful Spam
87
88   .. versionchanged:: 2.5
89      The parser is now stricter with respect to multi-line quoted fields. Previously,
90      if a line ended within a quoted field without a terminating newline character, a
91      newline would be inserted into the returned field. This behavior caused problems
92      when reading files which contained carriage return characters within fields.
93      The behavior was changed to return the field without inserting newlines. As a
94      consequence, if newlines embedded within fields are important, the input should
95      be split into lines in a manner which preserves the newline characters.
96
97
98.. function:: writer(csvfile, dialect='excel', **fmtparams)
99
100   Return a writer object responsible for converting the user's data into delimited
101   strings on the given file-like object.  *csvfile* can be any object with a
102   :func:`write` method.  If *csvfile* is a file object, it must be opened with the
103   'b' flag on platforms where that makes a difference.  An optional *dialect*
104   parameter can be given which is used to define a set of parameters specific to a
105   particular CSV dialect.  It may be an instance of a subclass of the
106   :class:`Dialect` class or one of the strings returned by the
107   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
108   can be given to override individual formatting parameters in the current
109   dialect.  For full details about the dialect and formatting parameters, see
110   section :ref:`csv-fmt-params`. To make it
111   as easy as possible to interface with modules which implement the DB API, the
112   value :const:`None` is written as the empty string.  While this isn't a
113   reversible transformation, it makes it easier to dump SQL NULL data values to
114   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
115   Floats are stringified with :func:`repr` before being written.
116   All other non-string data are stringified with :func:`str` before being written.
117
118   A short usage example::
119
120      import csv
121      with open('eggs.csv', 'wb') as csvfile:
122          spamwriter = csv.writer(csvfile, delimiter=' ',
123                                  quotechar='|', quoting=csv.QUOTE_MINIMAL)
124          spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
125          spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
126
127
128.. function:: register_dialect(name[, dialect], **fmtparams)
129
130   Associate *dialect* with *name*.  *name* must be a string or Unicode object. The
131   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
132   by *fmtparams* keyword arguments, or both, with keyword arguments overriding
133   parameters of the dialect. For full details about the dialect and formatting
134   parameters, see section :ref:`csv-fmt-params`.
135
136
137.. function:: unregister_dialect(name)
138
139   Delete the dialect associated with *name* from the dialect registry.  An
140   :exc:`Error` is raised if *name* is not a registered dialect name.
141
142
143.. function:: get_dialect(name)
144
145   Return the dialect associated with *name*.  An :exc:`Error` is raised if *name*
146   is not a registered dialect name.
147
148   .. versionchanged:: 2.5
149      This function now returns an immutable :class:`Dialect`.  Previously an
150      instance of the requested dialect was returned.  Users could modify the
151      underlying class, changing the behavior of active readers and writers.
152
153.. function:: list_dialects()
154
155   Return the names of all registered dialects.
156
157
158.. function:: field_size_limit([new_limit])
159
160   Returns the current maximum field size allowed by the parser. If *new_limit* is
161   given, this becomes the new limit.
162
163   .. versionadded:: 2.5
164
165The :mod:`csv` module defines the following classes:
166
167
168.. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
169                      dialect='excel', *args, **kwds)
170
171   Create an object which operates like a regular reader but maps the
172   information read into a dict whose keys are given by the optional
173   *fieldnames* parameter.  The *fieldnames* parameter is a :ref:`sequence
174   <collections-abstract-base-classes>` whose elements are associated with the
175   fields of the input data in order. These elements become the keys of the
176   resulting dictionary.  If the *fieldnames* parameter is omitted, the values
177   in the first row of the file *f* will be used as the fieldnames.  If the
178   row read has more fields than the fieldnames sequence, the remaining data is
179   added as a sequence keyed by the value of *restkey*.  If the row read has
180   fewer fields than the fieldnames sequence, the remaining keys take the value
181   of the optional *restval* parameter.  Any other optional or keyword
182   arguments are passed to the underlying :class:`reader` instance.
183
184   A short usage example::
185
186       >>> import csv
187       >>> with open('names.csv') as csvfile:
188       ...     reader = csv.DictReader(csvfile)
189       ...     for row in reader:
190       ...         print(row['first_name'], row['last_name'])
191       ...
192       Baked Beans
193       Lovely Spam
194       Wonderful Spam
195
196
197.. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
198                      dialect='excel', *args, **kwds)
199
200   Create an object which operates like a regular writer but maps dictionaries
201   onto output rows.  The *fieldnames* parameter is a :ref:`sequence
202   <collections-abstract-base-classes>` of keys that identify the order in
203   which values in the dictionary passed to the :meth:`writerow` method are
204   written to the file *f*.  The optional *restval* parameter specifies the
205   value to be written if the dictionary is missing a key in *fieldnames*.  If
206   the dictionary passed to the :meth:`writerow` method contains a key not
207   found in *fieldnames*, the optional *extrasaction* parameter indicates what
208   action to take.  If it is set to ``'raise'`` a :exc:`ValueError` is raised.
209   If it is set to ``'ignore'``, extra values in the dictionary are ignored.
210   Any other optional or keyword arguments are passed to the underlying
211   :class:`writer` instance.
212
213   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
214   of the :class:`DictWriter` is not optional.  Since Python's :class:`dict`
215   objects are not ordered, there is not enough information available to deduce
216   the order in which the row should be written to the file *f*.
217
218   A short usage example::
219
220       import csv
221
222       with open('names.csv', 'w') as csvfile:
223           fieldnames = ['first_name', 'last_name']
224           writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
225
226           writer.writeheader()
227           writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
228           writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
229           writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
230
231
232.. class:: Dialect
233
234   The :class:`Dialect` class is a container class relied on primarily for its
235   attributes, which are used to define the parameters for a specific
236   :class:`reader` or :class:`writer` instance.
237
238
239.. class:: excel()
240
241   The :class:`excel` class defines the usual properties of an Excel-generated CSV
242   file.  It is registered with the dialect name ``'excel'``.
243
244
245.. class:: excel_tab()
246
247   The :class:`excel_tab` class defines the usual properties of an Excel-generated
248   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
249
250
251.. class:: Sniffer()
252
253   The :class:`Sniffer` class is used to deduce the format of a CSV file.
254
255   The :class:`Sniffer` class provides two methods:
256
257   .. method:: sniff(sample, delimiters=None)
258
259      Analyze the given *sample* and return a :class:`Dialect` subclass
260      reflecting the parameters found.  If the optional *delimiters* parameter
261      is given, it is interpreted as a string containing possible valid
262      delimiter characters.
263
264
265   .. method:: has_header(sample)
266
267      Analyze the sample text (presumed to be in CSV format) and return
268      :const:`True` if the first row appears to be a series of column headers.
269
270An example for :class:`Sniffer` use::
271
272   with open('example.csv', 'rb') as csvfile:
273       dialect = csv.Sniffer().sniff(csvfile.read(1024))
274       csvfile.seek(0)
275       reader = csv.reader(csvfile, dialect)
276       # ... process CSV file contents here ...
277
278
279The :mod:`csv` module defines the following constants:
280
281.. data:: QUOTE_ALL
282
283   Instructs :class:`writer` objects to quote all fields.
284
285
286.. data:: QUOTE_MINIMAL
287
288   Instructs :class:`writer` objects to only quote those fields which contain
289   special characters such as *delimiter*, *quotechar* or any of the characters in
290   *lineterminator*.
291
292
293.. data:: QUOTE_NONNUMERIC
294
295   Instructs :class:`writer` objects to quote all non-numeric fields.
296
297   Instructs the reader to convert all non-quoted fields to type *float*.
298
299
300.. data:: QUOTE_NONE
301
302   Instructs :class:`writer` objects to never quote fields.  When the current
303   *delimiter* occurs in output data it is preceded by the current *escapechar*
304   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
305   any characters that require escaping are encountered.
306
307   Instructs :class:`reader` to perform no special processing of quote characters.
308
309The :mod:`csv` module defines the following exception:
310
311
312.. exception:: Error
313
314   Raised by any of the functions when an error is detected.
315
316
317.. _csv-fmt-params:
318
319Dialects and Formatting Parameters
320----------------------------------
321
322To make it easier to specify the format of input and output records, specific
323formatting parameters are grouped together into dialects.  A dialect is a
324subclass of the :class:`Dialect` class having a set of specific methods and a
325single :meth:`validate` method.  When creating :class:`reader` or
326:class:`writer` objects, the programmer can specify a string or a subclass of
327the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
328of, the *dialect* parameter, the programmer can also specify individual
329formatting parameters, which have the same names as the attributes defined below
330for the :class:`Dialect` class.
331
332Dialects support the following attributes:
333
334
335.. attribute:: Dialect.delimiter
336
337   A one-character string used to separate fields.  It defaults to ``','``.
338
339
340.. attribute:: Dialect.doublequote
341
342   Controls how instances of *quotechar* appearing inside a field should
343   themselves be quoted.  When :const:`True`, the character is doubled. When
344   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
345   defaults to :const:`True`.
346
347   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
348   :exc:`Error` is raised if a *quotechar* is found in a field.
349
350
351.. attribute:: Dialect.escapechar
352
353   A one-character string used by the writer to escape the *delimiter* if *quoting*
354   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
355   :const:`False`. On reading, the *escapechar* removes any special meaning from
356   the following character. It defaults to :const:`None`, which disables escaping.
357
358
359.. attribute:: Dialect.lineterminator
360
361   The string used to terminate lines produced by the :class:`writer`. It defaults
362   to ``'\r\n'``.
363
364   .. note::
365
366      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
367      end-of-line, and ignores *lineterminator*. This behavior may change in the
368      future.
369
370
371.. attribute:: Dialect.quotechar
372
373   A one-character string used to quote fields containing special characters, such
374   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
375   defaults to ``'"'``.
376
377
378.. attribute:: Dialect.quoting
379
380   Controls when quotes should be generated by the writer and recognised by the
381   reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
382   :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
383
384
385.. attribute:: Dialect.skipinitialspace
386
387   When :const:`True`, whitespace immediately following the *delimiter* is ignored.
388   The default is :const:`False`.
389
390
391.. attribute:: Dialect.strict
392
393   When ``True``, raise exception :exc:`Error` on bad CSV input.
394   The default is ``False``.
395
396Reader Objects
397--------------
398
399Reader objects (:class:`DictReader` instances and objects returned by the
400:func:`reader` function) have the following public methods:
401
402
403.. method:: csvreader.next()
404
405   Return the next row of the reader's iterable object as a list, parsed according
406   to the current dialect.
407
408Reader objects have the following public attributes:
409
410
411.. attribute:: csvreader.dialect
412
413   A read-only description of the dialect in use by the parser.
414
415
416.. attribute:: csvreader.line_num
417
418   The number of lines read from the source iterator. This is not the same as the
419   number of records returned, as records can span multiple lines.
420
421   .. versionadded:: 2.5
422
423
424DictReader objects have the following public attribute:
425
426
427.. attribute:: csvreader.fieldnames
428
429   If not passed as a parameter when creating the object, this attribute is
430   initialized upon first access or when the first record is read from the
431   file.
432
433   .. versionchanged:: 2.6
434
435
436Writer Objects
437--------------
438
439:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
440the :func:`writer` function) have the following public methods.  A *row* must be
441a sequence of strings or numbers for :class:`Writer` objects and a dictionary
442mapping fieldnames to strings or numbers (by passing them through :func:`str`
443first) for :class:`DictWriter` objects.  Note that complex numbers are written
444out surrounded by parens. This may cause some problems for other programs which
445read CSV files (assuming they support complex numbers at all).
446
447
448.. method:: csvwriter.writerow(row)
449
450   Write the *row* parameter to the writer's file object, formatted according to
451   the current dialect.
452
453
454.. method:: csvwriter.writerows(rows)
455
456   Write all elements in *rows* (an iterable of *row* objects as described
457   above) to the writer's file object, formatted according to the current
458   dialect.
459
460Writer objects have the following public attribute:
461
462
463.. attribute:: csvwriter.dialect
464
465   A read-only description of the dialect in use by the writer.
466
467
468DictWriter objects have the following public method:
469
470
471.. method:: DictWriter.writeheader()
472
473   Write a row with the field names (as specified in the constructor).
474
475   .. versionadded:: 2.7
476
477
478.. _csv-examples:
479
480Examples
481--------
482
483The simplest example of reading a CSV file::
484
485   import csv
486   with open('some.csv', 'rb') as f:
487       reader = csv.reader(f)
488       for row in reader:
489           print row
490
491Reading a file with an alternate format::
492
493   import csv
494   with open('passwd', 'rb') as f:
495       reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
496       for row in reader:
497           print row
498
499The corresponding simplest possible writing example is::
500
501   import csv
502   with open('some.csv', 'wb') as f:
503       writer = csv.writer(f)
504       writer.writerows(someiterable)
505
506Registering a new dialect::
507
508   import csv
509   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
510   with open('passwd', 'rb') as f:
511       reader = csv.reader(f, 'unixpwd')
512
513A slightly more advanced use of the reader --- catching and reporting errors::
514
515   import csv, sys
516   filename = 'some.csv'
517   with open(filename, 'rb') as f:
518       reader = csv.reader(f)
519       try:
520           for row in reader:
521               print row
522       except csv.Error as e:
523           sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
524
525And while the module doesn't directly support parsing strings, it can easily be
526done::
527
528   import csv
529   for row in csv.reader(['one,two,three']):
530       print row
531
532The :mod:`csv` module doesn't directly support reading and writing Unicode, but
533it is 8-bit-clean save for some problems with ASCII NUL characters.  So you can
534write functions or classes that handle the encoding and decoding for you as long
535as you avoid encodings like UTF-16 that use NULs.  UTF-8 is recommended.
536
537:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
538to handle Unicode CSV data (a list of Unicode strings).  :func:`utf_8_encoder`
539is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
540a time.  The encoded strings are parsed by the CSV reader, and
541:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
542
543   import csv
544
545   def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
546       # csv.py doesn't do Unicode; encode temporarily as UTF-8:
547       csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
548                               dialect=dialect, **kwargs)
549       for row in csv_reader:
550           # decode UTF-8 back to Unicode, cell by cell:
551           yield [unicode(cell, 'utf-8') for cell in row]
552
553   def utf_8_encoder(unicode_csv_data):
554       for line in unicode_csv_data:
555           yield line.encode('utf-8')
556
557For all other encodings the following :class:`UnicodeReader` and
558:class:`UnicodeWriter` classes can be used. They take an additional *encoding*
559parameter in their constructor and make sure that the data passes the real
560reader or writer encoded as UTF-8::
561
562   import csv, codecs, cStringIO
563
564   class UTF8Recoder:
565       """
566       Iterator that reads an encoded stream and reencodes the input to UTF-8
567       """
568       def __init__(self, f, encoding):
569           self.reader = codecs.getreader(encoding)(f)
570
571       def __iter__(self):
572           return self
573
574       def next(self):
575           return self.reader.next().encode("utf-8")
576
577   class UnicodeReader:
578       """
579       A CSV reader which will iterate over lines in the CSV file "f",
580       which is encoded in the given encoding.
581       """
582
583       def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
584           f = UTF8Recoder(f, encoding)
585           self.reader = csv.reader(f, dialect=dialect, **kwds)
586
587       def next(self):
588           row = self.reader.next()
589           return [unicode(s, "utf-8") for s in row]
590
591       def __iter__(self):
592           return self
593
594   class UnicodeWriter:
595       """
596       A CSV writer which will write rows to CSV file "f",
597       which is encoded in the given encoding.
598       """
599
600       def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
601           # Redirect output to a queue
602           self.queue = cStringIO.StringIO()
603           self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
604           self.stream = f
605           self.encoder = codecs.getincrementalencoder(encoding)()
606
607       def writerow(self, row):
608           self.writer.writerow([s.encode("utf-8") for s in row])
609           # Fetch UTF-8 output from the queue ...
610           data = self.queue.getvalue()
611           data = data.decode("utf-8")
612           # ... and reencode it into the target encoding
613           data = self.encoder.encode(data)
614           # write to the target stream
615           self.stream.write(data)
616           # empty queue
617           self.queue.truncate(0)
618
619       def writerows(self, rows):
620           for row in rows:
621               self.writerow(row)
622
623