• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5   :synopsis: Write and read tabular data to and from delimited files.
6
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9**Source code:** :source:`Lib/csv.py`
10
11.. index::
12   single: csv
13   pair: data; tabular
14
15--------------
16
17The so-called CSV (Comma Separated Values) format is the most common import and
18export format for spreadsheets and databases.  CSV format was used for many
19years prior to attempts to describe the format in a standardized way in
20:rfc:`4180`.  The lack of a well-defined standard means that subtle differences
21often exist in the data produced and consumed by different applications.  These
22differences can make it annoying to process CSV files from multiple sources.
23Still, while the delimiters and quoting characters vary, the overall format is
24similar enough that it is possible to write a single module which can
25efficiently manipulate such data, hiding the details of reading and writing the
26data from the programmer.
27
28The :mod:`csv` module implements classes to read and write tabular data in CSV
29format.  It allows programmers to say, "write this data in the format preferred
30by Excel," or "read data from this file which was generated by Excel," without
31knowing the precise details of the CSV format used by Excel.  Programmers can
32also describe the CSV formats understood by other applications or define their
33own special-purpose CSV formats.
34
35The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
36write sequences.  Programmers can also read and write data in dictionary form
37using the :class:`DictReader` and :class:`DictWriter` classes.
38
39.. seealso::
40
41   :pep:`305` - CSV File API
42      The Python Enhancement Proposal which proposed this addition to Python.
43
44
45.. _csv-contents:
46
47Module Contents
48---------------
49
50The :mod:`csv` module defines the following functions:
51
52
53.. index::
54   single: universal newlines; csv.reader function
55
56.. function:: reader(csvfile, dialect='excel', **fmtparams)
57
58   Return a reader object which will iterate over lines in the given *csvfile*.
59   *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
60   string each time its :meth:`!__next__` method is called --- :term:`file objects
61   <file object>` and list objects are both suitable.   If *csvfile* is a file object,
62   it should be opened with ``newline=''``. [1]_  An optional
63   *dialect* parameter can be given which is used to define a set of parameters
64   specific to a particular CSV dialect.  It may be an instance of a subclass of
65   the :class:`Dialect` class or one of the strings returned by the
66   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
67   can be given to override individual formatting parameters in the current
68   dialect.  For full details about the dialect and formatting parameters, see
69   section :ref:`csv-fmt-params`.
70
71   Each row read from the csv file is returned as a list of strings.  No
72   automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
73   option is specified (in which case unquoted fields are transformed into floats).
74
75   A short usage example::
76
77      >>> import csv
78      >>> with open('eggs.csv', newline='') as csvfile:
79      ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
80      ...     for row in spamreader:
81      ...         print(', '.join(row))
82      Spam, Spam, Spam, Spam, Spam, Baked Beans
83      Spam, Lovely Spam, Wonderful Spam
84
85
86.. function:: writer(csvfile, dialect='excel', **fmtparams)
87
88   Return a writer object responsible for converting the user's data into delimited
89   strings on the given file-like object.  *csvfile* can be any object with a
90   :func:`write` method.  If *csvfile* is a file object, it should be opened with
91   ``newline=''`` [1]_.  An optional *dialect*
92   parameter can be given which is used to define a set of parameters specific to a
93   particular CSV dialect.  It may be an instance of a subclass of the
94   :class:`Dialect` class or one of the strings returned by the
95   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
96   can be given to override individual formatting parameters in the current
97   dialect.  For full details about dialects and formatting parameters, see
98   the :ref:`csv-fmt-params` section. To make it
99   as easy as possible to interface with modules which implement the DB API, the
100   value :const:`None` is written as the empty string.  While this isn't a
101   reversible transformation, it makes it easier to dump SQL NULL data values to
102   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
103   All other non-string data are stringified with :func:`str` before being written.
104
105   A short usage example::
106
107      import csv
108      with open('eggs.csv', 'w', newline='') as csvfile:
109          spamwriter = csv.writer(csvfile, delimiter=' ',
110                                  quotechar='|', quoting=csv.QUOTE_MINIMAL)
111          spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
112          spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
113
114
115.. function:: register_dialect(name[, dialect[, **fmtparams]])
116
117   Associate *dialect* with *name*.  *name* must be a string. The
118   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
119   by *fmtparams* keyword arguments, or both, with keyword arguments overriding
120   parameters of the dialect. For full details about dialects and formatting
121   parameters, see section :ref:`csv-fmt-params`.
122
123
124.. function:: unregister_dialect(name)
125
126   Delete the dialect associated with *name* from the dialect registry.  An
127   :exc:`Error` is raised if *name* is not a registered dialect name.
128
129
130.. function:: get_dialect(name)
131
132   Return the dialect associated with *name*.  An :exc:`Error` is raised if
133   *name* is not a registered dialect name.  This function returns an immutable
134   :class:`Dialect`.
135
136.. function:: list_dialects()
137
138   Return the names of all registered dialects.
139
140
141.. function:: field_size_limit([new_limit])
142
143   Returns the current maximum field size allowed by the parser. If *new_limit* is
144   given, this becomes the new limit.
145
146
147The :mod:`csv` module defines the following classes:
148
149.. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
150                      dialect='excel', *args, **kwds)
151
152   Create an object that operates like a regular reader but maps the
153   information in each row to a :class:`dict` whose keys are given by the
154   optional *fieldnames* parameter.
155
156   The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
157   omitted, the values in the first row of file *f* will be used as the
158   fieldnames.  Regardless of how the fieldnames are determined, the
159   dictionary preserves their original ordering.
160
161   If a row has more fields than fieldnames, the remaining data is put in a
162   list and stored with the fieldname specified by *restkey* (which defaults
163   to ``None``).  If a non-blank row has fewer fields than fieldnames, the
164   missing values are filled-in with the value of *restval* (which defaults
165   to ``None``).
166
167   All other optional or keyword arguments are passed to the underlying
168   :class:`reader` instance.
169
170   .. versionchanged:: 3.6
171      Returned rows are now of type :class:`OrderedDict`.
172
173   .. versionchanged:: 3.8
174      Returned rows are now of type :class:`dict`.
175
176   A short usage example::
177
178       >>> import csv
179       >>> with open('names.csv', newline='') as csvfile:
180       ...     reader = csv.DictReader(csvfile)
181       ...     for row in reader:
182       ...         print(row['first_name'], row['last_name'])
183       ...
184       Eric Idle
185       John Cleese
186
187       >>> print(row)
188       {'first_name': 'John', 'last_name': 'Cleese'}
189
190
191.. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
192                      dialect='excel', *args, **kwds)
193
194   Create an object which operates like a regular writer but maps dictionaries
195   onto output rows.  The *fieldnames* parameter is a :mod:`sequence
196   <collections.abc>` of keys that identify the order in which values in the
197   dictionary passed to the :meth:`writerow` method are written to file
198   *f*.  The optional *restval* parameter specifies the value to be
199   written if the dictionary is missing a key in *fieldnames*.  If the
200   dictionary passed to the :meth:`writerow` method contains a key not found in
201   *fieldnames*, the optional *extrasaction* parameter indicates what action to
202   take.
203   If it is set to ``'raise'``, the default value, a :exc:`ValueError`
204   is raised.
205   If it is set to ``'ignore'``, extra values in the dictionary are ignored.
206   Any other optional or keyword arguments are passed to the underlying
207   :class:`writer` instance.
208
209   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
210   of the :class:`DictWriter` class is not optional.
211
212   A short usage example::
213
214       import csv
215
216       with open('names.csv', 'w', newline='') as csvfile:
217           fieldnames = ['first_name', 'last_name']
218           writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
219
220           writer.writeheader()
221           writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
222           writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
223           writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
224
225
226.. class:: Dialect
227
228   The :class:`Dialect` class is a container class whose attributes contain
229   information for how to handle doublequotes, whitespace, delimiters, etc.
230   Due to the lack of a strict CSV specification, different applications
231   produce subtly different CSV data.  :class:`Dialect` instances define how
232   :class:`reader` and :class:`writer` instances behave.
233
234   All available :class:`Dialect` names are returned by :func:`list_dialects`,
235   and they can be registered with specific :class:`reader` and :class:`writer`
236   classes through their initializer (``__init__``) functions like this::
237
238       import csv
239
240       with open('students.csv', 'w', newline='') as csvfile:
241           writer = csv.writer(csvfile, dialect='unix')
242                                        ^^^^^^^^^^^^^^
243
244
245.. class:: excel()
246
247   The :class:`excel` class defines the usual properties of an Excel-generated CSV
248   file.  It is registered with the dialect name ``'excel'``.
249
250
251.. class:: excel_tab()
252
253   The :class:`excel_tab` class defines the usual properties of an Excel-generated
254   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
255
256
257.. class:: unix_dialect()
258
259   The :class:`unix_dialect` class defines the usual properties of a CSV file
260   generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
261   all fields.  It is registered with the dialect name ``'unix'``.
262
263   .. versionadded:: 3.2
264
265
266.. class:: Sniffer()
267
268   The :class:`Sniffer` class is used to deduce the format of a CSV file.
269
270   The :class:`Sniffer` class provides two methods:
271
272   .. method:: sniff(sample, delimiters=None)
273
274      Analyze the given *sample* and return a :class:`Dialect` subclass
275      reflecting the parameters found.  If the optional *delimiters* parameter
276      is given, it is interpreted as a string containing possible valid
277      delimiter characters.
278
279
280   .. method:: has_header(sample)
281
282      Analyze the sample text (presumed to be in CSV format) and return
283      :const:`True` if the first row appears to be a series of column headers.
284      Inspecting each column, one of two key criteria will be considered to
285      estimate if the sample contains a header:
286
287        - the second through n-th rows contain numeric values
288        - the second through n-th rows contain strings where at least one value's
289          length differs from that of the putative header of that column.
290
291      Twenty rows after the first row are sampled; if more than half of columns +
292      rows meet the criteria, :const:`True` is returned.
293
294   .. note::
295
296      This method is a rough heuristic and may produce both false positives and
297      negatives.
298
299An example for :class:`Sniffer` use::
300
301   with open('example.csv', newline='') as csvfile:
302       dialect = csv.Sniffer().sniff(csvfile.read(1024))
303       csvfile.seek(0)
304       reader = csv.reader(csvfile, dialect)
305       # ... process CSV file contents here ...
306
307
308The :mod:`csv` module defines the following constants:
309
310.. data:: QUOTE_ALL
311
312   Instructs :class:`writer` objects to quote all fields.
313
314
315.. data:: QUOTE_MINIMAL
316
317   Instructs :class:`writer` objects to only quote those fields which contain
318   special characters such as *delimiter*, *quotechar* or any of the characters in
319   *lineterminator*.
320
321
322.. data:: QUOTE_NONNUMERIC
323
324   Instructs :class:`writer` objects to quote all non-numeric fields.
325
326   Instructs the reader to convert all non-quoted fields to type *float*.
327
328
329.. data:: QUOTE_NONE
330
331   Instructs :class:`writer` objects to never quote fields.  When the current
332   *delimiter* occurs in output data it is preceded by the current *escapechar*
333   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
334   any characters that require escaping are encountered.
335
336   Instructs :class:`reader` to perform no special processing of quote characters.
337
338The :mod:`csv` module defines the following exception:
339
340
341.. exception:: Error
342
343   Raised by any of the functions when an error is detected.
344
345.. _csv-fmt-params:
346
347Dialects and Formatting Parameters
348----------------------------------
349
350To make it easier to specify the format of input and output records, specific
351formatting parameters are grouped together into dialects.  A dialect is a
352subclass of the :class:`Dialect` class having a set of specific methods and a
353single :meth:`validate` method.  When creating :class:`reader` or
354:class:`writer` objects, the programmer can specify a string or a subclass of
355the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
356of, the *dialect* parameter, the programmer can also specify individual
357formatting parameters, which have the same names as the attributes defined below
358for the :class:`Dialect` class.
359
360Dialects support the following attributes:
361
362
363.. attribute:: Dialect.delimiter
364
365   A one-character string used to separate fields.  It defaults to ``','``.
366
367
368.. attribute:: Dialect.doublequote
369
370   Controls how instances of *quotechar* appearing inside a field should
371   themselves be quoted.  When :const:`True`, the character is doubled. When
372   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
373   defaults to :const:`True`.
374
375   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
376   :exc:`Error` is raised if a *quotechar* is found in a field.
377
378
379.. attribute:: Dialect.escapechar
380
381   A one-character string used by the writer to escape the *delimiter* if *quoting*
382   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
383   :const:`False`. On reading, the *escapechar* removes any special meaning from
384   the following character. It defaults to :const:`None`, which disables escaping.
385
386
387.. attribute:: Dialect.lineterminator
388
389   The string used to terminate lines produced by the :class:`writer`. It defaults
390   to ``'\r\n'``.
391
392   .. note::
393
394      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
395      end-of-line, and ignores *lineterminator*. This behavior may change in the
396      future.
397
398
399.. attribute:: Dialect.quotechar
400
401   A one-character string used to quote fields containing special characters, such
402   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
403   defaults to ``'"'``.
404
405
406.. attribute:: Dialect.quoting
407
408   Controls when quotes should be generated by the writer and recognised by the
409   reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
410   :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
411
412
413.. attribute:: Dialect.skipinitialspace
414
415   When :const:`True`, whitespace immediately following the *delimiter* is ignored.
416   The default is :const:`False`.
417
418
419.. attribute:: Dialect.strict
420
421   When ``True``, raise exception :exc:`Error` on bad CSV input.
422   The default is ``False``.
423
424Reader Objects
425--------------
426
427Reader objects (:class:`DictReader` instances and objects returned by the
428:func:`reader` function) have the following public methods:
429
430.. method:: csvreader.__next__()
431
432   Return the next row of the reader's iterable object as a list (if the object
433   was returned from :func:`reader`) or a dict (if it is a :class:`DictReader`
434   instance), parsed according to the current :class:`Dialect`.  Usually you
435   should call this as ``next(reader)``.
436
437
438Reader objects have the following public attributes:
439
440.. attribute:: csvreader.dialect
441
442   A read-only description of the dialect in use by the parser.
443
444
445.. attribute:: csvreader.line_num
446
447   The number of lines read from the source iterator. This is not the same as the
448   number of records returned, as records can span multiple lines.
449
450
451DictReader objects have the following public attribute:
452
453.. attribute:: csvreader.fieldnames
454
455   If not passed as a parameter when creating the object, this attribute is
456   initialized upon first access or when the first record is read from the
457   file.
458
459
460
461Writer Objects
462--------------
463
464:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
465the :func:`writer` function) have the following public methods.  A *row* must be
466an iterable of strings or numbers for :class:`Writer` objects and a dictionary
467mapping fieldnames to strings or numbers (by passing them through :func:`str`
468first) for :class:`DictWriter` objects.  Note that complex numbers are written
469out surrounded by parens. This may cause some problems for other programs which
470read CSV files (assuming they support complex numbers at all).
471
472
473.. method:: csvwriter.writerow(row)
474
475   Write the *row* parameter to the writer's file object, formatted according
476   to the current :class:`Dialect`. Return the return value of the call to the
477   *write* method of the underlying file object.
478
479   .. versionchanged:: 3.5
480      Added support of arbitrary iterables.
481
482.. method:: csvwriter.writerows(rows)
483
484   Write all elements in *rows* (an iterable of *row* objects as described
485   above) to the writer's file object, formatted according to the current
486   dialect.
487
488Writer objects have the following public attribute:
489
490
491.. attribute:: csvwriter.dialect
492
493   A read-only description of the dialect in use by the writer.
494
495
496DictWriter objects have the following public method:
497
498
499.. method:: DictWriter.writeheader()
500
501   Write a row with the field names (as specified in the constructor) to
502   the writer's file object, formatted according to the current dialect. Return
503   the return value of the :meth:`csvwriter.writerow` call used internally.
504
505   .. versionadded:: 3.2
506   .. versionchanged:: 3.8
507      :meth:`writeheader` now also returns the value returned by
508      the :meth:`csvwriter.writerow` method it uses internally.
509
510
511.. _csv-examples:
512
513Examples
514--------
515
516The simplest example of reading a CSV file::
517
518   import csv
519   with open('some.csv', newline='') as f:
520       reader = csv.reader(f)
521       for row in reader:
522           print(row)
523
524Reading a file with an alternate format::
525
526   import csv
527   with open('passwd', newline='') as f:
528       reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
529       for row in reader:
530           print(row)
531
532The corresponding simplest possible writing example is::
533
534   import csv
535   with open('some.csv', 'w', newline='') as f:
536       writer = csv.writer(f)
537       writer.writerows(someiterable)
538
539Since :func:`open` is used to open a CSV file for reading, the file
540will by default be decoded into unicode using the system default
541encoding (see :func:`locale.getpreferredencoding`).  To decode a file
542using a different encoding, use the ``encoding`` argument of open::
543
544   import csv
545   with open('some.csv', newline='', encoding='utf-8') as f:
546       reader = csv.reader(f)
547       for row in reader:
548           print(row)
549
550The same applies to writing in something other than the system default
551encoding: specify the encoding argument when opening the output file.
552
553Registering a new dialect::
554
555   import csv
556   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
557   with open('passwd', newline='') as f:
558       reader = csv.reader(f, 'unixpwd')
559
560A slightly more advanced use of the reader --- catching and reporting errors::
561
562   import csv, sys
563   filename = 'some.csv'
564   with open(filename, newline='') as f:
565       reader = csv.reader(f)
566       try:
567           for row in reader:
568               print(row)
569       except csv.Error as e:
570           sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
571
572And while the module doesn't directly support parsing strings, it can easily be
573done::
574
575   import csv
576   for row in csv.reader(['one,two,three']):
577       print(row)
578
579
580.. rubric:: Footnotes
581
582.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
583   will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
584   on write an extra ``\r`` will be added.  It should always be safe to specify
585   ``newline=''``, since the csv module does its own
586   (:term:`universal <universal newlines>`) newline handling.
587