• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!csv` --- CSV File Reading and Writing
2============================================
3
4.. module:: csv
5   :synopsis: Write and read tabular data to and from delimited files.
6
7.. sectionauthor:: Skip Montanaro <skip.montanaro@gmail.com>
8
9**Source code:** :source:`Lib/csv.py`
10
11.. index::
12   single: csv
13   pair: data; tabular
14
15--------------
16
17The so-called CSV (Comma Separated Values) format is the most common import and
18export format for spreadsheets and databases.  CSV format was used for many
19years prior to attempts to describe the format in a standardized way in
20:rfc:`4180`.  The lack of a well-defined standard means that subtle differences
21often exist in the data produced and consumed by different applications.  These
22differences can make it annoying to process CSV files from multiple sources.
23Still, while the delimiters and quoting characters vary, the overall format is
24similar enough that it is possible to write a single module which can
25efficiently manipulate such data, hiding the details of reading and writing the
26data from the programmer.
27
28The :mod:`csv` module implements classes to read and write tabular data in CSV
29format.  It allows programmers to say, "write this data in the format preferred
30by Excel," or "read data from this file which was generated by Excel," without
31knowing the precise details of the CSV format used by Excel.  Programmers can
32also describe the CSV formats understood by other applications or define their
33own special-purpose CSV formats.
34
35The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
36write sequences.  Programmers can also read and write data in dictionary form
37using the :class:`DictReader` and :class:`DictWriter` classes.
38
39.. seealso::
40
41   :pep:`305` - CSV File API
42      The Python Enhancement Proposal which proposed this addition to Python.
43
44
45.. _csv-contents:
46
47Module Contents
48---------------
49
50The :mod:`csv` module defines the following functions:
51
52
53.. index::
54   single: universal newlines; csv.reader function
55
56.. function:: reader(csvfile, dialect='excel', **fmtparams)
57
58   Return a :ref:`reader object <reader-objects>` that will process
59   lines from the given *csvfile*.  A csvfile must be an iterable of
60   strings, each in the reader's defined csv format.
61   A csvfile is most commonly a file-like object or list.
62   If *csvfile* is a file object,
63   it should be opened with ``newline=''``. [1]_  An optional
64   *dialect* parameter can be given which is used to define a set of parameters
65   specific to a particular CSV dialect.  It may be an instance of a subclass of
66   the :class:`Dialect` class or one of the strings returned by the
67   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
68   can be given to override individual formatting parameters in the current
69   dialect.  For full details about the dialect and formatting parameters, see
70   section :ref:`csv-fmt-params`.
71
72   Each row read from the csv file is returned as a list of strings.  No
73   automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
74   option is specified (in which case unquoted fields are transformed into floats).
75
76   A short usage example::
77
78      >>> import csv
79      >>> with open('eggs.csv', newline='') as csvfile:
80      ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
81      ...     for row in spamreader:
82      ...         print(', '.join(row))
83      Spam, Spam, Spam, Spam, Spam, Baked Beans
84      Spam, Lovely Spam, Wonderful Spam
85
86
87.. function:: writer(csvfile, dialect='excel', **fmtparams)
88
89   Return a writer object responsible for converting the user's data into delimited
90   strings on the given file-like object.  *csvfile* can be any object with a
91   :meth:`~io.TextIOBase.write` method.  If *csvfile* is a file object, it should be opened with
92   ``newline=''`` [1]_.  An optional *dialect*
93   parameter can be given which is used to define a set of parameters specific to a
94   particular CSV dialect.  It may be an instance of a subclass of the
95   :class:`Dialect` class or one of the strings returned by the
96   :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
97   can be given to override individual formatting parameters in the current
98   dialect.  For full details about dialects and formatting parameters, see
99   the :ref:`csv-fmt-params` section. To make it
100   as easy as possible to interface with modules which implement the DB API, the
101   value :const:`None` is written as the empty string.  While this isn't a
102   reversible transformation, it makes it easier to dump SQL NULL data values to
103   CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
104   All other non-string data are stringified with :func:`str` before being written.
105
106   A short usage example::
107
108      import csv
109      with open('eggs.csv', 'w', newline='') as csvfile:
110          spamwriter = csv.writer(csvfile, delimiter=' ',
111                                  quotechar='|', quoting=csv.QUOTE_MINIMAL)
112          spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
113          spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
114
115
116.. function:: register_dialect(name[, dialect[, **fmtparams]])
117
118   Associate *dialect* with *name*.  *name* must be a string. The
119   dialect can be specified either by passing a sub-class of :class:`Dialect`, or
120   by *fmtparams* keyword arguments, or both, with keyword arguments overriding
121   parameters of the dialect. For full details about dialects and formatting
122   parameters, see section :ref:`csv-fmt-params`.
123
124
125.. function:: unregister_dialect(name)
126
127   Delete the dialect associated with *name* from the dialect registry.  An
128   :exc:`Error` is raised if *name* is not a registered dialect name.
129
130
131.. function:: get_dialect(name)
132
133   Return the dialect associated with *name*.  An :exc:`Error` is raised if
134   *name* is not a registered dialect name.  This function returns an immutable
135   :class:`Dialect`.
136
137.. function:: list_dialects()
138
139   Return the names of all registered dialects.
140
141
142.. function:: field_size_limit([new_limit])
143
144   Returns the current maximum field size allowed by the parser. If *new_limit* is
145   given, this becomes the new limit.
146
147
148The :mod:`csv` module defines the following classes:
149
150.. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
151                      dialect='excel', *args, **kwds)
152
153   Create an object that operates like a regular reader but maps the
154   information in each row to a :class:`dict` whose keys are given by the
155   optional *fieldnames* parameter.
156
157   The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
158   omitted, the values in the first row of file *f* will be used as the
159   fieldnames and will be omitted from the results. If
160   *fieldnames* is provided, they will be used and the first row will be
161   included in the results.  Regardless of how the fieldnames are determined,
162   the dictionary preserves their original ordering.
163
164   If a row has more fields than fieldnames, the remaining data is put in a
165   list and stored with the fieldname specified by *restkey* (which defaults
166   to ``None``).  If a non-blank row has fewer fields than fieldnames, the
167   missing values are filled-in with the value of *restval* (which defaults
168   to ``None``).
169
170   All other optional or keyword arguments are passed to the underlying
171   :class:`reader` instance.
172
173   If the argument passed to *fieldnames* is an iterator, it will be coerced to a :class:`list`.
174
175   .. versionchanged:: 3.6
176      Returned rows are now of type :class:`OrderedDict`.
177
178   .. versionchanged:: 3.8
179      Returned rows are now of type :class:`dict`.
180
181   A short usage example::
182
183       >>> import csv
184       >>> with open('names.csv', newline='') as csvfile:
185       ...     reader = csv.DictReader(csvfile)
186       ...     for row in reader:
187       ...         print(row['first_name'], row['last_name'])
188       ...
189       Eric Idle
190       John Cleese
191
192       >>> print(row)
193       {'first_name': 'John', 'last_name': 'Cleese'}
194
195
196.. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
197                      dialect='excel', *args, **kwds)
198
199   Create an object which operates like a regular writer but maps dictionaries
200   onto output rows.  The *fieldnames* parameter is a :mod:`sequence
201   <collections.abc>` of keys that identify the order in which values in the
202   dictionary passed to the :meth:`~csvwriter.writerow` method are written to file
203   *f*.  The optional *restval* parameter specifies the value to be
204   written if the dictionary is missing a key in *fieldnames*.  If the
205   dictionary passed to the :meth:`~csvwriter.writerow` method contains a key not found in
206   *fieldnames*, the optional *extrasaction* parameter indicates what action to
207   take.
208   If it is set to ``'raise'``, the default value, a :exc:`ValueError`
209   is raised.
210   If it is set to ``'ignore'``, extra values in the dictionary are ignored.
211   Any other optional or keyword arguments are passed to the underlying
212   :class:`writer` instance.
213
214   Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
215   of the :class:`DictWriter` class is not optional.
216
217   If the argument passed to *fieldnames* is an iterator, it will be coerced to a :class:`list`.
218
219   A short usage example::
220
221       import csv
222
223       with open('names.csv', 'w', newline='') as csvfile:
224           fieldnames = ['first_name', 'last_name']
225           writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
226
227           writer.writeheader()
228           writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
229           writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
230           writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
231
232
233.. class:: Dialect
234
235   The :class:`Dialect` class is a container class whose attributes contain
236   information for how to handle doublequotes, whitespace, delimiters, etc.
237   Due to the lack of a strict CSV specification, different applications
238   produce subtly different CSV data.  :class:`Dialect` instances define how
239   :class:`reader` and :class:`writer` instances behave.
240
241   All available :class:`Dialect` names are returned by :func:`list_dialects`,
242   and they can be registered with specific :class:`reader` and :class:`writer`
243   classes through their initializer (``__init__``) functions like this::
244
245       import csv
246
247       with open('students.csv', 'w', newline='') as csvfile:
248           writer = csv.writer(csvfile, dialect='unix')
249
250
251.. class:: excel()
252
253   The :class:`excel` class defines the usual properties of an Excel-generated CSV
254   file.  It is registered with the dialect name ``'excel'``.
255
256
257.. class:: excel_tab()
258
259   The :class:`excel_tab` class defines the usual properties of an Excel-generated
260   TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
261
262
263.. class:: unix_dialect()
264
265   The :class:`unix_dialect` class defines the usual properties of a CSV file
266   generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
267   all fields.  It is registered with the dialect name ``'unix'``.
268
269   .. versionadded:: 3.2
270
271
272.. class:: Sniffer()
273
274   The :class:`Sniffer` class is used to deduce the format of a CSV file.
275
276   The :class:`Sniffer` class provides two methods:
277
278   .. method:: sniff(sample, delimiters=None)
279
280      Analyze the given *sample* and return a :class:`Dialect` subclass
281      reflecting the parameters found.  If the optional *delimiters* parameter
282      is given, it is interpreted as a string containing possible valid
283      delimiter characters.
284
285
286   .. method:: has_header(sample)
287
288      Analyze the sample text (presumed to be in CSV format) and return
289      :const:`True` if the first row appears to be a series of column headers.
290      Inspecting each column, one of two key criteria will be considered to
291      estimate if the sample contains a header:
292
293      - the second through n-th rows contain numeric values
294      - the second through n-th rows contain strings where at least one value's
295        length differs from that of the putative header of that column.
296
297      Twenty rows after the first row are sampled; if more than half of columns +
298      rows meet the criteria, :const:`True` is returned.
299
300   .. note::
301
302      This method is a rough heuristic and may produce both false positives and
303      negatives.
304
305An example for :class:`Sniffer` use::
306
307   with open('example.csv', newline='') as csvfile:
308       dialect = csv.Sniffer().sniff(csvfile.read(1024))
309       csvfile.seek(0)
310       reader = csv.reader(csvfile, dialect)
311       # ... process CSV file contents here ...
312
313
314.. _csv-constants:
315
316The :mod:`csv` module defines the following constants:
317
318.. data:: QUOTE_ALL
319
320   Instructs :class:`writer` objects to quote all fields.
321
322
323.. data:: QUOTE_MINIMAL
324
325   Instructs :class:`writer` objects to only quote those fields which contain
326   special characters such as *delimiter*, *quotechar* or any of the characters in
327   *lineterminator*.
328
329
330.. data:: QUOTE_NONNUMERIC
331
332   Instructs :class:`writer` objects to quote all non-numeric fields.
333
334   Instructs :class:`reader` objects to convert all non-quoted fields to type *float*.
335
336
337.. data:: QUOTE_NONE
338
339   Instructs :class:`writer` objects to never quote fields.  When the current
340   *delimiter* occurs in output data it is preceded by the current *escapechar*
341   character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
342   any characters that require escaping are encountered.
343
344   Instructs :class:`reader` objects to perform no special processing of quote characters.
345
346.. data:: QUOTE_NOTNULL
347
348   Instructs :class:`writer` objects to quote all fields which are not
349   ``None``.  This is similar to :data:`QUOTE_ALL`, except that if a
350   field value is ``None`` an empty (unquoted) string is written.
351
352   Instructs :class:`reader` objects to interpret an empty (unquoted) field
353   as ``None`` and to otherwise behave as :data:`QUOTE_ALL`.
354
355   .. versionadded:: 3.12
356
357.. data:: QUOTE_STRINGS
358
359   Instructs :class:`writer` objects to always place quotes around fields
360   which are strings.  This is similar to :data:`QUOTE_NONNUMERIC`, except that if a
361   field value is ``None`` an empty (unquoted) string is written.
362
363   Instructs :class:`reader` objects to interpret an empty (unquoted) string as ``None`` and
364   to otherwise behave as :data:`QUOTE_NONNUMERIC`.
365
366   .. versionadded:: 3.12
367
368The :mod:`csv` module defines the following exception:
369
370
371.. exception:: Error
372
373   Raised by any of the functions when an error is detected.
374
375.. _csv-fmt-params:
376
377Dialects and Formatting Parameters
378----------------------------------
379
380To make it easier to specify the format of input and output records, specific
381formatting parameters are grouped together into dialects.  A dialect is a
382subclass of the :class:`Dialect` class containing various attributes
383describing the format of the CSV file.  When creating :class:`reader` or
384:class:`writer` objects, the programmer can specify a string or a subclass of
385the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
386of, the *dialect* parameter, the programmer can also specify individual
387formatting parameters, which have the same names as the attributes defined below
388for the :class:`Dialect` class.
389
390Dialects support the following attributes:
391
392
393.. attribute:: Dialect.delimiter
394
395   A one-character string used to separate fields.  It defaults to ``','``.
396
397
398.. attribute:: Dialect.doublequote
399
400   Controls how instances of *quotechar* appearing inside a field should
401   themselves be quoted.  When :const:`True`, the character is doubled. When
402   :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
403   defaults to :const:`True`.
404
405   On output, if *doublequote* is :const:`False` and no *escapechar* is set,
406   :exc:`Error` is raised if a *quotechar* is found in a field.
407
408
409.. attribute:: Dialect.escapechar
410
411   A one-character string used by the writer to escape the *delimiter* if *quoting*
412   is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
413   :const:`False`. On reading, the *escapechar* removes any special meaning from
414   the following character. It defaults to :const:`None`, which disables escaping.
415
416   .. versionchanged:: 3.11
417      An empty *escapechar* is not allowed.
418
419.. attribute:: Dialect.lineterminator
420
421   The string used to terminate lines produced by the :class:`writer`. It defaults
422   to ``'\r\n'``.
423
424   .. note::
425
426      The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
427      end-of-line, and ignores *lineterminator*. This behavior may change in the
428      future.
429
430
431.. attribute:: Dialect.quotechar
432
433   A one-character string used to quote fields containing special characters, such
434   as the *delimiter* or *quotechar*, or which contain new-line characters.  It
435   defaults to ``'"'``.
436
437   .. versionchanged:: 3.11
438      An empty *quotechar* is not allowed.
439
440.. attribute:: Dialect.quoting
441
442   Controls when quotes should be generated by the writer and recognised by the
443   reader.  It can take on any of the :ref:`QUOTE_\* constants <csv-constants>`
444   and defaults to :const:`QUOTE_MINIMAL`.
445
446
447.. attribute:: Dialect.skipinitialspace
448
449   When :const:`True`, spaces immediately following the *delimiter* are ignored.
450   The default is :const:`False`.
451
452
453.. attribute:: Dialect.strict
454
455   When ``True``, raise exception :exc:`Error` on bad CSV input.
456   The default is ``False``.
457
458.. _reader-objects:
459
460Reader Objects
461--------------
462
463Reader objects (:class:`DictReader` instances and objects returned by the
464:func:`reader` function) have the following public methods:
465
466.. method:: csvreader.__next__()
467
468   Return the next row of the reader's iterable object as a list (if the object
469   was returned from :func:`reader`) or a dict (if it is a :class:`DictReader`
470   instance), parsed according to the current :class:`Dialect`.  Usually you
471   should call this as ``next(reader)``.
472
473
474Reader objects have the following public attributes:
475
476.. attribute:: csvreader.dialect
477
478   A read-only description of the dialect in use by the parser.
479
480
481.. attribute:: csvreader.line_num
482
483   The number of lines read from the source iterator. This is not the same as the
484   number of records returned, as records can span multiple lines.
485
486
487DictReader objects have the following public attribute:
488
489.. attribute:: DictReader.fieldnames
490
491   If not passed as a parameter when creating the object, this attribute is
492   initialized upon first access or when the first record is read from the
493   file.
494
495
496
497Writer Objects
498--------------
499
500:class:`writer` objects (:class:`DictWriter` instances and objects returned by
501the :func:`writer` function) have the following public methods.  A *row* must be
502an iterable of strings or numbers for :class:`writer` objects and a dictionary
503mapping fieldnames to strings or numbers (by passing them through :func:`str`
504first) for :class:`DictWriter` objects.  Note that complex numbers are written
505out surrounded by parens. This may cause some problems for other programs which
506read CSV files (assuming they support complex numbers at all).
507
508
509.. method:: csvwriter.writerow(row)
510
511   Write the *row* parameter to the writer's file object, formatted according
512   to the current :class:`Dialect`. Return the return value of the call to the
513   *write* method of the underlying file object.
514
515   .. versionchanged:: 3.5
516      Added support of arbitrary iterables.
517
518.. method:: csvwriter.writerows(rows)
519
520   Write all elements in *rows* (an iterable of *row* objects as described
521   above) to the writer's file object, formatted according to the current
522   dialect.
523
524Writer objects have the following public attribute:
525
526
527.. attribute:: csvwriter.dialect
528
529   A read-only description of the dialect in use by the writer.
530
531
532DictWriter objects have the following public method:
533
534
535.. method:: DictWriter.writeheader()
536
537   Write a row with the field names (as specified in the constructor) to
538   the writer's file object, formatted according to the current dialect. Return
539   the return value of the :meth:`csvwriter.writerow` call used internally.
540
541   .. versionadded:: 3.2
542   .. versionchanged:: 3.8
543      :meth:`writeheader` now also returns the value returned by
544      the :meth:`csvwriter.writerow` method it uses internally.
545
546
547.. _csv-examples:
548
549Examples
550--------
551
552The simplest example of reading a CSV file::
553
554   import csv
555   with open('some.csv', newline='') as f:
556       reader = csv.reader(f)
557       for row in reader:
558           print(row)
559
560Reading a file with an alternate format::
561
562   import csv
563   with open('passwd', newline='') as f:
564       reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
565       for row in reader:
566           print(row)
567
568The corresponding simplest possible writing example is::
569
570   import csv
571   with open('some.csv', 'w', newline='') as f:
572       writer = csv.writer(f)
573       writer.writerows(someiterable)
574
575Since :func:`open` is used to open a CSV file for reading, the file
576will by default be decoded into unicode using the system default
577encoding (see :func:`locale.getencoding`).  To decode a file
578using a different encoding, use the ``encoding`` argument of open::
579
580   import csv
581   with open('some.csv', newline='', encoding='utf-8') as f:
582       reader = csv.reader(f)
583       for row in reader:
584           print(row)
585
586The same applies to writing in something other than the system default
587encoding: specify the encoding argument when opening the output file.
588
589Registering a new dialect::
590
591   import csv
592   csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
593   with open('passwd', newline='') as f:
594       reader = csv.reader(f, 'unixpwd')
595
596A slightly more advanced use of the reader --- catching and reporting errors::
597
598   import csv, sys
599   filename = 'some.csv'
600   with open(filename, newline='') as f:
601       reader = csv.reader(f)
602       try:
603           for row in reader:
604               print(row)
605       except csv.Error as e:
606           sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
607
608And while the module doesn't directly support parsing strings, it can easily be
609done::
610
611   import csv
612   for row in csv.reader(['one,two,three']):
613       print(row)
614
615
616.. rubric:: Footnotes
617
618.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
619   will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
620   on write an extra ``\r`` will be added.  It should always be safe to specify
621   ``newline=''``, since the csv module does its own
622   (:term:`universal <universal newlines>`) newline handling.
623