Lines Matching full:csv
2 A tutorial for handling CSV data in Rust.
4 This tutorial will cover basic CSV reading and writing, automatic
5 (de)serialization with Serde, CSV transformations and performance.
24 1. [Reading CSV](#reading-csv)
29 1. [Writing CSV](#writing-csv)
38 * [CSV parsing without the standard library](#csv-parsing-without-the-standard-library)
43 In this section, we'll get you setup with a simple program that reads CSV data
56 `csv = "1.1"` to your `[dependencies]` section. At this point, your
66 csv = "1.1"
69 Next, let's build your project. Since you added the `csv` crate as a
85 Let's make our program do something useful. Our program will read CSV data on
97 // Create a CSV parser that reads data from stdin.
98 let mut rdr = csv::Reader::from_reader(io::stdin());
103 let record = result.expect("a CSV record");
118 some CSV data to play with! For that, we will use a random selection of 100
120 will use this same CSV data throughout the entire tutorial.) To get the data,
124 $ curl -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop.csv'
127 And now finally, run your program on `uspop.csv`:
130 $ ./target/debug/csvtutor < uspop.csv
139 Since reading CSV data can result in errors, error handling is pervasive
165 errors. A non-existent file or invalid CSV data are examples of recoverable
183 not be found, or if some CSV data is invalid, is considered bad practice.
194 let mut rdr = csv::Reader::from_reader(io::stdin());
196 let record = result.expect("a CSV record");
208 invalid CSV data, then the program will panic:
217 thread 'main' panicked at 'a CSV record: Error(UnequalLengths { pos: Some(Position { byte: 24, line…
221 What happened here? First and foremost, we should talk about why the CSV data
222 is invalid. The CSV data consists of three records: a header and two data
224 data record has three fields. By default, the csv crate will treat inconsistent
232 (Note that the CSV reader automatically interprets the first record as a
242 let record = result.expect("a CSV record"); // this panics
271 Since this causes a panic if the CSV data is invalid, and invalid CSV data is
287 let mut rdr = csv::Reader::from_reader(io::stdin());
295 println!("error reading CSV from <stdin>: {}", err);
313 error reading CSV from <stdin>: CSV error: record 2 (line: 3, byte: 24): found record with 3 fields…
316 The second step for moving to recoverable errors is to put our CSV record loop
332 let mut rdr = csv::Reader::from_reader(io::stdin());
370 let mut rdr = csv::Reader::from_reader(io::stdin());
391 tutorial on writing command line programs that do CSV parsing, we will consider
393 a library that handles CSV data, then you should check out my
397 CSV transformations, then using methods like `expect` and panicking when an
401 # Reading CSV
404 what we came here to do: handle CSV data. We've already seen how to read
405 CSV data from `stdin`, but this section will cover how to read CSV data from
406 files and how to configure our CSV reader to data formatted with different
425 let mut rdr = csv::Reader::from_reader(file);
455 $ ./target/debug/csvtutor uspop.csv
472 `File` in a buffer. The CSV reader does buffering internally, so there's
475 Now is a good time to introduce an alternate CSV reader constructor, which
476 makes it slightly more convenient to open CSV data from a file. That is,
482 let mut rdr = csv::Reader::from_reader(file);
489 let mut rdr = csv::Reader::from_path(file_path)?;
492 `csv::Reader::from_path` will open the file for you and return an error if
497 If you had a chance to look at the data inside `uspop.csv`, you would notice
506 CSV reader will interpret the first record in CSV data as a header, which
509 iterate over the records in CSV data.
511 The CSV reader does not try to be smart about the header record and does
514 as a header, you'll need to tell the CSV reader that there are no headers.
516 To configure a CSV reader to do this, we'll need to use a
518 to build a CSV reader with our desired configuration. Here's an example that
527 let mut rdr = csv::ReaderBuilder::new()
545 If you compile and run this program with our `uspop.csv` data, then you'll see
550 $ ./target/debug/csvtutor < uspop.csv
566 let mut rdr = csv::Reader::from_reader(io::stdin());
605 This converts it from a borrow of the CSV reader to a new owned value. This
611 In this section we'll temporarily depart from our `uspop.csv` data set and
612 show how to read some CSV data that is a little less clean. This CSV data
618 $ cat strange.csv
628 To read this CSV data, we'll want to do the following:
645 let mut rdr = csv::ReaderBuilder::new()
668 Now re-compile your project and try running the program on `strange.csv`:
672 $ ./target/debug/csvtutor < strange.csv
684 1. If you remove the `escape` setting, notice that no CSV errors are reported.
685 Instead, records are still parsed. This is a feature of the CSV parser. Even
688 of real world CSV data.
695 This covers most of the things you might want to configure on your CSV reader,
712 For example, let's take a look at some data from our `uspop.csv` file:
731 let mut rdr = csv::Reader::from_reader(io::stdin());
776 let mut rdr = csv::Reader::from_reader(io::stdin());
799 $ ./target/debug/csvtutor < uspop.csv
808 if your CSV data has a header record, since you might tend to think about each
828 let mut rdr = csv::Reader::from_reader(io::stdin());
849 $ ./target/debug/csvtutor < uspop.csv
855 This method works especially well if you need to read CSV data with header
857 However, in our case, we know the structure of the data in `uspop.csv`. In
872 code required to populate our struct from a CSV record. The next example shows
887 // the fields in the CSV data!
899 let mut rdr = csv::Reader::from_reader(io::stdin());
921 $ ./target/debug/csvtutor < uspop.csv
934 definition at compile time and generate code that will deserialize a CSV record
940 struct's field names to the header names in the CSV data. If you recall, our
954 $ ./target/debug/csvtutor < uspop.csv
955 CSV deserialize error: record 1 (line: 2, byte: 41): missing field `latitude`
1007 …url -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop-null.csv'
1030 let mut rdr = csv::Reader::from_reader(io::stdin());
1050 $ ./target/debug/csvtutor < uspop-null.csv
1055 CSV deserialize error: record 42 (line: 43, byte: 1710): field 2: invalid digit found in string
1064 $ head -n 43 uspop-null.csv | tail -n1
1091 #[serde(deserialize_with = "csv::invalid_option")]
1098 let mut rdr = csv::Reader::from_reader(io::stdin());
1119 $ ./target/debug/csvtutor < uspop-null.csv
1130 #[serde(deserialize_with = "csv::invalid_option")]
1137 `None` value. This is useful when you need to work with messy CSV data.
1139 # Writing CSV
1141 In this section we'll show a few examples that write CSV data. Writing CSV data
1142 tends to be a bit more straight-forward than reading CSV data, since you get to
1145 Let's start with the most basic example: writing a few CSV records to `stdout`.
1152 let mut wtr = csv::Writer::from_writer(io::stdout());
1161 // A CSV writer maintains an internal buffer, so it's important
1175 Compiling and running this example results in CSV data being printed:
1191 pub fn write_record<I, T>(&mut self, record: I) -> csv::Result<()>
1202 3. `record` is the CSV record we'd like to write. Its type is `I`, which is
1215 The CSV writer will take these bytes and write them as a single field.
1218 6. Finally, the method returns a `csv::Result<()>`, which is short-hand for
1219 `Result<(), csv::Error>`. That means `write_record` either returns nothing
1220 on success or returns a `csv::Error` on failure.
1239 # use csv;
1240 # let mut wtr = csv::Writer::from_writer(vec![]);
1246 wtr.write_record(&csv::StringRecord::from(vec!["a", "b", "c"]));
1248 wtr.write_record(&csv::ByteRecord::from(vec!["a", "b", "c"]));
1265 let mut wtr = csv::Writer::from_path(file_path)?;
1295 In the previous section, we saw how to write some simple CSV data to `stdout`
1305 You might wonder to yourself: what's the point of using a CSV writer if the
1306 data is so simple? Well, the benefit of a CSV writer is that it can handle all
1308 when to quote fields that contain special CSV characters (like commas or new
1309 lines) or escape literal quotes that appear in your data. The CSV writer can
1313 on a CSV writer. In particular, we'll write TSV ("tab separated values")
1314 instead of CSV, and we'll ask the CSV writer to quote all non-numeric fields.
1322 let mut wtr = csv::WriterBuilder::new()
1324 .quote_style(csv::QuoteStyle::NonNumeric)
1365 Just like the CSV reader supports automatic deserialization into Rust types
1366 with Serde, the CSV writer supports automatic serialization from Rust types
1367 into CSV records using Serde. In this section, we'll learn how to use it.
1376 let mut wtr = csv::Writer::from_writer(io::stdout());
1422 As with reading, we can also serialize custom structs as CSV records. As a
1426 To write custom structs as CSV records, we'll need to make use of Serde's
1457 let mut wtr = csv::Writer::from_writer(io::stdout());
1554 that take CSV data as input, and produce possibly transformed or filtered CSV
1556 reads and writes CSV data. Rust is well positioned to perform this task, since
1557 you'll get great performance with the convenience of a high level CSV library.
1561 The first example of CSV pipelining we'll look at is a simple filter. It takes
1562 as input some CSV data on stdin and a single string query as its only
1563 positional argument, and it will produce as output CSV data that only contains
1578 // Build CSV readers and writers to stdin and stdout, respectively.
1579 let mut rdr = csv::Reader::from_reader(io::stdin());
1580 let mut wtr = csv::Writer::from_writer(io::stdout());
1594 // CSV writers use an internal buffer, so we should always flush when done.
1607 If we compile and run this program with a query of `MA` on `uspop.csv`, we'll
1612 $ ./csvtutor MA < uspop.csv
1618 you've already learned about CSV readers and writers from previous sections.
1621 messy CSV data that might not be encoded correctly. One example you might come
1622 across is CSV data encoded in
1624 Unfortunately, for the examples we've seen so far, our CSV reader assumes that
1627 problems. But let's introduce a slightly tweaked version of our `uspop.csv`
1632 …l -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop-latin1.csv'
1639 $ ./csvtutor MA < uspop-latin1.csv
1641 CSV parse error: record 3 (line 4, field: 0, byte: 125): invalid utf-8: invalid UTF-8 in field 0 ne…
1648 $ head -n4 uspop-latin1.csv | tail -n1
1654 CSV parser has choked on our data? You have two choices. The first is to go in
1655 and fix up your CSV data so that it's valid UTF-8. This is probably a good
1657 But if you can't or don't want to do that, then you can instead read CSV data
1666 Each them represent a single record in CSV data, where a record is a sequence
1693 let mut rdr = csv::Reader::from_reader(io::stdin());
1694 let mut wtr = csv::Writer::from_writer(io::stdout());
1724 $ ./csvtutor MA < uspop-latin1.csv
1732 writes CSV data, but instead of dealing with arbitrary records, we will use
1778 // Build CSV readers and writers to stdin and stdout, respectively.
1781 let mut rdr = csv::Reader::from_reader(io::stdin());
1782 let mut wtr = csv::Writer::from_writer(io::stdout());
1802 // CSV writers use an internal buffer, so we should always flush when done.
1821 $ ./target/debug/csvtutor 100000 < uspop.csv
1830 In this section, we'll go over how to squeeze the most juice out of our CSV
1834 of the section will show how to do CSV parsing with as little allocation as
1852 such a dataset, we'll use the original source of `uspop.csv`. **Warning: the
1856 $ curl -LO http://burntsushi.net/stuff/worldcitiespop.csv.gz
1857 $ gunzip worldcitiespop.csv.gz
1858 $ wc worldcitiespop.csv
1859 3173959 5681543 151492068 worldcitiespop.csv
1860 $ md5sum worldcitiespop.csv
1861 6198bd180b6d6586626ecbf044c1cca5 worldcitiespop.csv
1876 to measure how long it takes to do CSV parsing.
1880 `worldcitiespop.csv`:
1887 let mut rdr = csv::Reader::from_reader(io::stdin());
1918 $ time ./target/release/csvtutor < worldcitiespop.csv
1934 (If validation fails, then the CSV reader will return an error.) If we remove
1943 let mut rdr = csv::Reader::from_reader(io::stdin());
1972 $ time ./target/release/csvtutor < worldcitiespop.csv
2007 by the `records` and `byte_records` methods on a CSV reader. These iterators
2014 by creating a *single* `ByteRecord` and asking the CSV reader to read into it.
2024 let mut rdr = csv::Reader::from_reader(io::stdin());
2025 let mut record = csv::ByteRecord::new();
2053 $ time ./target/release/csvtutor < worldcitiespop.csv
2068 fn read_byte_record(&mut self, record: &mut ByteRecord) -> csv::Result<bool>;
2071 This method takes as input a CSV reader (the `self` parameter) and a *mutable
2072 borrow* of a `ByteRecord`, and returns a `csv::Result<bool>`. (The
2073 `csv::Result<bool>` is equivalent to `Result<bool, csv::Error>`.) The return
2118 let mut rdr = csv::Reader::from_reader(io::stdin());
2147 $ ./target/release/csvtutor < worldcitiespop.csv
2185 let mut rdr = csv::Reader::from_reader(io::stdin());
2186 let mut raw_record = csv::StringRecord::new();
2216 $ ./target/release/csvtutor < worldcitiespop.csv
2231 allocation. In this case, our `&str` is borrowing from the CSV record itself.
2271 let mut rdr = csv::Reader::from_reader(io::stdin());
2272 let mut raw_record = csv::ByteRecord::new();
2302 $ ./target/release/csvtutor < worldcitiespop.csv
2314 fastest way to parse CSV since it necessarily needs to do more work.
2316 ## CSV parsing without the standard library
2318 In this section, we will explore a niche use case: parsing CSV without the
2319 standard library. While the `csv` crate itself requires the standard library,
2321 [`csv-core`](https://docs.rs/csv-core)
2323 depending on the standard library is that CSV parsing becomes a lot more
2326 The `csv-core` crate is structured similarly to the `csv` crate. There is a
2334 The `csv-core` crate has no record types or iterators. Instead, CSV data
2341 `csv-core` that counts the number of records in the state of Massachusetts.
2344 though `csv-core` doesn't technically require it. We do this for convenient
2366 // Attempt to incrementally read the next CSV field.
2400 // This case happens when the CSV reader has successfully exhausted
2433 $ time ./target/release/csvtutor < worldcitiespop.csv
2441 This isn't as fast as some of our previous examples where we used the `csv`
2454 2. If you wanted to build your own csv-like library, you could build it on top
2455 of `csv-core`.
2460 write so many words on something as basic as CSV parsing. I wanted this
2467 * The [API documentation for the `csv` crate](../index.html) documents all
2469 * The [`csv-index` crate](https://docs.rs/csv-index) provides data structures
2470 that can index CSV data that are amenable to writing to disk. (This library
2473 performance CSV swiss army knife. It can slice, select, search, sort, join,
2474 concatenate, index, format and compute statistics on arbitrary CSV data. Give