AWK CSV Parser

This is a bit of AWK code I wrote to parse CSV files. It was created for use in a CSV to SQL converter for shql, but might be useful to others.

New BETA version.
I've been slowly working on some major changes and bug fixes for a while. Work has kept me much too busy to devote enough time to this to finish. So here is a BETA version. This has major changes. The parsing function has been renamed and a new function has been added to turn arrays into a CSV string. So here is BETA version 2A for download as a PKZip file.

This archive includes two files:
csv.awk
This is the main file with comments.
sparse_csv.awk
This has had all comments, blank lines and example code stripped out. This is for inclusion in your projects without taking up as much space.

Please read all the comments in the source file. They describe how everything works. There is also a bit of example code at the bottom of csv.awk.

Information on the current version follows.

See also my CSV utilities based on this code.

Usage:

  1. Add the function to your program. You might want to strip out the example code at the bottom.
  2. Read a line of text. This is only necessary if you are reading manually.
  3. Call parse_csv with the following parameters:
    1. The string to parse.
    2. The array to parse the string into.
    3. The separator character. (Normally ,)
    4. The quote character. (Normally ")
    5. The escape character. (Normally " or \)
    6. Indicate if embedded newlines should be handled. This should be either the proper line terminator for you file type or a string to replace the newline with. Leave this empty if embedded newlines should not be handled, in which case they will cause an error

The number of fields parsed will be returned. If an error occurs -1 is returned and an error message is placed in csverr.

A short bit of example code is at the bottom of the file. It calls parse_csv on $0 then either prints the error if one occurs or prints $0, "->", the number of fields found then all the fields wrapped in the | character. Use the file test.csv for examples.

NOTE:
Embedded carriage returns cause problems when reading CSV files. When a line ends with a quoted string the embedded carriage return causes an error. Otherwise it is included in the last field. It is best if carriage returns are stripped before processing. If your AWK implementation supports the gsub function should work.

It is available as PKZip, Gzip, Plain Text.
Last updated 2008-04-13.

Sun Apr 2008-04-13

  • Added a new option to trim spaces from the beginning and end of fields. This option is off by default to match the behavior of Excel and OpenOffice more closely.
  • Cleaned the code up a bit and added more comments.
  • Made the return codes indicate the error that occurred. (See the comments in the source)

Thu Mar 2007-03-01
Fixed an error that caused it to bail when a newline immediately followed a quote character. The fix is to change the text "pos == length(string)" to "pos >= length(string)". Also added a note to the top of the file that it is in the public domain.

Mon Jun 2006-06-26
I forgot arrays are passed by reference. Added the array option to the function instead of making a global array csv.

Fri Jun 2006-06-23
While working on another program it occurred to me I could easily handle embedded newlines. While I was at it I reformatted the error messages and made sure quoted empty fields were handled correctly.

The file test.csv has examples of data that can be handled.

Back

Created with VIM. Valid HTML 4.01 Transitional.