CSV to JSON: Building Data Pipelines Between Spreadsheets and APIs
A deep guide to converting CSV data to JSON — type inference, schema design, handling edge cases, and building reliable pipelines between spreadsheet exports and APIs.
CSV to JSON is the conversion that data engineers and backend developers hit most often. It shows up whenever a stakeholder exports a spreadsheet and you need to load it into an application, or whenever you're building an import feature that accepts CSV uploads.
The conversion looks simple. It often isn't.
The CSV Data Model
CSV (Comma-Separated Values) is defined by RFC 4180 but widely implemented with variations. The core rules:
- First row is the header (column names)
- Each subsequent row is a record
- Fields are separated by commas
- Fields with commas, double-quotes, or newlines must be quoted
- Double-quote inside a quoted field is escaped as two double-quotes
name,email,role,"department"
Alice,"[email protected]",admin,"Engineering, Backend"
Bob,[email protected],editor,Product
The fourth field shows why quoting is required — "Engineering, Backend" contains a comma and must be quoted.
Why CSV to JSON Isn't Just String Splitting
Developers often write CSV parsers as a string split on commas. This breaks immediately on quoted fields. A proper CSV parser handles:
- Quoted fields with embedded commas
- Double-quote escaping within quoted fields
- Newlines within quoted fields (multi-line values)
- Trailing newlines in the file
- BOM (Byte Order Mark) at the start of Excel-exported CSVs
- CRLF (\r\n) vs LF (\n) line endings
Using a battle-tested CSV library rather than a homebrew split is strongly recommended.
The Type Inference Problem
CSV carries no type information. Every cell value is a string. "42" is a string. "true" is a string. "2024-01-01" is a string.
When converting to JSON, you face a decision: should "42" become the number 42 or the string "42"?
There's no universally correct answer. It depends on how the JSON will be used:
- If it's going into a database that expects integers for an id column, you want numbers
- If it's going into a system that cares about leading zeros (ZIP codes: "01234"), you want strings
- If it's going into a JavaScript frontend and dates need to be parsed, you might want ISO 8601 strings
Strategies:
- **No inference (safest):** All values stay strings. The consuming system handles types. This is DevConvert's default — it avoids silent errors from wrong type inference.
- **Infer from value:** Try to parse numbers, booleans, and null values. Risky for ZIP codes, phone numbers, IDs with leading zeros.
- **Schema-driven:** Provide a schema that says "column 'age' is integer, column 'zip' is string." Most production import pipelines use this.
Handling the Header Row
The header row becomes JSON object keys. This creates problems when headers:
- Have spaces ("First Name" → problematic JSON key, needs quoting or camelCase conversion)
- Have special characters ("%", "(", ")")
- Are duplicated (two columns named "Notes")
- Are empty
- Have inconsistent casing (Name vs name vs NAME)
For production pipelines, normalise headers before converting: trim whitespace, convert to snake_case or camelCase, remove special characters, deduplicate.
Nested Structures from Flat CSV
CSV is flat. If your JSON target is nested, you need a mapping strategy.
Flat CSV:
user_id,user_name,address_city,address_zip
1,Alice,Bangalore,560001
Desired JSON:
{"user": {"id": 1, "name": "Alice"}, "address": {"city": "Bangalore", "zip": "560001"}}
This requires a transform step after basic CSV → JSON conversion. Tools like jq, pandas, or a custom mapping function handle this. DevConvert's basic CSV → JSON produces the flat form — nesting requires post-processing.
Excel-Specific Gotchas
Excel-exported CSVs have quirks:
- **BOM prefix** — Windows Excel adds a UTF-8 BOM (\xEF\xBB\xBF) to CSV files. Parsers that don't strip it produce a key like "name" instead of "name".
- **Date formatting** — Excel may export dates as "01/15/2024" or "1/15/24" rather than ISO 8601
- **Number formatting** — Excel may export 1,234.56 with a comma thousand separator, which breaks CSV parsing
- **Encoding** — older Excel versions export Latin-1 encoding; parsers expecting UTF-8 will corrupt accented characters
Google Sheets is better behaved — it exports clean UTF-8 CSV with ISO dates.
Building a Reliable Import Feature
If you're building a CSV import feature for your application:
1. **Parse CSV strictly** — use a proven library, don't write your own
2. **Validate headers** — check required columns exist before processing rows
3. **Preview before import** — show the user the first 5 rows parsed as JSON and let them confirm
4. **Report errors by row** — "Row 42: invalid date format in column 'created_at'" is infinitely better than "Import failed"
5. **Use transactions** — if any row fails validation, roll back the entire import
6. **Handle large files** — stream the CSV rather than loading it all into memory; a 50MB spreadsheet is 100,000+ rows
The simple case (small, clean CSV from a known source) is trivial. The production case (user-uploaded files from unknown Excel versions) requires careful handling of every edge case.