In Data Engineer’s Lunch #32: Converting JSON to CSV, we discussed tools for converting JSON to CSV and went through a demo using NodeJS to do so. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!
Converting JSON to CSV
What is CSV?
CSV, which stands for character-separated values or comma-separated-values, has been a ubiquitous file format since it was first used in the 1970s. It is a text file format that contains one or more fields. Each line of the file is considered an individual record and each record will have identical fields. An example of what a typical CSV file structure would like is shown below.
id,setup,punchline
1,I slipped on a bra the other day...,It was a booby trap.
2,What do you call a large pile of cats?,A meowtain.
3,What is the best way to count cows?,A cowculator.
Why do we need CSV?
CSV is primarily used for exchanging data between programs. CSV tends to have wide support as an exporting file format as well as being able to be imported into a wide range of programs including Spark, Elasticsearch, Excel, Google Sheets, and an inexhaustible list of others.
What is JSON?
JSON stands for JavaScript Object Notation. It is a file format with an open standard that stores data in attribute-value pairs and arrays that are human-readable. It is also frequently used in data exchange especially in cases of communication between applications and servers. By comparison to CSV, JSON is a much younger file format with its first standardization coming in 2013. Below is an example of JSON object describing a person.
[
{
"firstname": "John",
"lastname": "Smith",
"age": 30,
"address": {
"line1": "123 N. Main St.",
"line2": "suite 10",
"city": "New York City",
"state": "NY"
},
"phoneNumbers": [
{
"type": "work",
"number": "555-555-5555"
},
{
"type": "cell",
"number": "555-123-4567"
}
],
"email": "JSmith@domain.com",
"employed": true
}
]
Why do we need JSON?
JSON is used extensively in web development using AJAX, or AJAJ, Asynchronous JavaScript and XML or Asynchronous JavaScript and JSON. This methodology allows the ability of a webpage to make new requests for data after it has loaded into a browser. JSON has also been used as a configuration language for specifying parameters and settings for computer programs. One such JSON superset you may be familiar with is YAML files.
Tools for Converting JSON to CSV
Jq: A JSON command-line processor, is likely the most popular option for converting JSON to CSV and a great blog post as an introduction can be found here.
json2csv: Customizable, lightweight node package that uses stream processing to scale for larger datasets.
Altova MapForce: An any-to-any data mapping tool with a graphical user interface with a library of functions for conversions that can be used together to make custom functions.
Python: The Python programming language supports methods to convert JSON to CSV using the built-in JSON package.
Demo
A walkthrough of the demo used in the video can be found here.
Along with the respective repo at https://github.com/vikobg/first-class-js/tree/master/nodejs-json-to-csv.
Cassandra.Link
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!