In Data Engineer’s Lunch #3: Scripting / Shell Automation for Data Engineering, we discuss a multitude of tools you can use to do scripting and shell automation for data engineering. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!
In Data Engineer’s Lunch #3: Scripting / Shell Automation for Data Engineering, we discuss a multitude of tools you can use to do scripting and shell automation for data engineering. We cover different shells, cron, and various command-line tools with resources and examples. If you want a more in-depth discussion, be sure to watch the live recording of Data Engineer’s Lunch #3 embedded below! Don’t forget to like and subscribe while you watch it!
Shell
- BASH
- PowerShell
- Perl
cron
- cron command in Linux with Examples – GeeksforGeeks
- Minute (holds a value between 0-59)
- Hour (holds value between 0-23)
- Day of Month (holds value between 1-31)
- Month of the year (holds a value between 1-12 or Jan-Dec, the first three letters of the month’s name shall be used)
- Day of the week (holds a value between 0-6 or Sun-Sat, here also first three letters of the day shall be used)
Command-Line Tools
- Command-line Tools can be 235x Faster than your Hadoop Cluster – Adam Drake
- Grep
- Cut
- Diff/Cmp
- Uniq
- Sort
- TR
- Sed
- Sed Command in Linux/Unix with examples – GeeksforGeeks
- SED is a powerful text stream editor. Can do insertion, deletion, search and replace(substitution).
- SED command in unix supports regular expression which allows it perform complex pattern matching.
- Sed Command in Linux/Unix with examples – GeeksforGeeks
- Awk
- AWK command in Unix/Linux with examples – GeeksforGeeks
- WHAT CAN WE DO WITH AWK ?
- 1. AWK Operations:
- (a) Scans a file line by line
- (b) Splits each input line into fields
- (c) Compares input line/fields to pattern
- (d) Performs action(s) on matched lines
- 2. Useful For:
- (a) Transform data files
- (b) Produce formatted reports
- 3. Programming Constructs:
- (a) Format output lines
- (b) Arithmetic and string operations
- (c) Conditionals and loops
- AWK command in Unix/Linux with examples – GeeksforGeeks
- JQ
- Other Tools
- json2csv
- csv2json
- xml2json
- json2xml
If you missed last week’s Data Engineer’s Lunch #2: Common ETL Frameworks, be sure to check it out! As mentioned above, the live recording of Data Engineer’s Lunch #3 is embedded below. Also, check out our YouTube page for more videos and the Data Engineer’s Lunch playlist here! Don’t forget to subscribe while you are there!
Cassandra.Link
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!