In Data Engineer’s Lunch #12: Introduction to sed for Data Engineering, we introduce sed, a stream editor, and how we can use this tool for data engineering. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!
sed is a stream editor that is commonly used to take text input, perform operations on the input, and output modified text for fast data engineering. These inputs can be from a file or from an incoming data pipeline. sed also supports basic and extended regular expressions that allow you to match complex patterns. In data engineer’s lunch #12, we also have a demonstration where we learn some basics of sed; as well as, applying those basics to a potential real-world situation.
For written instruction of the demo, you can check out this GitHub repo, or this blog. You can open the repo directly in Gitpod by hitting the “Open in Gitpod” button, which we recommend so you do not have to download any files and/or software to your local computer. If you want to watch a live demonstration of the sed walkthrough, you can check out the live recording of Data Engineer’s Lunch #12: Introduction to sed for Data Engineering embedded below.
If you missed last week’s Data Engineer’s Lunch #11: MLFlow and Spark, be sure to check it out! As mentioned above, the live recording of Data Engineer’s Lunch #12 is embedded below. Also, check out our YouTube page for more videos and the Data Engineer’s Lunch playlist here! Don’t forget to subscribe while you are there!
Resources
- https://github.com/adp8ke/Introduction-to-sed
- https://www.gnu.org/software/sed/manual/sed.html
- https://www.oracle.com/technical-resources/articles/dulaney-sed.html
- https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux
- http://sed.sourceforge.net/sed1line.txt
- http://sed.sourceforge.net/#scripts
- https://www.linuxtechi.com/20-sed-command-examples-linux-users/
Cassandra.Link
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!