Data Engineer's Lunch #20: DataOps vs. DevOps

Data Engineer’s Lunch #20: DataOps vs DevOps

In Data Engineer’s Lunch #20: DataOps vs DevOps, we discuss the definitions and differences between DataOps and DevOps. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!

In Data Engineer’s Lunch #20: DataOps vs DevOps, we cover the definitions of and the difference between DataOps (Data Operations) vs DevOps (Dev Operations). If you want a more in-depth discussion of these topics, the live recording of Data Engineer’s Lunch #20 is embedded below!

Dev Operations (DevOps)

  • DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.
  • DevOps is also characterized by operations staff making use many of the same techniques as developers for their systems work.
  • Values
    • People over Processes over Tools
  • Principles
    • Infrastructure as Code
    • Do it Right / Do it Once
  • Practices
    • Source Control
    • Config management
    • Metrics
    • Monitoring
  • Tools
    • CICD (Jenkins, CircleCI , TeamCity, ADO, Google Build, AWS DevOps)
      • Continuous Integration (automated)
        • Pull down the code when code is committed
        • Build it
        • Unit Tests
        • Run it / run some tests
      • Continuous Delivery
        • One button deployment of the whole stack
        • Looks good?
        • Push to stage
        • Looks good?
        • Push to prod
    • Config Management
      • vault, consul, git
      • chef, puppet, ansible
    • Orchestration
      • terraform, cloudformation
      • vmware + terraform
    • Virtualization
      • virtual machine
    • Containerization
      • docker
      • kubernetes

Data Operations (DataOps)

  • DataOps (data operations) is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production. The goal of DataOps is to create business value from big data.
  • Values
    • People over Processes over Tools (Agile)
  • Principles
    • Infrastructure as Code
    • Do it Right / Do it Once
  • Practices
    • Inherited from Devops
      • Source Control
      • Config management
      • Metrics
      • Monitoring
    • Semantic Rules / Metadata
    • Feedback loops to Validate Data
    • Metrics for Execution
    • Automate as much of the processes of the data pipeline
    • Data Profiling
  • Tools
    • Inherited from DevOps
      • “CICD”
      • Config Management
      • Orchestration
      • Virtualization
      • Containerization
    • Scheduling (beyond cron)
      • Airflow
      • Jenkins
      • Luigi
      • Cloud Run
      • Kubernetes Pods for Jobs/CronJob
    • Data Catalogs
      • Giving the user one place to find the data

If you missed last week’s Data Engineer’s Lunch #19: Introduction to jq for Data Engineering, be sure to check it out! As mentioned above, the live recording of Data Engineer’s Lunch #20 is embedded below. Also, check out our YouTube page for more videos and the Data Engineer’s Lunch playlist here! Don’t forget to subscribe while you are there!

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!