<evan.rosa/>
<evan.rosa/>

From Pipelines to Purpose: Why I’m Sharing My Journey in Data Engineering

Cover Image for From Pipelines to Purpose: Why I’m Sharing My Journey in Data Engineering

Even after a full week of work and family life, I spend my weekends tinkering with data pipelines—not because I have to, but because this is how I level up. I’m not just refining skills—I’m writing the next chapter of my career, in public.

Who I Am

Hey there—I’m Evan Rosa, a data engineer with nearly 15 years of experience in the data space designing, building, and scaling data systems across the public, private, and nonprofit sectors.

I’ve architected cloud-native solutions for:

  • Digital Turbine, where I helped drive over 3 billion monthly ad impressions
  • HHS/NIH (via Booz Allen Hamilton), supporting data integrity in critical health research
  • The American Chemical Society, building systems to track engagement to their 80+ peer reviewed journals.

My toolkit includes Python, SQL, Airflow, Spark, Kafka, Flink, and lately I’ve been getting my hands dirty with Apache Iceberg and Project Nessie to bring version control into the modern data stack. I focus on building flexible, connector-driven ETL frameworks that scale—because brittle pipelines simply don’t work.

Why I’m Writing This Blog

I’m not just sharing to show off what I’ve built—I’m sharing to connect.

This blog is my digital workbench: part journal, part lab notebook, part blueprint. I’ll walk you through:

  • Projects I’ve built—with architecture diagrams, code, and real-world lessons
  • Tools I’ve adopted (and abandoned), and why
  • Thoughts on what makes great data engineering
  • My personal roadmap toward an advanced data engineering role at a mission-driven company

I believe in building in public—because the best ideas get sharper when shared. If something I’ve wrestled with helps you debug your own pipeline, that’s a win.

What I’ve Built (So Far)

I’ve led and contributed to systems like:

  • Cost-efficient ETL frameworks with Airflow, Spark, and BigQuery, slashing cloud bills by over $100K
  • Streaming analytics pipelines using Kafka and Flink to support real-time dashboards in sports tech and subscription platforms
  • Composable, connector-based ingestion frameworks pulling data from APIs, GCS, S3, and live event streams
  • Self-serve analytics layers with Looker and Superset to help non-technical teams explore data confidently

These aren’t portfolio projects—they’re solutions to business-critical challenges, delivered in production environments with real stakes.

How I Think About Modern Data Workflows

To me, the best pipelines aren’t fragile contraptions—they’re well-oiled machines built for change. They should be:

  • Modular: Swap in new tools without rewriting your stack
  • Version-controlled: Git-like rollbacks for data models using Nessie and Iceberg
  • Hybrid: Real-time and batch playing nicely together
  • Self-serve: So analysts aren’t left waiting on pull requests
  • Observable and testable: Because trust in data is earned, not assumed

In short: Data engineers should be user experience designers—not just of dashboards, but of the pipelines themselves.

What You’ll Find Here

This isn’t another “hello world” blog.

Instead, I’ll offer:

  • Hard-won insights from real-world projects
  • Architectural patterns that stand up to scale
  • Thoughts on building resilient, elegant data systems
  • Ideas on career growth, tooling, and staying sharp in a fast-moving field

If you’re looking for practical, thoughtful takes on modern data engineering—you’re in the right place.

Let’s Build Together

Whether you’re a fellow builder, a hiring manager looking for a senior engineer, or someone just starting out—welcome.

Let’s share what we know, learn what we don’t, and build something better—together.

Follow along if you’re curious about how great pipelines are built—and what it means to build them with purpose.


Connect with me:
LinkedIn | Portfolio