Senior Site Reliability Engineer - Hiring Sprint

PythonJavaKubernetesTerraformLLMSF · Senior · Seed

Airbyte is the data and action layer for AI agents. We give agents fast, accurate, authenticated access to business data across hundreds of sources, so they can discover the entities that matter, reason over real-time context, and take action in the systems they read from, not just observe them.

We started as the open-source standard for data movement and proved the economics of data integration at scale: hundreds of connectors, thousands of companies, and, since 2020, have raised $181M from leading investors including Benchmark, Accel, Altimeter, Coatue, and Y Combinator. As our CEO Michel Tricot puts it, "the last ten years were all about structured data. The future is all about context." We're now building that context infrastructure for production-grade agents on the same open foundation, as agents become the primary consumers of enterprise data.

Our mission is unchanged: make data available and actionable to everyone, everywhere. That everyone now includes AI agents.

Engineering Hiring Sprint:

We're growing our engineering team and are accelerating hiring through a focused Engineering Hiring Sprint. Rather than stretching interviews over several weeks, we're bringing exceptional candidates through an expedited process and making hiring decisions quickly.

Interview process:

Apply
Technical Take-Home (Java or Python)
Hiring Manager Interview
In-Person Onsite (the week of July 20)
Hiring decision by the end of the week

We're hiring across multiple engineering teams, including:

⚙️ Platform Engineers
🗄️ Database Engineers
☁️ Site Reliability Engineers
🔌 Extensibility API Engineers
🤖 AI Agents Engineers
👤 Engineering Managers

If you enjoy solving complex technical problems, moving quickly, embracing AI, and taking ownership of your work, we'd love to meet you.

The Role:

You'll be the infrastructure and reliability engineer on the Data Replication team - a full-stack product team running over 3 million sync jobs a week powering thousands of data use cases across multiple regions and clouds. You’ll build and maintain the infrastructure, set reliability standards, drive down incidents, and make it easier and safer for engineers to ship through tooling. You're equally comfortable in a Terraform file, a Kubernetes cluster, and a postmortem doc.

We expect engineers here to actively use AI as a force multiplier - agentic tools to automate toil, augment incident response, and build smarter internal tooling. If you're not already doing this, you should be excited to start. We care as much about how you work as what you build. Trust, directness, and craftsmanship matter here.

What You’ll Do:

Own the infrastructure underpinning the Data Replication platform - Kubernetes clusters, CI/CD pipelines, secrets management, networking, and cloud resource configuration across AWS and GCP.
Partner with product engineers to reliably integrate product features with infrastructure.
Maintain and enhance observability, alerting, and anomaly detection with an eye towards LLM automation.
Maintain and enhance AI-augmented release and internal tooling: canary deployments, progressive rollouts, automated release qualification, and rollback automation - with an eye towards LLM automation.
Set the infrastructure bar for the team - build self-serve tooling, write runbooks, and coach engineers to own more of their stack.

What You’ll Need:

7+ years in infrastructure, platform engineering, SRE, or DevOps.
Hands-on ownership of Kubernetes, Helm, and Terraform in production environments.
Deep experience with observability stacks (Prometheus, Grafana, Datadog) and on-call operations.
Experience with CI/CD pipeline ownership and developer tooling.
Ability & willingness to read backend code to understand how systems break and instrument them correctly.
Fluency with AI tools - LLMs and agentic frameworks to automate, debug faster, and reduce toil.
A startup-ready mindset: comfortable with ambiguity, moving fast, and owning problems end-to-end.

Nice To Have:

Data pipelines, replication systems, or ETL/ELT platforms.
Control plane / data plane architectures or internal developer platforms.
Experience with Airbyte, CDKs, or connector-based architectures.

Location:

Onsite 4 days/week in San Francisco, CA

Why You'll Love Working at Airbyte:

At Airbyte, we believe great work happens when people feel supported, trusted, and empowered to grow. Our market-leading Total Rewards package is designed to help you thrive professionally and personally. Our benefits and perks include:

Flexible PTO with a culture that encourages at least 25 days off annually
16 weeks fully paid parental leave for all parents
Comprehensive medical, dental, and vision coverage for employees and dependents
401(k) retirement plan
Professional development budget, conference sponsorship, and book reimbursement
Commuter benefits and monthly internet reimbursement
Breakfast and lunch in our San Francisco office
A collaborative, in-person culture focused on learning, growth, and impact

If you find this role exciting, we encourage you to apply even if you think you don’t meet all of the requirements!

Check your CV against this role

Drop your CV. You get a 0-100 fit score against the actual job description, plus the read a senior engineering lead would write. Private to you.

Score this once, or every future role

Start the candidate journey and every new role on the board gets scored against you.

Five minutes. Tell us what you’re after, drop your CV once, pick how we should reach out. You get a candid read back and you only hear from us when a role fits.

Start the journey How it works

More at Airbyte

Engineering Manager, Platform · Hiring SprintSF · Mid→Senior Software Engineer, Platform Fullstack · Hiring SprintSF · Senior→Senior Software Engineer, Platform · Hiring SprintSF · Senior→Senior Software Engineer, Integrations (Databases) · Hiring SprintSF · Senior→