Staff Data Engineer
ABOUT MINERVA
Minerva builds AI for marketing leaders. Our platform lets marketers focus on telling the story of their brand while AI agents handle the operationally intensive work: data management, analytics, campaign generation, measurement and reporting.
Everything is built on Minerva's proprietary consumer graph: an identity and attribute layer covering 270M+ U.S. consumers across 2,000+ through-time attributes. On top of it sit two agentic systems built in partnership with OpenAI: an Agentic Data Engineer that unifies and standardizes a brand's first-party data in hours, and an Agentic Data Scientist that trains robust targeting models at scale. Together, our data and platform improve the quality of a brand's first-party data, lift campaign performance and give marketing teams their time back.
We work with leading consumer brands across categories, including the NBA, Capital One, Hard Rock Stadium Group / Miami Dolphins, Wander and Trust & Will. We've raised $20M from The General Partnership, 8VC, Lingotto, NBA Investments, Topology Ventures, Future Positive, Background Capital and many others. Our team brings together operators and investors from Citadel, Dentsu, Bridgewater, Meta Superintelligence and Lazard, alongside researchers from Berkeley, MIT, Stanford and Cambridge.
ABOUT THE ROLE
We run a multi-tenant transformation platform that unifies a portfolio of customer brands into a single golden data model across Shopify, HubSpot, Klaviyo, Google/Meta Ads and more.
You'll be the technical owner of this part of our broader data platform: the person who designs for the regime we're growing into rather than the one we're in. The defining question of the role is how to make the cost of the next tenant (and the next data source, the next entity in our ontology, the next golden model) flat instead of linear, while correctness, isolation and freshness hold as the system fans out.
Increasingly, that platform isn't operated only by people. In partnership with OpenAI, our Agentic Data Engineer (ADE) already standardizes bespoke client data into golden records, and downstream agents like our Agentic Data Scientist (ADS) build models on top of it. You own the substrate they stand on (the contracts, the golden model, the access layer) and the systems that expose our data to internal and external AI agents (MCP, vector search) as a first-class channel. We believe agentic access to our data is just as important as the traditional API and ETL paths, if not more, and this role builds for that.
This is a build-from-strength role. We expect data cleaning, modeling and warehouse fluency to be second nature so your thinking is free for architecture and large-scale initiatives, especially given how much leverage modern AI coding tools give a strong engineer.
WHAT YOU'LL DO
Design the tenant-onboarding model so adding the Nth tenant is a config-and-metadata operation, not bespoke engineering, driving source routing, transformation, model membership and per-tenant view generation from a single source of truth rather than hand-assembly
Own correctness guarantees across a wide fan-out of tenants and models: the class of problem where coverage gaps, contract drift and partial rollouts can hide unless the system makes them structurally impossible. Build the contracts, audits and CI gates that turn "is every tenant fully represented?" into something the build answers, not a person
Own the org-identity and access layer (tenant provisioning, source-to-tenant routing, multi-tenant isolation and RBAC) so it holds at hundreds of tenants without per-tenant special cases
Build the platform machinery that lets our data ontology grow (new entities, attributes and behaviors) without breaking everything downstream. Own how the ontology is represented, versioned and propagated: schema evolution, contract migration and backward compatibility across hundreds of tenants and the models built on them
Partner with the field as our FDEs and customers surface the new concepts a customer needs, making adding them to the canonical model a safe, repeatable operation rather than a risky one-off
Build the systems that expose Minerva's data product to internal and external AI agents (MCP servers, vector search, semantic and metrics layers) so agents can query, reason over and act on our data safely. Treat agentic access as a first-class delivery channel alongside API, batch ETL and CRM integrations
Own the substrate our agentic workflows depend on: the golden-record contracts ADE produces and ADS consumes. Build the guardrails, validation and observability that let agent-generated transformations be trusted in production at scale. You build the reliability layer under the agents
Lead the evaluation and migration of our orchestration and transformation stack as throughput, concurrency and latency demands outgrow what today's tooling is built for, incrementally, without halting the business running on top of it
Architect the path to real-time: keeping the analytical warehouse in sync with the Postgres/Elasticsearch product backends and standing up the infrastructure for the CDP (streaming audiences, identity resolution, reverse-ETL to ad platforms and CRMs)
Define how pipelines are built, tested, deployed and governed as the team and tenant count grow: isolation, privacy/opt-out enforcement and data-quality SLAs as first-class platform properties
WHAT YOU HAVE
5+ years in data/software engineering in a data-heavy context, with real ownership of a production data platform at scale (think XX+ TB, many sources, many consumers)
Expert Python and SQL, fluent enough that ingestion and modeling are reflexes, not projects
Deep experience with transformation and orchestration tooling (SQLMesh, dbt, Airflow or equivalent) and a point of view on where they break and what comes next
Strong grasp of analytical warehouses (Snowflake/Redshift/BigQuery) and how they interplay with transactional stores (Postgres/MySQL)
A track record of multi-tenant design: isolation, governance and secure access patterns as tenant count climbs
Demonstrated judgment on the stability vs. speed of development tradeoff: you've kept a platform reliable for its users while still letting a team ship fast on top of it, and you can articulate how you decided where each belonged
Experience evolving a schema, data model or ontology in production (versioning, migration and backward compatibility) without breaking downstream consumers
Interest in (or experience) exposing data to AI agents (MCP, vector search, semantic layers, retrieval) and a conviction that agentic access is a first-class way to deliver a data product
Comfort owning ambiguous, cross-cutting initiatives end-to-end in a lean, fast-changing environment. Willing to work onsite in NYC (relocation provided)
NICE TO HAVES
Lakehouse or Spark experience and comfort handling XX+ TB datasets
Built or operated a CDP, identity graph or reverse-ETL/activation system
Strong communication skills, preferably has interfaced with enterprise customers before
Data lineage and reactive-propagation systems; streaming (Kafka/Flink/Snowpipe)
Backend or ML/AI-eng exposure; feature engineering and training infra are adjacent to this role
Heavy and efficient use of AI coding tools (Claude Code, Cursor, etc.) as a force multiplier
Built infrastructure for agentic or LLM workflows on top of data (OpenAI Agents SDK, MCP servers, vector DBs, retrieval and eval harnesses)
Prior early-stage startup experience
Counterpart role: you own the technical side of evolving the ontology and platform; our Forward Deployed Engineer sits with the customer, expands the ontology from the field and feeds you the patterns worth productizing.
COMPENSATION
$200,000-$250,000 base salary, commensurate with experience. Meaningful equity. Performance Bonus. Medical, dental and vision.
Apply: hiring@minerva.io | Subject: Staff Data Engineer Candidate
Check your CV against this role
Drop your CV. You get a 0-100 fit score against the actual job description, plus the read a senior engineering lead would write. Private to you.
Score this once, or every future role
Start the candidate journey and every new role on the board gets scored against you.
Five minutes. Tell us what you’re after, drop your CV once, pick how we should reach out. You get a candid read back and you only hear from us when a role fits.