Data Scientist

PythonPostgresSparkNYC · Mid · Seed

About Minerva

Minerva builds AI for marketing leaders. Our platform allows marketers to focus on telling their brand's story, delegating operationally intensive to our AI agents which handle data management, analytics, campaign generation, measurement, and reporting.

Everything is built on Minerva's proprietary consumer graph, an identity and attribute layer covering 270M+ U.S. consumers across 1,000+ temporal attributes. We have two agentic systems built through an OpenAI research partnership: an Agentic Data Engineer that unifies and standardizes a brand's first party data in hours, and an Agentic Data Scientist that trains robust targeting models at scale. Together, these systems enhance the quality of first party data, increase campaign performance, and give marketing teams back their time.

Our clients include leading consumer brands across categories: the NBA, Ramp, Capital One, Hard Rock Stadium Group / Miami Dolphins, Wander, and Trust & Will. We have raised $20M from The General Partnership, 8VC, Lingotto, NBA Investments, Topology Ventures, Future Positive, Background Capital, and others.

About the Role

As a Data Scientist at Minerva, you build the models and features that power our consumer graph and the agents that run on top of it. You sit at the intersection of heavy data engineering and applied modeling: you architect feature engineering pipelines that are computed over terabytes of data, train and sharpen the models that drive targeting and prediction, and ensure the outputs are robust enough to be consumed autonomously by our Minerva Agents and our world-class modeled attributes (i.e. income / wealth).

This is a role that will be deploying constantly to production. The models you build are not handed off to be deployed by someone else, you own the path from raw data to a feature or model that an agent can call reliably at scale. As we grow, your work becomes the foundation other systems are built on.

What You'll Do

Create new features for models and agents, expanding the predictive surface area of our consumer data lake and building the pipelines that turn raw signal into trusted attributes.
Improve existing models through rigorous feature engineering, including our income/wealth, home buyer, and home seller models.
Play a pivotal role in the buildout of our world-class data lake, shaping how terabytes of consumer data are stored, transformed, and made queryable for both humans and agents.
Build feature engineering pipelines that run efficiently at terabyte scale, with the data engineering rigor to make them reliable in production. This is a 70/30 split DS/DE role.
Ensure model and feature outputs are reliable enough to be consumed agentically, writing the validations and guardrails that let our agents act on your work without a human in the loop.

Our Data Stack

Dagster for all things orchestration
dbt-core within Dagster as the primary data transformation surface
Spark, Iceberg, Trino, AWS Glue for Lakehouse workloads
Modal for ML eng
Frontier + OSS models & agent SDKs. We are heavy users of OpenAI/Anthropic batch APIs

Qualifications

2-4+ years working as a data scientist, applied machine learning focused data engineer or software engineer in a data-heavy context. Simply put, you live and breathe data.
Highly proficient at Python and SQL.
You are driven by first-principles thinking and are a go-getter. You reason about what datasets and features are necessary to solve a modeling problem, and are scrappy and clever enough to bring that to life.
Strong intuition for data engineering principles, especially around data cleaning/ingestion and data modeling. We prefer these core skills to be second-nature, freeing up thinking for architecting and executing large-scale data initiatives, especially given the advancement of AI coding tools.
Strong engineering background. You are comfortable deploying complicated production pipelines and working within larger production systems, not just in sandboxed or research environments.
Willingness to work in office in NYC (we provide a relocation package).
Flexibility and openness to wearing several hats. We are lean and things are always changing.
Eagerness to learn and grow with the company and your coworkers.

Preferred

Experience building and training predictive models (e.g. lead scoring, LTV, propensity, lookalike modeling).
Experience with orchestration tools like Dagster, Airflow, Prefect and SQL transformation tools like dbt, SQLMesh.
Experience with both transactional databases (e.g. Postgres, MySQL) and analytical databases (e.g. Snowflake, Redshift), with a bias toward the latter.
Familiarity with a cloud resource provider (e.g. AWS, GCP).
Familiarity with backend and ML/AI engineering.
Experience with AI coding tools (e.g. Cursor, Claude Code, OpenCode) as a force multiplier.
Prior work at an early-stage startup.

You don't need to tick every box. If you're strong on the engineering side and hungry to build models that matter, we want to hear from you.

Compensation

Base salary: $200,000 to $225,000, commensurate with experience. Competitive equity and a marquee benefits package.

Check your CV against this role

Drop your CV. You get a 0-100 fit score against the actual job description, plus the read a senior engineering lead would write. Private to you.

Score this once, or every future role

Start the candidate journey and every new role on the board gets scored against you.

Five minutes. Tell us what you’re after, drop your CV once, pick how we should reach out. You get a candid read back and you only hear from us when a role fits.

Start the journey How it works

More at Minerva

Forward Deployed Engineer, Data SystemsNYC · Mid→Staff Data EngineerNYC · Staff→