Forward Deployed ML Engineer
Job Description:
Build and deploy AI agent pipelines that extract structured oncology variables from unstructured patient documents for tailor made use cases for pharmaceutical companies and cancer hospitals. You own the full cycle: understanding the customer's data dictionary, studying the source clinical documents, building extraction agents, evaluating accuracy, deploying to production, and iterating until it works. This role requires someone who can go deep into both the agentic layer as well as the clinical domain, coordinate across customer and internal teams, and deliver under deadline pressure.
Responsibilities
Design and build agentic extraction pipelines that process 500+ page patient charts (clinical notes, pathology reports, imaging reports, genomic panels) and output structured JSON per customer data dictionaries
Own accuracy end-to-end: define evaluation datasets, run precision/recall analysis per variable, identify failure modes, and improve through agent architecture changes, prompt engineering, fine-tuning, or rule-based post-processing
Go deep into the clinical source data - read the actual patient charts, understand how oncologists document, learn why certain data points are ambiguous and use that understanding to improve extraction
Work with the clinical annotation team to build gold-standard datasets and resolve edge cases
Coordinate with customer data science and clinical teams to clarify dictionary definitions, review output quality, and close accuracy gaps
Coordinate with internal engineering and infrastructure teams to deploy, scale, and monitor pipelines in production
Deliver on customer timelines - this means intense sprint periods around customer deliveries followed by iteration and improvement cycles
What Success Looks Like in the First 90 Days
Days 1-30: Learn the stack, the data, and the domain.
You should be reading real patient charts within your first week - not abstractions of them. Understand how oncologists document across clinical notes, pathology reports, imaging, and genomic panels. Learn why the same data point (e.g., disease stage, biomarker status, line of therapy) shows up differently across document types and why extraction is hard. Get hands-on with the existing extraction pipeline architecture: how agents are orchestrated, how documents are segmented and classified, how structured JSON is produced, and where the current system fails. Run the evaluation suite on an active customer dictionary and understand the per-variable accuracy breakdown - which variables are easy, which are hard, and why. By end of month one, you should be able to explain the top 5 failure modes in the current extraction pipeline and have an opinion on which ones are fixable with prompt/agent changes vs. which require deeper architectural work.
Days 30-60: Own a customer delivery end-to-end.
Pick up an active customer workstream -- a new dictionary, a new tumor type, or an accuracy improvement cycle on an existing delivery. Run it yourself: study the customer's data dictionary, map it to the source documents, build or modify the extraction agents, define the evaluation dataset with the annotation team, run precision/recall per variable, and iterate until accuracy targets are met. You should be coordinating directly with the customer's data science team on edge cases and definition ambiguities. Simultaneously, you should be identifying patterns across customer dictionaries.
Days 60-90: Ship improvements and have an opinion on every decision
Deliver measurable accuracy improvements on your owned workstream - concrete numbers, not vibes. Document the pipeline architecture, evaluation methodology, and customer-specific decisions well enough that another engineer can pick up the work. You should have a point of view on how to standardize extraction pipelines across customers so that new dictionary onboarding takes days, not weeks.
Requirements
2+ years building ML/AI systems in production
Built and deployed AI agents or multi-step LLM pipelines (not just single-call wrappers) - you should have a clear point of view on agent architectures, tool use, orchestration frameworks, and where they break down
Strong Python - pipeline code, data processing, infrastructure glue, not just model training scripts
Practical LLM experience: prompt engineering, fine-tuning, RAG, evaluation design
Built evaluation frameworks for LLM based document extraction tasks (precision, recall, per-class analysis, error taxonomy)
Willingness to become a domain expert in oncology data - this role requires going deep into clinical documentation, not just treating it as generic text
Comfortable owning customer-facing communication alongside technical delivery - you'll talk to customer data science teams, clinical teams, and internal engineering regularly
Can operate in high-intensity delivery sprints and manage your own time across multiple workstreams
Preferred
Kept up with the agentic ML landscape - frameworks, patterns, and failure modes in production agent systems
Clinical or biomedical NLP is a plus but not required - what matters is willingness to go deep into the domain
Check your CV against this role
Drop your CV. You get a 0-100 fit score against the actual job description, plus the read a senior engineering lead would write. Private to you.
Score this once, or every future role
Start the candidate journey and every new role on the board gets scored against you.
Five minutes. Tell us what you’re after, drop your CV once, pick how we should reach out. You get a candid read back and you only hear from us when a role fits.