Lead Data Scientist (NLP & LLM Focus)

  • Full-Time
  • Remote

Job Description:

The Mission
We are solving the "Context Loss" problem in financial data. While others provide raw PDFs, FinancialReports provides semantic understanding. We need a Lead Data Scientist to perfect our PDF-to-Markdown engines and build the next generation of RAG-ready financial datasets.


The Role
You will lead our research into unstructured data extraction. Your primary focus will be enhancing the accuracy of our parsing algorithms—ensuring that complex tables in a German Annual Report are perfectly preserved for vectorization.


Key Responsibilities

  • Algorithmic Extraction: Improve our proprietary models for detecting and parsing financial tables from unstructured PDFs.
  • LLM Pipeline Optimization: Design pipelines that prepare our 10M+ filings for large-scale LLM training and RAG applications.
  • Quality Assurance: Build automated benchmarks to verify data integrity across 30+ languages.


Who You Are

  • Deep NLP Background: Experience with Transformers, OCR correction, and document layout analysis (DLA).
  • Research to Production: You don't just write papers; you ship models that run at scale.
  • Detail Obsessed: You understand that in financial data, 99% accuracy is a failure.


Why Join?

  • Data Advantage: You will have access to one of the world's largest clean corpora of financial text.
  • Equity & Impact: You are building the brain of the company. Compensation includes significant equity.