• 50K+Indexed data points
  • 6,000+Reverse-engineered transactions
  • 5.5MPricing data points
  • 40+Years of deal history
  • 9Primary jurisdictions, daily
  • 5+Languages parsed natively
Architecture

Three layers, end to end.

Ingestion brings primary documents in. Analysis turns documents into a structured knowledge graph and runs the valuation work on top of it. Reporting surfaces the outputs as scored briefs, hidden-connection alerts, new-stream detection, and benchmark queries.

Ingestion layer Analysis layer Reporting layer Automated daily ingestion SEC filings 10-K, 10-Q, 8-K, S-1 Court records litigation, settlements Regulatory FDA, EMA, PMDA Trials outcomes, timelines Patents families, expirations Pricing 5.5M data points Deal announcements licensing terms University TTO tech transfer, spin-outs Owned compute infrastructure no third-party cloud APIs · full data sovereignty Graph neural networks map entity relationships across drugs, patents, companies, deals, trials Mixture-of-experts models domain-specialised on clinical, patent, pricing Structured knowledge graph 50K+ data points, 6,000+ transactions Continuous learning loop every verified outcome retrains the models Daily intelligence briefs scored against your investment thesis Hidden connections surfaced relationships invisible in any single filing New royalty streams detected buried in footnotes, litigation, amendments Competitive landscape shifts pipeline, formulary, pricing changes Royalty rate calculator rates by stage, indication, structure Purpose-built AI for pharmaceutical royalty intelligence. Every daily ingestion makes the dataset stronger.
Under the hood

Four compute roles, one purpose.

The system is partitioned by workload. Ingestion runs in parallel across primary sources. Enrichment runs domain-trained models against a forty-year deal corpus. A dedicated role handles non-English markets continuously. A central data layer serves the rest of the system and powers the scoring engine. We do not publish hardware specifics.

Node 1: Ingest

Parallel extraction · multi-source

Parallel extraction from SEC EDGAR, clinical trial registries (ClinicalTrials.gov, EU CTR), patent databases (USPTO, EPO, JPO), press releases, and annual reports. Every new filing is automatically parsed, classified, and queued for enrichment. Coverage runs daily across the source set.

Node 2: Enrich & Value

Domain-trained models · multi-pass enrichment

Runs large parameter models purpose-trained on our deal corpus. Multi-pass enrichment: scientific context, competitive landscape, patent analysis, deal-precedent matching. Proprietary valuation engine runs DCF, Monte Carlo, and comparables against the full forty-year database for every asset.

Node 3: Global Markets

Non-English filings · five-plus languages

Dedicated to non-English markets. EDINET filings (Japan), DART system (Korea), EMA regulatory data (EU). Custom Japanese NER and OCR pipeline for pharmaceutical deal extraction. Always-on coverage scanning filings in five-plus languages. This is how we find deals no one else sees.

Node 4: Score & Surface

Central data layer · scoring & screening

Central data layer serves all nodes. Deal scoring engine rates every asset on royalty and licensing attractiveness. Distressed-biotech detection (cash runway, management changes, declining market cap). Origination screening: which holders are most likely to monetise next. Surfaces results to the research team and the publication pipeline.

Why owned infrastructure

Three reasons we do not run this on a cloud API.

01

Data sovereignty

Royalty research touches private deal documents, draft term sheets, confidential disclosures from holders. None of that leaves our infrastructure. No third-party model provider trains on our corpus, and no inference call routes through anyone else's logging.

02

Domain training, not generic

Our models are continuously fine-tuned on our own 40-year deal corpus. A general-purpose frontier model is excellent at language but does not know what a step-down clause looks like in a 1998 Merck-Schering term sheet. The domain training is the moat.

03

Cost discipline at the long tail

Daily ingestion across nine jurisdictions and five-plus languages would be uneconomic on per-token pricing. Owned hardware turns marginal cost to near-zero, which is what makes systematic coverage of sub-$100M positions tractable in the first place.

The output

What the engine surfaces, every day.

Surface What it is
Daily intelligence briefs Scored and ranked against your investment thesis. Asset, action, why now.
Hidden connections Entity relationships invisible in any single filing. Same molecule, multiple counterparties; same counterparty, multiple molecules.
New royalty streams Buried in footnotes, litigation, amendments. Detected the day they appear in a filing.
Competitive landscape shifts Pipeline, formulary, and pricing changes that move existing royalty economics.
Deal structure patterns Rate benchmarks by therapy area, stage, structure. Where the cohort is tight; where it is wide.
Royalty rate calculator Query fair rates by stage, indication, deal structure against the corpus.

Subscribe to the engine → Asset owners →

Numbers and architecture descriptions reflect the state of the engine as of the date of publication. The engine is a research tool. Outputs are research and analysis, not investment advice. Capital for Cures does not manage capital, hold positions, or place securities.