Transform Data Chaos Into Precision
Scourr builds institutional data and research infrastructure for systematic funds. We turn noisy web-scale corpora into point-in-time, survivorship-biasâfree features and event streams, delivered as APIs and notebooks for signal research, portfolio construction, and production deployment.
Advanced Technology Architecture
Cutting-edge infrastructure powering quantitative research and data intelligence
AI & Machine Learning
Data Infrastructure
Quantitative Analytics
Security & Compliance
Platform SLOs (30âday rolling)
Data Processing Pipeline
Data Ingestion
18M raw docs (2019â2024)
Processing
8.6M cleaned (48%), 21.6% dedup
Insights
1.8M embeddings ⢠realâtime features
Quantitative Research
From corpus to tradable alpha: data engineering, feature generation, and rigorous outâofâsample validation
Agentic RAG Systems
Autonomous corpusâtoâfeature extraction with LLM agents: entity resolution, event labeling, deduplication, and timestamp normalization. Outputs clean, pointâinâtime features to a research feature store for robust crossâsectional modeling.
High-Throughput Systems
Asynchronous batched retrieval, Redis caching, and backfill orchestration to maintain lagâcorrected, vendorânormalized panels. Endâtoâend RAG query latency median 220ms, p95 226ms. Supports ~100 concurrent signal evaluations.
Advanced Backtesting
Walkâforward evaluation with 10âfold purged CV (120âday embargo), Monte Carlo path perturbations, and transactionâcost/risk models. Enhanced multiâsignal model Sharpe 0.92 OOS vs 0.80 baseline (2019â2024); drift to production â3bps over 6 months.
Alpha Generation Pipeline
Data Ingestion
Tens of millions of documents
AI Processing
Agentic RAG analysis
Signal Generation
Alpha extraction
Portfolio Alpha
Sharpe 0.92 OOS vs 0.80 baseline
Data-Driven Solutions
Productionâgrade data and research platform for systematic funds and private investors
Mass Data Scraping
Continuous webâscale acquisition with pointâinâtime snapshots, entity linking, and intelligent deduplication. Rawâcleaned: 18Mâ8.6M (48% retained). Dedup reduction 21.6%.
Business Intelligence
Research workbench: feature store, APIs, notebooks, diagnostics, and leakage checks (10âfold purged CV, 120âday embargo). Factor library and audit trails to accelerate ideaâproduction cycles for QR/PM teams.
Quantitative Trading
Executionâready signals for equities/statâarb: crossâsectional alphas, riskâneutralization hooks, universe/portfolio construction primitives, and monitoring to keep models stable in production.
Efficiency Through Intelligence
Shorten research cycles and ship models faster, automation for data prep, feature generation, and backtesting.
Time Savings
Automated analysis frees up strategic thinking
AI Automation
Self-learning agents adapt to market patterns
Enterprise Grade
Consulting-level competitive advantage
Strategic Insights
Deep market intelligence for better decisions
Ready to Transform Your Data?
Work with CIOs/PMs/QRs under NDA to deploy private, customized data and research pipelines, onâprem or VPC-tailored to hedge funds and family offices.
Milestones
Our journey, visualized. A plan from pen to processor.
Genesis
The initiative began in September 2024 as a solo effort to test whether alternative data could be ingested, cleaned, and transformed into research-grade features. The initial months focused on building ingestion pipelines, implementing entity resolution, and developing leakage-aware backtesting. Despite being driven by a single developer, the work demonstrated that meaningful alpha could be extracted from messy, unstructured data.
Dream
- 2027: Expand coverage to about 50k instruments across global equities and macro series. Ingest about 250M documents per year with fully automated quality gates. Grow the vector index to about 30M embeddings while keeping p95 latency near 150ms through GPU accelerated retrieval and aggressive caching.
- 2028: Build a cross market knowledge graph that links entities, events, and features. Ship an internal feature marketplace and automated lineage and drift governance. Reach about 500k API calls per day with hybrid GPU and TPU inference that remains cost aware.
- 2029+: Autopilot research loops that propose, test, and promote signals with human approval. Offer private on premise deployments for funds with reproducibility guarantees and cryptographic audit trails. Process over one billion documents per year while keeping end to end latency under 100ms for common queries.
Scale
- 2025: Coverage to ~12k equities; docs to ~40M/yr; embeddings >3M; p95 latency under 250ms.
- 2026: Global ~20k universe; embeddings 8â10M; ~120k API calls/mo; <15% external dependency; hundreds of concurrent signals.
Synthesis
Over time, the pieces began to come together: robust pipelines, automated deduplication, timestamp-normalized features, and a walk-forward validation framework. This synthesis created a self-reinforcing loop where data flowed in, features were generated, and signals were tested, producing stable, out-of-sample alphas with Sharpe improvements over baseline models. It marked the transition from experimentation to a fully functioning research platform.
Ready to Extract Alpha?
Connect with our quantitative research team
scourr@proton.me
Responses vary by consumer volumeLocation
Toronto, ON
CanadaStatus
Pre-Seed
Actively growing