Open Source · Python 3.11+ · Local-first

Know before you invest.

ML signals, institutional flows, and volatility-aware sizing on a private data lake you control.

One ./run.sh no Python install · ★ Star on GitHub

13+
Data sources synced
18+
ETFs scored daily
26
ClickHouse tables
10
Specialist AI agents
👆 Click a question watch Mosaic route & answer it live
mosaic interactive chat
$ ./mosaic.sh
Local Ollama detected · ClickHouse healthy · loading agent…
✔ Agent ready mosaic-gemma4 @ ollama
ML running: LightGBM 5d forecast · GARCH volatility · Isolation Forest anomaly
────────────────────────────────────────────────────────
You:

Real interactive REPL no slash command needed, just type. Every number is computed in Python/SQL from your live ClickHouse data, never invented by the LLM.

Three users, one platform

One window into every market

Launch with a single ./run.sh and the full Streamlit hub opens at localhost:8501 12 tabs over your live ClickHouse data. No Python, no notebooks, no glue code.

🔒 localhost:8501  ·  Mosaic Data Hub
📥 Import 🔍 SQL Query 📊 Explorer 🔬 Anomaly Detection 🕵️ Who Is Selling? 📦 MF Holdings 🏦 ETF Scanner 📰 Market News 🎛️ Signals 🪁 Kite Dashboard 🏢 Deep Dive 🌍 Intl ETFs
🔬 Anomaly Detection GOLDBEES
GARCH(1,1) + Isolation Forest · last 90 trading days · live from ClickHouse
GARCH Vol
16.5%
▲ above 15% target
Latest Regime
🔥 Breakout
Final Z = +3.81
Anomalies (90d)
4
fire rate 4.4%
ML 5d prob_up
64%
WATCH_LONG
132 ┤ 130 ┤ ╭─╯ ╰──╮ ╭─── 128 ┤ ╭───╯ ╰─────╯ 126 ┤ ╭────╯ 124 ┤ ╭────╮╭───╯ 122 ┤────╯ ╰╯ = flagged anomaly └────────────────────────────────────────── Mar Apr May Jun

12 tabs: Import · SQL Query · Explorer · Anomaly Detection · Who Is Selling · MF Holdings · ETF Scanner · Market News · Signals · Kite Dashboard · Deep Dive · Intl ETFs

Why another tool?

Serious decisions need clean cross-asset data, models that quantify edge and risk, and context that explains why something moved. Retail tools give you charts. Terminals give you feeds. Neither gives you all three in one place on your own machine.

🏠

Your data, your machine

All market data stays in a local ClickHouse instance. No third-party cloud sync, no API leakage your portfolio intelligence is private.

🤖

LLM-grounded, not hallucinated

Every number the agent reports is first computed in Python or SQL. The LLM only narrates never calculates. A hard architectural rule, enforced everywhere.

🌐

Works offline with Ollama

Runs fully locally via Ollama (Gemma 4). The orchestrator auto-switches to compact prompts and data injection paths for low-context local models.

🔬

Forensic anomaly explanation

Every flagged date gets a full report: GARCH regime + Final-Z, news sentiment correlation, COMEX futures context, COT speculator positioning, and what the ML model predicted that day — all point-in-time, no future leakage.

🐋

Institutional flow intelligence

Reverse-engineer DSP, Nippon, and ICICI AMC conviction from monthly portfolio disclosures. 24+ months of cross-fund ownership = highest-quality single-name signal.

⚖️

Volatility-aware sizing

GARCH(1,1) conditional vol feeds inverse-vol + Kelly position sizing. High-stress regimes automatically reduce position weights before you need to think about it.

How it works

Data lands in a local lake, fans out to four quant engines, converges in a multi-agent orchestrator backed by a SQLite LLM cache so repeat questions are free and surfaces wherever you work.

① Ingest & Store
13+ data sources
Yahoo · NSE · CFTC COT
MFAPI · Morningstar · NewsAPI
Zerodha Kite · World Bank · IMF
🗄️ Local ClickHouse Lake · 26 tables
daily_prices · mf_holdings · fii_dii
cot_gold · fx_rates · ml_predictions
signal_composite · inav_snapshots
ReplacingMergeTree · watermark delta-sync · idempotent inserts
fan-out · read through one typed repository
② Compute four quant engines
🎛️

Signals

6 pillars → 0–100 composite score, run in parallel

📈

ML Forecast

LightGBM 5-day return + quantile confidence band

⚖️

Volatility

GARCH(1,1) vol → inverse-vol + Kelly sizing

🔬

Anomaly

MAD-Z → GARCH(1,1) → Isolation Forest → PELT change-point · 8 regime labels · corporate action suppression

converge

🧠 Multi-Agent Orchestrator (LangGraph ReAct)

LLM intent router → specialist sub-agent guild (10 agents) · budget + tracing middleware auto-attached

⚡ SQLite LLM cache
output/.cache · 24h TTL
repeat questions return instantly
surface
📊 Streamlit Dashboard
💬 Conversational ask
⚡ ./mosaic.sh CLI
🔌 Claude / Gemini MCP
📄 PDF Report Export
Python 3.11+
ClickHouse
LangGraph ReAct
LightGBM
GARCH + arch
Isolation Forest
Streamlit
Ollama / Gemma 4
OpenAI / Anthropic

Built for every layer of the decision

Signals

6-Pillar Composite Scoring

18+ ETFs scored 0–100 daily across six independent pillars run in parallel via ThreadPoolExecutor.

  • Macro themes (8 live-news event clusters)
  • Capital flows (FII/DII 5-day rolling net)
  • Valuation (iNAV Z-score premium/discount)
  • Sentiment (NewsAPI + GNews ratio)
  • ML forecast (LightGBM expected return)
  • GARCH anomaly regime booster/dampener
Anomaly

4-Step Composite Anomaly Pipeline

Fires on ~8% of days vs. 21% for a naive Random Forest. Four independent methods vote; corporate action ex-dates are automatically suppressed so splits and bonuses never pollute your signal.

  • Step 1 — MAD Robust Z-score: rolling median + MAD resists outlier inflation that standard Z misses during trends
  • Step 2 — GARCH(1,1) Student-t: models conditional volatility σ_t so only moves extreme relative to current regime are flagged
  • Step 3 — Isolation Forest: cross-asset enrichment with USDINR FX + CFTC COT speculator crowding; boosts days suspicious to both algorithms
  • Step 4 — PELT change-point: detects structural vol-regime shifts (calm → turbulent), not just point shocks; confirmed breaks get a 1.15× Final-Z boost
  • 8 regime labels: Flash Crash · Volatile Breakout · Crowded Long · Blow-off Top · Strong Trend · Regime Shift · Corporate Action · Normal
  • Explanation layer: each flagged date correlated with news, COMEX futures, and point-in-time ML prediction — no future leakage
ML

LightGBM 5-Day Forecast

Walk-forward time-series CV with quantile regression for calibrated uncertainty bands.

  • 25+ alpha features (momentum, vol, COT, FX, seasonality)
  • 80% confidence band via quantile regression
  • AUC + hit-ratio from walk-forward CV
  • Kelly Criterion position sizing from prob_up
  • Model cache auto-invalidated on new data
Risk

Volatility-Aware Position Sizing

Real-time GARCH vol feeds an inverse-vol + Kelly blend. Regime overrides cut size during stress.

  • w(t) = vol_target / σ_t × regime_mult × score_gate
  • blended_50: 50% RG + 50% Kelly (recommended)
  • blended_30: conservative (70% RG + 30% Kelly)
  • Grid-search parameter optimizer included
Research

Indian & US Equity Deep-Dives

One command pulls price, earnings, cashflow, promoter trends, MF cross-ownership, and news in parallel.

  • NSE/BSE: 3yr cashflow, QoQ shareholding delta
  • DSP/Nippon/ICICI AMC conviction cross-check
  • US: SEC 10-K/10-Q, XBRL, exec comp, job trends
  • Auto-symbol resolution with QIP dilution check
Flows

Institutional Whale Tracking

Reverse-engineer AMC tactical pivots from monthly portfolio disclosures across 7 multi-asset funds.

  • DSP: 31-month history, 60+ funds
  • Nippon India: dynamic URL discovery 2024+
  • CFTC COT: COMEX gold + silver speculator positioning
  • FII/DII: daily cash + F&O participant flows
Reports

PDF Report Export

Every analysis — anomaly breakdown, ML forecast, signal report, or equity deep-dive — can be exported as a shareable PDF with one command.

  • Full anomaly report with GARCH chart, regime table, and news correlation
  • GOLDBEES signal report with Kelly sizing and confidence band
  • Equity deep-dive: earnings, promoter trends, MF cross-ownership
  • Saved to output/reports/ — ready to share or archive

A guild of specialist analysts

Rather than one LLM with 80 tools, an intent router dispatches to 10 specialist sub-agents, each with a curated tool set, domain system prompt, and hard limits (20 tool calls / 30k tokens / 180s).

📊

SignalSubAgent

GOLDBEES ML pipeline, composite ETF scores, GARCH vol, anomaly explanation.

🇮🇳

IndianEquityResearchSubAgent

NSE/BSE stocks parallel fetch of price, earnings, promoter, MF holdings, news.

🌐

MacroSubAgent

COMEX pre-market, FII/DII flows, macro theme scanner, whale tracker.

🇺🇸

DeepDiveSubAgent

SEC EDGAR 10-K/10-Q, XBRL financials, exec comp, Workday hiring trends.

🌍

IntlETFSubAgent

MAFANG, Hang Seng, Nasdaq ETFs scarcity premium vs NAV analysis.

📰

NewsSubAgent

Multi-source sentiment aggregation NewsAPI + GNews per symbol.

💻

CodeSubAgent

Ad-hoc Python execution and raw ClickHouse SQL queries.

🗄️

DatabaseSubAgent

Schema inspection, watermark status, import freshness checks.

Two commands. That's it.

No Python, no virtualenv, no dependency hell. Install Docker Desktop, then ./run.sh for the dashboard and ./mosaic.sh for everything else.

📊  1 · Launch the dashboard

Builds the image, starts ClickHouse + UI, opens your browser.

# macOS / Linux
$ ./run.sh

# Windows
> run.bat

# Dashboard opens at →
http://localhost:8501

# Stop it anytime
$ ./stop.sh
⚡  2 · Run anything with ./mosaic.sh

Any CLI command or script inside Docker, zero local setup.

# Pre-market commodity signals
$ ./mosaic.sh comex

# Composite ETF signals (0–100)
$ ./mosaic.sh signals --save

# GOLDBEES ML pipeline
$ ./mosaic.sh src/scripts/goldbees_report.py

# Sync fresh data first
$ ./mosaic.sh import --category etfs
💬  Interactive chat (just type)

Bare ./mosaic.sh opens a REPL no slash command needed, auto-routes to the right agent.

$ ./mosaic.sh
✔ Agent ready mosaic-gemma4 @ ollama

You: explain GOLDBEES anomalies
You: am I overexposed to IT?
You: /signals   # or slash commands

# one-shot mode also works
$ ./mosaic.sh ask "today's gold signal"
🐍  Prefer a manual install?

Developers can run natively without the wrappers.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ cp .env.example .env   # add keys
$ docker compose up clickhouse -d
$ python src/main.py ui

On first ./run.sh a .env is created add OPENAI_API_KEY, NEWSAPI_KEY, GOLD_API_KEY, or point LLM_BASE_URL at Ollama for fully offline operation.

Learn the system