Open Source · Python 3.11+ · Local-first

Know before you invest.

ML signals, institutional flows, and volatility-aware sizing on a private data lake you control.

🚀 Launch the dashboard See it first →

One ./run.sh no Python install · ★ Star on GitHub

13+

Data sources synced

18+

ETFs scored daily

ClickHouse tables

Specialist AI agents

👆 Click a question watch Mosaic route & answer it live

mosaic interactive chat

$ ./mosaic.sh

Local Ollama detected · ClickHouse healthy · loading agent…

✔ Agent ready mosaic-gemma4 @ ollama

ML running: LightGBM 5d forecast · GARCH volatility · Isolation Forest anomaly

────────────────────────────────────────────────────────

You:

Real interactive REPL no slash command needed, just type. Every number is computed in Python/SQL from your live ClickHouse data, never invented by the LLM.

Who it's for

Three users, one platform

🧑‍💻

Retail investor

"What should I buy today? Am I overexposed?" One-click dashboard + plain-English ask. Composite 0–100 scores, premium alerts, and clear BUY / HOLD / AVOID no spreadsheets, no jargon.

Launch the dashboard →

🏦

Fund manager / institutional analyst

Track institutional flows, reverse-engineer DSP / Nippon / ICICI conviction, map macro themes to positioning, and size with the GARCH Risk Governor. Built by data enthusiasts with deep AMC domain knowledge and quant network backing.

See institutional signals →

⚡

Quant engineer

Walk-forward LightGBM, GARCH + Isolation Forest, Kelly sizing, raw ClickHouse SQL. Add a signal pillar in one class, backtest it, ship it all via ./mosaic.sh.

See the CLI & architecture →

The Dashboard

One window into every market

Launch with a single ./run.sh and the full Streamlit hub opens at localhost:8501 12 tabs over your live ClickHouse data. No Python, no notebooks, no glue code.

🔒 localhost:8501 · Mosaic Data Hub

📥 Import 🔍 SQL Query 📊 Explorer 🔬 Anomaly Detection 🕵️ Who Is Selling? 📦 MF Holdings 🏦 ETF Scanner 📰 Market News 🎛️ Signals 🪁 Kite Dashboard 🏢 Deep Dive 🌍 Intl ETFs

🔬 Anomaly Detection GOLDBEES

GARCH(1,1) + Isolation Forest · last 90 trading days · live from ClickHouse

GARCH Vol

16.5%

▲ above 15% target

Latest Regime

🔥 Breakout

Final Z = +3.81

Anomalies (90d)

fire rate 4.4%

ML 5d prob_up

64%

WATCH_LONG

₹ 132 ┤ ● 130 ┤ ╭─╯ ╰──╮ ╭─── 128 ┤ ╭───╯ ╰─────╯ 126 ┤ ╭────╯ 124 ┤ ╭────╮╭───╯ 122 ┤────╯ ╰╯ ● = flagged anomaly └────────────────────────────────────────── Mar Apr May Jun

12 tabs: Import · SQL Query · Explorer · Anomaly Detection · Who Is Selling · MF Holdings · ETF Scanner · Market News · Signals · Kite Dashboard · Deep Dive · Intl ETFs

The Problem

Why another tool?

Serious decisions need clean cross-asset data, models that quantify edge and risk, and context that explains why something moved. Retail tools give you charts. Terminals give you feeds. Neither gives you all three in one place on your own machine.

🏠

Your data, your machine

All market data stays in a local ClickHouse instance. No third-party cloud sync, no API leakage your portfolio intelligence is private.

🤖

LLM-grounded, not hallucinated

Every number the agent reports is first computed in Python or SQL. The LLM only narrates never calculates. A hard architectural rule, enforced everywhere.

🌐

Works offline with Ollama

Runs fully locally via Ollama (Gemma 4). The orchestrator auto-switches to compact prompts and data injection paths for low-context local models.

🔬

Forensic anomaly explanation

Every flagged date gets a full report: GARCH regime + Final-Z, news sentiment correlation, COMEX futures context, COT speculator positioning, and what the ML model predicted that day — all point-in-time, no future leakage.

🐋

Institutional flow intelligence

Reverse-engineer DSP, Nippon, and ICICI AMC conviction from monthly portfolio disclosures. 24+ months of cross-fund ownership = highest-quality single-name signal.

⚖️

Volatility-aware sizing

GARCH(1,1) conditional vol feeds inverse-vol + Kelly position sizing. High-stress regimes automatically reduce position weights before you need to think about it.

Architecture

How it works

Data lands in a local lake, fans out to four quant engines, converges in a multi-agent orchestrator backed by a SQLite LLM cache so repeat questions are free and surfaces wherever you work.

① Ingest & Store

13+ data sources

Yahoo · NSE · CFTC COT
MFAPI · Morningstar · NewsAPI
Zerodha Kite · World Bank · IMF

🗄️ Local ClickHouse Lake · 26 tables

daily_prices · mf_holdings · fii_dii
cot_gold · fx_rates · ml_predictions
signal_composite · inav_snapshots

ReplacingMergeTree · watermark delta-sync · idempotent inserts

fan-out · read through one typed repository

② Compute four quant engines

🎛️

Signals

6 pillars → 0–100 composite score, run in parallel

📈

ML Forecast

LightGBM 5-day return + quantile confidence band

⚖️

Volatility

GARCH(1,1) vol → inverse-vol + Kelly sizing

🔬

Anomaly

MAD-Z → GARCH(1,1) → Isolation Forest → PELT change-point · 8 regime labels · corporate action suppression

converge

🧠 Multi-Agent Orchestrator (LangGraph ReAct)

LLM intent router → specialist sub-agent guild (10 agents) · budget + tracing middleware auto-attached

⚡ SQLite LLM cache

output/.cache · 24h TTL
repeat questions return instantly

surface

📊 Streamlit Dashboard

💬 Conversational ask

⚡ ./mosaic.sh CLI

🔌 Claude / Gemini MCP

📄 PDF Report Export

Python 3.11+

ClickHouse

LangGraph ReAct

LightGBM

GARCH + arch

Isolation Forest

Streamlit

Ollama / Gemma 4

OpenAI / Anthropic

Capabilities

Built for every layer of the decision

Signals

6-Pillar Composite Scoring

18+ ETFs scored 0–100 daily across six independent pillars run in parallel via ThreadPoolExecutor.

Macro themes (8 live-news event clusters)
Capital flows (FII/DII 5-day rolling net)
Valuation (iNAV Z-score premium/discount)
Sentiment (NewsAPI + GNews ratio)
ML forecast (LightGBM expected return)
GARCH anomaly regime booster/dampener

Anomaly

4-Step Composite Anomaly Pipeline

Fires on ~8% of days vs. 21% for a naive Random Forest. Four independent methods vote; corporate action ex-dates are automatically suppressed so splits and bonuses never pollute your signal.

Step 1 — MAD Robust Z-score: rolling median + MAD resists outlier inflation that standard Z misses during trends
Step 2 — GARCH(1,1) Student-t: models conditional volatility σ_t so only moves extreme relative to current regime are flagged
Step 3 — Isolation Forest: cross-asset enrichment with USDINR FX + CFTC COT speculator crowding; boosts days suspicious to both algorithms
Step 4 — PELT change-point: detects structural vol-regime shifts (calm → turbulent), not just point shocks; confirmed breaks get a 1.15× Final-Z boost
8 regime labels: Flash Crash · Volatile Breakout · Crowded Long · Blow-off Top · Strong Trend · Regime Shift · Corporate Action · Normal
Explanation layer: each flagged date correlated with news, COMEX futures, and point-in-time ML prediction — no future leakage

LightGBM 5-Day Forecast

Walk-forward time-series CV with quantile regression for calibrated uncertainty bands.

25+ alpha features (momentum, vol, COT, FX, seasonality)
80% confidence band via quantile regression
AUC + hit-ratio from walk-forward CV
Kelly Criterion position sizing from prob_up
Model cache auto-invalidated on new data

Risk

Volatility-Aware Position Sizing

Real-time GARCH vol feeds an inverse-vol + Kelly blend. Regime overrides cut size during stress.

w(t) = vol_target / σ_t × regime_mult × score_gate
blended_50: 50% RG + 50% Kelly (recommended)
blended_30: conservative (70% RG + 30% Kelly)
Grid-search parameter optimizer included

Research

Indian & US Equity Deep-Dives

One command pulls price, earnings, cashflow, promoter trends, MF cross-ownership, and news in parallel.

NSE/BSE: 3yr cashflow, QoQ shareholding delta
DSP/Nippon/ICICI AMC conviction cross-check
US: SEC 10-K/10-Q, XBRL, exec comp, job trends
Auto-symbol resolution with QIP dilution check

Flows

Institutional Whale Tracking

Reverse-engineer AMC tactical pivots from monthly portfolio disclosures across 7 multi-asset funds.

DSP: 31-month history, 60+ funds
Nippon India: dynamic URL discovery 2024+
CFTC COT: COMEX gold + silver speculator positioning
FII/DII: daily cash + F&O participant flows

Reports

PDF Report Export

Every analysis — anomaly breakdown, ML forecast, signal report, or equity deep-dive — can be exported as a shareable PDF with one command.

Full anomaly report with GARCH chart, regime table, and news correlation
GOLDBEES signal report with Kelly sizing and confidence band
Equity deep-dive: earnings, promoter trends, MF cross-ownership
Saved to output/reports/ — ready to share or archive

Multi-Agent System

A guild of specialist analysts

Rather than one LLM with 80 tools, an intent router dispatches to 10 specialist sub-agents, each with a curated tool set, domain system prompt, and hard limits (20 tool calls / 30k tokens / 180s).

📊

SignalSubAgent

GOLDBEES ML pipeline, composite ETF scores, GARCH vol, anomaly explanation.

🇮🇳

IndianEquityResearchSubAgent

NSE/BSE stocks parallel fetch of price, earnings, promoter, MF holdings, news.

🌐

MacroSubAgent

COMEX pre-market, FII/DII flows, macro theme scanner, whale tracker.

🇺🇸

DeepDiveSubAgent

SEC EDGAR 10-K/10-Q, XBRL financials, exec comp, Workday hiring trends.

🌍

IntlETFSubAgent

MAFANG, Hang Seng, Nasdaq ETFs scarcity premium vs NAV analysis.

📰

NewsSubAgent

Multi-source sentiment aggregation NewsAPI + GNews per symbol.

💻

CodeSubAgent

Ad-hoc Python execution and raw ClickHouse SQL queries.

🗄️

DatabaseSubAgent

Schema inspection, watermark status, import freshness checks.

Get Started

Two commands. That's it.

No Python, no virtualenv, no dependency hell. Install Docker Desktop, then ./run.sh for the dashboard and ./mosaic.sh for everything else.

📊 1 · Launch the dashboard

Builds the image, starts ClickHouse + UI, opens your browser.

# macOS / Linux
$ ./run.sh

# Windows
> run.bat

# Dashboard opens at →
http://localhost:8501

# Stop it anytime
$ ./stop.sh

⚡ 2 · Run anything with ./mosaic.sh

Any CLI command or script inside Docker, zero local setup.

# Pre-market commodity signals
$ ./mosaic.sh comex

# Composite ETF signals (0–100)
$ ./mosaic.sh signals --save

# GOLDBEES ML pipeline
$ ./mosaic.sh src/scripts/goldbees_report.py

# Sync fresh data first
$ ./mosaic.sh import --category etfs

💬 Interactive chat (just type)

Bare ./mosaic.sh opens a REPL no slash command needed, auto-routes to the right agent.

$ ./mosaic.sh
✔ Agent ready mosaic-gemma4 @ ollama

You: explain GOLDBEES anomalies
You: am I overexposed to IT?
You: /signals   # or slash commands

# one-shot mode also works
$ ./mosaic.sh ask "today's gold signal"

🐍 Prefer a manual install?

Developers can run natively without the wrappers.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ cp .env.example .env   # add keys
$ docker compose up clickhouse -d
$ python src/main.py ui

On first ./run.sh a .env is created add OPENAI_API_KEY, NEWSAPI_KEY, GOLD_API_KEY, or point LLM_BASE_URL at Ollama for fully offline operation.

Know before you invest.

Three users, one platform

Retail investor

Fund manager / institutional analyst

Quant engineer

One window into every market

Why another tool?

Your data, your machine

LLM-grounded, not hallucinated

Works offline with Ollama

Forensic anomaly explanation

Institutional flow intelligence

Volatility-aware sizing

How it works

Signals

ML Forecast

Volatility

Anomaly

🧠 Multi-Agent Orchestrator (LangGraph ReAct)

Built for every layer of the decision

6-Pillar Composite Scoring

4-Step Composite Anomaly Pipeline

LightGBM 5-Day Forecast

Volatility-Aware Position Sizing

Indian & US Equity Deep-Dives

Institutional Whale Tracking

PDF Report Export

A guild of specialist analysts

SignalSubAgent

IndianEquityResearchSubAgent

MacroSubAgent

DeepDiveSubAgent

IntlETFSubAgent

NewsSubAgent

CodeSubAgent

DatabaseSubAgent

Two commands. That's it.

Learn the system