DataHub Project Roadmap for next 4 weeks (Feb 26th, 2026)

Strategic context: People are actively using and downloading data. North Star metric — Data extractions per 1,000 dataset page views — is already showing strong results (EPL: 714/1k, Country List: 428/1k). Focus is on driving more traffic and conversions, not onboarding publishers.

North Star Metric

Data Extractions per 1,000 Dataset Page Views

Current extraction breakdown:

Raw data download: 64%
Filtered download: 31%
Copy to clipboard: 5%

Epic 1 — Improve Performance of Dataset Pages

Summary: Every core dataset page should be fully functional, well-structured, and optimised to convert visitors into data extractors. We achieve this by fixing layout issues, surfacing CTAs above the fold, and using the country-list page as a reference implementation to roll improvements across all core datasets.

Key Results (4 weeks):

KR1: 100% of core dataset pages are fully functional (no broken downloads, missing previews, or layout issues)
KR2: North Star metric (extractions per 1,000 views) improves by ≥10% across core dataset pages

Owner: Anu

Reference: Data Page Optimization Playbook · Design for Post Page

Issues

Deep dive audit on country-list page — Use as reference implementation for page optimisation
Add prominent CTAs above the fold — Download/Get Data (free), Subscribe, Read More, and Premium offering placement
Restructure page layout — Show data table as prominently as possible; move secondary metadata (license, last updated) further down
Fix download button visibility and positioning — Currently nearly invisible and poorly placed
Optional signup gate for downloads — Prompt users to sign up before downloading (skippable initially)
API key requirement for machine/programmatic access — Block bots from direct data access; require API keys
Add developer/automated-use instructions section — Restore docs for CLI and API access patterns
Add AI section with waitlist — "Explore data with AI" button with coming soon / subscribe now CTA 🔥
Advertise premium/paid offerings on high-traffic pages — e.g. logistics offerings on country-list page

Epic 2 — More Pages, More Traffic

Summary: The fastest path to more extractions is more high-quality data that people are already searching for. By doubling the number of core datasets and improving related dataset discovery, we create a compounding engine for organic traffic growth.

Key Results (4 weeks):

KR1: Number of core datasets doubles (e.g. 100 → 200)
KR2: Organic traffic increases by ≥25% (strong: 50%, dream: 100%)

Owner: Anu

Issues

Audit and count current core datasets — Establish baseline number to track doubling goal
Identify and prioritise high-demand dataset categories — Country lists, commodity prices (gold), pharmaceutical spending, energy data, etc.
Publish new core datasets to hit doubling target — Statistical approach: volume of relevant data drives organic traffic
Improve related dataset suggestions — Surface relevant datasets to reduce bounce rate and increase time on site
Track and report on page view growth — Weekly reporting against traffic targets (25% / 50% / 100%)
Investigate and filter bot traffic — Clean up North Star metric anomalies caused by bot activity

Epic 3 — Automate & Improve Publication Flow

Summary: Publishing new datasets must be fast, repeatable, and owned entirely by the internal team — not blocked by UI complexity or manual steps. A solid CLI/API workflow directly unlocks our ability to hit the dataset volume targets in Epic 2.

Key Results (4 weeks):

KR1: Internal team can publish a new dataset end-to-end via CLI in under 30 minutes (documented, repeatable workflow)
KR2: At least one 2-day publishing sprint completed with measurable dataset output

Owner: Luccas

Issues

Define and document internal publishing workflow — End-to-end runbook from raw data → live dataset page
Build/improve dh CLI tool for dataset publishing — CLI-first, API-backed publishing workflow for internal team
Maintain GitHub integration — Keep git-backed publishing benefits while removing UI complexity
Run a 2-day publishing sprint — Pick 1–2 high-leverage datasets, publish end-to-end, measure output
Deprecate/disable publisher UI for non-approved users — Remove self-serve publishing dashboard from general access

Summary: The current signup flow is built for publishers, not data consumers. We need to remove that friction, give users a dashboard that reflects what they actually care about (datasets they follow), and start capturing baseline engagement signals.

Key Results (4 weeks):

KR1: Publisher options removed from standard signup flow and user dashboard redesigned and shipped
KR2: Baseline metrics established for signups and dataset likes (current numbers tracked and reported)

Issues

Check and report actual signup numbers and dataset likes — Establish baseline before any changes
Remove "Publisher" options from standard signup flow — Not relevant for most users right now
Redesign user dashboard — Focus on: bookmarked/subscribed datasets, update streams for followed datasets
Add per-dataset subscription — Users subscribe to individual datasets, not publications
Add optional user info prompt on signup — Skippable; capture use-case / user journey for segmentation
Track signups and dataset likes as ongoing side metrics — Validate engagement signals beyond extractions

Epic 5 — LinkedIn & Marketing Presence

Summary: DataHub needs its own voice and community presence, separate from Datopian. LinkedIn is the right channel to attract future publishers, showcase data, and build credibility with potential premium buyers over the coming months.

Key Results (4 weeks):

KR1: DataHub LinkedIn page created and live with at least 3 posts published
KR2: Initial list of 10+ potential publisher prospects identified via LinkedIn

Issues

Create DataHub LinkedIn page — Separate from Datopian's existing 2,000-follower account
Publish content showcasing high-value datasets — Regular posts linking to popular dataset pages
Identify and reach out to potential publishers via LinkedIn — e.g. energy infrastructure visualisation publishers, open data orgs
Begin publisher outreach pipeline (April onwards) — Structured outreach to interested data publishers (e.g. PUDL, energy sector)

Epic 6 — Queryless for Data Portals (Marketing Push) (this is not exactly about DataHub but keeping it here for consolidation)

Summary: Queryless is ready to be positioned as the agentic interface for data portals — but the world doesn't know it yet. This sprint is about creating the marketing narrative, a compelling demo video, and publishing content that establishes Queryless as the future of data portal UX.

Key Results (4 weeks):

KR1: Blog post "Queryless for Data Portals" published within week 1
KR2: Demo video live showing Queryless against a real client portal (not a demo)

⭐ Priority marketing focus. Present as prototype/demo, not finished product.

Key message: "Queryless is an agentic interface to your data portal that lets anyone ask questions and get clear, trustworthy answers fast — in the browser, Telegram, WhatsApp, or Slack."

Issues

Write blog post with Joanna — Target: published within 1 week; angle: "Queryless for Data Portals"
Produce demo video — Show Queryless against an actual client portal (not a demo portal)
Create LinkedIn marketing campaign — Around "agentic interface to your data portal" concept
Document Queryless technical capabilities — Agent instructions, built-in DuckDB support, extensibility (Postgres, data lake, etc.)
Define marketing positioning for Queryless variants:
- Queryless for Data Portals
- Queryless for DataHub.io (already working)
- Queryless for DuckDB
- Queryless for your data lake
- Queryless in the browser
- Queryless in Telegram / WhatsApp
Build lightweight demos for each variant — Videos preferred for speed; live demos where feasible

Backlog / Asides

Repository consolidation decision — Evaluate merging strategy repo + marketing repo + product repo; define what gets deprecated, merged, or archived
Define deprecation policy — Reduce confusion across active docs and projects
Explore energy infrastructure data publishers — Found via LinkedIn; potential future publisher pipeline

DataHub Project Roadmap for next 4 weeks (Feb 26th, 2026)

North Star Metric

Epic 1 — Improve Performance of Dataset Pages

Issues

Epic 2 — More Pages, More Traffic

Issues

Epic 3 — Automate & Improve Publication Flow

Issues

Epic 4 — Improve User Signup & Dashboard Experience

Issues

Epic 5 — LinkedIn & Marketing Presence

Issues

Epic 6 — Queryless for Data Portals (Marketing Push) (this is not exactly about DataHub but keeping it here for consolidation)

Issues

Backlog / Asides