DataHub Project Roadmap for next 4 weeks (Feb 26th, 2026)
DataHub Project Roadmap for next 4 weeks (Feb 26th, 2026)
Strategic context: People are actively using and downloading data. North Star metric — Data extractions per 1,000 dataset page views — is already showing strong results (EPL: 714/1k, Country List: 428/1k). Focus is on driving more traffic and conversions, not onboarding publishers.
North Star Metric
Data Extractions per 1,000 Dataset Page Views
Current extraction breakdown:
- Raw data download: 64%
- Filtered download: 31%
- Copy to clipboard: 5%
Epic 1 — Improve Performance of Dataset Pages
Summary: Every core dataset page should be fully functional, well-structured, and optimised to convert visitors into data extractors. We achieve this by fixing layout issues, surfacing CTAs above the fold, and using the country-list page as a reference implementation to roll improvements across all core datasets.
Key Results (4 weeks):
- KR1: 100% of core dataset pages are fully functional (no broken downloads, missing previews, or layout issues)
- KR2: North Star metric (extractions per 1,000 views) improves by ≥10% across core dataset pages
Owner: Anu
Reference: Data Page Optimization Playbook · Design for Post Page
Issues
- Deep dive audit on country-list page — Use as reference implementation for page optimisation
- Add prominent CTAs above the fold — Download/Get Data (free), Subscribe, Read More, and Premium offering placement
- Restructure page layout — Show data table as prominently as possible; move secondary metadata (license, last updated) further down
- Fix download button visibility and positioning — Currently nearly invisible and poorly placed
- Optional signup gate for downloads — Prompt users to sign up before downloading (skippable initially)
- API key requirement for machine/programmatic access — Block bots from direct data access; require API keys
- Add developer/automated-use instructions section — Restore docs for CLI and API access patterns
- Add AI section with waitlist — "Explore data with AI" button with coming soon / subscribe now CTA 🔥
- Advertise premium/paid offerings on high-traffic pages — e.g. logistics offerings on country-list page
Epic 2 — More Pages, More Traffic
Summary: The fastest path to more extractions is more high-quality data that people are already searching for. By doubling the number of core datasets and improving related dataset discovery, we create a compounding engine for organic traffic growth.
Key Results (4 weeks):
- KR1: Number of core datasets doubles (e.g. 100 → 200)
- KR2: Organic traffic increases by ≥25% (strong: 50%, dream: 100%)
Owner: Anu
Issues
- Audit and count current core datasets — Establish baseline number to track doubling goal
- Identify and prioritise high-demand dataset categories — Country lists, commodity prices (gold), pharmaceutical spending, energy data, etc.
- Publish new core datasets to hit doubling target — Statistical approach: volume of relevant data drives organic traffic
- Improve related dataset suggestions — Surface relevant datasets to reduce bounce rate and increase time on site
- Track and report on page view growth — Weekly reporting against traffic targets (25% / 50% / 100%)
- Investigate and filter bot traffic — Clean up North Star metric anomalies caused by bot activity
Epic 3 — Automate & Improve Publication Flow
Summary: Publishing new datasets must be fast, repeatable, and owned entirely by the internal team — not blocked by UI complexity or manual steps. A solid CLI/API workflow directly unlocks our ability to hit the dataset volume targets in Epic 2.
Key Results (4 weeks):
- KR1: Internal team can publish a new dataset end-to-end via CLI in under 30 minutes (documented, repeatable workflow)
- KR2: At least one 2-day publishing sprint completed with measurable dataset output
Owner: Luccas
Issues
- Define and document internal publishing workflow — End-to-end runbook from raw data → live dataset page
- Build/improve
dhCLI tool for dataset publishing — CLI-first, API-backed publishing workflow for internal team - Maintain GitHub integration — Keep git-backed publishing benefits while removing UI complexity
- Run a 2-day publishing sprint — Pick 1–2 high-leverage datasets, publish end-to-end, measure output
- Deprecate/disable publisher UI for non-approved users — Remove self-serve publishing dashboard from general access
Epic 4 — Improve User Signup & Dashboard Experience
Summary: The current signup flow is built for publishers, not data consumers. We need to remove that friction, give users a dashboard that reflects what they actually care about (datasets they follow), and start capturing baseline engagement signals.
Key Results (4 weeks):
- KR1: Publisher options removed from standard signup flow and user dashboard redesigned and shipped
- KR2: Baseline metrics established for signups and dataset likes (current numbers tracked and reported)
Issues
- Check and report actual signup numbers and dataset likes — Establish baseline before any changes
- Remove "Publisher" options from standard signup flow — Not relevant for most users right now
- Redesign user dashboard — Focus on: bookmarked/subscribed datasets, update streams for followed datasets
- Add per-dataset subscription — Users subscribe to individual datasets, not publications
- Add optional user info prompt on signup — Skippable; capture use-case / user journey for segmentation
- Track signups and dataset likes as ongoing side metrics — Validate engagement signals beyond extractions
Epic 5 — LinkedIn & Marketing Presence
Summary: DataHub needs its own voice and community presence, separate from Datopian. LinkedIn is the right channel to attract future publishers, showcase data, and build credibility with potential premium buyers over the coming months.
Key Results (4 weeks):
- KR1: DataHub LinkedIn page created and live with at least 3 posts published
- KR2: Initial list of 10+ potential publisher prospects identified via LinkedIn
Issues
- Create DataHub LinkedIn page — Separate from Datopian's existing 2,000-follower account
- Publish content showcasing high-value datasets — Regular posts linking to popular dataset pages
- Identify and reach out to potential publishers via LinkedIn — e.g. energy infrastructure visualisation publishers, open data orgs
- Begin publisher outreach pipeline (April onwards) — Structured outreach to interested data publishers (e.g. PUDL, energy sector)
Epic 6 — Queryless for Data Portals (Marketing Push) (this is not exactly about DataHub but keeping it here for consolidation)
Summary: Queryless is ready to be positioned as the agentic interface for data portals — but the world doesn't know it yet. This sprint is about creating the marketing narrative, a compelling demo video, and publishing content that establishes Queryless as the future of data portal UX.
Key Results (4 weeks):
- KR1: Blog post "Queryless for Data Portals" published within week 1
- KR2: Demo video live showing Queryless against a real client portal (not a demo)
⭐ Priority marketing focus. Present as prototype/demo, not finished product.
Key message: "Queryless is an agentic interface to your data portal that lets anyone ask questions and get clear, trustworthy answers fast — in the browser, Telegram, WhatsApp, or Slack."
Issues
- Write blog post with Joanna — Target: published within 1 week; angle: "Queryless for Data Portals"
- Produce demo video — Show Queryless against an actual client portal (not a demo portal)
- Create LinkedIn marketing campaign — Around "agentic interface to your data portal" concept
- Document Queryless technical capabilities — Agent instructions, built-in DuckDB support, extensibility (Postgres, data lake, etc.)
- Define marketing positioning for Queryless variants:
- Queryless for Data Portals
- Queryless for DataHub.io (already working)
- Queryless for DuckDB
- Queryless for your data lake
- Queryless in the browser
- Queryless in Telegram / WhatsApp
- Build lightweight demos for each variant — Videos preferred for speed; live demos where feasible
Backlog / Asides
- Repository consolidation decision — Evaluate merging strategy repo + marketing repo + product repo; define what gets deprecated, merged, or archived
- Define deprecation policy — Reduce confusion across active docs and projects
- Explore energy infrastructure data publishers — Found via LinkedIn; potential future publisher pipeline