Purpose

Measure whether DataHub is improving successful value extraction from existing demand over a 90-day optimization phase.

Focus: use before growth.


Core Definition (used everywhere)

Data Extraction
A user action that produces a reusable data artifact.

Count exactly these events:

  • Full dataset download (CSV, XLS, etc.)

  • Sample / filtered dataset download

  • Visualization export (PNG, SVG, PDF)

  • Copy-to-clipboard of tabular data (if available)

Do not count:

  • Page views

  • Scrolls

  • Visualization views without export

  • Hover or passive interactions

This definition replaces “downloads” throughout the dashboard.


North-Star Metric

Data Extractions per 1,000 Dataset Page Views

Rationale: best proxy for successful task completion in data-seeking contexts.


Tier 1: Core Page-Level Metrics (must have)

Per dataset page:

  • Page views

  • Unique visitors

  • Total data extractions

  • Data extractions / 1,000 views

  • Bounce rate

  • Median time on page

Interpretation:

  • High views + low extractions → relevance or usability failure

  • Very low time on page (<10s) → scope mismatch or misleading search intent


Tier 2: Extraction Breakdown (diagnostic)

Tracked separately, not optimized separately (initially):

  • Raw dataset downloads

  • Sample / filtered downloads

  • Visualization exports

Purpose:

  • Understand how users extract value

  • Inform later product and pricing decisions


Tier 3: Engagement & Depth (supporting signals)

Per session:

  • Preview interactions (table scrolls, schema views, chart interactions)

  • Related-dataset clicks

  • Multi-dataset sessions (% sessions with ≥2 dataset pages)

These predict future monetization but are not the primary KPI.


Tier 4: Commitment Metrics (observed, not driven)

Tracked but explicitly secondary:

  • Account creations

  • Dataset subscriptions / follows

  • Purchases

These should lag extraction improvements.


Required Segment Filters

Dashboard must support filtering by:

  • Top 20 dataset pages by traffic

  • Free vs paid datasets

  • New vs returning visitors

Optional (later):

  • Device type

  • Traffic source (search / referral)


Reporting Cadence

  • Weekly review: page-level extraction performance

  • Monthly checkpoint: are extraction rates improving on top pages?

Success in this phase = higher extraction density, not more traffic.


One-Line Strategy Reminder (for the dashboard header)

“If users cannot extract value, nothing else matters.”


Next, we can revise the dataset-page audit checklist to align exactly with this extraction-based metric (e.g. every checklist item maps to improving extraction probability).