Below is a concise synthesis, separating product job stories from the technical trade-offs.

Job stories

Public-facing (dataset page)

  • When I land on a dataset site, I want to see total views so that I can quickly assess its visibility and popularity.
  • When I land on a dataset site, I want to see total downloads so that I can assess actual usage, not just attention.
  • When I compare datasets, I want views and downloads to be clearly distinguished, because downloads are a stronger signal of utility than page views (cf. Piwowar et al., 2007).

Publisher dashboard (internal)

  • When I open my dashboard, I want to see views and downloads over time (e.g. daily, last 30/90 days) so that I can understand trends and the impact of releases or promotion.
  • When I review my datasets, I want lightweight analytics (totals, recent period) without needing to leave Data Hub for an external analytics tool.

Recording views and downloads: options

Option A: Pull from Google Analytics Description Use GA pageview events for “views” and custom GA events for “downloads,” periodically ingesting aggregated counts into your own datastore and caching them for display.

Pros

  • Mature, well-documented, and robust at scale (Google, 2024).
  • Minimal engineering effort to get basic view counts.
  • Built-in bot filtering and sampling controls (with caveats).

Cons

  • GA is page-centric; your “one dataset per site” abstraction is implicit rather than explicit.
  • Download semantics require careful event instrumentation and can drift over time.
  • Privacy, consent, and regulatory overhead (GDPR/consent banners) apply (European Data Protection Board, 2020).
  • GA data latency and sampling can complicate “near-real-time” dashboards.

Option B: First-party logging (own system) Description Instrument your app to increment counters on dataset-site render (views) and on file fetch (downloads), storing aggregates per dataset per day.

Pros

  • Clean semantic alignment: counts are natively “per dataset,” not inferred from pages.
  • Downloads are straightforward and highly reliable when logged at the storage or API layer (e.g. signed URL issuance).
  • Full control over aggregation windows, caching, and privacy posture (no third-party tracking).

Cons

  • Engineering and operational overhead (storage, aggregation jobs, bot mitigation).
  • Requires explicit decisions about what counts as a “view” (unique vs. raw hits).
  • You must implement basic bot filtering yourself (see Bar-Yossef et al., 2008).

Practical synthesis (common pattern)

A hybrid is often optimal:

  • Record downloads first-party (they are high-signal and easy to define).
  • For views, either (a) log first-party lightweight counters, or (b) ingest GA aggregates nightly and cache them.
  • Expose only cached aggregates to the public UI; reserve detailed time-series for the dashboard.

This mirrors practices in scholarly repositories, where download counts are treated as primary impact indicators and views as secondary context (Piwowar et al., 2007; COUNTER Code of Practice, 2018).

References

  • Google. (2024). Google Analytics documentation.
  • European Data Protection Board. (2020). Guidelines on consent under GDPR.
  • Piwowar, H., Day, R., & Fridsma, D. (2007). Sharing detailed research data is associated with increased citation rate. PLOS ONE.
  • COUNTER. (2018). Code of Practice Release 5.
  • Bar-Yossef, Z. et al. (2008). Counting user visits in web analytics. WWW Conference.