Path forward

Last updated: 2025-12-20T0125

Overview

Propose advancing “Substack for Data” by extending the existing FlowerShow + DataHub Next stack rather than starting from scratch, leveraging its live status, modularity, and proven workflows.

Why?

Okay, I think there's a few different ways that we can go here that are for sub-stack for data. So one would be to build around the existing flower show, what data up next system we have, and make improvements to that. The advantage is we have something that's already working. The question then is what refactors that we do to that. And how easy it is to do decoding around that with AI, but I think it's quite possible.

Priorities and Sequencing

What's nice is there are a set of largely orthogonal improvements that we can go in. Orthogonal is good as it means we can do them independently.

Sprint priorities would be as follows.

Primary — Avenue 3 and Avenue 4: Ship social and analytics first to create fast feedback loops with low implementation risk.
Secondary — Avenue 6: Add embedded data exploration to deliver visible reader-side differentiation early.
Exploratory — Avenue 2: Probe a native data post editor in parallel to de-risk future publishing UX.
Later — Avenue 5: Treat pipeline extensibility as a higher-leverage, slower-burn capability.

Social features and analytics are the most feasible to ship quickly and the most immediately valuable. They build directly on existing content without requiring deep refactors, and they create fast feedback loops for publishers (views, likes, follows).

Crucially, they also enable outbound notifications (e.g. someone liked your post, your dataset was viewed), which supports re-engagement and habit formation with relatively low implementation risk.

Secondary priority: Avenue 6 (embedded data exploration and previews)

Adding automatic data explorers to data pages offers clear, user-visible differentiation and improves reader-side value without requiring changes to authoring workflows. This is especially compelling for CSV-like assets and aligns well with the idea of “data as something you can immediately interact with.”

Compared to pipeline extensibility, this delivers clearer experiential value earlier.

Exploratory priority: Avenue 2 (native data post editor and upload UI)

In parallel, it makes sense to explore the “Florist”-style native data post editor to assess how far it can be taken in a limited scope. This work is exploratory rather than sprint-critical, aimed at de-risking future shifts away from GitHub-centric publishing rather than delivering immediate platform effects.

Later priority: Avenue 5 (processing pipeline extensibility)

While strategically important, deeper pipeline customization (e.g. auto-generated formats, derived artifacts) is likely to have a longer payoff horizon. It is better positioned as a later-stage differentiator once engagement, feedback, and publishing flows are already active.

Summary of intended sequencing (without renumbering)

First: Avenue 3 and Avenue 4, for feasibility, engagement, and fast feedback loops
Next: Avenue 6, for visible differentiation and reader-side value
In parallel (exploratory): Avenue 2, to probe future publishing UX
Later: Avenue 5, as a higher-leverage but slower-burn capability

Avenues

Avenue 1: Publication model refactor

Shift from a project/dataset-centric model to a publication-centric one. A user can create one or more data publications, and each publication contains posts (datasets, data stories, or projects), closely analogous to Substack’s structure.

This refactor provides a clear conceptual container for content, identity, and subscriptions, and creates a stable anchor for later social and monetization features. While foundational, it is relatively contained at the data-model and routing level.

Avenue 2: Native data post editor and upload UI

Introduce a first-class editor for creating data posts, including direct file upload (e.g. direct-to-R2). This removes GitHub from the critical path for most users and enables a simple “drop files + write context + publish” flow.

Technically, this can remain largely orthogonal to the publication model by interacting only with storage and content APIs, making it independently shippable and incrementally improvable.

Add social mechanics such as following publications, liking posts, and possibly commenting. These features benefit conceptually from the publication abstraction but can be implemented in a generic way first and later attached more tightly to publications once that model is in place.

This avenue supports discovery, network effects, and eventual distribution incentives.

Avenue 4: Analytics and dashboards

Enhance creator dashboards with basic analytics such as views, downloads, and engagement per data post. This can be layered on incrementally and does not depend strongly on other refactors, making it a good candidate for early value delivery.

Avenue 5: Processing pipeline extensibility and customization

Improve and modularize the internal processing pipeline so that it can be customized per publication or per post. Examples include automatically generating alternative file formats (e.g. CSV → Parquet, JSON, XLSX), lightweight validation steps, or derived artifacts.

This positions the system not just as a hosting layer but as a value-adding data publishing pipeline, while remaining largely orthogonal to identity, social, and UI concerns.

Avenue 6: Embedded data exploration and previews

Add automatic data explorers to data pages, particularly for tabular formats such as CSV. This could be backed by technologies such as DuckDB-in-the-browser or similar, enabling filtering, querying, and previewing data without downloads.

This avenue directly improves reader-side experience and comprehension, and can be introduced independently of the authoring and publication model.

Appendix: Cross-cutting technical decision — GitHub vs direct publishing

A key architectural choice underlies all avenues: whether projects are primarily created from GitHub repositories or published directly to DataHub.

✅ Current hypothesis: Direct publishing should be the default user experience, with GitHub retained as an optional or secondary integration (including automated or CI-driven publishing into DataHub). This preserves power for advanced users while removing GitHub from the critical path for adoption.

Analysis

Option A: Preserve GitHub-first publishing

Pros:

Aligns with existing internal workflows and current DataHub usage.
Retains strong versioning, collaboration, and provenance guarantees.
Appeals to technically sophisticated users and open-data practitioners.

Cons:

High cognitive and operational overhead for non-technical users.
Distorts FlowerShow’s UX toward developer tooling rather than publishing.
Introduces friction that undermines a Substack-like experience.

Option B: Default to direct publishing on DataHub