2026-02-13: DataHub improvement ideas

Tidy outline of what you’re asking for

1) North-star framing

  • Primary loop: publish more datasets → more traffic → stronger reputation → attract publishers
  • Success metrics you named:
    • # datasets published
    • views per dataset / total views
  • Strategic note to self: Do publisher outreach more directly; “no” is cheap.

2) Core product experiences you want (as a user)

2.1 Ultra-fast publishing modes (“paste-and-go”)

  • Drop a dataset link or API URL → becomes a post
  • Upload a single file (esp. CSV) → instantly published
  • “One markdown page + assets” as the canonical unit (simple authoring surface)
  • Optional lightweight commentary now; richer notes later

2.2 Instant dataset inspection (the “preview is the product”)

  • Make DataHub the fastest place to preview a dataset

  • Desired UX apex:

    • Upload CSV → immediate SQL-browsable interface
    • Ability to query and search within your data

2.3 Querying, search, and (maybe) cleaning

  • “Search my data” (across datasets and/or within a dataset)
  • “Query my data” (SQL interface; possibly saved queries/views)
  • “Clean data live” (you flagged as optional / maybe out-of-scope initially)

2.4 Dashboards as first-class publishing objects

  • You already have dashboards, but you want specific flagship ones:

    • State of the World dashboard
    • Wiser Metrics dashboard

3) Platform/architecture decisions you’re flagging (without deciding yet)

  • Core vs Commons content quality bar:

    • Should “core” be curated/high-quality, with bulk publishing elsewhere?
    • Or put large volumes in core and rely on filtering/UX?
  • Direct upload vs GitHub-based publishing

    • Direct upload might simplify a lot and reduce friction
    • “FlowerShow-style flow” as a mental model: instant publish from minimal inputs
  • API surface

    • You’re wondering how much to formalize an API early vs later

4) Candidate “one thing to build next” (your clearest single slice)

  • CSV → instant SQL browsing + publishable dataset page

    • Later: add markdown notes/commentary around it
    • This is your “Substack for CSV” primitive

5) Implied feature backlog (re-stated as crisp items)

  • Create post from dataset URL
  • Create post from API URL
  • Direct upload CSV (no GitHub required)
  • Auto-generate dataset page (preview + metadata + schema)
  • Built-in SQL explorer for the uploaded CSV
  • Dataset search and query
  • Dashboards: “State of the World”; “Wiser Metrics”
  • Decide core vs commons publishing posture (or defer via UX separation)

If you want, I can convert this into a single product brief with: user stories, “minimum lovable” scope for the CSV→SQL slice, and acceptance criteria—but the outline above is already faithful to what you said.

Raw transcript (lightly cleaned, same content)

I’m going to brain-dump features and ideas around DataHub improvements I’d like to see—things we’ve dreamed of for years and haven’t implemented. I’m going to be a bit more unleashed and imagine I’m the user.

I do think that by creating these things we’ll attract others and attract other publishers. We should be bright about reaching out to publishers and showcasing what we’re doing—that matters. But I don’t think there’s a specific outreach program right now. Maybe we should just reach out; people can tell us no. I might be being a bit shy about that, so I want to note it for myself.

The game is: publish more → generate more traffic. Almost like being the “Statista for data.” Use this platform to publish data quickly and build on the traffic and reputation I have. Our overall metric is probably the number of datasets—what we control is number of datasets and then the number of views they’re getting. We might also choose particular verticals (we’ve talked about logistics), but for now I want to outline features that would be cool and that I’d like to see, and try to think of them ourselves.

One obvious feature: dashboards (we’ve implemented), but I specifically want a “state of the world” dashboard and a “wiser metrics” dashboard.

I want quick posting of datasets: make it very quick for me to drop a link to a dataset or an API and turn it into a post. There’s also a question about how we create those, but to start we should just try and get this stuff up. I don’t know whether we put it in core or commons. I could get hung up on that—whether core needs to be relatively high quality or whether we just put loads of stuff in core.

I’m probably confusing a bunch of different things here; maybe I should just log them and outline later.

I like the idea of things being simpler: one markdown page, but with a load of assets.

I want to be able to publish a single data file: just shove it up, maybe add some commentary, but basically paste-and-go. Even posting a single data point could be interesting; I’m not sure what that would look like, but exploring these experiences quickly would be interesting—and most don’t require much infrastructure.

I wonder if we want to think about the API.

Other stuff we’ve talked about: I want to search my data; I want to query it. Even being able to clean up my data live would be cool. Not sure if that belongs inside DataHub, but making this the quickest way for me to preview a dataset would be very attractive.

I also wonder about direct upload rather than going via GitHub; as we’re choosing, some FlowerShow-style flow would be a good idea and might simplify a lot.

If I hone down on one particular case: dropping a CSV and immediately getting an SQL-browsable interface to that dataset would be so useful and cool. That’s like what we’re always trying to do with FlowerShow: publish a markdown, but for a CSV. Later I can add notes and commentary around it if I want. I need to think more about what that experience would be like.

So: can you take that and create a raw transcript and then tidy up what I’m saying about the cool features I want in DataHub / “Substack for data” right now, for myself?


Raw

Okay, so I'm going to talk through here, just trying to like brain dump features and ideas around DataHub improvements that I would like to see, that we've dreamed of often for years and not been able to implement. And here I'm going to be a bit more unleashed and just imagine I am the user. I'm not, I do think that by creating these things, we will actually attract others and attract other publishers. And I think we do want to be bright throughout reaching out to those and about showcasing what we're doing. That is really important. But I'm not, I don't think at the moment there's a specific outreach program for publishers. Though, you know, maybe we should be just reaching, what's the problem? We can reach out, people can tell us no. You know, that's not, not an issue. So I guess it would, it would, you know, I may be being a bit shy about that, so I should just make a note about that for myself. So I think the kind of game here is by like, we're just trying to publish more, which will generate more traffic. And this is almost us being the people behind Statista for data. We're using this platform as a way to publish data quickly and build on the traffic reputation I have. And I guess our overall metric, I should say, is probably just the number of data sets. Like what we control is the number of data sets and then the number of views they are getting. And I think we do want to think of particular, you know, we might just want to publish in general, but we might want to think of particular verticals that are of interest to us. You know, we've talked about logistics. I think we might talk about this. Anyway, for now, I'm going to try and just outline some of the features that I think would be cool, that I would like to see. And I do think we want to try and think of them ourselves. So obviously one is… Dashboards, which we already have implemented, but I particularly have got like a state of the world dashboard that I would like to do, and a kind of wiser metrics dashboard I'd like to do. I think there's just going in quick posting datasets, like I'd like to make it very quick for me to drop a link to a dataset or an API and turn it into a post. I think there's also a question about how we create those, but I think to start with, we should just try and get this stuff up. I don't know whether we put it in core or commons. That's now something I could get hung up on, but no, I don't know whether the core needs to be relatively high quality or whether we just put loads of stuff in core. I think, I think the other question I have for myself, so I think there are things about, I like that, I also think there's a, maybe I'm getting kind of confusing a whole bunch of different things here, and maybe I'll just log them and I can outline them later. The idea, almost a point of having things simpler, that there's just one markdown page, but I have a load of assets. I think that seems interesting. I think the idea of just being able to publish a single data file, like just shove it up and maybe I add some commentary on it, but I want to just really paste and go, um, would be really helpful. Sometimes posting even a single data point would be kind of interesting. I don't know what that would look like, but I think creating, thinking about these experiences and playing with them quite quickly would be interesting. Um, and most of them don't require much infrastructure. I wonder if we want to think about the API. I think there's other stuff that we've talked about a lot, which is I want to be able to have, like, search my data. I want to be able to query. I think even be able to clear up my data live, it would be kind of cool. And I don't know if that's even Maybe I should just do that outside of it, but I, um, I wonder if just making this the quickest way for me to even preview a dataset would make it very attractive. Um, I don't know, I again wonder if I want also direct upload rather than going via GitHub, just as we're going to choose some flower show would be a good idea at this point. Um, it might simplify a lot of things. OK, if I were to hone down on one particular case at the moment, it would be something like dropping a CSV, immediately getting uh like an SQL browsable interface to that data set, would be so, would be so useful and cool. I'll think a bit more, but that's, I think we're always trying to do what we're doing for flower show, which is like just publish a markdown as published for a CSV. And then later I can add notes and commentary around it if I want. That would be kind of cool. I need to think more about what that experience would be like. So, OK, can you just take that and go and create a raw transcript and then try and tidy up what I'm saying about. I'm just going to think you through the cool features I want in DataHub slash Substack for data at the moment for myself.