7  Research Workflows

7.1 Treat projects as repeatable containers

Every project should feel familiar when you open it. That means using a consistent layout and a small set of conventions.

A useful project structure might separate:

  • data-raw/
  • data/
  • notebooks/
  • src/
  • results/
  • docs/
  • references/

The exact names matter less than the distinctions.

When a project is also meant to be pulled by others from GitHub, this consistency becomes even more valuable. The book can teach one lab shape, and each portable environment can reinforce it.

7.2 Write down the operating context

Each project benefits from a short README that answers:

  • what question is this project addressing
  • where did the data come from
  • what are the important scripts or notebooks
  • what environment is required
  • what outputs matter
  • what is incomplete or risky

This small habit dramatically improves handoff and re-entry.

For distributed lab environments, the README should also answer one extra question: what is the fastest path to a first successful run?

7.3 Separate exploration from production

Exploration is where you try ideas quickly. Production is where you stabilize what worked. Problems start when these blur together.

Healthy projects often have:

  • exploratory notebooks or scratch scripts
  • cleaned scripts or pipelines
  • explicit output directories
  • versioned reports or manuscripts

Let notebooks be exploratory, but promote durable logic into scripts, packages, or documented workflows.

7.4 Record provenance

For every important result, you should be able to answer:

  • which data produced it
  • which code produced it
  • which parameters mattered
  • when it was generated
  • where the result was saved

Perfect provenance may be unrealistic, but partial provenance is much better than none.

7.5 Design a proof of concept, not a monument

The first portable lab for a domain such as computational geography should be intentionally small. It should prove that:

  • the environment installs or starts cleanly
  • the open data can be accessed or fetched
  • the analysis completes
  • the expected outputs are easy to compare

Once that proof exists, the lab can grow safely.