7 Research Workflows
7.1 Treat projects as repeatable containers
Every project should feel familiar when you open it. That means using a consistent layout and a small set of conventions.
A useful project structure might separate:
data-raw/data/notebooks/src/results/docs/references/
The exact names matter less than the distinctions.
When a project is also meant to be pulled by others from GitHub, this consistency becomes even more valuable. The book can teach one lab shape, and each portable environment can reinforce it.
7.2 Write down the operating context
Each project benefits from a short README that answers:
- what question is this project addressing
- where did the data come from
- what are the important scripts or notebooks
- what environment is required
- what outputs matter
- what is incomplete or risky
This small habit dramatically improves handoff and re-entry.
For distributed lab environments, the README should also answer one extra question: what is the fastest path to a first successful run?
7.3 Separate exploration from production
Exploration is where you try ideas quickly. Production is where you stabilize what worked. Problems start when these blur together.
Healthy projects often have:
- exploratory notebooks or scratch scripts
- cleaned scripts or pipelines
- explicit output directories
- versioned reports or manuscripts
Let notebooks be exploratory, but promote durable logic into scripts, packages, or documented workflows.
7.4 Record provenance
For every important result, you should be able to answer:
- which data produced it
- which code produced it
- which parameters mattered
- when it was generated
- where the result was saved
Perfect provenance may be unrealistic, but partial provenance is much better than none.
7.5 Design a proof of concept, not a monument
The first portable lab for a domain such as computational geography should be intentionally small. It should prove that:
- the environment installs or starts cleanly
- the open data can be accessed or fetched
- the analysis completes
- the expected outputs are easy to compare
Once that proof exists, the lab can grow safely.