7 Research Workflows

7.1 Treat projects as repeatable containers

Every project should feel familiar when you open it. That means using a consistent layout and a small set of conventions.

A useful project structure might separate:

data-raw/
data/
notebooks/
src/
results/
docs/
references/

The exact names matter less than the distinctions.

When a project is also meant to be pulled by others from GitHub, this consistency becomes even more valuable. The book can teach one lab shape, and each portable environment can reinforce it.

7.2 Write down the operating context

Each project benefits from a short README that answers:

what question is this project addressing
where did the data come from
what are the important scripts or notebooks
what environment is required
what outputs matter
what is incomplete or risky

This small habit dramatically improves handoff and re-entry.

For distributed lab environments, the README should also answer one extra question: what is the fastest path to a first successful run?

7.3 Separate exploration from production

Exploration is where you try ideas quickly. Production is where you stabilize what worked. Problems start when these blur together.

Healthy projects often have:

exploratory notebooks or scratch scripts
cleaned scripts or pipelines
explicit output directories
versioned reports or manuscripts

Let notebooks be exploratory, but promote durable logic into scripts, packages, or documented workflows.

7.4 Record provenance

For every important result, you should be able to answer:

which data produced it
which code produced it
which parameters mattered
when it was generated
where the result was saved

Perfect provenance may be unrealistic, but partial provenance is much better than none.

7.5 Design a proof of concept, not a monument

The first portable lab for a domain such as computational geography should be intentionally small. It should prove that:

the environment installs or starts cleanly
the open data can be accessed or fetched
the analysis completes
the expected outputs are easy to compare

Once that proof exists, the lab can grow safely.