9 Analysis Environments

9.1 Environments are agreements

An analysis environment is an agreement between your code and the software stack that executes it. Break that agreement and results become harder to reproduce.

9.2 Isolate by project when practical

Project-specific environments reduce dependency conflicts and make work easier to revive later. Depending on your stack, this may mean:

Python virtual environments
uv project management
conda environments
renv for R
containerized workflows

The right choice depends on your language mix, platform, collaboration style, and tolerance for complexity.

In a portable-lab model, these choices also shape how easy it is for someone else to pull your environment from GitHub and get to a first result.

9.3 Capture what matters

At minimum, record:

language versions
key package versions
external system dependencies
environment creation instructions
secrets handling approach

A working environment that cannot be described is a temporary accident.

For book-linked labs, keep this description close to the repo itself. Readers should not have to search through prose to discover how to start the environment.

9.4 Notebooks, scripts, and pipelines

Different analysis surfaces support different stages of work:

notebooks for exploration and explanation
scripts for repeatable tasks
pipelines for multi-step production workflows
packages for shared logic

The mature lab uses all of them, but with clear boundaries.

9.5 Containers are useful, not mandatory

Containers can improve portability and team consistency, but they are not always the next best step. In this book, they are best understood as a packaging option for a lab you already understand.

Use them when you need:

deployment parity
isolation from the host machine
repeatable execution across systems
controlled teaching or workshop environments

They are especially helpful when you want a GitHub-hosted lab environment to behave the same way across many machines.

Avoid them when they add complexity without solving a real problem.

9.6 Docker and OrbStack

Docker is the broad ecosystem and packaging model most readers will recognize. It is useful when you want to distribute a known-good runtime, define services, or give contributors one repeatable way to run the lab.

OrbStack is best understood as a practical macOS host layer for container and Linux-machine workflows. It can make local container use feel lighter and cleaner on macOS, especially for people who want Docker-compatible workflows without as much friction.

For this book, the distinction can be framed simply:

use Docker concepts and files as the portable standard
mention OrbStack as a strong macOS implementation choice
keep native language environments as the default starting point

9.7 A minimal research container workflow

For a book-connected proof of concept, a container setup should probably do only a few jobs:

provide the required runtime
mount or include a small open dataset
run a canonical script or notebook
produce one expected result

If the container story requires orchestration, secrets, multiple services, and heavy data pulls on day one, it is too much for the proof-of-concept phase.

9.8 When not to containerize

Do not make containers the mandatory front door when:

the workflow is already easy to reproduce with project-local environments
readers are new to the command line
the domain tools depend heavily on local graphics or desktop software
the data is too large or too sensitive to bundle casually

Containers are most useful when they remove onboarding pain, not when they introduce a second curriculum.