9 Analysis Environments
9.1 Environments are agreements
An analysis environment is an agreement between your code and the software stack that executes it. Break that agreement and results become harder to reproduce.
9.2 Isolate by project when practical
Project-specific environments reduce dependency conflicts and make work easier to revive later. Depending on your stack, this may mean:
- Python virtual environments
uvproject managementcondaenvironmentsrenvfor R- containerized workflows
The right choice depends on your language mix, platform, collaboration style, and tolerance for complexity.
In a portable-lab model, these choices also shape how easy it is for someone else to pull your environment from GitHub and get to a first result.
9.3 Capture what matters
At minimum, record:
- language versions
- key package versions
- external system dependencies
- environment creation instructions
- secrets handling approach
A working environment that cannot be described is a temporary accident.
For book-linked labs, keep this description close to the repo itself. Readers should not have to search through prose to discover how to start the environment.
9.4 Notebooks, scripts, and pipelines
Different analysis surfaces support different stages of work:
- notebooks for exploration and explanation
- scripts for repeatable tasks
- pipelines for multi-step production workflows
- packages for shared logic
The mature lab uses all of them, but with clear boundaries.
9.5 Containers are useful, not mandatory
Containers can improve portability and team consistency, but they are not always the next best step. In this book, they are best understood as a packaging option for a lab you already understand.
Use them when you need:
- deployment parity
- isolation from the host machine
- repeatable execution across systems
- controlled teaching or workshop environments
They are especially helpful when you want a GitHub-hosted lab environment to behave the same way across many machines.
Avoid them when they add complexity without solving a real problem.
9.6 Docker and OrbStack
Docker is the broad ecosystem and packaging model most readers will recognize. It is useful when you want to distribute a known-good runtime, define services, or give contributors one repeatable way to run the lab.
OrbStack is best understood as a practical macOS host layer for container and Linux-machine workflows. It can make local container use feel lighter and cleaner on macOS, especially for people who want Docker-compatible workflows without as much friction.
For this book, the distinction can be framed simply:
- use Docker concepts and files as the portable standard
- mention OrbStack as a strong macOS implementation choice
- keep native language environments as the default starting point
9.7 A minimal research container workflow
For a book-connected proof of concept, a container setup should probably do only a few jobs:
- provide the required runtime
- mount or include a small open dataset
- run a canonical script or notebook
- produce one expected result
If the container story requires orchestration, secrets, multiple services, and heavy data pulls on day one, it is too much for the proof-of-concept phase.
9.8 When not to containerize
Do not make containers the mandatory front door when:
- the workflow is already easy to reproduce with project-local environments
- readers are new to the command line
- the domain tools depend heavily on local graphics or desktop software
- the data is too large or too sensitive to bundle casually
Containers are most useful when they remove onboarding pain, not when they introduce a second curriculum.