10  Backup and Retention

10.1 Backup is a design requirement

If losing your machine would materially damage your work, backup strategy belongs near the beginning of your setup process, not the end.

10.2 The three copies principle

A durable baseline is often summarized as:

  • one working copy
  • one local or nearline backup
  • one offsite copy

The exact implementation varies, but the principle is simple: one copy is not a backup.

10.3 Sync is not the same as backup

Cloud sync tools are valuable, but they are not sufficient by themselves. Sync spreads changes quickly, including bad changes such as accidental deletions, corruption, or unwanted edits.

Good research labs distinguish between:

  • synchronization for convenience
  • backup for recovery
  • archival retention for historical preservation

10.4 Retention is a policy question

Decide what should be kept and for how long:

  • raw source data
  • cleaned analysis-ready data
  • intermediate files
  • figures and reports
  • manuscripts and submissions
  • lab notebooks and decisions

Retention should reflect scientific value, legal obligations, storage cost, and rebuild difficulty.

10.5 Test recovery

A backup you have never restored from is partly hypothetical. Periodically test that you can recover:

  • a project directory
  • a note archive
  • a configuration file
  • a historical version of an important output

Recovery drills turn backup from optimism into evidence.