Differences between revisions 2 and 14 (spanning 12 versions)
Revision 2 as of 2010-09-22 07:18:01
Size: 1481
Editor: root
Comment:
Revision 14 as of 2013-04-18 06:47:18
Size: 1868
Editor: IanRees
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Backup: Long Answer =
An EMEN2 database environment contains three types of files: database files, log files, and region files.
= EMEN2 Backups =
Line 4: Line 3:
Database files contain key/value pairs that comprise all the records in the database, as well as a number of database files used for indexes. Database files are contained in $DB_HOME/data and subdirectories. Log files contain data from all committed transactions, and are stored in $DB_HOME/log as log.XX, where XX are consecutive integers starting from 1. An EMEN2 environment contains a number of things:
Line 6: Line 5:
To provide guarantees about transaction atomicity and durability, changes are first written to log files on stable storage before a transaction is marked as committed. The database files are not updated until this has been completed. In the event of a crash or hardware failure, the database files can be checked against the log files to correct any errors or missing data. BerkeleyDB files:
 * _db.* (environment backing files)
 * data/ (databases)
 * journal/ (transaction journal)
Line 8: Line 10:
Because a cold backup copies the database files, the database must be stopped so they are not changed while the backup is in progress. Once a cold backup is made, it can be updated with a hot backup. A hot backup only copies new log files, which are append-only, and does not require the database files to be stable during the backup. EMEN2-managed file attachments:
 * binary/ (file storage)
 * preview/ (thumbnails and other derived data)
 * tmp/ (temporary files)
Line 10: Line 15:
Configuration and application logs:
 * DB_CONFIG
 * config.json
 * log/ (EMEN2 application logs)
 * ssl/ (SSL certificates)
Line 11: Line 21:
== backup.py == = Cold Backups =
Line 13: Line 23:
This page is currently being rewritten to avoid displaying incorrect or out-of-date information. EMEN2 provides several mechanisms for backing up metadata and raw data -- please contact email me if you would like specifics while I rewrite this page. The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.

= Hot Backups =

If the database is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then checkpoint the environment and copy updated BerkeleyDB transaction log files using:

{{{
emen2ctl archive -h <EMEN2DBHOME>
}}}

This command will copy the journal/log.* files to the configuration-specified directory, by default, EMEN2DBHOME/journal_archive. These can be copied to the journal directory of the cold backup, and replayed using the BerkeleyDB recover command, "db_recover -c -h <backup directory>". Please email me if you have any questions or concerns about this operation.

= Non-BerkeleyDB Files =

Files that are not part of the BerkeleyDB environment (binary, preview, config, etc.) can be copied at any time using normal backup procedures; rsync is probably the most appropriate tool.

If you have changed your configuration to use directories outside of EMEN2DBHOME (most commonly, to place binary storage on different disk) make sure you back these up as well! Again, rsync is fine.

EMEN2 Backups

An EMEN2 environment contains a number of things:

BerkeleyDB files:

  • _db.* (environment backing files)
  • data/ (databases)
  • journal/ (transaction journal)

EMEN2-managed file attachments:

  • binary/ (file storage)
  • preview/ (thumbnails and other derived data)
  • tmp/ (temporary files)

Configuration and application logs:

  • DB_CONFIG
  • config.json
  • log/ (EMEN2 application logs)
  • ssl/ (SSL certificates)

Cold Backups

The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.

Hot Backups

If the database is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then checkpoint the environment and copy updated BerkeleyDB transaction log files using:

emen2ctl archive -h <EMEN2DBHOME>

This command will copy the journal/log.* files to the configuration-specified directory, by default, EMEN2DBHOME/journal_archive. These can be copied to the journal directory of the cold backup, and replayed using the BerkeleyDB recover command, "db_recover -c -h <backup directory>". Please email me if you have any questions or concerns about this operation.

Non-BerkeleyDB Files

Files that are not part of the BerkeleyDB environment (binary, preview, config, etc.) can be copied at any time using normal backup procedures; rsync is probably the most appropriate tool.

If you have changed your configuration to use directories outside of EMEN2DBHOME (most commonly, to place binary storage on different disk) make sure you back these up as well! Again, rsync is fine.

EMEN2/Backups (last edited 2013-04-18 06:47:18 by IanRees)