Differences between revisions 3 and 4
Revision 3 as of 2010-09-22 07:19:09
Size: 1530
Editor: root
Comment:
Revision 4 as of 2010-12-07 10:54:03
Size: 1584
Editor: root
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from EMEN2/BackupDiscussion
= Backup: Long Answer =
An EMEN2 database environment contains three types of files: database files, log files, and region files.
# EMEN2 Backups
Line 5: Line 3:
Database files contain key/value pairs that comprise all the records in the database, as well as a number of database files used for indexes. Database files are contained in $DB_HOME/data and subdirectories. Log files contain data from all committed transactions, and are stored in $DB_HOME/log as log.XX, where XX are consecutive integers starting from 1. An EMEN2 environment contains a number of things:
Line 7: Line 5:
To provide guarantees about transaction atomicity and durability, changes are first written to log files on stable storage before a transaction is marked as committed. The database files are not updated until this has been completed. In the event of a crash or hardware failure, the database files can be checked against the log files to correct any errors or missing data. BerkeleyDB files:
* __db.* (BDB backing files)
* home/ (BDB registration)
* log/ (BDB log files)
* data/ (EMEN2 databases)
Line 9: Line 11:
Because a cold backup copies the database files, the database must be stopped so they are not changed while the backup is in progress. Once a cold backup is made, it can be updated with a hot backup. A hot backup only copies new log files, which are append-only, and does not require the database files to be stable during the backup. EMEN2-managed file attachments:
* emen2data/ (file storage)
* tiles/ (thumbnails and other derived data)
* tmp/ (temporary files)
Line 11: Line 16:
Configuration and application logs:
* DB_CONFIG
* config.json
* applog/ (EMEN2 application logs)
* ssl/ (encryption keys)
Line 12: Line 22:
== backup.py == # Cold Backups
Line 14: Line 24:
This page is currently being rewritten to avoid displaying incorrect or out-of-date information. EMEN2 provides several mechanisms for backing up metadata and raw data -- please contact email me if you would like specifics while I rewrite this page. The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.

If you have specified paths outside EMEN2DBHOME, e.g. for binary attachment storage, you will also need to archive these directories.

# Hot Backups

If the EMEN2 environment is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then copy updated BerkeleyDB log files using:

{{{
emen2control.py --log_archive
}}}

This command will copy the EMEN2DBHOME/log/log.* files to the configuration-specified directory, and you can use these to bring a cold-backup up to date using "db_restore -c"

# Non-BerkeleyDB Files

Files that are not part of the BerkeleyDB environment (emen2data, tiles, config, etc.) can be copied at any time using normal backup procedures; "rsync" is probably the most appropriate tool.

An EMEN2 environment contains a number of things:

BerkeleyDB files: * db.* (BDB backing files) * home/ (BDB registration) * log/ (BDB log files) * data/ (EMEN2 databases)

EMEN2-managed file attachments: * emen2data/ (file storage) * tiles/ (thumbnails and other derived data) * tmp/ (temporary files)

Configuration and application logs: * DB_CONFIG * config.json * applog/ (EMEN2 application logs) * ssl/ (encryption keys)

# Cold Backups

The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.

If you have specified paths outside EMEN2DBHOME, e.g. for binary attachment storage, you will also need to archive these directories.

# Hot Backups

If the EMEN2 environment is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then copy updated BerkeleyDB log files using:

emen2control.py --log_archive

This command will copy the EMEN2DBHOME/log/log.* files to the configuration-specified directory, and you can use these to bring a cold-backup up to date using "db_restore -c"

# Non-BerkeleyDB Files

Files that are not part of the BerkeleyDB environment (emen2data, tiles, config, etc.) can be copied at any time using normal backup procedures; "rsync" is probably the most appropriate tool.

EMEN2/Backups (last edited 2013-04-18 06:47:18 by IanRees)