Written by Ray Ballisti, 25th July 1997
Mod. 1.8.97; 13.08.97
Overview: ========= - Goal - Concept of incremental backup - Usage - Directories structure in ~ifhbup - Controlling files and dumping scheme - names of the files generated (logs and dumps) - stamp files on the partitions - Variables used for the logic - structure of the procedure "incr_backup" GOAL ==== The goal of this procedure is to execute an incremental backup of some disk partitions from a machine in our network to the machine "jabba" of our departement. In jabba the dumps will be then copied automatically to a magn. tape cartridge ( using a roboter ). On jabba the user "ifhbup" has an account with two links in his home directory: "backup" and "archive". archive -> /usr/jabba/archive/ifh/ifhbup backup -> /usr/jabba/backup/ifh/ifhbup The function of the two directories is obvious from their name. We mount those directories via NFS using the automounter facility on: /net/jabba/usr/jabba/backup/ifh/ifhbup The transfer speed is usually one MB per second. CONCEPT OF INCREMENTAL BACKUP ============================= The dump of level 0 copies all data of a given disk partition to the dump file. Each subsequent dump of an higher level only dumps those files which have been modified after the last dump of a lower level. If we execute, say, a dump of level 0 followed by one of level 6, then the following dump of level 4 will consider all files modified after the last dump of level 0. The one of level 6 can now be disregarded. Our dump strategy is so, that we never need more than three dump files in order to reconstruct a lost file. As shown later on, we will use here a four week cycle, i.e. one dump of level 0 each month. See "Controlling files and dumping scheme". USAGE ===== The procedure must be run as user "ifhbup" which belong to group "system", (ID-grp= 3) and it has to run on every machine from which we want to dump a disk partition to jabba. We require that the machine runs the automounter. The procedure do NOT have any parameters, as everything is determined in the directory ~ifhbup/bkup_data/. Usually it is a cron job like this: 18 4 * * * /home/ifh/ifhbup/bin/incr_backup Notice that cronjobs are stored on a "per machine" basis and not in the user's home directory, so that we can have different crontabs for the same user on different machines. DIRECTORIES STRUCTURE: ===================== One of the properties of this procedure is that everything which is needed in order to tell the procedure what to do is stored under the directory sirius:~ifhbup/bkup_data. It is very important to understand the structure of this directory, because from it depends the behavior of the procedure: the first level of directories under ~ifhbup/bkup_data consists of the names of the machines from which we want to dump some partitions on jabba. the second level of directories have the name of the partition to be dumped. In the former version of the procedure the name was the one of the disk, i.e. c0t2d0s6 c1t1d0s3 c3t1d0s5 etc. Later on (1.07.98) I had change it to the "basename" of the mounting point. The assumption is that this name is unique in the file /etc/vfstab of the machine on which the incr_backup will be executed. Example of name see below. In the directory ~ifhbup/bkup_data/ we have to find directories with the name of the partitions that we want to dump. The procedure looks there to get the information about which partition has to be dumped. Obviously, only the system manager should create those directories. sirius:/home/ifh/ifhbup | | admin doc bkup_data bin | | ... black-hole sirius nausikaa regulus .... | | default_day_scheme ... sirius134 black43 sirius427 ... general_rule | | bkup_day_scheme 110-* 126-* 135-* .... family_root CONTROLLING FILES AND DUMPING SCHEME ==================================== The information about what dump level and when it has to be done is stored in two files: ~ifhbup/bkup_data/"mach_name"/general_rule and ~ifhbup/bkup_data/"mach_name"/"part_name"/bkup_day_scheme It is important to understand, that the file "general_rule" is defined on a "per machine" basis while the file "bkup_day_scheme" is defined on a "per partition" basis. IMPORTANT: this version of the procedure suppose that we are using a dumping scheme which is not longer than FOUR weeks!! Content of file ~ifhbup/bkup_data/sirius/general_rule is (as an example): In directory /home/ifh/ifhbup/bkup_data/sirius: {sirius:[sirius]:43}% cat general_rule 0 9 8 7 6 5 4 ==> will be stored in variable WEEK_ONE 3 9 8 7 6 5 4 ==> will be stored in variable WEEK_TWO 2 9 8 7 6 5 4 ==> will be stored in variable WEEK_THREE 1 9 8 7 6 5 4 ==> will be stored in variable WEEK_FOUR ^ It is recommended that those numbers in the first column should be decreasing: 3 2 1 as shown and not increasing as 1 2 3. In this way we always need only three dumps to restore a file: the "0", the first of the week and the one of the day considered. At least one dump_level MUST be zero ! (usually WEEK_ONE, Index 1 = 0 ). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a central point of the whole scheme: a dump of level 0 must be done as first dump of the whole series. We call it the "parent" of the whole "family" of incremental dumps. It would be nice to unmount the corresponding partition at least for the dump of level 0. This is not yet implemented. The problem is that user "ifhbup" does not have the privilege necessary to use the commands "umount" and "mount". Content of file bkup_day_scheme is, for instance: In directory /home/ifh/ifhbup/bkup_data/sirius : {sirius:[sirius]:67}% cat sirius427/bkup_day_scheme Sun Mon Tue Wed Thu Fri Sat ==> will be stored in variable DAYSCHEME It is the responsability of the system manager to chose the bkup_day_scheme in such a way that dumps of level 0 for the different partitions and machines are distributed all over the week. In version 3.0 we introduced a file named "family_root" which contains information about the root dump, i.e. the dump of level 0 upon which the actual series of incremental dumps is based. This is called the "parent" of the whole "family" of incremenmtal dumps: ~ifhbup/bkup_data/"mach_name"/"part_name"/family_root content of the file "family_root" is: ROOT_TAG "corresponding-0th-level-dump-file-name-on-jabba" The tag ROOT_TAG allow to identify all dumps which belong to the same family, i.e. have the same 0th level dump as a "parent". NAMES OF FILES ( LOGS AND DUMPS ) ================================= There are at present two log files: one called in the following "global_log" which take record of what is done for a specific machine (output of the cron-job) and in the procedure its name is stored in the variable GLOBALOG, and one called "log_file" which shows what is done for each single partition and whose name is in the variable LOGFILE. The first one is saved in a special directory: ~ifhbup/admin. The second one is stored in the directory with the corresponding partition name. In the example above, this will be for instance the directory ~ifhbup/bkp_data/sirius/sirius427. An important element in creating the name of those files is the tag which is builded as follows: TAG="${WEEK_LEVEL_NOW}${DAY_INDEX}${DMP_LEVEL} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ WEEK_LEVEL_NOW shows in which of the four week we are supposed to be(1,2,3,4). DAY_INDEX shows in which day we are in this week(1,2,3,4,5,6,7). (Not the real day, not our calender, but the one related to this partition, see variable DAYSCHEME). TAG is also an integer with three digits. We usually expect "110" as the first TAG in a scheme. The general_rule showed above will produces the following TAGs: 110 126 135 144 159 168 177 213 226 235 244 259 268 277 312 326 335 344 359 368 377 411 426 435 444 459 468 477 You see that all 28 numbers are different. We are mostly free to set the last digit, while the first two digits are fixed. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Name of the global_log file is in the variable GLOBALOG, defined as follows: GLOBALOG=${HOME_DIR}/admin/dump-${HOSTNAME}-${DATENR} where: HOME_DIR=/home/ifh/ifhbup and HOSTNAME is the name of the host on which the procedure runs. DATENR is defined as "day_of_the_week"-"Year""Month""day"-"Hour"-"Minutes" We did avoid using the character ":" in a file name in order not to have problems with the syntax of the csh later on. Example for DATENR: "Tue-97Jul08-03-13" The name of the log file related to a particular partition is stored in the variable LOGFILE, defined as follows: LOGFILE=${LOG_DIRS}/${PART}/${TAG}-${ROOT_TAG}-${DATENR} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ where LOG_DIRS=${HOME_DIR}/bkup_data/${HOSTNAME} and PART is the name of the partition considered at this point of the execution. In the examples show before PART was "sirius427". Note that in "DATENR" we have also stored the time, thus helping to identify in an unique way the file. A very important file is the real dump file, stored on jabba, whose name is builded up with the following rule and stored in the variable DUMP_FILE: "mach_name"-"part_name"-"root_tag"-"date_stamp"."tag" DUMP_FILE="${HOSTNAME}-${PART}-${ROOT_TAG}-${DATENR}.${TAG}" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This file will be copied in the directory: /net/jabba/usr/jabba/backup/ifh/ifhbup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ which will be mounted by the automounter and can also be retrieved there. It should be clair that in this directory we will find ALL dumps from all our institute's machines. This is not a problem, as every name is unique. NOTE that the "TAG" on jabba is written in the last position after a "." while for the log files I preferred to have it in front followed by a hyphen. This has to do with the different kind of searching needed. STAMP FILES ON PARTITIONS ========================= Before we dump a partition we put a time marker to show on the dump what it is, when it was done and on what (jabba or tape). This is an empty file with a significan name. We do the same with an empty file which shows the mouting point of the partition touch $MNTPT/.jabba-${TAG}-${ROOT_TAG}-${DATENR} touch $MNTPT/.part-${HOSTNAME}-${PART}-`basename ${MNTPT}` VARIABLES USED FOR THE LOGIC ============================ The content of the variables WEEK_"nr" is obtained from the file "general_rule" which is generally the same for all partitions of one machine. The content of the var. DAYSCHEME is obtained from the file "bkup_day_scheme" which is usually different for each partition. Variable DAYSCHEME contains the rule for the days of the week for the partition considered. Today's name is in TODAY-> (as an example suppose it is Thursday, in short Thu) | | DAYSCHEME Tue Wed Thu Fri Sat Sun Mon | day_index --> 1 2 3 4 5 6 7 ==> in variable DAY_INDEX ^ Index 3 will be used to find the dump_level out of the four week_levels rules. WEEK_"nr" describe the dump_levels for each day of the 4 weeks considered: DAY_INDEX=3 --------->| | WEEK_ONE 0 6 5 4 9 8 7 ==> dump_level = 5 WEEK_TWO 3 6 5 4 9 8 7 ==> dump_level = 5 WEEK_THREE 2 6 5 4 9 8 7 ==> dump_level = 5 WEEK_FOUR 1 6 5 4 9 8 7 ==> dump_level = 5 Once we know on which week we are, we can find the dump level requested for this day and put it into the variable DMP_LEVEL. In the example above it happens that all four weeks require a dump of level nr. 5. But this does not have to be. Some variables heve to be taken into consideration in taking decisions: MUST_BE_DONE=false : usual case: we should execute an incremental dump MUST_BE_DONE=true : we NEED to execute a 0th level dump. STRUCTURE OF THE PROCEDURE "incr_backup" ======================================== - setting parameters in the corresponding variables - checking the connection to machine "jabba.ethz.ch" - checking the mounting of the jabba directory with automount - checking that dir ~ifhbup/bkup_data/"mach_name" exists - reading and parsing the "general_rule" for this machine = starting a loop over all partitions for this machine - read the "bkup_day_scheme" for this partition (DAYSCHEME) - find how many dumps were done each week (DD_WK_"nr") - find out in which week we are now (WEEK_LEVEL_NOW and => DMP_LEVEL) - find which is the day_index for this partition (DAY_INDEX) - state the allowed (requested) dump levels for each week (ALLOWED) (*) - decide if there is anything to do and what to do - define TAG and LOGFILE for this partition - write a time stamp and other information on the partition - write to LOGFILE the top directories of the partition - execute the dump = loop for the next partition - end procedure (*) this is the only critical and more complicated part of the procedure. All others are explained in comments in the procedure itself.