Documentation for the "incr_backup" procedure

Written by Ray Ballisti, 25th July 1997
Mod. 1.8.97; 13.08.97

Overview:
=========

	- Goal
	- Concept of incremental backup
	- Usage
	- Directories structure in ~ifhbup
	- Controlling files and dumping scheme
	- names of the files generated (logs and dumps)
	- stamp files on the partitions
	- Variables used for the logic
	- structure of the procedure "incr_backup"



GOAL
====

The goal of this procedure is to execute an incremental backup of some disk
partitions from a machine in our network to the machine "jabba" of our
departement. In jabba the dumps will be then copied automatically to a 
magn. tape cartridge ( using a roboter ).
On jabba the user "ifhbup" has an account with two links in
his home directory: "backup" and "archive". 

 archive -> /usr/jabba/archive/ifh/ifhbup
 backup  -> /usr/jabba/backup/ifh/ifhbup

The function of the two directories is obvious from their name.
We mount those directories via NFS using the automounter facility on:
/net/jabba/usr/jabba/backup/ifh/ifhbup

The transfer speed is usually one MB per second.



CONCEPT OF INCREMENTAL BACKUP
=============================

The dump of level 0 copies all data of a given disk partition to the dump file.
Each subsequent dump of an higher level only dumps those files which have
been modified after the last dump of a lower level.
If we execute, say, a dump of level 0 followed by one of level 6, then the
following dump of level 4 will consider all files modified after the last
dump of level 0. The one of level 6 can now be disregarded.
Our dump strategy is so, that we never need more than three dump files in
order to reconstruct a lost file.
As shown later on, we will use here a four week cycle, i.e. one dump of
level 0 each month. See "Controlling files and dumping scheme".



USAGE
=====

The procedure must be run as user "ifhbup" which belong to group "system", 
(ID-grp= 3) and it has to run on every machine from which we want to dump a 
disk partition to jabba. We require that the machine runs the automounter.
The procedure do NOT have any parameters, as everything is determined in
the directory ~ifhbup/bkup_data/.
Usually it is a cron job like this:

18 4 * * * /home/ifh/ifhbup/bin/incr_backup

Notice that cronjobs are stored on a "per machine" basis and not in the
user's home directory, so that we can have different crontabs for the same
user on different machines.


DIRECTORIES STRUCTURE:
=====================

One of the properties of this procedure is that everything which is needed
in order to tell the procedure what to do is stored under the directory 
sirius:~ifhbup/bkup_data. It is very important to understand the structure 
of this directory, because from it depends the behavior of the procedure:

the first level of directories under ~ifhbup/bkup_data consists of the names
of the machines from which we want to dump some partitions on jabba.

the second level of directories have the name of the partition to be dumped.
In the former version of the procedure the name was the one of the disk, i.e.
c0t2d0s6  c1t1d0s3  c3t1d0s5   etc. Later on (1.07.98) I had change it to
the "basename" of the mounting point. The assumption is that this name is
unique in the file /etc/vfstab of the machine on which the incr_backup will
be executed. Example of name see below.

In the directory ~ifhbup/bkup_data/ we have to find directories
with the name of the partitions that we want to dump. The procedure looks
there to get the information about which partition has to be dumped.
Obviously, only the system manager should create those directories.

          sirius:/home/ifh/ifhbup
                            |
                            |
         admin     doc    bkup_data     bin
                            |
                            |
    ...    black-hole     sirius    nausikaa   regulus  ....
                            |
                            |
 default_day_scheme ...  sirius134  black43  sirius427  ... general_rule 
                            |
                            |
   bkup_day_scheme  110-*  126-*  135-*  ....  family_root



CONTROLLING FILES AND DUMPING SCHEME
====================================

The information about what dump level and when it has to be done is stored
in two files:

	 ~ifhbup/bkup_data/"mach_name"/general_rule
and
	 ~ifhbup/bkup_data/"mach_name"/"part_name"/bkup_day_scheme

It is important to understand, that the file "general_rule" is defined
on a "per machine" basis while the file "bkup_day_scheme" is defined
on a "per partition" basis.

IMPORTANT: this version of the procedure suppose that we are using 
	   a dumping scheme which is not longer than FOUR weeks!!

 Content of file ~ifhbup/bkup_data/sirius/general_rule is (as an example):
	In directory /home/ifh/ifhbup/bkup_data/sirius:
	{sirius:[sirius]:43}% cat general_rule
	0  9  8  7  6  5  4     ==> will be stored in variable WEEK_ONE
	3  9  8  7  6  5  4     ==> will be stored in variable WEEK_TWO
	2  9  8  7  6  5  4     ==> will be stored in variable WEEK_THREE
	1  9  8  7  6  5  4     ==> will be stored in variable WEEK_FOUR
        ^
	It is recommended that those numbers in the first column should 
	be decreasing: 3 2 1 as shown and not increasing as 1 2 3.
	In this way we always need only three dumps to restore a file:
	the "0", the first of the week and the one of the day considered.

At least one dump_level MUST be zero ! (usually WEEK_ONE, Index 1  = 0 ).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is a central point of the whole scheme: a dump of level 0 must be done
as first dump of the whole series. We call it the "parent" of the whole
"family" of incremental dumps.
It would be nice to unmount the corresponding partition at least for 
the dump of level 0. This is not yet implemented.
The problem is that user "ifhbup" does not have the privilege necessary
to use the commands "umount" and "mount".
 
 Content of file bkup_day_scheme is, for instance:
	In directory /home/ifh/ifhbup/bkup_data/sirius :
	{sirius:[sirius]:67}% cat sirius427/bkup_day_scheme
	Sun Mon Tue Wed Thu Fri Sat    ==> will be stored in variable DAYSCHEME

It is the responsability of the system manager to chose the bkup_day_scheme
in such a way that dumps of level 0 for the different partitions and
machines are distributed all over the week.

In version 3.0 we introduced a file named "family_root" which contains
information about the root dump, i.e. the dump of level 0 upon which the
actual series of incremental dumps is based. This is called the "parent"
of the whole "family" of incremenmtal dumps:

~ifhbup/bkup_data/"mach_name"/"part_name"/family_root

 content of the file "family_root" is:

 ROOT_TAG  "corresponding-0th-level-dump-file-name-on-jabba" 

 The tag ROOT_TAG allow to identify all dumps which belong to the
 same family, i.e. have the same 0th level dump as a "parent".


NAMES OF FILES ( LOGS AND DUMPS )
=================================

There are at present two log files: one called in the following "global_log" 
which take record of what is done for a specific machine (output of the
cron-job) and in the procedure its name is stored in the variable GLOBALOG,   
and one called "log_file" which shows what is done for each single partition
and whose name is in the variable LOGFILE.
The first one is saved in a special directory: ~ifhbup/admin. 
The second one is stored in the directory with the corresponding partition
name. In the example above, this will be for instance the directory 
~ifhbup/bkp_data/sirius/sirius427.

An important element in creating the name of those files is the tag which
is builded as follows:

    TAG="${WEEK_LEVEL_NOW}${DAY_INDEX}${DMP_LEVEL}
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

WEEK_LEVEL_NOW shows in which of the four week we are supposed to be(1,2,3,4).
DAY_INDEX shows in which day we are in this week(1,2,3,4,5,6,7). 
(Not the real day, not our calender, but the one related to this partition,
 see variable DAYSCHEME).

TAG is also an integer with three digits. We usually expect "110" as the
first TAG in a scheme. The general_rule showed above will produces the
following TAGs:

		110  126  135  144  159  168  177
		213  226  235  244  259  268  277
		312  326  335  344  359  368  377
		411  426  435  444  459  468  477

You see that all 28 numbers are different. We are mostly free to set
the last digit, while the first two digits are fixed.
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Name of the global_log file is in the variable GLOBALOG, defined as follows:

GLOBALOG=${HOME_DIR}/admin/dump-${HOSTNAME}-${DATENR}

where:
HOME_DIR=/home/ifh/ifhbup     and
HOSTNAME is the name of the host on which the procedure runs.
DATENR is defined as "day_of_the_week"-"Year""Month""day"-"Hour"-"Minutes"
We did avoid using the character ":" in a file name in order not to
have problems with the syntax of the csh later on.
Example for DATENR:  "Tue-97Jul08-03-13"


The name of the log file related to a particular partition is stored in
the variable LOGFILE, defined as follows:

   	LOGFILE=${LOG_DIRS}/${PART}/${TAG}-${ROOT_TAG}-${DATENR}
  	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
where
LOG_DIRS=${HOME_DIR}/bkup_data/${HOSTNAME}  and
PART is the name of the partition considered at this point of the execution.
In the examples show before PART was "sirius427".

Note that in "DATENR" we have also stored the time, thus helping to
identify in an unique way the file.

A very important file is the real dump file, stored on jabba, whose name is
builded up with the following rule and stored in the variable DUMP_FILE:

     "mach_name"-"part_name"-"root_tag"-"date_stamp"."tag"

       DUMP_FILE="${HOSTNAME}-${PART}-${ROOT_TAG}-${DATENR}.${TAG}"
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This file will be copied in the directory:

/net/jabba/usr/jabba/backup/ifh/ifhbup
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

which will be mounted by the automounter and can also be retrieved there.

It should be clair that in this directory we will find ALL dumps from all our
institute's machines. This is not a problem, as every name is unique.

NOTE that the "TAG" on jabba is written in the last position after a "."
while for the log files I preferred to have it in front followed by a hyphen.
This has to do with the different kind of searching needed.



STAMP FILES ON PARTITIONS
=========================

Before we dump a partition we put a time marker to show on the dump
what it is, when it was done and on what (jabba or tape).
This is an empty file with a significan name. We do the same with
an empty file which shows the mouting point of the partition

   touch $MNTPT/.jabba-${TAG}-${ROOT_TAG}-${DATENR}
   touch $MNTPT/.part-${HOSTNAME}-${PART}-`basename ${MNTPT}`



VARIABLES USED FOR THE LOGIC
============================

The content of the variables WEEK_"nr" is obtained from the file
"general_rule" which is generally the same for all partitions of one machine.
The content of the var. DAYSCHEME is obtained from the file "bkup_day_scheme"
which is usually different for each partition.

Variable DAYSCHEME contains the rule for the days of the week for the partition
considered.

Today's name is in TODAY->  (as an example suppose it is Thursday, in short Thu)
                         |
                         |
DAYSCHEME     Tue  Wed  Thu  Fri  Sat  Sun  Mon
                         |
day_index --> 1    2     3   4    5    6     7  ==> in variable DAY_INDEX
                         ^
Index 3 will be used to find the dump_level out of the four week_levels rules.

WEEK_"nr" describe the dump_levels for each day of the 4 weeks considered:

DAY_INDEX=3 --------->|
                      |
WEEK_ONE     0   6    5   4   9   8   7      ==> dump_level = 5
WEEK_TWO     3   6    5   4   9   8   7      ==> dump_level = 5
WEEK_THREE   2   6    5   4   9   8   7      ==> dump_level = 5
WEEK_FOUR    1   6    5   4   9   8   7      ==> dump_level = 5

Once we know on which week we are, we can find the dump level requested for 
this day and put it into the variable DMP_LEVEL. In the example above it
happens that all four weeks require a dump of level nr. 5. But this
does not have to be.

Some variables heve to be taken into consideration in taking decisions:
 
  MUST_BE_DONE=false : usual case: we should execute an incremental dump
  MUST_BE_DONE=true  : we NEED to execute a 0th level dump.
 
 


 
STRUCTURE OF THE PROCEDURE "incr_backup"
========================================

	- setting parameters in the corresponding variables
	- checking the connection to machine "jabba.ethz.ch"
	- checking the mounting of the jabba directory with automount
	- checking that dir ~ifhbup/bkup_data/"mach_name" exists
	- reading and parsing the "general_rule" for this machine

	= starting a loop over all partitions for this machine
	- read the "bkup_day_scheme" for this partition (DAYSCHEME)
	- find how many dumps were done each week (DD_WK_"nr")
	- find out in which week we are now (WEEK_LEVEL_NOW and => DMP_LEVEL)
	- find which is the day_index for this partition (DAY_INDEX)
	- state the allowed (requested) dump levels for each week (ALLOWED)
  (*)	- decide if there is anything to do and what to do
	- define TAG and LOGFILE for this partition
	- write a time stamp and other information on the partition
	- write to LOGFILE the top directories of the partition
	- execute the dump
	= loop for the next partition

	- end procedure

(*) this is the only critical and more complicated part of the procedure.
All others are explained in comments in the procedure itself.


 

Please mail suggestions (very welcome) or requests to ballisti@ifh.ee.ethz.ch (Click here)