how_to_clone_tru64_and_digital_unix

Cloning Digital Unix and Tru64

There are various places on The Net you can find how to clone a Tru64 system, but most of them are just discussions and hand waving. The complete process is documented here.

Process Overview

Here is a quick and dirty view of this overall process.

  1. Find out what filesystems, disklabels, and disks you are using for the source of your clone
  2. Find out the name of the target disk you are cloning to and make sure it's not in use for AdvFS, UFS, swap, or raw disk partitions for a database like Oracle or Sybase.
  3. Determine if your disks are exactly the same. If so, you can use the dd method, which is faster since it allows you to skip boot blocks, disklabel editing, and copying of data.
  4. Make the target disk bootable by doing proper commands in the right sequence (disklabel)
  5. Edit the disklabel if needed to fix any mismatches in capacity between source and destination
  6. Put a file system (UFS) or file domain and file set (AdvFS) on your target disk (or more than one which is usually needed).
  7. Copy the data over from the source disk, isolating only one file system at a time (cpio, vdump, pax, rsync, etc..)
  8. Fix the disk names in config files in /etc
  9. Fix the paths and links in /etc/fdmns if you are using AdvFS
  10. Reboot and test

Here are a few things you should not plan on doing as they will ruin your attempt.

  1. Do not plan on editing the disklabel with flags other than -e after you add the bootblocks.
  2. Do not try to to skip the disklabel step (you won't get bootblocks).
  3. Do not try to eliminate the /usr file system for UFS and AdvFS. Startup scripts in /sbin/init.d will force you to have it and freak out (stopping the boot process) if you don't. Don't try to consolidate those unless you plan on manually editing every single startup script and fixing them, which isn't worth it.
  4. You might have read about sysman clone features. Forget that. That tool is garbage and will cause you much more pain than it saves you. Plan on doing it manually yourself or have an expert do it for you using this process.
  5. Do not plan on using the dd method unless you have completely identical disks. That has several major drawbacks otherwise. The worst include performance problems (due to bad geometry alignment on the disklabel) and loss of space if the target disk is larger (or loss of data if the target disk is smaller). However, just to say we shared the actual command if you cloned an identical dsk1 onto a dsk2 in tru64 5.x it'd be this command: dd if=/dev/disk/dsk1c of=/dev/disk/dsk2c bs=1024k

Take Inventory

Cloning can be complicated because of the different filesystems and storage layouts on your system. Before you begin, you need to know what kind of storage you are using and how it's laid out. We need to answer the following questions:

  1. What type of disks do we use (SCSI, IDE, RAID, SAN, etc..)?
  2. How are the disks partitioned?
  3. Is LSM in play? What about AdvFS?
  4. What type of filesystem is on the disks?

How can you find out these things? Well, I'd use the following command sequence to tell me.

# for Digital Unix 
scu show edt

# for Tru64
hwmgr show scsi

# Find out the names of all AdvFS file domains and what disks are in them
ls -lR /etc/fdmns

# Find out what is currently mounted *now* and if it's LSM, AdvFS, or UFS
mount

# Find out what the system is setup with for the root partition and more
cat /etc/fstab

# Show the disklabel of the root disk to see how it was setup where
# "MyDisk" is a disk device like 'dsk0' (tru64) or 'rz0' (digital Unix)
disklabel -r MyDISK

# If we are using LSM on our root disk, I panic and run away. Don't 
# try to clone this setup, recreate it and restore backups to it. 
volprint -ht 

About LSM

LSM is a volume management scheme which is pretty much identical to Veritas Volume Manager (VxVM). This is because, at the time, DEC was able to secure a one-shot licensed copy of VxVM but they agreed to change it's name. So, if you know VxVM, then all you do is replace the string “vx” with the string “vol” on all the commands and they will work fine. For example, instead of using vxprint you use volprint and so forth. Don't get me wrong, VxVM is a great volume manager, it's just that when they glued it onto Tru64, they really really made root disk encapsulation (putting your boot drives into VxVM and getting RAID-1 working so you can boot off both sides of the mirror) is a HUGE pain in the neck to fix or maintain. It's pretty easy to install with it, but you end up with a giant white elephant. Most folks are just as likely to delete their surviving mirror as they are to resilver and fix their systems when using LSM.

Systems using LSM are extremely hard to clone unless you can use dd alone to clone a single disk. This would not be a very common LSM configuration (what would be the point of a single-disk LSM config?). So, my advice on cloning LSM-based systems is “do not try”. Instead, recreate the LSM RAID layout via a “shell” Tru64 installation and then use vdump and vrestore to get the data back over there.

The Boot Sector

In Tru64 and Digital Unix you need to make sure the disk has the proper boot blocks at the front of the disk. This is done using disklabel and it's not super-intuitive. You absolutely must use both the -rw and the -t flags when installing boot blocks. Without both sets of flags the procedure will fail. Also, the behavior of the tool is a bit odd sometimes and the boot blocks don't get properly installed or get clobbered later on. So, when starting a clone, I'd suggest zeroing out the disklabel and re-installing it from scratch using the exact-right syntax, then doing only edits (using -e to disklabel) after that point. Here is a couple of examples. I'll use a disk name of rz0 in this case, but you should alter that to fit your system. Also, don't do all the steps, but just the ones that correspond with your file system type. Boot blocks have to be customized for the file system that you are using.

# Remove and zero the existing disklabel 
disklabel -z rz0

# install the boot blocks for an AdvFS root file system - ONLY FOR ADVFS
disklabel -rw -t advfs rz0

# install the boot blocks for UFS on root - ONLY FOR UFS
disklabel -rw -t ufs rz0 

# Setup your disklabel so that the partitions don't overlap
# and set the slice-type to UFS, AdvFS, and for swap
disklabel -e rz0

REMEMBER: You need to use UFS or AdvFS boot sectors, but not both. They are mutually exclusive.

You will find the boot sectors themselves under the /mdec directory. There are ones for UFS, AdvFS, and CDFS (iso9660 with rock ridge extensions). This is where disklabel goes to find them and thus you can see that none of them are mixed case or have any unexpected spelling.

# ls /mdec
bootblks      bootra.advfs  bootre        bootre.cdfs
bootrz.advfs  bootxx        bootxx.cdfs   raboot.advfs
reboot        reboot.cdfs   rzboot.advfs  xxboot        
xxboot.cdfs   bootra        bootra.cdfs   bootre.advfs
bootrz        bootrz.cdfs   bootxx.advfs  raboot
raboot.cdfs   reboot.advfs  rzboot        rzboot.cdfs   
xxboot.advfs

Editing the Disklabel

Tru64 and Digital Unix have a strong connection to BSD Unix. This is because OSF/1 which was the predecessor to Digital Unix (ie.. the 1.x - 3.x versions of the OS were called OSF/1) used BSD for the majority of it's user space programs. Why re-invent all that good stuff when BSD set the defacto standard everyone was following for TCP/IP programs? Yes, the kernel is still mostly a microkernel and is just DEC's own thing (but resembles the Carnagie Mellon Mach operating system kernel a bit). DEC, IBM, and HP partnered to create OSF/1 but DEC was the only one who didn't eventually walk away from the partnership.

BSD disklabels are essentially just a text-file representation of the partition (slices) layout of your disk. Each block device (disks, SAN LUNs, floppies, etc..) can be sub-divided into smaller portions this way. Both Tru64 and Digital Unix use disklabels. It's just that in Tru64 you'll be working with disks in /dev/disk rather than the old “rz” and “re” style disks which lived in /dev under Digital Unix (4.x). The partitions determine how much space you'll have for your cloned disk's file domains or file systems. You've got to form some cogent plan on how you plan to get the data off the old disk and onto the new disk, and slice sizes are going to matter because they either directly determine the UFS filesystem size or they determine how much space you can add to an AdvFS file domain. Here is an example of a BSD disklabel.

# disklabel -r dsk0
# /dev/rdisk/dsk0c:
type: EIDE
disk: FIREBALLP AS10.
label: 
flags: dynamic_geometry
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 16383
sectors/unit: 20066251
rpm: 4500
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0 

8 partitions:
#            size       offset    fstype  fsize  bsize   cpg  # ~Cyl values
  a:       786432            0     AdvFS                      #      0 - 780*
  b:       786432       786432      swap                      #    780*- 1560*
  c:     20066251            0    unused      0      0        #      0 - 19906*
  d:      6164462      1572864    unused      0      0        #   1560*- 7675*
  e:      6164462      7737326    unused      0      0        #   7675*- 13791*
  f:      6164463     13901788    unused      0      0        #  13791*- 19906*
  g:      4300800      1572864     AdvFS                      #   1560*- 5827*
  h:     14192587      5873664    unused      0      0        #   5827*- 19906*

At first, this output can look intimidating, but trust me, it's simple. Each slice with a letter represents a potential section of your disk with the exception of the c slice which always refers to the whole disk (thus it's name of “whole disk slice”). The sector offsets and sizes determine the start position of the slice as well as how much data it will contain. Slices which are not labeled as unused should not overlap with anything but the c slice. Those which are labeled unused are just ignored. I recommend using the bc command or a pocket calculator to do the arithmetic. It's just addition an subtraction, nothing fancy. One slice's starting offset should match the previous slices size + starting offset to avoid any overlap.

What you normally see is that slices a, b, g, and sometimes also h will normally be used. However, when you label disks for the first time all the slices will be marked as unused. It's important to review your current disklabel data with the command disklabel -r mydsk so that you know what you are dealing with.

In general, if you are cloning a UFS based system, then be very careful that your disklabel is going to give you enough space for the / and /usr file systems. If you are using AdvFS make sure that the total slices you set aside can be used to add up to the sizes you need (ie.. remember that AdvFS can do concatination, mirroring, and striping between disk/block devices). This is an effort you need to make before you start copying over files, because by then it could be too late to correct a size mismatch and you'll simply find out because the destination file system or file set will fill up before your copy/sync operation completes.

File Copy Steps

The UFS file system is BSD's native file system. It's very reliable and tough, but it also lacks features such as journaling, logging, and some other more esoteric stuff. It's maximums are also much lower than AdvFS. The upshot of UFS is that it's extremely reliable and stable, gives you reasonably high performance, and has a lot of tools in user space and by 3rd parties that do things like recovery. It's also free-as-in-beer insomuch that DEC/Compaq/HP don't charge you any extra $$$ for what you do with it, unlike AdvFS which costs money to do anything but host a basic OS installation.

UFS is easy to clone. You can use any file copy utility to move the data. However, you must insure that the tool you pick can, at a minimum, do a few things. AdvFS is also easy to clone, but I recommend using vdump for that exclusively as it's the easiest to remember and the fastest method I've seen speed-of-copy wise. For whatever reason, it's significantly faster than using tar or cpio, but don't try that method for UFS since vdump will refuse to dump it! Whatever tool you choose to use, make sure it has these capabilities.

  1. It must be able to exclude files/directories not on the same file system you are cloning, even if they are sub-directories of the file system you are cloning. Ie.. you can't have something trying to copy over /usr at the same time it gets / just because one is a sub-directory of the other doesn't change the fact that they are separate file systems. This means you can't just use either tar or cpio by itself to clone. They have to be combined with the find command which can limit itself to just one file system at a time with the -mount flag (also called -xdev on other platforms).
  2. It must be able to restore the filesystem to a different directory than it was backed up from. This is something that tar only works with if you remembered to make the archive with relative paths. It has a flag for it -C but it still fails if the input archive has absolute paths. The problem with cpio besides the fact that the people that wrote it must have been on drugs that helped them make sense of it's opaque options and horrible manpage is that it has the same problem with absolute file names that tar has.

Here are some actually valid ways to copy one file system at a time in isolation. I use the /usr file system as an example and my clone disk filesystem is on /new/usr. However, don't forget, you need to do this for all of the operating system UFS file systems or AdvFS file sets (whichever you have).

  1. CPIO in two easy steps: cd /usr ; find . -print -depth -mount | cpio -pdm /new/usr/ Whatever you do, don't forget the trailing slash and don't provide an absolute path to your find command or you will regret it as the files will end up in /usr/usr/ which isn't what you want. Also, make sure you use find like I show with a relative path. Using an absolute path for the find command will definitely ruin the effort.
  2. Don't try to use tar it has no combination of flags (at least on Tru64's tar) that won't capture other file systems in your archive. It'll work fine for terminal directories like /usr, but won't work worth a darn for the root file system. It'll create a huge mess in that situation as it'll capture ALL other file systems unless you go on some crusade to manually create an exclude file. It's too painful. If we had GNU Tar, fine, but don't use DEC's tar command for it since it's not modern enough to pull it off easily.
  3. If you managed to get the rsync binary (which does not come with any version of OSF/1, Digital Unix, or Tru64 but is available as freeware). You could use the form rsync -Pxvra /usr/ /new/usr/ and it'd work great.
  4. You can use the AdvFS vdump and vrestore commands. They will also work on UFS which is really nice since this method is by far the easiest and fastest. Example vdump -0f - /usr | vrestore -xf - -D /new/usr

Here is an example of what you might want to try, but surely won't work.

## This won't work!!!! You'll get all the file systems, not just root
## because they are all sub-directories of / !!! OSF/1's tar does have -C
## but it doesn't have a -xdev or -mount parameter like other Unixes. 
cd /
tar cvpf - . | tar -f - -xpP -C /new
## Whoops, you just filled up /new with files from /usr 
## before you got a chance to even create /usr!!

## This will also fail for the root file system. Tru64's tar
## has no duplicate detection and will archive some directories and 
## files multiple times, resulting in a huge archive. The restore
## operation should be close, though, if you have enough space to store
## the archive on (in this case) /backup
find / -mount -print >/tmp/file_list.txt
tar -cvp -R /tmp/file_list.txt -f /backup/root_tar.tar 
cd /new
tar -xvf /backup/root_tar.tar

There is a big problem with the fact that you just can't limit Tru64's version of tar to a single file system. That becomes a showstopper for archiving root or any file system with sub-directories which are mount points for other filesystems. It'll just cause you pain. Use one of the other methods instead.

You should not use dump and restore commands through a pipe for UFS. It won't work because the UFS version of restore has no flag to allow it to change directories before it starts the restore. It will always try to restore to the same path and that blows it up for any use as a cloning mechanism. Note that this doesn't apply to vrestore which absolutely does have that option and works great.

Post Copy Fixes

Once you've re-created your slices (partitions) and got your data copied over then you are ready to begin making the final tweaks that will make sure the system is bootable.

Fix the Fstab

First fix you'll need to make is to your /etc/fstab but make sure you edit the right one! It's easy to get confused. So, make sure you are editing the file on your destination file system and not the source! You will need to update this file with any type of changes you made such as the swap device (in 4.0 only), disk paths (for UFS only), or names of AdvFS file domains (if you changed them). If you are using a stock Tru64 5.1B system and AdvFS there is a pretty good chance that you won't need to make any changes as the names of the default file domains and file sets won't change (Those are root_domain#root and usr_domain#usr). For UFS systems there is 100% chance you need to edit the /etc/fstab. It's going to point to a new disk (the one you put the new disklabel on and copied your data over to).

So, the bottom line is that you /might/ not have to alter the /etc/fstab if you run AdvFS because it abstracts the name of the disk. The system startup scripts refer to root_domain and usr_domain so do not rename them.

Fix the rc.config

The /etc/rc.config file is the main configuration file for Tru64 and Digital Unix systems. This file may contain a reference to swap which may tie the system back to the old disk. This needs to be altered or removed. You should edit the file, but be aware of something else. You don't want to edit the rc.config file if it's the one in use on your booted system. For a running system, you need to use a tool called rcmgr to make changes. However, because the cloning process generally has an opportunity to edit the cloned files before they are in use, you don't have to worry about this fact. You can simply make edits to the file and when it's used by the system at the time when you try to boot the clone, your edits will all be baked in.

The main thing you are looking for is any reference to swap on the old disk. It will occur in some kind of variable name and you can simply remove the whole line, or edit the line to point to your new disk's swap slice. The name of the variable will be something like “SWAPDEVICE=/dev/rz0b”.

Both Tru64 and Digital Unix (but especially Tru64) have a hardware registry which will store the names of disk devices that are seen by the system. In most cases, once a disk is seen, it's name will not change even on the cloned disk (the registry will be copied over at the same time during the file copy steps).

Fix the Sysconfigtab

Another file you might have to alter is your /etc/sysconfigtab. This isn't always needed. I believe it's a difference between Tru64 and Digital Unix. There are some versions of startup scripts which will refer to the file again, for a swap device. It would be present in the section called vm:. If you see a swap device listed in that section, alter it to point to the new disk or remove it.

Final Steps

Insure that you have completed these steps.

  1. Install the boot loader using disk label
  2. Edit the disklabel on your target disk
  3. Re-create UFS or AdvFS file systems
  4. Copy files over from the original
  5. Fix the /etc/rc.config, /etc/sysconfigtab, and of course, the /etc/fstab

You should have done all these steps before you attempt the new disk.

Final Boot

Now the system is ready to reboot. You probably want to understand a bit of interaction with what we call the SRM console. The main thing you want to do is to check the values of the following.

  1. show dev This will show you all the devices (NICs, HBAs, and of course disks). You need to know which disk is your target versus destination disk. The device list should have clues like the manufacturer name and the device model.
  2. The boot command takes the disk name as an argument. For example: “boot dka0” or “boot dqa0” would boot each of those disks respectively. Also, if you'd like to try single user mode you'll want to use the “-fl s” argument to boot into single user mode (if you do then remember to use bcheckrc command to make single user mode usable).
  3. The show command is the compliment of the set command. These allow you to view and alter the names of SRM variables which alter boot and system behavior.
  4. Understand the variables that matter most like BOOTDEF_DEV which points to the default boot device on the system. Another you might want to understand is AUTO_ACTION which governs if the system will automatically try to boot up the system or halts at the SRM chevron prompt. The action names are boot or halt.

So, what do you normally need to do? Try to boot the clone but don't yet change the default boot device until you are ready to completely switch over to the clone.

Troubleshooting

Cloning was something that DEC intended folks to use sysman for. Unfortunately, their process is too inflexible for most use. So, this more manual method is needed. It is, unfortunately a fault prone process. Here are some of the normal issues.

The Drive will not Boot

If you issue the boot command from the SRM console but you never see the kernel line saying “UNIX Boot” then you probably had an issue with the boot sector. Do the following.

  1. Re-mount the target disk and make double sure that you have the kernel on the root file system. These would be in the form of two files named vmunix and genvmunix. Without a kernel, you can't boot the system. They should be there as a result of your file copy effort.
  2. Unfortunately, the most likely cause is that you didn't do the disklabel steps in the proper order. Zero the disklabel with the -z and start over. Do it in the proper order and you'll have better luck.

It Hangs During Boot

Depends on why and where it hangs. The most common issues are these.

  1. You forgot to edit out some kind of reference to the swap device. Check the post-copy steps again. One of the startup scripts probably tried to activate swap on a device that won't. Remember that you can use single user mode to fix these issues without doing a full re-install. References to swap could be in in the /etc/fstab (usually on 4.x or older systems) and could also be inside of your /etc/sysconfigtab which is usually the case on 5.x systems.
  2. You are using UFS and you forgot to fix the reference to the /etc/fstab for one of the file systems. You might also have to edit any reference for swap, especially on Digital Unix 4.x. Also pay attention for any other filesystems that might have changed or gone away.
  3. Make sure your copy method preserved all the permissions, especially on /sbin and the scripts in /sbin/init.d which are critical. Those scripts should be executable and owned by the root or bin users.
  4. Do NOT try to eliminate one of the default AdvFS file domains (one for root and another for /usr). As mentioned earlier, the startup scripts reference both root_domain and usr_domain and if you change their names or eliminate one of them the startup scripts will fail.
  5. Make sure your SRM variables for boot_file and or boot_flags may be incorrect and have old VMS data in there or some other garbage. Your boot file should be your kernel, which is usually /vmunix or /genvmunix. Your boot flags should be A or S but not a number, if it's a number it came from VMS and it's wrong. People who re-use VMS systems for Tru64 will run into these problems often.

It Boots but It's Horked Up

  1. Double check your swap is pointing to the right place and working (swapon -s)
  2. Make sure your filesystems are not showing up with weird or generic names. Double check your source and destination device and make sure that your old device name isn't still leftover in a config file somewhere. Most commonly it's the /etc/fstab or bad disk-symlinks in /etc/fdmns. Also it's also worth perusing /etc/sysconfigtab on 5.x for mistakes or old device names.
  3. Make sure if you use a new system type that any kernel tuning you do makes sense. Ie.. if you take parameters from a system that has 4GB of RAM and try to use them on a big GS1280 with 64GB of RAM then you are almost certainly going to have some bad tuning in there. Double check your sysctl settings with sysctl -a.

If you have problems beyond the ones documented, then consider contacting PARSEC for some consulting work to help you!

how_to_clone_tru64_and_digital_unix.txt · Last modified: 2019/06/25 06:40 by sgriggs