Table of Contents
Cloning Digital Unix and Tru64
There are various places on The Net you can find how to clone a Tru64 system, but most of them are just discussions and hand waving. The complete process is documented here.
Process Overview
Here is a quick and dirty view of this overall process.
- Find out what filesystems, disklabels, and disks you are using for the source of your clone
- Find out the name of the target disk you are cloning to and make sure it's not in use for AdvFS, UFS, swap, or raw disk partitions for a database like Oracle or Sybase.
- Determine if your disks are exactly the same. If so, you can use the dd method, which is faster since it allows you to skip boot blocks, disklabel editing, and copying of data.
- Make the target disk bootable by doing proper commands in the right sequence (disklabel)
- Edit the disklabel if needed to fix any mismatches in capacity between source and destination
- Put a file system (UFS) or file domain and file set (AdvFS) on your target disk (or more than one which is usually needed).
- Copy the data over from the source disk, isolating only one file system at a time (cpio, vdump, pax, rsync, etc..)
- Fix the disk names in config files in /etc
- Fix the paths and links in /etc/fdmns if you are using AdvFS
- Reboot and test
Here are a few things you should not plan on doing as they will ruin your attempt.
- Do not plan on editing the disklabel with flags other than
-e
after you add the bootblocks. - Do not try to to skip the disklabel step (you won't get bootblocks).
- Do not try to eliminate the /usr file system for UFS and AdvFS. Startup scripts in /sbin/init.d will force you to have it and freak out (stopping the boot process) if you don't. Don't try to consolidate those unless you plan on manually editing every single startup script and fixing them, which isn't worth it.
- You might have read about sysman clone features. Forget that. That tool is garbage and will cause you much more pain than it saves you. Plan on doing it manually yourself or have an expert do it for you using this process.
- Do not plan on using the dd method unless you have completely identical disks. That has several major drawbacks otherwise. The worst include performance problems (due to bad geometry alignment on the disklabel) and loss of space if the target disk is larger (or loss of data if the target disk is smaller). However, just to say we shared the actual command if you cloned an identical dsk1 onto a dsk2 in tru64 5.x it'd be this command: dd if=/dev/disk/dsk1c of=/dev/disk/dsk2c bs=1024k Additionally, remember that the dd method only works if you put the clone back on the same SCSI target as the source was on before it failed (PITA).
Take Inventory
Cloning can be complicated because of the different filesystems and storage layouts on your system. Before you begin, you need to know what kind of storage you are using and how it's laid out. We need to answer the following questions:
- What type of disks do we use (SCSI, IDE, RAID, SAN, etc..)?
- How are the disks partitioned?
- Is LSM in play? What about AdvFS?
- What type of filesystem is on the disks?
How can you find out these things? Well, I'd use the following command sequence to tell me.
# for Digital Unix and OSF/1 (version 1.0 through 4.0g) scu show edt # for Tru64 5.0 and 5.1 hwmgr show scsi # Find out the names of all AdvFS file domains and what disks are in them ls -lR /etc/fdmns # Find out what is currently mounted *now* and if it's LSM, AdvFS, or UFS mount # Find out what the system is setup with for the root partition and more cat /etc/fstab # Show the disklabel of the root disk to see how it was setup where # "MyDisk" is a disk device like 'dsk0' (tru64) or 'rz0' (digital Unix) disklabel -r MyDISK # If we are using LSM on our root disk, I panic and run away. Don't # try to clone this setup, recreate it and restore backups to it. LSM is # really VxVM in disguise (no, really) and it has a completely different # way of cloning I might cover in a whole separate document. volprint -ht # If that last command gives you "lsm:volprint: ERROR: IPC failure: # Configuration daemon is not accessible" then GOOD. That means # you aren't using LSM.
About LSM
LSM is a volume management scheme which is pretty much identical to Veritas Volume Manager (VxVM). This is because, at the time, DEC was able to secure a one-shot licensed copy of VxVM but they agreed to change it's name. So, if you know VxVM, then all you do is replace the string “vx” with the string “vol” on all the commands and they will work fine. For example, instead of using vxprint you use volprint and so forth. Don't get me wrong, VxVM is a great volume manager, it's just that when they glued it onto Tru64, they really really made root disk encapsulation (putting your boot drives into VxVM and getting RAID-1 working so you can boot off both sides of the mirror) is a HUGE pain in the neck to fix or maintain. It's pretty easy to install with it, but you end up with a giant white elephant. Most folks are just as likely to delete their surviving mirror as they are to resilver and fix their systems when using LSM.
Systems using LSM can be cloned, but you need to use LSM (VxVM) semantics and methods. In a nutshell, you create and break a three-way mirror. This isn't covered by this document (yet). It's a fairly long rat-hole most folks don't have to worry about.
The Boot Sector
In Tru64 and Digital Unix you need to make sure the disk has the proper boot blocks at the front of the disk. This is done using disklabel and it's not super-intuitive. You absolutely must use both the -rw and the -t flags when installing boot blocks. Without both sets of flags the procedure will fail. Also, the behavior of the tool is a bit odd sometimes and the boot blocks don't get properly installed or get clobbered later on. So, when starting a clone, I'd suggest zeroing out the disklabel and re-installing it from scratch using the exact-right syntax, then doing only edits (using -e to disklabel) after that point. Here is a couple of examples. I'll use a disk name of rz0 in this case, but you should alter that to fit your system. Also, don't do all the steps, but just the ones that correspond with your file system type. Boot blocks have to be customized for the file system that you are using.
# Remove and zero the existing disklabel disklabel -z rz0 # install the boot blocks for an AdvFS root file system - ONLY FOR ADVFS disklabel -rw -t advfs rz0 # install the boot blocks for UFS on root - ONLY FOR UFS disklabel -rw -t ufs rz0 # Setup your disklabel so that the partitions don't overlap # and set the slice-type to UFS, AdvFS, and for swap disklabel -e rz0
REMEMBER: You need to use UFS or AdvFS boot sectors, but not both. They are mutually exclusive.
You will find the boot sectors themselves under the /mdec directory. There are ones for UFS, AdvFS, and CDFS (iso9660 with rock ridge extensions). This is where disklabel goes to find them and thus you can see that none of them are mixed case or have any unexpected spelling. It's a bit interesting.
# ls /mdec bootblks bootra.advfs bootre bootre.cdfs bootrz.advfs bootxx bootxx.cdfs raboot.advfs reboot reboot.cdfs rzboot.advfs xxboot xxboot.cdfs bootra bootra.cdfs bootre.advfs bootrz bootrz.cdfs bootxx.advfs raboot raboot.cdfs reboot.advfs rzboot rzboot.cdfs xxboot.advfs
Editing the Disklabel
Tru64 and Digital Unix have a strong connection to BSD Unix. This is because OSF/1 which was the predecessor to Digital Unix (ie.. the 1.x - 3.x versions of the OS were called OSF/1) used BSD for the majority of it's user space programs. Why re-invent all that good stuff when BSD set the defacto standard everyone was following for TCP/IP programs? Yes, the kernel is still mostly a microkernel and is just DEC's own thing (but resembles the Carnagie Mellon Mach operating system kernel a bit). DEC, IBM, and HP partnered to create OSF/1 but DEC was the only one who didn't eventually walk away from the partnership.
BSD disklabels are essentially just a text-file representation of the partition (slices) layout of your disk. Each block device (disks, SAN LUNs, floppies, etc..) can be sub-divided into smaller portions this way. Both Tru64 and Digital Unix use disklabels. It's just that in Tru64 you'll be working with disks in /dev/disk rather than the old “rz” and “re” style disks which lived in /dev under Digital Unix (4.x). The partitions determine how much space you'll have for your cloned disk's file domains or file systems. You've got to form some cogent plan on how you plan to get the data off the old disk and onto the new disk, and slice sizes are going to matter because they either directly determine the UFS filesystem size or they determine how much space you can add to an AdvFS file domain. Here is an example of a BSD disklabel.
# disklabel -r dsk0 # /dev/rdisk/dsk0c: type: EIDE disk: FIREBALLP AS10. label: flags: dynamic_geometry bytes/sector: 512 sectors/track: 63 tracks/cylinder: 16 sectors/cylinder: 1008 cylinders: 16383 sectors/unit: 20066251 rpm: 4500 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype fsize bsize cpg # ~Cyl values a: 786432 0 AdvFS # 0 - 780* b: 786432 786432 swap # 780*- 1560* c: 20066251 0 unused 0 0 # 0 - 19906* d: 6164462 1572864 unused 0 0 # 1560*- 7675* e: 6164462 7737326 unused 0 0 # 7675*- 13791* f: 6164463 13901788 unused 0 0 # 13791*- 19906* g: 4300800 1572864 AdvFS # 1560*- 5827* h: 14192587 5873664 unused 0 0 # 5827*- 19906*
At first, this output can look intimidating, but trust me, it's simple. Each slice with a letter represents a potential section of your disk with the exception of the c
slice which always refers to the whole disk (thus it's name of “whole disk slice”). The sector offsets and sizes determine the start position of the slice as well as how much data it will contain. Slices which are not labeled as unused
should not overlap with anything but the c
slice. Those which are labeled unused
are just ignored. I recommend using the bc command or a pocket calculator to do the arithmetic. It's just addition an subtraction, nothing fancy. One slice's starting offset should match the previous slices size + starting offset to avoid any overlap.
What you normally see is that slices a
, b
, g
, and sometimes also h
will normally be used. However, when you label disks for the first time all the slices will be marked as unused
. It's important to review your current root disklabel data with the command disklabel -r original_disk so that you know what you are dealing with.
In general, if you are cloning a UFS based system, then be very careful that your disklabel is going to give you enough space for the / and /usr file systems. If you are using AdvFS make sure that the total slices you set aside can be used to add up to the sizes you need (ie.. remember that AdvFS can do concatination, mirroring, and striping between disk/block devices). This is an effort you need to make before you start copying over files, because by then it could be too late to correct a size mismatch and you'll simply find out because the destination file system or file set will fill up before your copy/sync operation completes.
Creating File Systems
Once you have your disk label setup on the clone, what you have is a essentially a partition table with no file systems. It's now up to us to actually create file systems inside of those extents we just defined. Obviously, this is a different process depending on if you use UFS or AdvFS.
Re-Creating an AdvFS Layout
The AdvFS system is a combination of a disk volume manager with a journaling filesystem. Copy-on-write features are sprinkled in there as well (snapshots, cloning, and such). First you need to understand AdvFS and I don't really want to write a novel on it. See the man page for AdvFS for a more fullsome discussion.
AdvFS has two main overarching concepts. Domains and Filesets. AdvFS domains are abstractions you don't use directly. They are containers for the disks and the configuration. The Filesets are the actual file systems you really mount on a directory. The proper nomenclature for AdvFS is “FileDomain#FileSet” or as an example “DatabaseDomain#AccountingFileset”. This nomenclature is used rather than a path to a disk DSF file in /dev when you are mounting AdvFS filesets or referring to them in configuration files such as /etc/fstab.
If you are cloning an AdvFS system you have two problems to solve. The first getting the AdvFS configuration planned and created. The second is mounting it. Both old and new disks need to be mounted simultaneously to allow a copy, but we must avoid namespace collisions in our AdvFS file domains (especially the reserved domains “root_domain” and “usr_domain” which must exist and, like Highlander, “there can be only one”). In many cases, you may even need to create a whole set of scripts to mount and update the cloned disk periodically so that it's contents are not too stale and out of date to be any use. So, in that case there is an additional reason to want to have the clone-disk setup with it's own AdvFS domains & filesets on the source disk side all the time (we have to frequently update it).
Let's assume we are cloning disk dsk0 onto disk dsk1. All we have done so far is to simply clone the disklabel from the source disk and make dsk1 match. Let's create a local (source disk environment side) configuration for our clone's AdvFS filesets.
# Use slice dsk1a to create the copy of the root domain mkfdmn /dev/disk/dsk1a target-root_domain # We need a target for our usr_domain copy mkfdmn /dev/disk/dsk1g target-usr_domain # Now we must make filesets in the domains. Note that they have no size # that is because they, by default, can use all the space in the domain. mkfset target-root_domain root mkfset target-usr_domain usr mkfset target-usr_domain var # We need a place to mount these file sets. Let's create three new directories # and use them as target mount-points for our file copy jobs. I abbreviate # in order to make it easier to navigate, but "t" is for "target" and remember # this will be the root (/) of your cloned target disk when it boots for real. mkdir /t mount target-root_domain#root /t mkdir /t/usr mkdir /t/var mount target-usr_domain#usr /t/usr mount target-usr_domain#var /t/var
Note that in this case, the filesets for the /var and /usr file systems (filesets) are both using the same AdvFS file domain called “target-usr_domain”. This is totally fine and tells us that the source disk only had two real data slices on there (being “a” and “g” in this case). BTW, don't try to get clever like I did once and collapse all the filesystems down to just root. Tru64 hates that and will panic. It wants separate filesets for root, usr, and var if using AdvFS. I've never tried it for UFS, though.
So, if you'd be using the same layout as above (source of dsk0 target of dsk1 with two AdvFS data slices “a” and “g”) then you'd now be ready to get those filesets mounted so you can start copying over your data. We use the “target-” prefix for the clone so that we can avoid a namespace collision with the source operating system disk we are cloning. However, this solution also creates a problem. If we reboot using the clone, there will be stale information in the AdvFS configuration. It will actually try to go boot off the original target disk unless we fix our configuration in the /etc/fdmns directory and clean up our cloned /etc/fstab by commenting-out any potentially problematic entries.
Let me first elaborate on the issues you'll need to correct in /etc/fdmns. This directory is a very simple construct. In this directory you'll find a sub-directory for each AdvFS file Domain you have created. Within those sub-directories there should ONLY be soft symbolic links to specific disk DSF device slices (for example: /dev/disk/dsk99a) owned by that AdvFS domain. This way each domain knows what disk slices it “owns”.
So, do a thought experiment…. your are rebooting from the clone, it comes up and checks /etc/fdmns/root_domain and if it sees a link to your old drive things are going to fail but if it's a link to your new drive slice, it'll succeed. The trouble is, if you simply copy the /etc directory over, you'll get an exact copy with not-gonna-work configuration entries baked into your clone. We need to fix these before D-Day comes and we have to actually use/rely on the clone! Fortunately, doing so is quite easy. All we need to do is mount the clone's root fileset after we copy over the data and go manually fix things in /etc/fdmns with simple directory and link management commands like rmdir and ln. I will show you examples after I show you ways to actually copy the data over.
UFS Filesystem Setup
For UFS users the process is much more simple. You simply need to run the newfs command on each UFS partition. It's no more complicated than that. Here is an example, in this case we have three filesystems, one on dsk0a, and then two more: dsk0g, and dsk0h. This is a common configuration I find on many Tru64 5.1 systems.
# newfs /dev/rdisk/dsk0a # newfs /dev/rdisk/dsk0g # newfs /dev/rdisk/dsk0h
That puts down all the superblocks and metadata information the filesystem uses internally to operate. It doesn't mean you've mounted those filesystems anywhere for use, yet. It also doesn't mean you have proper entries in the clone's /etc/fstab reflecting this configuration. So, keep in mind that we still have to do that and I'll show you how later on in this guide.
File Copying For The Impatient
This assumes you have the filesystems mounted up (/target-root, /target-usr, and /target-var).
# Copy over the root filesystem vdump -0 -f - / | (cd /target-root; vrestore -x -f -) # Copy over the /usr filesystem vdump -0 -f - /usr | (cd /target-usr; vrestore -x -f -) # Copy over the /var filesystem vdump -0 -f - /var | (cd /target-var; vrestore -x -f -)
File Copy Steps in Detail with Troubleshooting
The UFS file system is BSD's native file system. It's very reliable and tough, but it also lacks features such as journaling, logging, and some other more esoteric stuff. It's maximums are also much lower than AdvFS. The upshot of UFS is that it's extremely stable, gives you reasonably high performance, and has a lot of tools in user space and by 3rd parties that do things like recovery. It's also free-as-in-beer insomuch that DEC/Compaq/HP don't charge you any extra $$$ for what you do with it, unlike AdvFS which costs money to do anything but host a basic OS installation, but AdvFS has advanced features like snapshots (copy-on-write), cloning, resiliency, striping, and others. If you want to see the full power of AdvFS on a system with no license limits, try it out on Linux, first. It was ported there many years ago. AdvFS may now be just another has-been 1990's Unix project, but it was the real thing when it comes to providing some of the most interesting and useful storage features still in demand today.
UFS is easy to clone. You can use any file copy utility to move the data. However, you must insure that the tool you pick can, at a minimum, do a few things. AdvFS is also easy to clone, but I recommend using vdump for that exclusively as it's the easiest to remember and the fastest method I've seen speed-of-copy wise. For whatever reason, it's significantly faster than using tar or cpio, but don't try that method for UFS since vdump will refuse to dump it! Whatever tool you choose to use, make sure it has these capabilities.
- It must be able to exclude files/directories not on the same file system you are cloning, even if they are sub-directories of the file system you are cloning. Ie.. you can't have something trying to copy over
/usr
at the same time it gets/
just because one is a sub-directory of the other doesn't change the fact that they are separate file systems. This means you can't just use either tar or cpio by itself to clone. They have to be combined with the find command which can limit itself to just one file system at a time with the-mount
flag (also called-xdev
on other platforms). - It must be able to restore the filesystem to a different directory than it was backed up from. This is something that tar only works with if you remembered to make the archive with relative paths. It has a flag for it
-C
but it still fails if the input archive has absolute paths. The problem with cpio besides the fact that the people that wrote it must have been on drugs that helped them make sense of it's opaque options and horrible manpage is that it has the same problem with absolute file names that tar has.
Here are some actually valid ways to copy one file system at a time in isolation. I use the /usr
file system as an example and my clone disk filesystem is on /new/usr
. However, don't forget, you need to do this for all of the operating system UFS file systems or AdvFS file sets (whichever you have).
- You can use the AdvFS vdump and vrestore commands. They will also work on UFS which is really nice since this method is by far the easiest and fastest. Example
vdump -0f - /usr | vrestore -xf - -D /new/usr
or another syntax but very similar isvdump -0 -f - / | (cd /target-root; vrestore -x -f -)
Note that it won't cross filesystem boundaries (whew!) which makes our job much easier. - CPIO in two easy steps:
cd /usr ; find . -print -depth -mount | cpio -pdm /new/usr/
Whatever you do, don't forget the trailing slash and don't provide an absolute path to your find command or you will regret it as the files will end up in /usr/usr/ which isn't what you want. Also, make sure you use find like I show with a relative path. Using an absolute path for the find command will definitely ruin the effort. I don't like or trust cpio and never have. YMMV. - Don't try to use tar it has no combination of flags (at least on Tru64's tar) that won't capture other file systems in your archive. It'll work fine for terminal directories like
/usr
, but won't work worth a darn for the root file system. It'll create a huge mess in that situation as it'll capture ALL other file systems unless you go on some crusade to manually create an exclude file. It's too painful. If we had GNU Tar, fine, but don't use DEC's tar command for it since it's not modern enough to pull it off easily. - If you managed to get the rsync binary (which does not come with any version of OSF/1, Digital Unix, or Tru64 but is available as freeware). You could use the form
rsync -Pxvra /usr/ /new/usr/
and it'd work great.
Here is an example of what you might want to try, but surely won't work.
## This won't work!!!! You'll get all the file systems, not just root ## because they are all sub-directories of / !!! OSF/1's tar does have -C ## but it doesn't have a -xdev or -mount parameter like other Unixes. cd / tar cvpf - . | tar -f - -xpP -C /new ## Whoops, you just filled up /new with files from /usr ## before you got a chance to even create /usr!! ## This will also fail for the root file system. Tru64's tar ## has no duplicate detection and will archive some directories and ## files multiple times, resulting in a huge archive. The restore ## operation should be close, though, if you have enough space to store ## the archive on (in this case) /backup find / -mount -print >/tmp/file_list.txt tar -cvp -R /tmp/file_list.txt -f /backup/root_tar.tar cd /new tar -xvf /backup/root_tar.tar
There is a big problem with the fact that you just can't limit Tru64's version of tar to a single file system. That becomes a showstopper for archiving root or any file system with sub-directories which are mount points for other filesystems. It'll just cause you pain. Use one of the other methods instead.
It's complicated to use regular UFS dump and restore commands through a pipe and I don't recommend it. It's difficult because the UFS version of restore has no flag to allow it to change directories before it starts the restore. It will always try to restore to the same path and that blows it up for any use as a cloning mechanism unless you use a shell-based parenthesis-operation to fix it which most people find confusing and indirect. Note that this doesn't apply to vrestore which absolutely does have an explicit option and works great.
Post Copy Fixes
Once you've re-created your slices (partitions) and got your data copied over then you are ready to begin making the final tweaks that will make sure the system is bootable.
Dynamic Disk Reconfiguration in Tru64 5.x Horks AdvFS
There are a lot of situations in which your clone might fail to boot or might boot up crippled in single user mode after cloning a Tru64 system. For storage and other items, Tru64 has a dynamic hardware database it tries to keep updated on it's own. Rather than assign names based on absolute hardware locations, like in Digital Unix 4.x before, it does something completely new. I'm not sure why thought it was a good idea or why, but it turns out to be pretty sub-optimal in practice. One of many negative side effects of adopting this system was that it causes problems for anyone wanting to clone a root disk. You're in a chicken and egg scenario. How do you know what links to create for “root_domain” and “usr_domain” in /etc/fdmns so that the proper disk slices will be symlinked there when the system boots up? You might know the slice letters such as “a”, “h”, or “g” but you don't know what disk Tru64 will assign to your cloned root disk once the auto-configuration scripts kick off. What you thought might be dsk1 could turn out to be dsk3, etc…
There are two strategies for addressing this issue:
- If you clone the OS image from the workstation you use to do the vrestore operations, you can safely assume the disk name will stay the same. Why? Because you are copying the same hardware database that the system has booted up with. Therefore, you're going to get the same disk name as when you had when you created the file domain, mounted the target file sets, and performed the restore. So, in this case, you should have a clean booting clone. This is especially true if you are booting the disk on the same physical system (ie.. you're clone was for a backup root disk should your primary fail).
- In other cases, you are restoring vdumps from a non-local system. In this case, you're getting the hardware database that came from those vdumps (the database files are in /etc). In this situation, you have literally no idea what the disk number will be. The best strategy is to boot the system into single user mode, then observe whatever disk name the root disk came up with. Then, if possible, run bcheckrc from your single-user session, and fix your links in /etc/fdmns to point to the new disk name. If that fails then do the same process after booting optical media or a different hard disk drive. Once you know the disk name the system is going to pick, it's just a matter of fixing /etc/fdmns sub-directories and updating their symlinks.
Digital Unix users can ignore all that and just stick to bus, SCSI ID, and LUN identifiers with the confidence of knowing they will stay deterministic.
Fix AdvFS Domains in /etc/fdmns
Once you finish copying data for the root fileset in AdvFS you'll have the exact same layout as the running system. The trouble is, that's not what we need on the clone. It needs to have it's own disk name reflected in the symbolic links present in /etc/fdmns. In Tru64 5.1 there are three required file systems. These are the root file system (/), /usr, and /var filesystems. They must be present in either two or three AdvFS file domains with specific names. If not, the boot scripts will halt/fail, so do not try to “simplify” down to just having one root filesystem or something like that. The system will not cooperate.
At a minimum, you need a domain called “root_domain” and “usr_domain” the reason is that these are hardcoded into the kernel and some of the system startup scripts. We now need to have the cloned disk slices in those required domains, rather than the original.
We have a problem because the system we use to create and copy the cloned data will have already given the disk a name. However, once you reboot using the prepared-target disk it will be using it's hardware database from you just restored. So, where did you restore it from? That is the critical question. If you cloned the machine you used for the copy & restore operation, you can be sure the disk name will stay the same. For example, if it was 'dsk2' when you copied the data to it, it'll be dsk2 when you boot it up. Why? Because you copied the hardware database in /etc after the disk had already been named. If the disk is further cloned, say by a RAID-1 hardware mirror being copied, it'll also end up being renamed again by Tru64. This sucks because it'll cause problems on your first boot of your cloned system.
So, consider what might happen if you restored a vdump from another system, instead of your cloning workstation. You really don't know what the disk name will be, do you? Same is true of a RAID-1 based clone. So, here's what's going to happen. You'll need to boot up your system, observe what disk name the system believes it's OS disks are called, then (if it's different from the link structure in the target /etc/fdmns directory) you'll need to go in and correct the disk symlinks in /etc/fdmns. Odds are, the first boot will drop you into single-user mode anyway and complain it cannot mount /usr.
Thus, before this clone ever boots, we need to fix this issue. Since I don't know your exact scenario, I can't give a one-size-fits-all solution. What you need is for the system to boot once, then see what the dynamic hardware configuration will name the “new” disk it sees. Then, you need to update the links in /etc/fdmns to reflect the new name. This is a bit of a chicken and egg scenario. However, what generally works is to boot the system up in single user mode, then try to run bcheckrc and finally, check the output of hwmgr show scsi to find the new disk name, read the disklabel -r dskX output, and fix the links in /etc/fdmns.
If you are building a cloned disk for the same system you are doing the cloning on, you can be pretty sure the disk name will remain constant with whatever you booted with, now.
I do not know how Tru64 tracks disk devices. It might be by WWN or some other identifier. However, it takes very little to “invalidate” the hardware database entry for a disk and thus make it appear “new”. In this instance Tru64 gives the disk a different name. What might have originally been dsk0 may now become dsk3, for example. This is a downside of the way Tru64 5.x was designed and frankly, I'd rather work with disks in Digital Unix anyday just for it's simple determinism.
I can give you detail on the process, of fixing this, at least. Start with a session like this one in my example.
# Change into the AdvFS configuration directory cd /etc/fdmns # See what you have right now. See which disks are linked in each sub-directory? # those sub-directories have the same names as all your AdvFS Domains $ ls -lR total 32 drwxr-xr-x 2 root system 8192 Dec 10 2019 root_domain drwxr-xr-x 2 root system 8192 Dec 10 2019 usr_domain drwxr-xr-x 2 root system 8192 Dec 11 2020 target-root_domain drwxr-xr-x 2 root system 8192 Dec 11 2020 target-usr_domain ./root_domain: total 0 lrwxr-xr-x 1 root system 15 Dec 10 2019 dsk0a -> /dev/disk/dsk0a ./target-root_domain: total 0 lrwxr-xr-x 1 root system 15 Dec 11 2020 dsk0a -> /dev/disk/dsk1a ./usr_domain: total 0 lrwxr-xr-x 1 root system 15 Dec 10 2019 dsk0g -> /dev/disk/dsk0g ./target-usr_domain: total 0 lrwxr-xr-x 1 root system 15 Dec 11 2020 dsk0g -> /dev/disk/dsk0g # You can see from that structure that the system would try to use disk dsk0 # which is our original. The clone is dsk1. So, your easiest option is to manually # fix these links before the clone ever tries to boot of this disk. # Out go the references to the never-to-be-seen-again dsk0 # we need to link to the new target disk, instead. rm /etc/fdmns/root_domain/dsk0a rm /etc/fdmns/usr_domain/dsk0g # In go our cloned slices which are now promoted to production/running # names "root_domain" and "usr_domain" instead of what ever you called # them temporarily while you were copying the data over. ln -s /dev/disk/dsk1a /etc/fdmns/root_domain ln -s /dev/disk/dsk1g /etc/fdmns/usr_domain
In that situation, we just adjusted the disk names from using dsk0 to dsk1. We might have done this because we know the clone disk will be going back into the same system and the Tru64 disk name will stay static. We might also have done this after booting into single user mode, and discovering we had to fix the links because Tru64 decided it wanted a new name for the root disk (it happens for many reasons). In either case, the fix is the same:
- Start by learning what disks you have and are “real” in the hardware database, run hwmgr show scsi
- Read the disklabel of each disk to understand what's on it. disklabel -r dskX
- Update the links for AdvFS in /etc/fdmns. Keep in mind the updated disk will need to replace the old disk name in all the symbolic soft links in any AdvFS file domain directories in /etc/fdmns.
- Check for usability with showfsets mydomain. If that works, your domain is usable.
- Clean up any old/stale file domains that no longer exist. Simply remove the directories in /etc/fdmns.
- Remember to never remove root_domain or usr_domain. They can never be combined.
What about the fact that /etc/fdmns is also cluttered with references to a clone that would actually be the root disk if it was booted-on. The “clone” then gets promoted to being the OS disk and there is no clone. So, our clone's /etc/fdmns should have no reference to “target” or cloned AdvFS domains. You can simply remove those directories now, if you wish. If you don't, then worry-not: they won't cause any harm. You can clean them up after your first successful boot if you are paranoid.
# Be careful with this command since it's recursive. Double check your syntax. # but afterewards you will have no stale reference to the no-longer-clone rm -rf /etc/fdmns/target-root_domain /etc/fdmns/target-usr_domain
Fix the Fstab
First fix you'll need to make is to your /etc/fstab but make sure you edit the right one! It's easy to get confused. So, make sure you are editing the file on your destination file system and not the source! You will need to update this file with any type of changes you made such as the swap device (in 4.0 only), disk paths (for UFS only), or names of AdvFS file domains (if you changed them). If you are using a stock Tru64 5.1B system and AdvFS there is a pretty good chance that you won't need to make any changes as the names of the default file domains and file sets won't change (Those are root_domain#root
and usr_domain#usr
). For UFS systems there is 100% chance you need to edit the /etc/fstab. It's going to point to a new disk (the one you put the new disklabel on and copied your data over to).
So, the bottom line is that you /might/ not have to alter the /etc/fstab if you run AdvFS because it abstracts the name of the disk. The system startup scripts refer to root_domain and usr_domain so do not rename them. However, when you mounted up your clone-disk's filesystems or filesets you might have made some changes to your source system's /etc/fstab that now got copied over to the clone. In many cases, we need to NOT have any reference to the original disk (duh! It's broken in this context). So, let's not have any references to any clone file-sets or anything but a nice clean root, /usr, and /var setup. Remember that Advfs doesn't have any disk names in it's mounting device nomenclature. Instead, you'd have had to fix the symbolic links in /etc/fdmns to make sure the slices will be what your cloned disk expects to boot successfully.
# Here is a working, normal, ordinary, average, Tru64 /etc/fstab for AdvFS root_domain#root / advfs rw 0 1 /proc /proc procfs rw 0 0 usr_domain#usr /usr advfs rw 0 2 usr_domain#var /var advfs rw 0 2
What about UFS? Here is an example of a valid fstab for UFS
# Here is a Tru64 5.1 /etc/fstab from a UFS based system # note the sysadmin took advantage of a 2nd disk for his user's # home directories. This would be ignored by our cloning process /dev/disk/dsk2a / ufs rw 1 1 /dev/disk/dsk2g /usr ufs rw 1 2 /dev/disk/dsk2h /var ufs rw 1 2 /dev/disk/dsk3c /usr/users ufs rw 1 2
Fix the rc.config
The /etc/rc.config file is the main configuration file for Tru64 and Digital Unix systems. This file may contain a reference to swap which may tie the system back to the old disk. This needs to be altered or removed. You should edit the file, but be aware of something else. You don't want to edit the rc.config file if it's the one in use on your booted system. For a running system, you need to use a tool called rcmgr to make changes. However, because the cloning process generally has an opportunity to edit the cloned files before they are in use, you don't have to worry about this fact. You can simply make edits to the file and when it's used by the system at the time when you try to boot the clone, your edits will all be baked in.
The main thing you are looking for is any reference to swap on the old disk. It will occur in some kind of variable name and you can simply remove the whole line, or edit the line to point to your new disk's swap slice. The name of the variable will be something like “SWAPDEVICE=/dev/disk/dsk2b” (or whatever your actual swap slice is).
Both Tru64 and Digital Unix (but especially Tru64) have a hardware registry which will store the names of disk devices that are seen by the system. In most cases, once a disk is seen, it's name will not change even on the cloned disk (the registry will be copied over at the same time during the file copy steps). However, in the event your disk name changes, don't forget to change the swap device, too. The system can hang if you specify a bogus device.
Fix the Sysconfigtab
Another file you might have to alter is your /etc/sysconfigtab. This isn't always needed. I believe it's a difference between Tru64 and Digital Unix. There are some versions of startup scripts which will refer to the file again, for a swap device. It would be present in the section called vm:. If you see a swap device listed in that section, alter it to point to the new disk or remove it.
Final Steps
Insure that you have completed these steps.
- Install the boot loader using disk label
- Edit the disklabel on your target disk
- Re-create UFS or AdvFS file systems
- Copy files over from the original
- Fix the /etc/rc.config, /etc/sysconfigtab, and of course, the /etc/fstab
You should have done all these steps before you attempt the new disk.
Final Boot
Now the system is ready to reboot. You probably want to understand a bit of interaction with what we call the SRM console. The main thing you want to do is to check the values of the following.
- show dev This will show you all the devices (NICs, HBAs, and of course disks). You need to know which disk is your target versus destination disk. The device list should have clues like the manufacturer name and the device model.
- The boot command takes the disk name as an argument. For example: “boot dka0” or “boot dqa0” would boot each of those disks respectively. Also, if you'd like to try single user mode you'll want to use the “-fl s” argument to boot into single user mode (if you do then remember to use bcheckrc command to make single user mode usable).
- The show command is the compliment of the set command. These allow you to view and alter the names of SRM variables which alter boot and system behavior.
- Understand the variables that matter most like BOOTDEF_DEV which points to the default boot device on the system. Another you might want to understand is AUTO_ACTION which governs if the system will automatically try to boot up the system or halts at the SRM chevron prompt. The action names are boot or halt.
So, what do you normally need to do? Try to boot the clone but don't yet change the default boot device until you are ready to completely switch over to the clone.
Troubleshooting
Cloning was something that DEC intended folks to use sysman for. Unfortunately, their process is too inflexible for most use. So, this more manual method is needed. It is, unfortunately a fault prone process. Here are some of the normal issues.
The Drive will not Boot
If you issue the boot command from the SRM console but you never see the kernel line saying “UNIX Boot” then you probably had an issue with the boot sector. Do the following.
- Re-mount the target disk and make double sure that you have the kernel on the root file system. These would be in the form of two files named vmunix and genvmunix. Without a kernel, you can't boot the system. They should be there as a result of your file copy effort.
- Unfortunately, the most likely cause is that you didn't do the disklabel steps in the proper order. Zero the disklabel with the -z and start over. Do it in the proper order and you'll have better luck.
It Hangs During Boot
Depends on why and where it hangs. The most common issues are these.
- You forgot to edit out some kind of reference to the swap device. Check the post-copy steps again. One of the startup scripts probably tried to activate swap on a device that won't. Remember that you can use single user mode to fix these issues without doing a full re-install. References to swap could be in in the /etc/fstab (usually on 4.x or older systems) and could also be inside of your /etc/sysconfigtab which is usually the case on 5.x systems.
- You are using UFS and you forgot to fix the reference to the /etc/fstab for one of the file systems. You might also have to edit any reference for swap, especially on Digital Unix 4.x. Also pay attention for any other filesystems that might have changed or gone away.
- Make sure your copy method preserved all the permissions, especially on /sbin and the scripts in /sbin/init.d which are critical. Those scripts should be executable and owned by the root or bin users.
- Do NOT try to eliminate one of the default AdvFS file domains (one for root and another for /usr). As mentioned earlier, the startup scripts reference both root_domain and usr_domain and if you change their names or eliminate one of them the startup scripts will fail.
- Make sure your SRM variables for boot_file and or boot_flags may be incorrect and have old VMS data in there or some other garbage. Your boot file should be your kernel, which is usually /vmunix or /genvmunix. Your boot flags should be A or S but not a number, if it's a number it came from VMS and it's wrong. People who re-use VMS systems for Tru64 will run into these problems often. Also check the OS_TYPE variable and make sure it's UNIX not VMS.
- Are you absolutely sure you prepared your disk with the properly disklabel command with the -t flag so that it got the correct boot loader? If you miss this you wasted a lot of time and will need to start over from that step!
It Boots but It's Horked Up
- Double check your swap is pointing to the right place and working (swapon -s)
- Make sure your filesystems are not showing up with weird or generic names. Double check your source and destination device and make sure that your old device name isn't still leftover in a config file somewhere. Most commonly it's the /etc/fstab or bad disk-symlinks in /etc/fdmns. Also it's also worth perusing /etc/sysconfigtab on 5.x for mistakes or old device names.
- Make sure if you use a new system type that any kernel tuning you do makes sense. Ie.. if you take parameters from a system that has 4GB of RAM and try to use them on a big GS1280 with 64GB of RAM then you are almost certainly going to have some bad tuning in there. Double check your sysctl settings with sysctl -a.
- In general, you should make sure that the symlinks in /etc/fdmns sub-directories point to real disks that actually exist (check with disklabel -r dskX). Make sure file domains can show their file sets with the showfsets command to check for domain health.
If you have problems beyond the ones documented, then consider contacting PARSEC for some consulting work to help you!
A Note About Alphaserver Compaq Smart Array Contollers
Early models of the Compaq Smart Array controller are available for Alpha hardware and feature RAID1-10 levels in including distributed parity. They are configured via the “ORCA” utility from the SRM console. If you have a graphics console and a real PS/2 keyboard you can enter ORCA by catching the boot console message which will urge you to hit F8 if you want to enter the array configuration tool. If that doesn't work, you can find the RAID card's identifier by doing a show dev on the SRM console then you can manually load the RAID BIOS using the syntax load bios dky0 (where you substitute dky0 for whatever your actual RAID card's ID is).
These array controllers have RAID-1 features. If you're using a system with such a card, you might be tempted to copy RAID volumes by breaking RAID-1 mirrors and re-synchronizing them to disks you intend as clones. This way you can import the RAID configuration on the clone-target system and everything is supposed to work, right? After all, it's a bit-for-bit clone!
Well, yes and no. First of all there are some drawbacks to this method. Here are some absolute show stoppers for example:
- You cannot do this with anything other than RAID-1 or RAID-10 based arrays. RAID-5 is right out as you cannot split the array and maintain enough parity in two places at once. RAID-6 or RAID-DP is also unworkable due to this issue. Mirroring or bust.
- Your disks must be as large or larger than the original disks you are cloning. If you do use mismatched disks, your resulting logical drive will just waste the extra space on the larger drive.
Other potentially troublesome issues include:
- The copy time will run for the full capacity of the disk, not just the modified blocks on the array. Thus, if you already have some small vdump backup files, consider that you might be able to vrestore those faster than the RAID controller can mirror the entire drive.
- If you use a larger disk as the sync-target, you'll just lose whatever capacity it has beyond the size of the disk you are cloning. Clone a 36G disk onto a 300G disk and it's still going to appear as a 36G disk because of the disklabel.
- The target/cloned disk will definitely get a new disk ID if it's Tru64 version 5.0 or newer. This is because of it's awful AIX-like hardware dynamic abstraction crapola layer. Not that I'm salty about this not being an issue whatsoever in Digital Unix 4.0 or anything like that.
- Even uglier. Once you rebuild the mirror on the source machine's RAID controller you might have a system that boots into single user mode and freaks out. This is again, due to the stupid changing disk name game due to hardware abstraction and auto-naming. You can either go through the hassle of deleting the linkage to the name with the dlmgr horror-show, or you can fix the symbolic links in the file domains found in /etc/fdmns, which I think is a bit of an easier way to go. Both work. This is only an issue in Tru64, but not Digital Unix or OSF/1.