User Tools

Site Tools


how_to_clone_tru64_and_digital_unix

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
how_to_clone_tru64_and_digital_unix [2019/06/25 06:30] – [It Hangs During Boot] sgriggshow_to_clone_tru64_and_digital_unix [2023/09/08 23:04] (current) – [Dynamic Disk Reconfiguration in Tru64 5.x Horks AdvFS] sgriggs
Line 24: Line 24:
   - Do not try to eliminate the /usr file system for UFS __and__ AdvFS. Startup scripts in **/sbin/init.d** will force you to have it and freak out (stopping the boot process) if you don't. Don't try to consolidate those unless you plan on manually editing every single startup script and fixing them, which isn't worth it.    - Do not try to eliminate the /usr file system for UFS __and__ AdvFS. Startup scripts in **/sbin/init.d** will force you to have it and freak out (stopping the boot process) if you don't. Don't try to consolidate those unless you plan on manually editing every single startup script and fixing them, which isn't worth it. 
   - You might have read about **sysman** clone features. Forget that. That tool is garbage and will cause you much more pain than it saves you. Plan on doing it manually yourself or have an expert do it for you using this process.    - You might have read about **sysman** clone features. Forget that. That tool is garbage and will cause you much more pain than it saves you. Plan on doing it manually yourself or have an expert do it for you using this process. 
-  - Do not plan on using the **dd** method unless you have completely identical disks. That has several major drawbacks otherwise. The worst include performance problems (due to bad geometry alignment on the disklabel) and loss of space if the target disk is larger (or loss of __data__ if the target disk is smaller). However, just to say we shared the actual command if you cloned an identical dsk1 onto a dsk2 in tru64 5.x it'd be this command: **dd if=/dev/disk/dsk1c of=/dev/disk/dsk2c bs=1024k**+  - Do not plan on using the **dd** method unless you have completely identical disks. That has several major drawbacks otherwise. The worst include performance problems (due to bad geometry alignment on the disklabel) and loss of space if the target disk is larger (or loss of __data__ if the target disk is smaller). However, just to say we shared the actual command if you cloned an identical dsk1 onto a dsk2 in tru64 5.x it'd be this command: **dd if=/dev/disk/dsk1c of=/dev/disk/dsk2c bs=1024k** Additionally, remember that the **dd** method only works if you put the clone back on the same SCSI target as the source was on before it failed (PITA).  
  
 ==== Take Inventory ==== ==== Take Inventory ====
Line 38: Line 38:
  
 <code> <code>
-# for Digital Unix +# for Digital Unix and OSF/1 (version 1.0 through 4.0g)
 scu show edt scu show edt
  
-# for Tru64+# for Tru64 5.0 and 5.1
 hwmgr show scsi hwmgr show scsi
  
Line 58: Line 58:
  
 # If we are using LSM on our root disk, I panic and run away. Don' # If we are using LSM on our root disk, I panic and run away. Don'
-# try to clone this setup, recreate it and restore backups to it.  +# try to clone this setup, recreate it and restore backups to it. LSM is 
-volprint -ht +# really VxVM in disguise (no, really) and it has a completely different 
 +# way of cloning I might cover in a whole separate document.  
 +volprint -ht 
 + 
 +# If that last command gives you "lsm:volprint: ERROR: IPC failure:  
 +# Configuration daemon is not accessible" then GOOD. That means  
 +# you aren't using LSM.
 </code> </code>
  
Line 67: Line 73:
 LSM is a volume management scheme which is pretty much identical to Veritas Volume Manager (VxVM). This is because, at the time, DEC was able to secure a one-shot licensed copy of VxVM but they agreed to change it's name. So, if you know VxVM, then all you do is replace the string "vx" with the string "vol" on all the commands and they will work fine. For example, instead of using **vxprint** you use **volprint** and so forth. Don't get me wrong, VxVM is a great volume manager, it's just that when they glued it onto Tru64, they really really made root disk encapsulation (putting your boot drives into VxVM and getting RAID-1 working so you can boot off both sides of the mirror) is a HUGE pain in the neck to fix or maintain. It's pretty easy to install with it, but you end up with a giant white elephant. Most folks are just as likely to delete their surviving mirror as they are to resilver and fix their systems when using LSM.  LSM is a volume management scheme which is pretty much identical to Veritas Volume Manager (VxVM). This is because, at the time, DEC was able to secure a one-shot licensed copy of VxVM but they agreed to change it's name. So, if you know VxVM, then all you do is replace the string "vx" with the string "vol" on all the commands and they will work fine. For example, instead of using **vxprint** you use **volprint** and so forth. Don't get me wrong, VxVM is a great volume manager, it's just that when they glued it onto Tru64, they really really made root disk encapsulation (putting your boot drives into VxVM and getting RAID-1 working so you can boot off both sides of the mirror) is a HUGE pain in the neck to fix or maintain. It's pretty easy to install with it, but you end up with a giant white elephant. Most folks are just as likely to delete their surviving mirror as they are to resilver and fix their systems when using LSM. 
  
-Systems using LSM are extremely hard to clone unless you can use dd alone to clone a single disk. This would not be a very common LSM configuration (what would be the point of a single-disk LSM config?). Somy advice on cloning LSM-based systems is "do not try"Instead, recreate the LSM RAID layout via "shell" Tru64 installation and then use **vdump** and **vrestore** to get the data back over there+Systems using LSM can be cloned, but you need to use LSM (VxVMsemantics and methodsIn a nutshellyou create and break a three-way mirrorThis isn't covered by this document (yet). It'fairly long rat-hole most folks don't have to worry about
  
 ==== The Boot Sector ==== ==== The Boot Sector ====
Line 91: Line 97:
 __REMEMBER__: You need to use UFS or AdvFS boot sectors, but not both. They are mutually exclusive.  __REMEMBER__: You need to use UFS or AdvFS boot sectors, but not both. They are mutually exclusive. 
  
-You will find the boot sectors themselves under the **/mdec** directory. There are ones for UFS, AdvFS, and CDFS (iso9660 with rock ridge extensions). This is where **disklabel** goes to find them and thus you can see that none of them are mixed case or have any unexpected spelling.+You will find the boot sectors themselves under the **/mdec** directory. There are ones for UFS, AdvFS, and CDFS (iso9660 with rock ridge extensions). This is where **disklabel** goes to find them and thus you can see that none of them are mixed case or have any unexpected spelling. It's a bit interesting
  
 <code> <code>
Line 145: Line 151:
 At first, this output can look intimidating, but trust me, it's simple. Each slice with a letter represents a potential section of your disk with the exception of the ''c'' slice which always refers to the whole disk (thus it's name of "whole disk slice"). The sector offsets and sizes determine the start position of the slice as well as how much data it will contain. Slices which are not labeled as ''unused'' should not overlap with anything but the ''c'' slice. Those which are labeled ''unused'' are just ignored. I recommend using the **bc** command or a pocket calculator to do the arithmetic. It's just addition an subtraction, nothing fancy. One slice's starting offset should match the previous slices size + starting offset to avoid any overlap.   At first, this output can look intimidating, but trust me, it's simple. Each slice with a letter represents a potential section of your disk with the exception of the ''c'' slice which always refers to the whole disk (thus it's name of "whole disk slice"). The sector offsets and sizes determine the start position of the slice as well as how much data it will contain. Slices which are not labeled as ''unused'' should not overlap with anything but the ''c'' slice. Those which are labeled ''unused'' are just ignored. I recommend using the **bc** command or a pocket calculator to do the arithmetic. It's just addition an subtraction, nothing fancy. One slice's starting offset should match the previous slices size + starting offset to avoid any overlap.  
  
-What you normally see is that slices ''a'', ''b'', ''g'', and sometimes also ''h'' will normally be used. However, when you label disks for the first time all the slices will be marked as ''unused''. It's important to review your current disklabel data with the command **disklabel -r mydsk** so that you know what you are dealing with. +What you normally see is that slices ''a'', ''b'', ''g'', and sometimes also ''h'' will normally be used. However, when you label disks for the first time all the slices will be marked as ''unused''. It's important to review your current root disklabel data with the command **disklabel -r original_disk** so that you know what you are dealing with.  
  
 In general, if you are cloning a UFS based system, then be very careful that your disklabel is going to give you enough space for the **/** and **/usr** file systems. If you are using AdvFS make sure that the total slices you set aside can be used to add up to the sizes you need (ie.. remember that AdvFS can do concatination, mirroring, and striping between disk/block devices). This is an effort you need to make before you start copying over files, because by then it could be too late to correct a size mismatch and you'll simply find out because the destination file system or file set will fill up before your copy/sync operation completes.  In general, if you are cloning a UFS based system, then be very careful that your disklabel is going to give you enough space for the **/** and **/usr** file systems. If you are using AdvFS make sure that the total slices you set aside can be used to add up to the sizes you need (ie.. remember that AdvFS can do concatination, mirroring, and striping between disk/block devices). This is an effort you need to make before you start copying over files, because by then it could be too late to correct a size mismatch and you'll simply find out because the destination file system or file set will fill up before your copy/sync operation completes. 
  
-==== File Copy Steps ====+==== Creating File Systems ==== 
 + 
 +Once you have your disk label setup on the clone, what you have is a essentially a partition table with no file systems. It's now up to us to actually create file systems inside of those extents we just defined. Obviously, this is a different process depending on if you use UFS or AdvFS.  
 + 
 +===== Re-Creating an AdvFS Layout ===== 
 + 
 +The AdvFS system is a combination of a disk volume manager with a journaling filesystem. Copy-on-write features are sprinkled in there as well (snapshots, cloning, and such). First you need to understand AdvFS and I don't really want to write a novel on it. See the [[https://nixdoc.net/man-pages/tru64/advfs.4.html|man page for AdvFS]] for a more fullsome discussion.  
 + 
 +AdvFS has two main overarching concepts. Domains and Filesets. AdvFS domains are abstractions you don't use directly. They are containers for the disks and the configuration. The Filesets are the actual file systems you really mount on a directory. The proper nomenclature for AdvFS is "FileDomain#FileSet" or as an example "DatabaseDomain#AccountingFileset". This nomenclature is used rather than a path to a disk DSF file in /dev when you are mounting AdvFS filesets or referring to them in configuration files such as **/etc/fstab**.  
 + 
 +If you are cloning an AdvFS system you have two problems to solve. The first getting the AdvFS configuration planned and created. The second is mounting it. Both old and new disks need to be mounted simultaneously to allow a copy, but we must avoid namespace collisions in our AdvFS file domains (especially the reserved domains "root_domain" and "usr_domain" which **must** exist and, like Highlander, "there can be only one").  In many cases, you may even need to create a whole set of scripts to mount and update the cloned disk periodically so that it's contents are not too stale and out of date to be any use. So, in that case there is an additional reason to want to have the clone-disk setup with it's own AdvFS domains & filesets on the **source** disk side all the time (we have to frequently update it). 
 + 
 +Let's assume we are cloning disk **dsk0** onto disk **dsk1**. All we have done so far is to simply clone the disklabel from the source disk and make **dsk1** match. Let's create a local (source disk environment side) configuration for our clone's AdvFS filesets.  
 + 
 +<code> 
 +# Use slice dsk1a to create the copy of the root domain 
 +mkfdmn /dev/disk/dsk1a target-root_domain 
 + 
 +# We need a target for our usr_domain copy 
 +mkfdmn /dev/disk/dsk1g target-usr_domain 
 + 
 +# Now we must make filesets in the domains. Note that they have no size 
 +# that is because they, by default, can use all the space in the domain.  
 +mkfset target-root_domain root 
 +mkfset target-usr_domain usr 
 +mkfset target-usr_domain var 
 + 
 +# We need a place to mount these file sets. Let's create three new directories 
 +# and use them as target mount-points for our file copy jobs. I abbreviate 
 +# in order to make it easier to navigate, but "t" is for "target" and remember 
 +# this will be the root (/) of your cloned target disk when it boots for real.  
 +mkdir /t 
 +mount target-root_domain#root /t 
 +mkdir /t/usr 
 +mkdir /t/var 
 +mount target-usr_domain#usr /t/usr 
 +mount target-usr_domain#var /t/var 
 +</code> 
 + 
 +Note that in this case, the filesets for the **/var** and **/usr** file systems (filesets) are both using the same AdvFS file domain called "target-usr_domain". This is totally fine and tells us that the source disk only had two real data slices on there (being "a" and "g" in this case). BTW, don't try to get clever like I did once and collapse all the filesystems down to just root. Tru64 hates that and will panic. It wants separate filesets for root, usr, and var if using AdvFS. I've never tried it for UFS, though.  
 + 
 +So, if you'd be using the same layout as above (source of **dsk0** target of **dsk1** with two AdvFS data slices "a" and "g") then you'd now be ready to get those filesets mounted so you can start copying over your data. We use the "target-" prefix for the clone so that we can avoid a namespace collision with the source operating system disk we are cloning. However, this solution also creates a problem. If we reboot using the clone, there will be stale information in the AdvFS configuration. It will actually try to go boot off the original target disk unless we fix our configuration in the **/etc/fdmns** directory and clean up our cloned **/etc/fstab** by commenting-out any potentially problematic entries.  
 + 
 +Let me first elaborate on the issues you'll need to correct in **/etc/fdmns**. This directory is a very simple construct. In this directory you'll find a sub-directory for each AdvFS file Domain you have created. Within those sub-directories there should ONLY be soft symbolic links to specific disk DSF device slices (for example: **/dev/disk/dsk99a**) owned by that AdvFS domain. This way each domain knows what disk slices it "owns".  
 + 
 +So, do a thought experiment.... your are rebooting from the clone, it comes up and checks /etc/fdmns/root_domain and if it sees a link to your old drive things are going to fail but if it's a link to your new drive slice, it'll succeed. The trouble is, if you simply copy the **/etc** directory over, you'll get an __exact__ copy with not-gonna-work configuration entries baked into your clone. We need to fix these before D-Day comes and we have to actually use/rely on the clone! Fortunately, doing so is quite easy. All we need to do is mount the clone's root fileset after we copy over the data and go manually fix things in /etc/fdmns with simple directory and link management commands like **rmdir** and **ln**. I will show you examples after I show you ways to actually copy the data over.   
 +===== UFS Filesystem Setup ===== 
 + 
 +For UFS users the process is much more simple. You simply need to run the **newfs** command on each UFS partition. It's no more complicated than that. Here is an example, in this case we have three filesystems, one on dsk0a, and then two more: dsk0g, and dsk0h. This is a common configuration I find on many Tru64 5.1 systems.  
 + 
 +<code> 
 +# newfs /dev/rdisk/dsk0a 
 +# newfs /dev/rdisk/dsk0g 
 +# newfs /dev/rdisk/dsk0h 
 +</code> 
 + 
 +That puts down all the superblocks and metadata information the filesystem uses internally to operate. It doesn't mean you've mounted those filesystems anywhere for use, yet. It also doesn't mean you have proper entries in the clone's /etc/fstab reflecting this configuration. So, keep in mind that we still have to do that and I'll show you how later on in this guide.  
 + 
 +==== File Copying For The Impatient ==== 
 + 
 +This assumes you have the filesystems mounted up (/target-root, /target-usr, and /target-var).  
 + 
 +<code> 
 +# Copy over the root filesystem  
 +vdump -0 -f - / | (cd /target-root; vrestore -x -f -) 
 + 
 +# Copy over the /usr filesystem  
 +vdump -0 -f - /usr | (cd /target-usr; vrestore -x -f -) 
 + 
 +# Copy over the /var filesystem  
 +vdump -0 -f - /var | (cd /target-var; vrestore -x -f -) 
 +</code> 
 + 
 + 
 + 
 +==== File Copy Steps in Detail with Troubleshooting ====
    
-The UFS file system is BSD's native file system. It's very reliable and tough, but it also lacks features such as journaling, logging, and some other more esoteric stuff. It's maximums are also much lower than AdvFS. The upshot of UFS is that it's extremely reliable and stable, gives you reasonably high performance, and has a lot of tools in user space and by 3rd parties that do things like recovery. It's also free-as-in-beer insomuch that DEC/Compaq/HP don't charge you any extra $$$ for what you do with it, unlike AdvFS which costs money to do anything but host a basic OS installation. +The UFS file system is BSD's native file system. It's very reliable and tough, but it also lacks features such as journaling, logging, and some other more esoteric stuff. It's maximums are also much lower than AdvFS. The upshot of UFS is that it's extremely stable, gives you reasonably high performance, and has a lot of tools in user space and by 3rd parties that do things like recovery. It's also free-as-in-beer insomuch that DEC/Compaq/HP don't charge you any extra $$$ for what you do with it, unlike AdvFS which costs money to do anything but host a basic OS installation, but AdvFS has advanced features like snapshots (copy-on-write), cloning, resiliency, striping, and others. If you want to see the full power of AdvFS on a system with no license limits, try it out on Linux, first. It was [[http://advfs.sourceforge.net/|ported]] there many years ago. AdvFS may now be just another has-been 1990's Unix project, but it was the real thing when it comes to providing some of the most interesting and useful storage features still in demand today.
  
 UFS is easy to clone. You can use any file copy utility to move the data. However, you must insure that the tool you pick can, at a minimum, do a few things. AdvFS is also easy to clone, but I recommend using **vdump** for that exclusively as it's the easiest to remember and the fastest method I've seen speed-of-copy wise. For whatever reason, it's significantly faster than using **tar** or **cpio**, but don't try that method for UFS since vdump will refuse to dump it! Whatever tool you choose to use, make sure it has these capabilities. UFS is easy to clone. You can use any file copy utility to move the data. However, you must insure that the tool you pick can, at a minimum, do a few things. AdvFS is also easy to clone, but I recommend using **vdump** for that exclusively as it's the easiest to remember and the fastest method I've seen speed-of-copy wise. For whatever reason, it's significantly faster than using **tar** or **cpio**, but don't try that method for UFS since vdump will refuse to dump it! Whatever tool you choose to use, make sure it has these capabilities.
Line 158: Line 239:
   - __It must be able to restore the filesystem to a different directory than it was backed up from__. This is something that **tar** only works with if you remembered to make the archive with relative paths. It has a flag for it ''-C'' but it still fails if the input archive has absolute paths. The problem with **cpio** besides the fact that the people that wrote it must have been on drugs that helped them make sense of it's opaque options and horrible manpage is that it has the same problem with absolute file names that **tar** has.    - __It must be able to restore the filesystem to a different directory than it was backed up from__. This is something that **tar** only works with if you remembered to make the archive with relative paths. It has a flag for it ''-C'' but it still fails if the input archive has absolute paths. The problem with **cpio** besides the fact that the people that wrote it must have been on drugs that helped them make sense of it's opaque options and horrible manpage is that it has the same problem with absolute file names that **tar** has. 
  
-Here are some actually //valid// ways to copy one file system at a time in isolation. I use the ''/usr'' file system as an example and my clone disk filesystem is on ''/new/usr''. However, don't forget, you need to do this for all of the operating system UFS file systems or AdvFS file sets (whichever you have).  +Here are some actually //valid// ways to copy one file system at a time in isolation. I use the ''/usr'' file system as an example and my clone disk filesystem is on ''/new/usr''. However, don't forget, you need to do this for all of the operating system UFS file systems or AdvFS file sets (whichever you have). 
- +  - You can use the AdvFS **vdump** and **vrestore** commands. They will also work on UFS which is really nice since this method is by far the easiest and fastest. Example ''vdump -0f - /usr | vrestore -xf - -D /new/usr'' or another syntax but very similar is ''vdump -0 -f - / | (cd /target-root; vrestore -x -f -)'' Note that it won't cross filesystem boundaries (whew!) which makes our job much easier. 
-  - CPIO in two easy steps: ''cd /usr ; find . -print -depth -mount | cpio -pdm /new/usr/''  Whatever you do, don't forget the trailing slash and don't provide an absolute path to your **find** command or you will regret it as the files will end up in **/usr/usr/** which isn't what you want. Also, make sure you use find like I show with a relative path. Using an absolute path for the **find** command will definitely ruin the effort. +  - CPIO in two easy steps: ''cd /usr ; find . -print -depth -mount | cpio -pdm /new/usr/''  Whatever you do, don't forget the trailing slash and don't provide an absolute path to your **find** command or you will regret it as the files will end up in **/usr/usr/** which isn't what you want. Also, make sure you use find like I show with a relative path. Using an absolute path for the **find** command will definitely ruin the effort. I don't like or trust **cpio** and never have. YMMV
   - Don't try to use **tar** it has no combination of flags (at least on Tru64's tar) that won't capture other file systems in your archive. It'll work fine for terminal directories like ''/usr'', but won't work worth a darn for the root file system. It'll create a huge mess in that situation as it'll capture ALL other file systems unless you go on some crusade to manually create an exclude file. It's too painful. If we had GNU Tar, fine, but don't use DEC's **tar** command for it since it's not modern enough to pull it off easily.    - Don't try to use **tar** it has no combination of flags (at least on Tru64's tar) that won't capture other file systems in your archive. It'll work fine for terminal directories like ''/usr'', but won't work worth a darn for the root file system. It'll create a huge mess in that situation as it'll capture ALL other file systems unless you go on some crusade to manually create an exclude file. It's too painful. If we had GNU Tar, fine, but don't use DEC's **tar** command for it since it's not modern enough to pull it off easily. 
   - If you managed to get the **rsync** binary (which does not come with any version of OSF/1, Digital Unix, or Tru64 but is [[https://www.parsec.com/tru64-packages/rsync-3.1.2-tru64-5.1-alpha.tar.gz|available as freeware]]). You could use the form ''rsync -Pxvra /usr/ /new/usr/'' and it'd work great.    - If you managed to get the **rsync** binary (which does not come with any version of OSF/1, Digital Unix, or Tru64 but is [[https://www.parsec.com/tru64-packages/rsync-3.1.2-tru64-5.1-alpha.tar.gz|available as freeware]]). You could use the form ''rsync -Pxvra /usr/ /new/usr/'' and it'd work great. 
-  - You can use the AdvFS **vdump** and **vrestore** commands. They will also work on UFS which is really nice since this method is by far the easiest and fastest. Example ''vdump -0f - /usr | vrestore -xf - -D /new/usr''  
  
 Here is an example of what you might want to try, but surely **won't work**. Here is an example of what you might want to try, but surely **won't work**.
Line 173: Line 253:
 cd / cd /
 tar cvpf - . | tar -f - -xpP -C /new tar cvpf - . | tar -f - -xpP -C /new
 +
 ## Whoops, you just filled up /new with files from /usr  ## Whoops, you just filled up /new with files from /usr 
 ## before you got a chance to even create /usr!! ## before you got a chance to even create /usr!!
Line 189: Line 270:
 There is a big problem with the fact that you just can't limit Tru64's version of **tar** to a single file system. That becomes a showstopper for archiving root or any file system with sub-directories which are mount points for other filesystems. It'll just cause you pain. Use one of the other methods instead.  There is a big problem with the fact that you just can't limit Tru64's version of **tar** to a single file system. That becomes a showstopper for archiving root or any file system with sub-directories which are mount points for other filesystems. It'll just cause you pain. Use one of the other methods instead. 
  
-You should __not__ use **dump** and **restore** commands __through a pipe__ for UFS. It won't work because the UFS version of **restore** has no flag to allow it to change directories before it starts the restore. It will always try to restore to the same path and that blows it up for any use as a cloning mechanism. Note that this doesn't apply to **vrestore** which absolutely __does__ have that option and works great. +It's complicated to use regular UFS **dump** and **restore** commands __through a pipe__ and I don't recommend it. It's difficult because the UFS version of **restore** has no flag to allow it to change directories before it starts the restore. It will always try to restore to the same path and that blows it up for any use as a cloning mechanism unless you use a shell-based parenthesis-operation to fix it which most people find confusing and indirect. Note that this doesn't apply to **vrestore** which absolutely __does__ have an explicit option and works great. 
  
 === Post Copy Fixes === === Post Copy Fixes ===
  
 Once you've re-created your slices (partitions) and got your data copied over then you are ready to begin making the final tweaks that will make sure the system is bootable.  Once you've re-created your slices (partitions) and got your data copied over then you are ready to begin making the final tweaks that will make sure the system is bootable. 
 +
 +==== Dynamic Disk Reconfiguration in Tru64 5.x Horks AdvFS =====
 +
 +There are a lot of situations in which your clone might fail to boot or might boot up crippled in single user mode after cloning a Tru64 system. For storage and other items, Tru64 has a dynamic hardware database it tries to keep updated on it's own. Rather than assign names based on absolute hardware locations, like in Digital Unix 4.x before, it does something completely new. I'm not sure why thought it was a good idea or why, but it turns out to be pretty sub-optimal in practice. One of many negative side effects of adopting this system was that it causes problems for anyone wanting to clone a root disk. You're in a chicken and egg scenario. How do you know what links to create for "root_domain" and "usr_domain" in /etc/fdmns so that the proper disk slices will be symlinked there when the system boots up? You might know the slice letters such as "a", "h", or "g" but you don't know what disk Tru64 will assign to your cloned root disk once the auto-configuration scripts kick off. What you thought might be dsk1 could turn out to be dsk3, etc... 
 +
 +There are two strategies for addressing this issue:
 +
 +  - If you clone the OS image from the workstation you use to do the vrestore operations, you can safely assume the disk name will stay the same. Why? Because you are copying the same hardware database that the system has booted up with. Therefore, you're going to get the same disk name as when you had when you created the file domain, mounted the target file sets, and performed the restore. So, in this case, you should have a clean booting clone. This is especially true if you are booting the disk on the same physical system (ie.. you're clone was for a backup root disk should your primary fail).
 +  - In other cases, you are restoring vdumps from a non-local system. In this case, you're getting the hardware database that came from those vdumps (the database files are in /etc). In this situation, you have literally no idea what the disk number will be. The best strategy is to boot the system into single user mode, then observe whatever disk name the root disk came up with. Then, if possible, run **bcheckrc** from your single-user session, and fix your links in /etc/fdmns to point to the new disk name. If that fails then do the same process after booting optical media or a different hard disk drive. Once you know the disk name the system is going to pick, it's just a matter of fixing /etc/fdmns sub-directories and updating their symlinks. 
 +
 +Digital Unix users can ignore all that and just stick to bus, SCSI ID, and LUN identifiers with the confidence of knowing they will stay deterministic. 
 +
 +==== Fix AdvFS Domains in /etc/fdmns ====
 +
 +Once you finish copying data for the root fileset in AdvFS you'll have the exact same layout as the running system. The trouble is, that's not what we need on the clone. It needs to have it's own disk name reflected in the symbolic links present in /etc/fdmns. In Tru64 5.1 there are three required file systems. These are the root file system (/), /usr,  and /var filesystems. They must be present in either two or three AdvFS file domains with specific names. If not, the boot scripts will halt/fail, so do not try to "simplify" down to just having one root filesystem or something like that. The system will not cooperate. 
 +
 +At a minimum, you need a domain called "root_domain" and "usr_domain" the reason is that these are hardcoded into the kernel and some of the system startup scripts. We now need to have the __cloned__ disk slices in those required domains, rather than the original. 
 +
 +We have a problem because the system we use to create and copy the cloned data will have already given the disk a name. However, once you reboot using the prepared-target disk it will be using it's hardware database from you just restored. So, __where did you restore it from?__ That is the critical question. If you cloned the machine you used for the copy & restore operation, you can be sure the disk name will stay the same. For example, if it was 'dsk2' when you copied the data to it, it'll be dsk2 when you boot it up. Why? Because you copied the hardware database in /etc after the disk had already been named. If the disk is further cloned, say by a RAID-1 hardware mirror being copied, it'll also end up being renamed __again__ by Tru64. This sucks because it'll cause problems on your first boot of your cloned system. 
 +
 +So, consider what might happen if you restored a vdump from another system, instead of your cloning workstation. You really don't know what the disk name will be, do you? Same is true of a RAID-1 based clone. So, here's what's going to happen. You'll need to boot up your system, observe what disk name the system believes it's OS disks are called, then (if it's different from the link structure in the target /etc/fdmns directory) you'll need to go in and correct the disk symlinks in /etc/fdmns. Odds are, the first boot will drop you into single-user mode anyway and complain it cannot mount /usr. 
 +
 +Thus, before this clone ever boots, we need to fix this issue. Since I don't know your exact scenario, I can't give a one-size-fits-all solution. What you need is for the system to boot once, then see what the dynamic hardware configuration will name the "new" disk it sees. Then, you need to update the links in /etc/fdmns to reflect the new name. This is a bit of a chicken and egg scenario. However, what generally works is to boot the system up in single user mode, then try to run **bcheckrc** and finally, check the output of **hwmgr show scsi** to find the new disk name, read the **disklabel -r dskX** output, and fix the links in /etc/fdmns.
 +
 +If you are building a cloned disk for the same system you are doing the cloning on, you can be pretty sure the disk name will remain constant with whatever you booted with, now. 
 +
 +I do not know how Tru64 tracks disk devices. It might be by WWN or some other identifier. However, it takes very little to "invalidate" the hardware database entry for a disk and thus make it appear "new". In this instance Tru64 gives the disk a different name. What might have originally been dsk0 may now become dsk3, for example. This is a downside of the way Tru64 5.x was designed and frankly, I'd rather work with disks in Digital Unix anyday just for it's simple determinism. 
 +
 +I can give you detail on the process, of fixing this, at least. Start with a session like this one in my example. 
 +
 +<code>
 +# Change into the AdvFS configuration directory
 +cd /etc/fdmns
 +
 +# See what you have right now. See which disks are linked in each sub-directory?
 +# those sub-directories have the same names as all your AdvFS Domains
 +
 +$ ls -lR
 +total 32
 +drwxr-xr-x   2 root     system      8192 Dec 10  2019 root_domain
 +drwxr-xr-x   2 root     system      8192 Dec 10  2019 usr_domain
 +drwxr-xr-x   2 root     system      8192 Dec 11  2020 target-root_domain
 +drwxr-xr-x   2 root     system      8192 Dec 11  2020 target-usr_domain
 +
 +./root_domain:
 +total 0
 +lrwxr-xr-x   1 root     system        15 Dec 10  2019 dsk0a -> /dev/disk/dsk0a
 +
 +./target-root_domain:
 +total 0
 +lrwxr-xr-x   1 root     system        15 Dec 11  2020 dsk0a -> /dev/disk/dsk1a
 +
 +./usr_domain:
 +total 0
 +lrwxr-xr-x   1 root     system        15 Dec 10  2019 dsk0g -> /dev/disk/dsk0g
 +
 +./target-usr_domain:
 +total 0
 +lrwxr-xr-x   1 root     system        15 Dec 11  2020 dsk0g -> /dev/disk/dsk0g
 +
 +
 +# You can see from that structure that the system would try to use disk dsk0
 +# which is our original. The clone is dsk1. So, your easiest option is to manually
 +# fix these links before the clone ever tries to boot of this disk. 
 +
 +# Out go the references to the never-to-be-seen-again dsk0
 +# we need to link to the new target disk, instead.
 +rm /etc/fdmns/root_domain/dsk0a
 +rm /etc/fdmns/usr_domain/dsk0g
 +
 +# In go our cloned slices which are now promoted to production/running
 +# names "root_domain" and "usr_domain" instead of what ever you called
 +# them temporarily while you were copying the data over.
 +ln -s /dev/disk/dsk1a /etc/fdmns/root_domain
 +ln -s /dev/disk/dsk1g /etc/fdmns/usr_domain
 +
 +</code> 
 +
 +In that situation, we just adjusted the disk names from using dsk0 to dsk1. We might have done this because we know the clone disk will be going back into the same system and the Tru64 disk name will stay static. We might also have done this after booting into single user mode, and discovering we had to fix the links because Tru64 decided it wanted a new name for the root disk (it happens for many reasons). In either case, the fix is the same:
 +
 +  - Start by learning what disks you have and are "real" in the hardware database, run **hwmgr show scsi**
 +  - Read the disklabel of each disk to understand what's on it. **disklabel -r dskX**   
 +  - Update the links for AdvFS in /etc/fdmns. Keep in mind the updated disk will need to replace the old disk name in all the symbolic soft links in any AdvFS file domain directories in /etc/fdmns.
 +  - Check for usability with **showfsets mydomain**. If that works, your domain is usable. 
 +  - Clean up any old/stale file domains that no longer exist. Simply remove the directories in /etc/fdmns.
 +  - Remember to never remove root_domain or usr_domain. They can never be combined.
 +
 +What about the fact that **/etc/fdmns** is also cluttered with references to a clone that would actually be the root disk if it was booted-on. The "clone" then gets promoted to being the OS disk and there is no clone. So, our clone's /etc/fdmns should have no reference to "target" or cloned AdvFS domains. You can simply remove those directories now, if you wish. If you don't, then worry-not: they won't cause any harm.  You can clean them up after your first successful boot if you are paranoid. 
 +
 +<code>
 +# Be careful with this command since it's recursive. Double check your syntax. 
 +# but afterewards you will have no stale reference to the no-longer-clone
 +rm -rf /etc/fdmns/target-root_domain /etc/fdmns/target-usr_domain
 +</code>
  
 ==== Fix the Fstab ==== ==== Fix the Fstab ====
Line 199: Line 374:
 First fix you'll need to make is to your **/etc/fstab** but make sure you edit the right one! It's easy to get confused. So, make sure you are editing the file on your __destination__ file system and not the source! You will need to update this file with any type of changes you made such as the swap device (in 4.0 only), disk paths (for UFS only), or names of AdvFS file domains (if you changed them). If you are using a stock Tru64 5.1B system and AdvFS there is a pretty good chance that you won't need to make any changes as the names of the default file domains and file sets won't change (Those are ''root_domain#root'' and ''usr_domain#usr''). For UFS systems there is 100% chance you need to edit the **/etc/fstab**. It's going to point to a new disk (the one you put the new disklabel on and copied your data over to).  First fix you'll need to make is to your **/etc/fstab** but make sure you edit the right one! It's easy to get confused. So, make sure you are editing the file on your __destination__ file system and not the source! You will need to update this file with any type of changes you made such as the swap device (in 4.0 only), disk paths (for UFS only), or names of AdvFS file domains (if you changed them). If you are using a stock Tru64 5.1B system and AdvFS there is a pretty good chance that you won't need to make any changes as the names of the default file domains and file sets won't change (Those are ''root_domain#root'' and ''usr_domain#usr''). For UFS systems there is 100% chance you need to edit the **/etc/fstab**. It's going to point to a new disk (the one you put the new disklabel on and copied your data over to). 
  
-So, the bottom line is that you /might/ not have to alter the **/etc/fstab** if you run AdvFS because it abstracts the name of the disk. The system startup scripts refer to **root_domain** and **usr_domain** so __do not rename them__.+So, the bottom line is that you /might/ not have to alter the **/etc/fstab** if you run AdvFS because it abstracts the name of the disk. The system startup scripts refer to **root_domain** and **usr_domain** so __do not rename them__. However, when you mounted up your clone-disk's filesystems or filesets you might have made some changes to your __source__ system's /etc/fstab that now got copied over to the clone. In many cases, we need to NOT have any reference to the original disk (duh! It's broken in this context). So, let's not have any references to any clone file-sets or anything but a nice clean root, /usr, and /var setup. Remember that Advfs doesn't have any disk names in it's mounting device nomenclature. Instead, you'd have had to fix the symbolic links in /etc/fdmns to make sure the slices will be what your cloned disk expects to boot successfully
  
 +<code>
 +# Here is a working, normal, ordinary, average, Tru64 /etc/fstab for AdvFS
 +root_domain#root        /       advfs rw 0 1
 +/proc                   /proc   procfs rw 0 0
 +usr_domain#usr          /usr    advfs rw 0 2
 +usr_domain#var          /var    advfs rw 0 2
 +</code>
 +
 +What about UFS? Here is an example of a valid fstab for UFS
 +<code>
 +# Here is a Tru64 5.1 /etc/fstab from a UFS based system
 +# note the sysadmin took advantage of a 2nd disk for his user's
 +# home directories. This would be ignored by our cloning process
 +/dev/disk/dsk2a       /       ufs rw 1 1
 +/dev/disk/dsk2g       /usr    ufs rw 1 2
 +/dev/disk/dsk2h       /var    ufs rw 1 2
 +/dev/disk/dsk3c       /usr/users ufs rw 1 2
 +</code>
 ==== Fix the rc.config ==== ==== Fix the rc.config ====
  
 The **/etc/rc.config** file is the main configuration file for Tru64 and Digital Unix systems. This file may contain a reference to swap which may tie the system back to the old disk. This needs to be altered or removed. You should edit the file, but be aware of something else. You don't want to edit the rc.config file if it's the one in use on your booted system. For a running system, you need to use a tool called **rcmgr** to make changes. However, because the cloning process generally has an opportunity to edit the cloned files before they are in use, you don't have to worry about this fact. You can simply make edits to the file and when it's used by the system at the time when you try to boot the clone, your edits will all be baked in.  The **/etc/rc.config** file is the main configuration file for Tru64 and Digital Unix systems. This file may contain a reference to swap which may tie the system back to the old disk. This needs to be altered or removed. You should edit the file, but be aware of something else. You don't want to edit the rc.config file if it's the one in use on your booted system. For a running system, you need to use a tool called **rcmgr** to make changes. However, because the cloning process generally has an opportunity to edit the cloned files before they are in use, you don't have to worry about this fact. You can simply make edits to the file and when it's used by the system at the time when you try to boot the clone, your edits will all be baked in. 
  
-The main thing you are looking for is any reference to swap on the old disk. It will occur in some kind of variable name and you can simply remove the whole line, or edit the line to point to your new disk's swap slice. The name of the variable will be something like "SWAPDEVICE=/dev/rz0b"+The main thing you are looking for is any reference to swap on the old disk. It will occur in some kind of variable name and you can simply remove the whole line, or edit the line to point to your new disk's swap slice. The name of the variable will be something like "SWAPDEVICE=/dev/disk/dsk2b(or whatever your actual swap slice is)
  
-Both Tru64 and Digital Unix (but especially Tru64) have a hardware registry which will store the names of disk devices that are seen by the system. In most cases, once a disk is seen, it's name will not change even on the cloned disk (the registry will be copied over at the same time during the file copy steps).  +Both Tru64 and Digital Unix (but especially Tru64) have a hardware registry which will store the names of disk devices that are seen by the system. In most cases, once a disk is seen, it's name will not change even on the cloned disk (the registry will be copied over at the same time during the file copy steps). However, in the event your disk name changes, don't forget to change the swap device, too. The system can hang if you specify a bogus device. 
  
 ==== Fix the Sysconfigtab ==== ==== Fix the Sysconfigtab ====
Line 256: Line 449:
   - Make sure your copy method preserved all the permissions, especially on **/sbin** and the scripts in **/sbin/init.d** which are critical. Those scripts should be executable and owned by the __root__ or __bin__ users.    - Make sure your copy method preserved all the permissions, especially on **/sbin** and the scripts in **/sbin/init.d** which are critical. Those scripts should be executable and owned by the __root__ or __bin__ users. 
   - Do NOT try to eliminate one of the default AdvFS file domains (one for root and another for /usr). As mentioned earlier, the startup scripts reference both **root_domain** and **usr_domain** and if you change their names or eliminate one of them the startup scripts will fail.    - Do NOT try to eliminate one of the default AdvFS file domains (one for root and another for /usr). As mentioned earlier, the startup scripts reference both **root_domain** and **usr_domain** and if you change their names or eliminate one of them the startup scripts will fail. 
-  - Make sure your SRM variables for __boot_file__ and or __boot_flags__ may be incorrect and have old VMS data in there or some other garbage. Your boot file should be your kernel, which is usually __/vmunix__ or __/genvmunix__. Your boot flags should be **A** or **S** but not a number, if it's a number it came from VMS and it's wrong. People who re-use VMS systems for Tru64 will run into these problems often.  +  - Make sure your SRM variables for __boot_file__ and or __boot_flags__ may be incorrect and have old VMS data in there or some other garbage. Your boot file should be your kernel, which is usually __/vmunix__ or __/genvmunix__. Your boot flags should be **A** or **S** but not a number, if it's a number it came from VMS and it's wrong. People who re-use VMS systems for Tru64 will run into these problems often. Also check the OS_TYPE variable and make sure it's UNIX not VMS. 
 +  - Are you absolutely sure you prepared your disk with the properly **disklabel** command with the -t flag so that it got the correct boot loader? If you miss this you wasted a lot of time and will need to start over from that step!    
 + 
 +===== It Boots but It's Horked Up ===== 
 + 
 +  - Double check your swap is pointing to the right place and working (swapon -s) 
 +  - Make sure your filesystems are not showing up with weird or generic names. Double check your source and destination device and make sure that your old device name isn't still leftover in a config file somewhere. Most commonly it's the **/etc/fstab** or bad disk-symlinks in **/etc/fdmns**. Also it's also worth perusing **/etc/sysconfigtab** on 5.x for mistakes or old device names. 
 +  - Make sure if you use a new system type that any kernel tuning you do makes sense. Ie.. if you take parameters from a system that has 4GB of RAM and try to use them on a big GS1280 with 64GB of RAM then you are almost certainly going to have some bad tuning in there. Double check your __sysctl__ settings with **sysctl -a**. 
 +  - In general, you should make sure that the symlinks in /etc/fdmns sub-directories point to real disks that actually exist (check with disklabel -r dskX). Make sure file domains can show their file sets with the **showfsets** command to check for domain health. 
  
 If you have problems beyond the ones documented, then consider contacting PARSEC for some consulting work to help you! If you have problems beyond the ones documented, then consider contacting PARSEC for some consulting work to help you!
  
 +===== A Note About Alphaserver Compaq Smart Array Contollers =====
 +
 +Early models of the Compaq Smart Array controller are available for Alpha hardware and feature RAID1-10 levels in including distributed parity. They are configured via the "ORCA" utility from the SRM console. If you have a graphics console and a real PS/2 keyboard you can enter ORCA by catching the boot console message which will urge you to hit F8 if you want to enter the array configuration tool. If that doesn't work, you can find the RAID card's identifier by doing a **show dev** on the SRM console then you can manually load the RAID BIOS using the syntax **load bios dky0** (where you substitute dky0 for whatever your actual RAID card's ID is). 
 +
 +These array controllers have RAID-1 features. If you're using a system with such a card, you might be tempted to copy RAID volumes by breaking RAID-1 mirrors and re-synchronizing them to disks you intend as clones. This way you can import the RAID configuration on the clone-target system and everything is supposed to work, right? After all, it's a bit-for-bit clone! 
 +
 +Well, yes and no. First of all there are some drawbacks to this method. Here are some absolute show stoppers for example:
  
 +  - You cannot do this with anything other than RAID-1 or RAID-10 based arrays. RAID-5 is right out as you cannot split the array and maintain enough parity in two places at once. RAID-6 or RAID-DP is also unworkable due to this issue. Mirroring or bust.
 +  - Your disks must be as large or larger than the original disks you are cloning. If you do use mismatched disks, your resulting logical drive will just waste the extra space on the larger drive.
  
 +Other potentially troublesome issues include:
  
 +  - The copy time will run for the full capacity of the disk, not just the modified blocks on the array. Thus, if you already have some small vdump backup files, consider that you might be able to vrestore those faster than the RAID controller can mirror the entire drive.
 +  - If you use a larger disk as the sync-target, you'll just lose whatever capacity it has beyond the size of the disk you are cloning. Clone a 36G disk onto a 300G disk and it's still going to appear as a 36G disk because of the disklabel.
 +  - The target/cloned disk will **definitely** get a new disk ID if it's Tru64 version 5.0 or newer. This is because of it's awful AIX-like hardware dynamic abstraction crapola layer. Not that I'm salty about this not being an issue whatsoever in Digital Unix 4.0 or anything like that. 
 +  - Even uglier. Once you rebuild the mirror on the __source__ machine's RAID controller you might have a system that boots into single user mode and freaks out. This is again, due to the stupid changing disk name game due to hardware abstraction and auto-naming. You can either go through the hassle of deleting the linkage to the name with the dlmgr horror-show, or you can fix the symbolic links in the file domains found in /etc/fdmns, which I think is a bit of an easier way to go. Both work. This is only an issue in Tru64, but not Digital Unix or OSF/1.
how_to_clone_tru64_and_digital_unix.1561444208.txt.gz · Last modified: 2019/06/25 06:30 by sgriggs

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki