Sometimes a VMS problem manifests itself as a system crash. A VMS system crash is not an accident or a mistake – it is an intentional response by the operating system (actual program code written by the VMS Engineering Team) – to protect data on disk from any corruption, for example, due to a severe programming error or hardware fault.
A VMS system crash always stops/terminates all normal processing, writes out the complete contents of physical memory (or as much of it as possible) into the “crash dump file” (usually SYS$SYSTEM:SYSDUMP.DMP), and then halts the system, either stopping at the system console's P00»> prompt (awaiting your manual action to power-down, repair and/or reboot), or rebooting the system as configured by the console's settings.
After the next reboot, the contents of the crash dump file SYSDUMP.DMP will either “just sit there,” possibly to be overwritten by the next system crash (which could be minutes, or years, in the future), or – if the system's startup command files are properly configured – those contents will be copied into another, secondary or alternate crash dump file (this preserves each crash-instance in the case of a cascading series of crashes only minutes apart).
As a VMS sys-admin, you should know precisely how your VMS system(s) are configured to handle your SYSDUMP.DMP file upon reboot after a crash. PARSEC can assist you with information on determining your systems' config, both in this wiki and/or with an MEP white-paper (available through your PARSEC Account Representative).
Whether you configure VMS for “normal” on-system-disk dumps, dumps into the Page File, DOSD, or other more exotic configurations (don’t!), there’s one last configuration step that you must do to ensure that, after every VMS crash, you preserve this hard-won, valuable crash dump information for subsequent analysis.
Remember that each system crash overwrites the previous contents of the System Dump File, so ― especially if your system is crashing repetitively for some hard-ware-related reason ― it’s essential that, upon reboot, your system's site-specific Startup Command File copies the contents of the Dump File to another, alternative file. This alternative file can be stored on any other disk (probably best if not the VMS system disk or the DOSD disk), and can be named anything you’d like.
For example, let’s assume that CLASS8’s DSA2: (shadow-set) disk has “plenty of free space”, and that we can copy several versions of our System Dump File to it before we have to worry about clean-ups or purges. Create a working directory on that drive:
$ CREATE /DIRECTORY /OWNER=PARENT /PROT=(S:RWE,O:RWE,G,W) DSA2:[LRICKER.CRASH_ANALYSIS]
Now this directory can become the catch-point for any subsequent crash-dump copy:
$ ANALYZE /CRASH_DUMP SYS$SYSTEM:SYSDUMP.DMP
SDA> copy /collect /log DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
Copying: Headers... Copying: PT space - 47 blocks... Copying: S0/S1 space - 53922 blocks... Copying: S2 space - 37568 blocks... Copying: Page tables of key process "SWAPPER" - 3 blocks... Copying: Memory of key process "SWAPPER" - 3 blocks... Copying: Page tables of process "SYSINIT" - 8 blocks... Copying: Memory of process "SYSINIT" - 1478 blocks... Copying: Page tables of process "STACONFIG" - 8 blocks... Copying: Memory of process "STACONFIG" - 1721 blocks... %SDA-I-COLLECTING, collecting file and/or unwind data Scanning: Process "SWAPPER" (PCB 8435CF48)... Scanning: Process "SYSINIT" (PCB 85216740)... Scanning: Process "STACONFIG" (PCB 85218D80)... Scanning: Page and swap files... %SDA-W-NOCOLLECT, no file and/or unwind data collected Rewriting: Headers...
SDA> exit
With this in mind, there are several points to consider regarding VMS system startup crash-dump/SDA processing:
On Alpha and I64 systems: SDA is invoked by default during startup, and a CLUE list file is created as generated by a set sequence of commands; this CLUE list file contains only an overview of the crash and might not provide enough information to determine the cause of the crash.
Always copy the system dump file to its alternative destination directory/file.
The recommended method for SDA startup-the-system processing on Integrity and AlphaServer systems is:
$ DEFINE /SYSTEM /EXEC CLUE$SITE_PROC SYS$STARTUP:SAVEDUMP.SDA
! SAVEDUMP.SDA -- ! SDA command file, executed as part of the system reboot. ! Used to save the dump file after a system bugcheck, and ! to execute any additional SDA commands. ! READ /EXEC ! Read in the executive images' symbol tables SHOW STACK ! Display the stack COPY /COLLECT DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP ! Copy/save system dump file !
SET FILE /NOBACKUP
and PURGE /KEEP=n
for those copied Dump Files. For example: $ SET FILE /NOBACKUP DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP $ PURGE /KEEP=3 DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
On VAX/VMS systems: SDA crash-dump copy processing is (was) likely embedded in the SYS$STARTUP:SYSTARTUP_VMS.COM command procedure. If you look, you’ll likely see commands similar to this:
$ ANALYZE/CRASH_DUMP SYS$SYSTEM:SYSDUMP.SYS COPY DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP EXIT $ SET FILE /NOBACKUP DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP $ PURGE /KEEP=2 DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
Of course, you’re system will have a different disk device, directory and filename for what’s shown as DSA2:[…]DUMPFILE_COPY.DMP above.
Note that, on VAX/VMS, these commands simply appear (are edited) in-line in the SYSSTARTUP_VMS.COM command procedure, and are executed when encountered.