how_to_capture_a_vms_crash_dump

This is an old revision of the document!


How to Capture a VMS Crash Dump

or: What should be done to preserve the System Dump File after VMS reboots?

Sometimes a VMS problem manifests itself as a system crash. A VMS system crash is not an accident or a mistake – it is an intentional response by the operating system (actual program code written by the VMS Engineering Team) – to protect data on disk from any corruption, for example, due to a severe programming error or hardware fault.

A VMS system crash always stops/terminates all normal processing, writes out the complete contents of physical memory (or as much of it as possible) into the “crash dump file” (usually SYS$SYSTEM:SYSDUMP.DMP), and then halts the system, either stopping at the system console's P00»> prompt (awaiting your manual action to power-down, repair and/or reboot), or rebooting the system as configured by the console's settings.

After the next reboot, the contents of the crash dump file SYSDUMP.DMP will either “just sit there,” possibly to be overwritten by the next system crash (which could be minutes, or years, in the future), or – if the system's startup command files are properly configured – those contents will be copied into another, secondary or alternate crash dump file (this preserves each crash-instance in the case of a cascading series of crashes only minutes apart).

As a VMS sys-admin, you should know precisely how your VMS system(s) are configured to handle your SYSDUMP.DMP file upon reboot after a crash. PARSEC can assist you with information on determining your systems' config, both in this wiki and/or with an MEP white-paper (available through your PARSEC Account Representative).

Your Action Items

Whether you configure VMS for “normal” on-system-disk dumps, dumps into the Page File, DOSD, or other more exotic configurations (don’t!), there’s one last configuration step that you must do to ensure that, after every VMS crash, you preserve this hard-won, valuable crash dump information for subsequent analysis.

Remember that each system crash overwrites the previous contents of the System Dump File, so ― especially if your system is crashing repetitively for some hard-ware-related reason ― it’s essential that, upon reboot, your system's site-specific Startup Command File copies the contents of the Dump File to another, alternative file. This alternative file can be stored on any other disk (probably best if not the VMS system disk or the DOSD disk), and can be named anything you’d like.

For example, let’s assume that CLASS8’s DSA2: (shadow-set) disk has “plenty of free space”, and that we can copy several versions of our System Dump File to it before we have to worry about clean-ups or purges. Create a working directory on that drive:

$ CREATE /DIRECTORY /OWNER=PARENT /PROT=(S:RWE,O:RWE,G,W) DSA2:[LRICKER.CRASH_ANALYSIS]

Now this directory can become the catch-point for any subsequent crash-dump copy:

$ ANALYZE /CRASH_DUMP SYS$SYSTEM:SYSDUMP.DMP
SDA> copy /collect /log DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP

Copying: Headers...
Copying: PT space - 47 blocks...
Copying: S0/S1 space - 53922 blocks...
Copying: S2 space - 37568 blocks...
Copying: Page tables of key process "SWAPPER" - 3 blocks...
Copying: Memory of key process "SWAPPER" - 3 blocks...
Copying: Page tables of process "SYSINIT" - 8 blocks...
Copying: Memory of process "SYSINIT" - 1478 blocks...
Copying: Page tables of process "STACONFIG" - 8 blocks...
Copying: Memory of process "STACONFIG" - 1721 blocks...
%SDA-I-COLLECTING, collecting file and/or unwind data
Scanning: Process "SWAPPER" (PCB 8435CF48)...
Scanning: Process "SYSINIT" (PCB 85216740)...
Scanning: Process "STACONFIG" (PCB 85218D80)...
Scanning: Page and swap files...
%SDA-W-NOCOLLECT, no file and/or unwind data collected
Rewriting: Headers...

SDA> exit

With this in mind, there are several points to consider regarding VMS system startup crash-dump/SDA processing:

On Alpha and I64 systems: SDA is invoked by default during startup, and a CLUE list file is created as generated by a set sequence of commands; this CLUE list file contains only an overview of the crash and might not provide enough information to determine the cause of the crash.

Always copy the system dump file to its alternative destination directory/file.

  • Although you could use the DCL command COPY to copy the dump file, don’t. SDA’s internal COPY command is preferable because it copies only the blocks occupied by the dump and then marks the Dump File as copied.
  • The SDA COPY command is also preferable when the dump was written into the primary Page File, SYS$SYSTEM:PAGEFILE.SYS, because SDA COPY releases the dump page-blocks back to the pager after they’re copied.
  • Because a System Dump File can contain privileged and/or private information, always protect copies of dump files from world read access.
  • System Dump Files have the NOBACKUP attribute, so the BACKUP utility does not copy them unless you use the qualifier /IGNORE=NOBACKUP. When you use SDA COPY to copy the System Dump File to another file, the operating system does not automatically set the new file to NOBACKUP. If you want to set the NOBACKUP attribute on the copy, use SET FILE /NOBACKUP on the copied file(s).

The recommended method for SDA startup-the-system processing on Integrity and AlphaServer systems is:

  • Create a /SYSTEM /EXECUTIVE_MODE logical name CLUE$SITE_PROC in the SYS$STARTUP:SYLOGICALS.COM command file. This will name (refer to) your own site-specific “save-the-dump” file, here SAVEDUMP.SDA. This logical name must be (re)created each time your system reboots, so add this line to your SYLOGICALS.COM file:
$ DEFINE /SYSTEM /EXEC CLUE$SITE_PROC SYS$STARTUP:SAVEDUMP.SDA
  • Here’s an example of the contents of this site-specific command file SYS$STARTUP:SAVEDUMP.SDA ― Cut-&-paste these lines to create this file on your own system:
! SAVEDUMP.SDA --
! SDA command file, executed as part of the system reboot.
! Used to save the dump file after a system bugcheck, and
! to execute any additional SDA commands.
!
READ /EXEC  ! Read in the executive images' symbol tables
SHOW STACK  ! Display the stack
COPY /COLLECT DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP  ! Copy/save system dump file
!
  • Of course, you must replace DSA2:[…]DUMPFILE_COPY.DMP above with your own site-specific disk, directory and filename.
  • You should also include commands, perhaps in SYLOGICALS.COM, or even in SYSTARTUP_VMS.COM, to SET FILE /NOBACKUP and PURGE /KEEP=n for those copied Dump Files. For example:
$ SET FILE /NOBACKUP DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
$ PURGE /KEEP=3 DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
  • These commands will be executed well after VMS executes the SDA commands in SAVEDUMP.SDA (above) following any actual system crash, and it won’t matter much if these are re-executed on normal reboots as well.

On VAX/VMS systems: SDA crash-dump copy processing is (was) likely embedded in the SYS$STARTUP:SYSTARTUP_VMS.COM command procedure. If you look, you’ll likely see commands similar to this:

$ ANALYZE/CRASH_DUMP SYS$SYSTEM:SYSDUMP.SYS
COPY DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
EXIT
$ SET FILE /NOBACKUP DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP
$ PURGE /KEEP=2 DSA2:[LRICKER.CRASH_ANALYSIS]DUMPFILE_COPY.DMP

Of course, you’re system will have a different disk device, directory and filename for what’s shown as DSA2:[…]DUMPFILE_COPY.DMP above.

Note that, on VAX/VMS, these commands simply appear (are edited) in-line in the SYSSTARTUP_VMS.COM command procedure, and are executed when encountered.

how_to_capture_a_vms_crash_dump.1536608208.txt.gz · Last modified: 2018/09/10 19:36 (external edit)