User Tools

Site Tools


troubleshooting_rmc_connections_between_your_hmc_and_your_ibm_power_server_fsp

Troubleshooting Flaky RMC Connections (HMC to POWER Chassis / System Controller)

Note: RMC (Resource Monitoring and Control) is part of RSCT. It primarily handles HMC ↔ LPAR communication (DLPAR, LPM, etc.). The underlying HMC-to-chassis/system controller (FSP/eBMC) link uses the dedicated service network (HMC1/HMC2 ports). Flaky RMC often manifests as unstable LPAR connections in lspartition -dlpar or broader management instability.

Common Causes

  • Network issues (latency, drops, MTU mismatch)
  • Firewall/port blocks (especially 657 TCP/UDP)
  • Duplicate RSCT node IDs (common after cloning)
  • RSCT configuration drift
  • Outdated firmware / HMC / RSCT levels
  • Service processor overload or cabling problems

Basic Diagnostic Checks

1. HMC Managed System Status

  • lssyscfg -r sys -F name,type_model,serial_num,state
  • Look for No Connection, Incomplete, Recovery, or Error.
  • lspartition -dlpar then Check that DCaps values are non-zero for active RMC.

2. Network Connectivity

  • Ping HMC ↔ LPAR IPs and HMC ↔ FSP IPs.
  • Test RMC port: From LPAR → telnet <HMC_IP> 657
  • Verify no NAT (unsupported for RMC).
  • Check MTU consistency (1500 vs jumbo frames) across HMC, LPARs, and service network.
  • Confirm consistent IPv4/IPv6 (link-local IPv6 often used).

3. Firewall / Ports

  • HMC GUI: HMC Management → Change Network Settings → Firewall Settings → Allow RMC.
  • Ensure bidirectional TCP/UDP 657 is open (HMC ↔ LPARs).
  • No firewalls on the service network to FSP.

4. RMC/RSCT Status on LPAR (as root or padmin)

This assumes the LPAR is running and you can interact with it somehow, someway (usually ssh or telnet). This also doesn't apply if you are simply having problems even getting the LPAR created.

lslpp -l rsct.core.rmc
lsrsrc IBM.MCP
lssrc -a | grep rsct
/usr/sbin/rsct/bin/ctsvhbac

Basic set of AIX tools to work with the FSP / RMC connection state:

/usr/sbin/rsct/bin/rmcctrl -z     # stop
/usr/sbin/rsct/bin/rmcctrl -A     # start
/usr/sbin/rsct/bin/rmcctrl -p     # probe

Wait 5–10 minutes and recheck lspartition -dlpar.

Troubleshooting & Fix Steps (Non-Disruptive First)

On HMC (hscroot / hmcsuperadmin)

  • diagrmc –autocorrect -v (HMC v8+ highly recommended)
  • lspartition -dlparreset (older HMC)
  • runsig -s 101 (reconfigure trigger)
  • NUCLEAR OPTION: Rebuild connection: chsysstate -r sys -o rebuild -m <CEC_name> (be careful with this, because if you do it to a system with VIOS you can easily lose some vSCSI or vSEA devices in the LPAR profiles. VIOS is very buggy (we are not fans of VIOS) and this is a very common complaint.

More Aggressive RSCT Reconfigure

  • On HMC or LPAR: /usr/sbin/rsct/install/bin/recfgct then rmcctrl -p
  • Caution: Avoid on active PowerHA / CAA clusters without precautions. They can spontaneously reboot if you pull the rug out from under their RMC connections.

Other Quick Fixes

  • Restart ctrmc: stopsrc -s ctrmc; startsrc -s ctrmc
  • Fix duplicate node IDs with ctsvhbal or full rebuild.
  • Synchronize clocks (HMC + LPARs). This can sometimes help if the hardware clocks are way off
  • Update HMC, FSP/microcode, and RSCT levels if you have HMC SWAMA / IBM Support. We do no support HMCs.

Advanced Diagnostics

  • HMC: pedbg -c -q 4
  • LPAR: ctsnap -x runrpttr and snap -gtkc (or snap on VIOS)
  • Logs: HMC lssvcevents, LPAR /var/log/drmgr, RSCT logs

Persistent issues: Open IBM case with HMC/FSP levels, network topology, and the above outputs. Common root causes include intermittent service network drops or firmware bugs.

Test DLPAR and LPM operations after fixes to confirm stability.

troubleshooting_rmc_connections_between_your_hmc_and_your_ibm_power_server_fsp.txt · Last modified: 2026/05/14 17:00 by sgriggs

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki