Troubleshooting Flaky RMC Connections (HMC to POWER Chassis / System Controller)

Note: RMC (Resource Monitoring and Control) is part of RSCT. It primarily handles HMC ↔ LPAR communication (DLPAR, LPM, etc.). The underlying HMC-to-chassis/system controller (FSP/eBMC) link uses the dedicated service network (HMC1/HMC2 ports). Flaky RMC often manifests as unstable LPAR connections in lspartition -dlpar or broader management instability.

Common Causes

Network issues (latency, drops, MTU mismatch)
Firewall/port blocks (especially 657 TCP/UDP)
Duplicate RSCT node IDs (common after cloning)
RSCT configuration drift
Outdated firmware / HMC / RSCT levels
Service processor overload or cabling problems

Basic Diagnostic Checks

1. HMC Managed System Status

lssyscfg -r sys -F name,type_model,serial_num,state
Look for No Connection, Incomplete, Recovery, or Error.
lspartition -dlpar then Check that DCaps values are non-zero for active RMC.

2. Network Connectivity

Ping HMC ↔ LPAR IPs and HMC ↔ FSP IPs.
Test RMC port: From LPAR → telnet <HMC_IP> 657
Verify no NAT (unsupported for RMC).
Check MTU consistency (1500 vs jumbo frames) across HMC, LPARs, and service network.
Confirm consistent IPv4/IPv6 (link-local IPv6 often used).

3. Firewall / Ports

HMC GUI: HMC Management → Change Network Settings → Firewall Settings → Allow RMC.
Ensure bidirectional TCP/UDP 657 is open (HMC ↔ LPARs).
No firewalls on the service network to FSP.

4. RMC/RSCT Status on LPAR (as root or padmin)

This assumes the LPAR is running and you can interact with it somehow, someway (usually ssh or telnet). This also doesn't apply if you are simply having problems even getting the LPAR created.

lslpp -l rsct.core.rmc
lsrsrc IBM.MCP
lssrc -a | grep rsct
/usr/sbin/rsct/bin/ctsvhbac

Basic set of AIX tools to work with the FSP / RMC connection state:

/usr/sbin/rsct/bin/rmcctrl -z     # stop
/usr/sbin/rsct/bin/rmcctrl -A     # start
/usr/sbin/rsct/bin/rmcctrl -p     # probe

Wait 5–10 minutes and recheck lspartition -dlpar.

Troubleshooting & Fix Steps (Non-Disruptive First)

On HMC (hscroot / hmcsuperadmin)

diagrmc –autocorrect -v (HMC v8+ highly recommended)
lspartition -dlparreset (older HMC)
runsig -s 101 (reconfigure trigger)
NUCLEAR OPTION: Rebuild connection: chsysstate -r sys -o rebuild -m <CEC_name> (be careful with this, because if you do it to a system with VIOS you can easily lose some vSCSI or vSEA devices in the LPAR profiles. VIOS is very buggy (we are not fans of VIOS) and this is a very common complaint.

More Aggressive RSCT Reconfigure

On HMC or LPAR: /usr/sbin/rsct/install/bin/recfgct then rmcctrl -p
Caution: Avoid on active PowerHA / CAA clusters without precautions. They can spontaneously reboot if you pull the rug out from under their RMC connections.

Other Quick Fixes

Restart ctrmc: stopsrc -s ctrmc; startsrc -s ctrmc
Fix duplicate node IDs with ctsvhbal or full rebuild.
Synchronize clocks (HMC + LPARs). This can sometimes help if the hardware clocks are way off
Update HMC, FSP/microcode, and RSCT levels if you have HMC SWAMA / IBM Support. We do no support HMCs.

Advanced Diagnostics

HMC: pedbg -c -q 4
LPAR: ctsnap -x runrpttr and snap -gtkc (or snap on VIOS)
Logs: HMC lssvcevents, LPAR /var/log/drmgr, RSCT logs

Persistent issues: Open IBM case with HMC/FSP levels, network topology, and the above outputs. Common root causes include intermittent service network drops or firmware bugs.

Test DLPAR and LPM operations after fixes to confirm stability.

PARSEC Technical Information

Table of Contents