Table of Contents
Troubleshooting Flaky RMC Connections (HMC to POWER Chassis / System Controller)
Note: RMC (Resource Monitoring and Control) is part of RSCT. It primarily handles HMC ↔ LPAR communication (DLPAR, LPM, etc.). The underlying HMC-to-chassis/system controller (FSP/eBMC) link uses the dedicated service network (HMC1/HMC2 ports). Flaky RMC often manifests as unstable LPAR connections in lspartition -dlpar or broader management instability.
Common Causes
- Network issues (latency, drops, MTU mismatch)
- Firewall/port blocks (especially 657 TCP/UDP)
- Duplicate RSCT node IDs (common after cloning)
- RSCT configuration drift
- Outdated firmware / HMC / RSCT levels
- Service processor overload or cabling problems
Basic Diagnostic Checks
1. HMC Managed System Status
lssyscfg -r sys -F name,type_model,serial_num,state- Look for No Connection, Incomplete, Recovery, or Error.
lspartition -dlparthen Check that DCaps values are non-zero for active RMC.
2. Network Connectivity
- Ping HMC ↔ LPAR IPs and HMC ↔ FSP IPs.
- Test RMC port: From LPAR →
telnet <HMC_IP> 657 - Verify no NAT (unsupported for RMC).
- Check MTU consistency (1500 vs jumbo frames) across HMC, LPARs, and service network.
- Confirm consistent IPv4/IPv6 (link-local IPv6 often used).
3. Firewall / Ports
- HMC GUI: HMC Management → Change Network Settings → Firewall Settings → Allow RMC.
- Ensure bidirectional TCP/UDP 657 is open (HMC ↔ LPARs).
- No firewalls on the service network to FSP.
4. RMC/RSCT Status on LPAR (as root or padmin)
This assumes the LPAR is running and you can interact with it somehow, someway (usually ssh or telnet). This also doesn't apply if you are simply having problems even getting the LPAR created.
lslpp -l rsct.core.rmc lsrsrc IBM.MCP lssrc -a | grep rsct /usr/sbin/rsct/bin/ctsvhbac
Basic set of AIX tools to work with the FSP / RMC connection state:
/usr/sbin/rsct/bin/rmcctrl -z # stop /usr/sbin/rsct/bin/rmcctrl -A # start /usr/sbin/rsct/bin/rmcctrl -p # probe
Wait 5–10 minutes and recheck lspartition -dlpar.
Troubleshooting & Fix Steps (Non-Disruptive First)
On HMC (hscroot / hmcsuperadmin)
diagrmc –autocorrect -v(HMC v8+ highly recommended)lspartition -dlparreset(older HMC)runsig -s 101(reconfigure trigger)- NUCLEAR OPTION: Rebuild connection:
chsysstate -r sys -o rebuild -m <CEC_name>(be careful with this, because if you do it to a system with VIOS you can easily lose some vSCSI or vSEA devices in the LPAR profiles. VIOS is very buggy (we are not fans of VIOS) and this is a very common complaint.
More Aggressive RSCT Reconfigure
- On HMC or LPAR:
/usr/sbin/rsct/install/bin/recfgctthenrmcctrl -p - Caution: Avoid on active PowerHA / CAA clusters without precautions. They can spontaneously reboot if you pull the rug out from under their RMC connections.
Other Quick Fixes
- Restart ctrmc:
stopsrc -s ctrmc; startsrc -s ctrmc - Fix duplicate node IDs with
ctsvhbalor full rebuild. - Synchronize clocks (HMC + LPARs). This can sometimes help if the hardware clocks are way off
- Update HMC, FSP/microcode, and RSCT levels if you have HMC SWAMA / IBM Support. We do no support HMCs.
Advanced Diagnostics
- HMC:
pedbg -c -q 4 - LPAR:
ctsnap -x runrpttrandsnap -gtkc(orsnapon VIOS) - Logs: HMC
lssvcevents, LPAR/var/log/drmgr, RSCT logs
Persistent issues: Open IBM case with HMC/FSP levels, network topology, and the above outputs. Common root causes include intermittent service network drops or firmware bugs.
Test DLPAR and LPM operations after fixes to confirm stability.