sprig-mds1: =========== Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 13:41:34 sprig-mds1 kernel: : Lustre: MGS: haven't heard from client 2e7c747b-56d2-fb23-8699-d02e25df780b (at 11.3.0.47@o2ib) in 228 seconds. I think it's dead, and I am evicting it. exp ffff881061955800, cur 1395754894 expire 1395754744 last 1395754666 Mar 25 13:41:34 sprig-mds1 kernel: : Lustre: Skipped 5 previous similar messages Mar 25 13:41:34 sprig-mds1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754894.10499 Mar 25 13:41:54 sprig-mds1 kernel: : Lustre: scratch-MDT0001: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8807fba40000, cur 1395754914 expire 1395754764 last 1395754682 Mar 25 13:41:54 sprig-mds1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754914.10499 Mar 25 13:45:27 sprig-mds1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 sprig-mds2: =========== Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 13:41:51 sprig-mds2 kernel: : Lustre: scratch-MDT0000: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 230 seconds. I think it's dead, and I am evicting it. exp ffff88084cf52400, cur 1395754911 expire 1395754761 last 1395754681 Mar 25 13:41:51 sprig-mds2 kernel: : Lustre: Skipped 4 previous similar messages Mar 25 13:41:51 sprig-mds2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.10856 Mar 25 13:45:27 sprig-mds2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 sprig-oss1: =========== Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 13:41:51 sprig-oss1 kernel: : Lustre: scratch-OST0000: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 229 seconds. I think it's dead, and I am evicting it. exp ffff880177d7d000, cur 1395754911 expire 1395754761 last 1395754682 Mar 25 13:41:51 sprig-oss1 kernel: : Lustre: Skipped 2 previous similar messages Mar 25 13:41:51 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.12803 Mar 25 13:41:52 sprig-oss1 kernel: : Lustre: scratch-OST0004: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 230 seconds. I think it's dead, and I am evicting it. exp ffff88078d76dc00, cur 1395754912 expire 1395754762 last 1395754682 Mar 25 13:41:52 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754912.12803 Mar 25 13:41:52 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754912.12803 Mar 25 13:41:52 sprig-oss1 kernel: : LustreError: can't open /tmp/lustre-log.1395754912.12803 for dump: rc -17 Mar 25 13:41:54 sprig-oss1 kernel: : Lustre: scratch-OST0005: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8807e43be800, cur 1395754914 expire 1395754764 last 1395754682 Mar 25 13:41:54 sprig-oss1 kernel: : Lustre: Skipped 1 previous similar message Mar 25 13:41:54 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754914.12803 Mar 25 13:41:54 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754914.12803 Mar 25 13:41:54 sprig-oss1 kernel: : LustreError: can't open /tmp/lustre-log.1395754914.12803 for dump: rc -17 Mar 25 13:41:54 sprig-oss1 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754914.12803 Mar 25 13:41:54 sprig-oss1 kernel: : LustreError: can't open /tmp/lustre-log.1395754914.12803 for dump: rc -17 Mar 25 13:45:27 sprig-oss1 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 sprig-oss2: =========== Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 08:57:25 sprig-oss2 kernel: : ib_send_cm_mra: cm_id_priv->id.state: 0x6 Mar 25 13:41:49 sprig-oss2 kernel: : Lustre: scratch-OST0006: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88014a79dc00, cur 1395754909 expire 1395754759 last 1395754682 Mar 25 13:41:49 sprig-oss2 kernel: : Lustre: Skipped 2 previous similar messages Mar 25 13:41:49 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754909.15735 Mar 25 13:41:51 sprig-oss2 kernel: : Lustre: scratch-OST000a: haven't heard from client a2a97724-fc34-d985-e97c-8a728c185d31 (at 11.3.0.47@o2ib) in 229 seconds. I think it's dead, and I am evicting it. exp ffff8808468f5c00, cur 1395754911 expire 1395754761 last 1395754682 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.15735 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.15735 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.15735 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: can't open /tmp/lustre-log.1395754911.15735 for dump: rc -17 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: can't open /tmp/lustre-log.1395754911.15735 for dump: rc -17 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.15735 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: can't open /tmp/lustre-log.1395754911.15735 for dump: rc -17 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: dumping log to /tmp/lustre-log.1395754911.15735 Mar 25 13:41:51 sprig-oss2 kernel: : LustreError: can't open /tmp/lustre-log.1395754666 sprig3: ======= Mar 25 12:30:05 sprig3 syslog-ng[2130]: Configuration reload request received, reloading configuration; Mar 25 12:30:05 sprig3 syslog-ng[2130]: New configuration initialized; Mar 25 12:54:38 sprig3 intelremotemond[47904]: Main: daemon started Mar 25 12:54:38 sprig3 intelremotemond[47904]: config file is /scratch/awe/opt/intel/ism/bin/intel64/../intelremotemond.conf Mar 25 12:54:38 sprig3 intelremotemond[47904]: timeout value is 600 Mar 25 12:54:38 sprig3 intelremotemond[47904]: flushtime value is 10 Mar 25 12:54:38 sprig3 intelremotemond[47904]: limit value is 1000 Mar 25 12:54:38 sprig3 intelremotemond[47904]: stop value is 100000 Mar 25 12:54:38 sprig3 intelremotemond[47904]: Consumer: total events at start is 0 Mar 25 13:07:17 sprig3 intelremotemond[47904]: Consumer: timeout, exiting... Mar 25 13:07:17 sprig3 intelremotemond[48932]: SendDataToBackend: path to agent is /opt/intel/ism/bin/intel64/intelremotemonagent Mar 25 13:07:17 sprig3 intelremotemond[47904]: Main: try to delete semaphore, status 0 Mar 25 13:07:17 sprig3 intelremotemond[48932]: SendDataToBackend: start agent Mar 25 13:07:17 sprig3 intelremotemond[47904]: Main: daemon stopped, force is 0 Mar 25 13:30:05 sprig3 syslog-ng[2130]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=62', processed='center(received)=58', proces sed='destination(messages)=58', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(newserr)=0', proce ssed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=4', processed='destination(console)=0', processed='destination(null)=0', processed='destinatio n(mail)=0', processed='destination(xconsole)=0', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotic e)=0', processed='source(src)=58' Mar 25 13:31:02 sprig3 intelremotemond[49468]: Main: daemon started Mar 25 13:31:02 sprig3 intelremotemond[49468]: config file is /scratch/awe/opt/intel/ism/bin/intel64/../intelremotemond.conf Mar 25 13:31:02 sprig3 intelremotemond[49468]: timeout value is 600 Mar 25 13:31:02 sprig3 intelremotemond[49468]: flushtime value is 10 Mar 25 13:31:02 sprig3 intelremotemond[49468]: limit value is 1000 Mar 25 13:31:02 sprig3 intelremotemond[49468]: stop value is 100000 Mar 25 13:31:03 sprig3 intelremotemond[49468]: Consumer: total events at start is 1 ### After this point the node suffered a kernel panic ### Console log for sprig3: [90709.945927] LustreError: 50074:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed: [90709.983659] LustreError: 50074:0:(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) LBUG [90710.011706] Kernel panic - not syncing: LBUG [90710.025734] Pid: 50074, comm: Attach API tear Tainted: GF NX 3.0.93-0.8-default #1