Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
Lustre 2.4.3
-
None
-
kernel 2.6.32-358.23.2.el6
-
3
-
17618
Description
During SRP path failover kmmpd hangs which causes OSS require reboot.
Full detail in log file.
path failed at 14:46:50
Feb 24 14:46:50 nbp9-oss5 OpenSM[4146]: SM port is down Feb 24 14:46:50 nbp9-oss5 OpenSM[4146]: Entering DISCOVERING state Feb 24 14:47:02 nbp9-oss5 run_srp_daemon[95911]: failed srp_daemon: [HCA=mlx4_1] [port=1] [exit status=110]. Will try to restart srp_daemon periodically. No more warnings will be issued in the next 7200 seconds if the same problem repeats Feb 24 14:47:10 nbp9-oss5 run_srp_daemon[95917]: starting srp_daemon: [HCA=mlx4_1] [port=1] Feb 24 14:47:18 nbp9-oss5 kernel: scsi host12: ib_srp: failed receive status 5 Feb 24 14:47:18 nbp9-oss5 kernel: scsi host12: ib_srp: failed receive status 5 ..... Feb 24 14:49:07 nbp9-oss5 kernel: INFO: task kmmpd-dm-20:20927 blocked for more than 120 seconds. Feb 24 14:49:07 nbp9-oss5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 24 14:49:10 nbp9-oss5 kernel: kmmpd-dm-20 D 0000000000000000 0 20927 2 0x00000080 Feb 24 14:49:10 nbp9-oss5 kernel: ffff880aad5f7d20 0000000000000046 0000000000000000 ffffffffa001740c Feb 24 14:49:10 nbp9-oss5 kernel: ffff880301b415c0 0000000000000008 0000000000007030 000000000fd00014 Feb 24 14:49:10 nbp9-oss5 kernel: ffff880aad45faf8 ffff880aad5f7fd8 000000000000fc40 ffff880aad45faf8 Feb 24 14:49:10 nbp9-oss5 kernel: Call Trace: Feb 24 14:49:10 nbp9-oss5 kernel: [<ffffffffa001740c>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod] Feb 24 14:49:10 nbp9-oss5 kernel: [<ffffffff811b2c60>] ? sync_buffer+0x0/0x50 Feb 24 14:49:10 nbp9-oss5 kernel: [<ffffffff8153fe63>] io_schedule+0x73/0xc0 Feb 24 14:49:10 nbp9-oss5 kernel: [<ffffffff811b2ca0>] sync_buffer+0x40/0x50 Feb 24 14:49:10 nbp9-oss5 kernel: [<ffffffff8154081f>] __wait_on_bit+0x5f/0x90 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff811b2c60>] ? sync_buffer+0x0/0x50 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff815408c8>] out_of_line_wait_on_bit+0x78/0x90 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff81096350>] ? wake_bit_function+0x0/0x50 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff811b2c56>] __wait_on_buffer+0x26/0x30 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffffa0c9d40a>] write_mmp_block+0x5a/0x80 [ldiskfs] Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffffa0c9d955>] kmmpd+0x1a5/0x3b0 [ldiskfs] Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffffa0c9d7b0>] ? kmmpd+0x0/0x3b0 [ldiskfs] Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff81095fa6>] kthread+0x96/0xa0 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff81095f10>] ? kthread+0x0/0xa0 Feb 24 14:49:11 nbp9-oss5 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 ....