Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 1.8.7
-
None
-
RHEL5.7
-
3
-
6370
Description
At the our customer site, some of ll_ost_io_xx threads hang and OST's status get into "DOWN" with "lctl dl".
We saw the following Call traces on these OSSs during the time.
Jun 28 16:22:18 nos011i kernel: Pid: 32708, comm: ll_ost_io_188 Jun 28 16:22:18 nos011i kernel: Jun 28 16:22:18 nos011i kernel: Call Trace: Jun 28 16:22:18 nos011i kernel: [<ffffffff888a914d>] ldlm_cli_enqueue_local+0x4fd/0x520 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff800645e3>] __down_write_nested+0x7a/0x92 Jun 28 16:22:18 nos011i kernel: [<ffffffff88bd7b29>] filter_destroy+0x969/0x1f90 [obdfilter] Jun 28 16:22:18 nos011i kernel: [<ffffffff8876ecfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs] Jun 28 16:22:18 nos011i kernel: [<ffffffff888d2cd2>] lustre_pack_reply_flags+0x8e2/0x950 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff888d2d69>] lustre_pack_reply+0x29/0xb0 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff88b89070>] ost_destroy+0x660/0x790 [ost] Jun 28 16:22:18 nos011i kernel: [<ffffffff888ceef5>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff88b92a46>] ost_handle+0x1556/0x55b0 [ost] Jun 28 16:22:18 nos011i kernel: [<ffffffff888de6d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68 Jun 28 16:22:18 nos011i kernel: [<ffffffff888dfdc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jun 28 16:22:18 nos011i kernel: [<ffffffff888dee60>] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 28 16:22:18 nos011i kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jun 28 16:22:18 nos011i kernel: Jun 28 16:22:18 nos011i kernel: Lustre: Service thread pid 8870 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 28 16:22:18 nos011i kernel: LustreError: dumping log to /tmp/lustre-log.1340868138.32708 Jun 28 16:22:18 nos011i kernel: Pid: 8870, comm: ll_ost_io_37 Jun 28 16:22:20 nos011i kernel: Jun 28 16:22:20 nos011i kernel: Call Trace: Jun 28 16:22:21 nos011i kernel: [<ffffffff800645e3>] __down_write_nested+0x7a/0x92 Jun 28 16:22:21 nos011i kernel: [<ffffffff88bd7b29>] filter_destroy+0x969/0x1f90 [obdfilter] Jun 28 16:22:21 nos011i kernel: [<ffffffff8876ecfd>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs] Jun 28 16:22:21 nos011i kernel: [<ffffffff888d2cd2>] lustre_pack_reply_flags+0x8e2/0x950 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff888d2d69>] lustre_pack_reply+0x29/0xb0 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff88b89070>] ost_destroy+0x660/0x790 [ost] Jun 28 16:22:21 nos011i kernel: [<ffffffff888ceef5>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff88b92a46>] ost_handle+0x1556/0x55b0 [ost] Jun 28 16:22:21 nos011i kernel: [<ffffffff888de6d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff8008cc1e>] __wake_up_common+0x3e/0x68 Jun 28 16:22:21 nos011i kernel: [<ffffffff888dfdc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jun 28 16:22:21 nos011i kernel: [<ffffffff888dee60>] ptlrpc_main+0x0/0x1120 [ptlrpc] Jun 28 16:22:21 nos011i kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jun 28 16:22:21 nos011i kernel:
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA