Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 1.8.6
-
None
-
lustre1.8.5 with ofed1.5.3 with kernel 2.6.18-238.12.1.el5. AT NASA AMES
-
4
-
4769
Description
We started getting the following error with the oss getting to a high load and filesystem becomeing unsable.
Dec 26 11:52:16 service102 kernel: Lustre: Service thread pid 10832 was inactive for 506.00s. The thread might be hung, or it
might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Dec 26 11:52:16 service102 kernel: Lustre: Skipped 1 previous similar message
Dec 26 11:52:16 service102 kernel: Pid: 10832, comm: ll_ost_323
Dec 26 11:52:16 service102 kernel:
Dec 26 11:52:16 service102 kernel: Call Trace:
Dec 26 11:52:16 service102 kernel: [<ffffffff887d0d1d>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs]
Dec 26 11:52:16 service102 kernel: [<ffffffff88b63d02>] start_this_handle+0x301/0x3cb [jbd2]
Dec 26 11:52:16 service102 kernel: [<ffffffff800a2f36>] autoremove_wake_function+0x0/0x2e
Dec 26 11:52:18 service102 kernel: [<ffffffff88b63e77>] jbd2_journal_start+0xab/0xdf [jbd2]
Dec 26 11:52:18 service102 kernel: [<ffffffff88ba4c25>] ldiskfs_journal_start_sb+0x55/0xa0 [ldiskfs]
Dec 26 11:52:18 service102 kernel: [<ffffffff88c16a72>] fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
Dec 26 11:52:18 service102 kernel: [<ffffffff8002cc0e>] mntput_no_expire+0x19/0x88
Dec 26 11:52:18 service102 kernel: [<ffffffff887fca00>] push_ctxt+0x370/0x380 [lvfs]
Dec 26 11:52:18 service102 kernel: [<ffffffff88c31a08>] filter_client_add+0x508/0xc30 [obdfilter]
Dec 26 11:52:18 service102 kernel: [<ffffffff88c30de7>] filter_export_stats_init+0x117/0x650 [obdfilter]
Dec 26 11:52:18 service102 kernel: [<ffffffff88c32665>] filter_connect+0x535/0x8c0 [obdfilter]
Dec 26 11:52:18 service102 kernel: [<ffffffff88936107>] lustre_msg_add_op_flags+0x47/0x120 [ptlrpc]
Dec 26 11:52:18 service102 kernel: [<ffffffff88bf6500>] ost_handle+0x0/0x55b0 [ost]
Dec 26 11:52:18 service102 kernel: [<ffffffff88900976>] target_handle_connect+0x21c6/0x2e80 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff8892ca48>] ptlrpc_send_reply+0x5e8/0x600 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff88930f75>] lustre_msg_get_version+0x35/0xf0 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff88931038>] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff88bf6daf>] ost_handle+0x8af/0x55b0 [ost]
Dec 26 11:52:19 service102 kernel: [<ffffffff889405e9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff88940d45>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
Dec 26 11:52:19 service102 kernel: [<ffffffff8008ca4e>] __wake_up_common+0x3e/0x68
Dec 26 11:52:19 service102 kernel: [<ffffffff88941cd6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
Dec 26 11:52:20 service102 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Dec 26 11:52:20 service102 kernel: [<ffffffff88940d70>] ptlrpc_main+0x0/0x1120 [ptlrpc]
Dec 26 11:52:20 service102 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Dec 26 11:52:20 service102 kernel:
Dec 26 11:52:20 service102 kernel: Pid: 11020, comm: ll_ost_511
Dec 26 11:52:20 service102 kernel:
Dec 26 11:52:20 service102 kernel: Call Trace:
Dec 26 11:52:20 service102 kernel: [<ffffffff887d0d1d>] libcfs_debug_vmsg2+0x70d/0x970 [libcfs]
Dec 26 11:52:20 service102 kernel: [<ffffffff801632b8>] list_add+0xc/0xe
Dec 26 11:52:20 service102 kernel: [<ffffffff88b63d02>] start_this_handle+0x301/0x3cb [jbd2]
Dec 26 11:52:20 service102 kernel: [<ffffffff800a2f36>] autoremove_wake_function+0x0/0x2e
Dec 26 11:52:20 service102 kernel: [<ffffffff88b63e77>] jbd2_journal_start+0xab/0xdf [jbd2]
Dec 26 11:52:20 service102 kernel: [<ffffffff88ba4c25>] ldiskfs_journal_start_sb+0x55/0xa0 [ldiskfs]
Dec 26 11:52:20 service102 kernel: [<ffffffff88c16a72>] fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
Dec 26 11:52:20 service102 kernel: [<ffffffff8002cc0e>] mntput_no_expire+0x19/0x88
Dec 26 11:52:20 service102 kernel: [<ffffffff887fca00>] push_ctxt+0x370/0x380 [lvfs]
Dec 26 11:52:20 service102 kernel: [<ffffffff88c31a08>] filter_client_add+0x508/0xc30 [obdfilter]
Dec 26 11:52:20 service102 kernel: [<ffffffff88c30de7>] filter_export_stats_init+0x117/0x650 [obdfilter]
Dec 26 11:52:21 service102 kernel: [<ffffffff88c32665>] filter_connect+0x535/0x8c0 [obdfilter]
Dec 26 11:52:21 service102 kernel: [<ffffffff88936107>] lustre_msg_add_op_flags+0x47/0x120 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff88bf6500>] ost_handle+0x0/0x55b0 [ost]
Dec 26 11:52:21 service102 kernel: [<ffffffff88900976>] target_handle_connect+0x21c6/0x2e80 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff8892ca48>] ptlrpc_send_reply+0x5e8/0x600 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff88930f75>] lustre_msg_get_version+0x35/0xf0 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff88931038>] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff88bf6daf>] ost_handle+0x8af/0x55b0 [ost]
Dec 26 11:52:21 service102 kernel: [<ffffffff889405e9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff88940d45>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff8008ca4e>] __wake_up_common+0x3e/0x68
Dec 26 11:52:21 service102 kernel: [<ffffffff88941cd6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Dec 26 11:52:21 service102 kernel: [<ffffffff88940d70>] ptlrpc_main+0x0/0x1120 [ptlrpc]
Dec 26 11:52:21 service102 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Dec 26 11:52:21 service102 kernel:
Dec 26 11:52:21 service102 kernel: Pid: 10674, comm: ll_ost_165
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
-
Changelog 1.8 Changes from version 1.8.7wc1 to version 1.8.8wc1 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.13.1.el6 (RHEL6) Recommended e2fsprogs version: 1.41.90....
-
Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....
-
Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....