[LU-952] Hung thread with HIGH OSS load Created: 26/Dec/11 Updated: 09/May/12 Resolved: 26/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.2, Lustre 1.8.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre1.8.5 with ofed1.5.3 with kernel 2.6.18-238.12.1.el5. AT NASA AMES |
||
| Attachments: |
|
| Severity: | 4 |
| Epic: | server |
| Rank (Obsolete): | 4769 |
| Description |
|
We started getting the following error with the oss getting to a high load and filesystem becomeing unsable. Dec 26 11:52:16 service102 kernel: Lustre: Service thread pid 10832 was inactive for 506.00s. The thread might be hung, or it |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 26/Dec/11 ] |
|
This must be another case of slow IO. We're tracking this issue on |
| Comment by Mahmoud Hanafi [ 27/Dec/11 ] |
|
Reading LU874 and LU410 not sure if this is similar error condition. |
| Comment by Peter Jones [ 27/Dec/11 ] |
|
Niu Could you please check whether this is the same issue as Jinshan suspected and advise about the steps to take to restore the filesystem? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 27/Dec/11 ] |
|
Hi, Mahmoud I saw lots of: Dec 26 12:27:36 service102 kernel: LustreError: 11933:0:(obd_config.c:1011:class_process_proc_param()) nbp6-OST0031: unknown param writethrough=0 in the service102 messages, not sure if the write through cache was really disabled. (it should be writethrough_cache_enable=0) And what's the service113? I saw lots of multipath errors in the service113 messages. Thanks. |
| Comment by Niu Yawei (Inactive) [ 28/Dec/11 ] |
|
ok, then the "unknown param writethrough=0" error messages might come from older mistaken settings, but you eventually disabled the writethrough cache correctly. I just want make sure that the "disable writethrough cache' was setted permanently by 'lctl conf_param OSTNAME.ost.writethrough_cache_enable=0', so the cache was disabled in recovery stage too. From the begining of the stack trace, I saw some io threads are wait for journal commit: Dec 26 11:43:03 service102 kernel: Pid: 9175, comm: ll_ost_io_41 Dec 26 11:43:03 service102 kernel: Dec 26 11:43:03 service102 kernel: Call Trace: Dec 26 11:43:03 service102 kernel: [<ffffffff88b6869a>] jbd2_log_wait_commit+0xa3/0xf5 [jbd2] Dec 26 11:43:03 service102 kernel: [<ffffffff800a2f36>] autoremove_wake_function+0x0/0x2e Dec 26 11:43:03 service102 kernel: [<ffffffff88c1394b>] fsfilt_ldiskfs_commit_wait+0xab/0xd0 [fsfilt_ldiskfs] Dec 26 11:43:03 service102 kernel: [<ffffffff88c54214>] filter_commitrw_write+0x1e04/0x2dc0 [obdfilter] Dec 26 11:43:03 service102 kernel: [<ffffffff8015ee31>] __next_cpu+0x19/0x28 Dec 26 11:43:03 service102 kernel: [<ffffffff88c4b105>] filter_commitrw+0x65/0x2c0 [obdfilter] Dec 26 11:43:03 service102 kernel: [<ffffffff88bf5d19>] ost_brw_write+0x1c99/0x2480 [ost] Dec 26 11:43:04 service102 kernel: [<ffffffff8892ca48>] ptlrpc_send_reply+0x5e8/0x600 [ptlrpc] Dec 26 11:43:04 service102 kernel: [<ffffffff88930f75>] lustre_msg_get_version+0x35/0xf0 [ptlrpc] Dec 26 11:43:04 service102 kernel: [<ffffffff8008eb34>] default_wake_function+0x0/0xf Dec 26 11:43:04 service102 kernel: [<ffffffff88bf90ae>] ost_handle+0x2bae/0x55b0 [ost] Dec 26 11:43:04 service102 kernel: [<ffffffff889405e9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Dec 26 11:43:04 service102 kernel: [<ffffffff88940d45>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Dec 26 11:43:04 service102 kernel: [<ffffffff8008eb34>] default_wake_function+0x0/0xf Dec 26 11:43:04 service102 kernel: [<ffffffff88941cd6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Dec 26 11:43:04 service102 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Dec 26 11:43:05 service102 kernel: [<ffffffff88940d70>] ptlrpc_main+0x0/0x1120 [ptlrpc] Dec 26 11:43:05 service102 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 I don't see why the commit wasn't finished yet.
how long will the path failover take (switch from the failed path to the standby one)? is it possible that you switch the path too often and that makes the io can't finished for a very long time? |
| Comment by Mahmoud Hanafi [ 03/Jan/12 ] |
|
The multipath failed paths were after the issue add occurred. We had a second incident today. We have taken system dumps. Where can we upload them. |
| Comment by Niu Yawei (Inactive) [ 03/Jan/12 ] |
|
you can upload them to ftp.whamcloud.com/uploads/ , thanks. |
| Comment by Mahmoud Hanafi [ 04/Jan/12 ] |
|
Due to export controls it turns out I can upload the crash files. But we are able to provide you info that you may need from them. If you would to see some thing specific please let me know. Here is a bt from one of the hung threads And I am uploading the /var/log/messages |
| Comment by Niu Yawei (Inactive) [ 04/Jan/12 ] |
|
Thank you, Mahmoud. The attached message is quite helpful, I found a deadlock in it: Jan 3 14:34:03 service103 kernel: Pid: 28262, comm: ll_ost_io_432 Jan 3 14:34:03 service103 kernel: Jan 3 14:34:03 service103 kernel: Call Trace: Jan 3 14:34:03 service103 kernel: [<ffffffff88b63d02>] start_this_handle+0x301/0x3cb [jbd2] Jan 3 14:34:03 service103 kernel: [<ffffffff800a2f36>] autoremove_wake_function+0x0/0x2e Jan 3 14:34:03 service103 kernel: [<ffffffff88b63e77>] jbd2_journal_start+0xab/0xdf [jbd2] Jan 3 14:34:03 service103 kernel: [<ffffffff88ba4c25>] ldiskfs_journal_start_sb+0x55/0xa0 [ldiskfs] Jan 3 14:34:03 service103 kernel: [<ffffffff88ba8ba4>] ldiskfs_acquire_dquot+0x64/0xb0 [ldiskfs] Jan 3 14:34:04 service103 kernel: [<ffffffff80104a42>] dqget+0x286/0x2b6 Jan 3 14:34:04 service103 kernel: [<ffffffff80104e40>] dquot_initialize+0x7b/0xac Jan 3 14:34:04 service103 kernel: [<ffffffff88c3cb7d>] filter_destroy+0x99d/0x1fb0 [obdfilter] Jan 3 14:34:04 service103 kernel: [<ffffffff88908910>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff8890bff0>] ldlm_completion_ast+0x0/0x880 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff88869403>] lh_read_lock+0x13/0x20 [obdclass] Jan 3 14:34:04 service103 kernel: [<ffffffff88931ed4>] lustre_msg_add_version+0x34/0x110 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff88934b9a>] lustre_pack_reply_flags+0x86a/0x950 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff887c9309>] cfs_mem_cache_free+0x9/0x10 [libcfs] Jan 3 14:34:04 service103 kernel: [<ffffffff888f33eb>] ldlm_resource_putref_internal+0x3ab/0x460 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff888ee8d2>] ldlm_lock_put+0x372/0x3d0 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff88bee080>] ost_destroy+0x660/0x790 [ost] Jan 3 14:34:04 service103 kernel: [<ffffffff88930e85>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc] Jan 3 14:34:04 service103 kernel: [<ffffffff88bf7a56>] ost_handle+0x1556/0x55b0 [ost] Jan 3 14:34:05 service103 kernel: [<ffffffff889405e9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jan 3 14:34:05 service103 kernel: [<ffffffff88940d45>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jan 3 14:34:05 service103 kernel: [<ffffffff8008ca4e>] __wake_up_common+0x3e/0x68 Jan 3 14:34:05 service103 kernel: [<ffffffff88941cd6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jan 3 14:34:05 service103 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 3 14:34:05 service103 kernel: [<ffffffff88940d70>] ptlrpc_main+0x0/0x1120 [ptlrpc] Jan 3 14:34:05 service103 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 This thread hold sb->s_dquot->dqptr_sem -> start journal, whereas: Jan 3 14:33:57 service103 kernel: Pid: 28173, comm: ll_ost_io_346 Jan 3 14:33:57 service103 kernel: Jan 3 14:33:57 service103 kernel: Call Trace: Jan 3 14:33:57 service103 kernel: [<ffffffff88b97adf>] ldiskfs_mb_free_blocks+0x64f/0x710 [ldiskfs] Jan 3 14:33:57 service103 kernel: [<ffffffff80064691>] __down_read+0x7a/0x92 Jan 3 14:33:57 service103 kernel: [<ffffffff800a5073>] down_read+0x11/0x13 Jan 3 14:33:57 service103 kernel: [<ffffffff80105c25>] __dquot_free_space+0x3d/0x139 Jan 3 14:33:57 service103 kernel: [<ffffffff80105d3c>] dquot_free_space+0xb/0xd Jan 3 14:33:57 service103 kernel: [<ffffffff88b7e1a3>] ldiskfs_free_blocks+0xa3/0xc0 [ldiskfs] Jan 3 14:33:57 service103 kernel: [<ffffffff88b84e4a>] ldiskfs_ext_truncate+0x50a/0xa80 [ldiskfs] Jan 3 14:33:57 service103 kernel: [<ffffffff8002d270>] wake_up_bit+0x1e/0x23 Jan 3 14:33:57 service103 kernel: [<ffffffff88b8c5c3>] ldiskfs_truncate+0xb3/0x5c0 [ldiskfs] Jan 3 14:33:58 service103 kernel: [<ffffffff80019d9f>] __getblk+0x25/0x236 Jan 3 14:33:58 service103 kernel: [<ffffffff88b8000b>] __ldiskfs_handle_dirty_metadata+0xdb/0x110 [ldiskfs] Jan 3 14:33:58 service103 kernel: [<ffffffff800cff96>] unmap_mapping_range+0x59/0x204 Jan 3 14:33:58 service103 kernel: [<ffffffff88b88fd5>] ldiskfs_mark_iloc_dirty+0x4a5/0x540 [ldiskfs] Jan 3 14:33:58 service103 kernel: [<ffffffff800d09e6>] vmtruncate+0xa2/0xc9 Jan 3 14:33:58 service103 kernel: [<ffffffff80041afe>] inode_setattr+0x22/0x104 Jan 3 14:33:58 service103 kernel: [<ffffffff88b8f6ce>] ldiskfs_setattr+0x2de/0x3a0 [ldiskfs] Jan 3 14:33:58 service103 kernel: [<ffffffff88c13c07>] fsfilt_ldiskfs_setattr+0x1a7/0x250 [fsfilt_ldiskfs] Jan 3 14:33:58 service103 kernel: [<ffffffff88c2f511>] filter_version_get_check+0x91/0x2a0 [obdfilter] Jan 3 14:33:58 service103 kernel: [<ffffffff800a502c>] up_write+0x9/0xb Jan 3 14:33:58 service103 kernel: [<ffffffff88c3cf7b>] filter_destroy+0xd9b/0x1fb0 [obdfilter] Jan 3 14:33:58 service103 kernel: [<ffffffff88908910>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc] Jan 3 14:33:58 service103 kernel: [<ffffffff8890bff0>] ldlm_completion_ast+0x0/0x880 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff88869403>] lh_read_lock+0x13/0x20 [obdclass] Jan 3 14:33:59 service103 kernel: [<ffffffff88931ed4>] lustre_msg_add_version+0x34/0x110 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff88934b9a>] lustre_pack_reply_flags+0x86a/0x950 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff887c9309>] cfs_mem_cache_free+0x9/0x10 [libcfs] Jan 3 14:33:59 service103 kernel: [<ffffffff888f33eb>] ldlm_resource_putref_internal+0x3ab/0x460 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff888ee8d2>] ldlm_lock_put+0x372/0x3d0 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff88bee080>] ost_destroy+0x660/0x790 [ost] Jan 3 14:33:59 service103 kernel: [<ffffffff88930e85>] lustre_msg_get_opc+0x35/0xf0 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff88bf7a56>] ost_handle+0x1556/0x55b0 [ost] Jan 3 14:33:59 service103 kernel: [<ffffffff889405e9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff88940d45>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Jan 3 14:33:59 service103 kernel: [<ffffffff8008eb34>] default_wake_function+0x0/0xf Jan 3 14:33:59 service103 kernel: [<ffffffff88941cd6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Jan 3 14:34:00 service103 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 3 14:34:00 service103 kernel: [<ffffffff88940d70>] ptlrpc_main+0x0/0x1120 [ptlrpc] Jan 3 14:34:00 service103 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 This thread start journal Actually, the correct locking order is: journal_lock > dqptr_sem, however, we called ll_vfs_dq_init() before journal start in several places, which breaks the locking order. I'll post a patch to fix this soon. |
| Comment by Niu Yawei (Inactive) [ 05/Jan/12 ] |
|
patch for b1_8: http://review.whamcloud.com/1919 |
| Comment by Niu Yawei (Inactive) [ 06/Jan/12 ] |
|
patch for master: http://review.whamcloud.com/1923 |
| Comment by Niu Yawei (Inactive) [ 06/Jan/12 ] |
|
Hi, Mahmoud Could you try the patch for b1_8 to see if it can resolve your problem? Thanks. |
| Comment by Jay Lan (Inactive) [ 10/Jan/12 ] |
|
I saw Maloo passed on "patch for b1_8: http://review.whamcloud.com/1919". Thanks, |
| Comment by Peter Jones [ 10/Jan/12 ] |
|
Jay That's the beauty of the system - click on the link in the comments of the changeset (under Maloo https://maloo.whamcloud.com/test_sessions/e0e2890a-3793-11e1-bd0b-5254004bbbd3) and you can see for yourself Peter |
| Comment by Mahmoud Hanafi [ 12/Jan/12 ] |
|
Install the Patch today. Will know if a few days if we run into the deadlock. Thanks, |
| Comment by Shuichi Ihara (Inactive) [ 16/Jan/12 ] |
|
Niu, what is trigger of this issue? It seems that we also hit same issue which is filed on |
| Comment by Niu Yawei (Inactive) [ 16/Jan/12 ] |
|
Hi, Ihara It's good to hear that this patch fix your problem. This is a deadlock, which could be triggered when quota is enabled and there are some many parallel ios which caused some io threads have to wait for journal space. |
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 26/Jan/12 ] |
|
Landed to b1_8 and 2.2 |
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 26/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|