[LU-2173] Some sort of deadlock in lod_qos code Created: 14/Oct/12  Updated: 19/Apr/13  Resolved: 29/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5205

 Description   

Got this happening twice already.
Racer would run for some time, then one of the MDT threads gets stuck in lod_qos code.
The most recent example happened on a crashdump enabled node, so I dumped kernel core for subsequent analysis:

[ 3000.572084] INFO: task mdt00_003:3541 blocked for more than 120 seconds.
[ 3000.572364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.572831] mdt00_003     D 0000000000000001  2416  3541      2 0x00000080
[ 3000.573180]  ffff8800af919630 0000000000000046 ffff8800af9195d0 ffffffff810385d8
[ 3000.573620]  ffff8800ffffffff 000005900aedb3e6 ffff8800b0995df0 ffff8800af919630
[ 3000.574062]  ffff8800b48908b8 ffff8800af919fd8 000000000000fba8 ffff8800b48908b8
[ 3000.574500] Call Trace:
[ 3000.574696]  [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0
[ 3000.574959]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
[ 3000.575230]  [<ffffffff814fad23>] rwsem_down_write_failed+0x23/0x30
[ 3000.575489]  [<ffffffff8127c133>] call_rwsem_down_write_failed+0x13/0x20
[ 3000.575751]  [<ffffffff814f9e85>] ? down_write+0x45/0x50
[ 3000.576002]  [<ffffffffa0e06d82>] lod_alloc_qos.clone.0+0x182/0x1210 [lod]
[ 3000.576271]  [<ffffffff814f970c>] ? __mutex_lock_slowpath+0x21c/0x2c0
[ 3000.576558]  [<ffffffffa0e08578>] lod_qos_prep_create+0x768/0x1af4 [lod]
[ 3000.576906]  [<ffffffffa0cc4ff2>] ? osd_declare_inode_qid+0x1a2/0x270 [osd_ldiskfs]
[ 3000.577364]  [<ffffffffa0e03a8b>] lod_declare_striped_object+0x14b/0x920 [lod]
[ 3000.577782]  [<ffffffffa0e04568>] lod_declare_object_create+0x308/0x4f0 [lod]
[ 3000.578062]  [<ffffffffa0beac4f>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
[ 3000.578519]  [<ffffffffa0bfbaca>] mdd_create+0x39a/0x1550 [mdd]
[ 3000.578790]  [<ffffffffa0d54bc9>] mdt_reint_open+0x1079/0x1860 [mdt]
[ 3000.579077]  [<ffffffffa0c0186e>] ? md_ucred+0x1e/0x60 [mdd]
[ 3000.579330]  [<ffffffffa0d1f655>] ? mdt_ucred+0x15/0x20 [mdt]
[ 3000.579582]  [<ffffffffa0d3f0a1>] mdt_reint_rec+0x41/0xe0 [mdt]
[ 3000.579839]  [<ffffffffa0d38483>] mdt_reint_internal+0x4e3/0x7e0 [mdt]
[ 3000.580107]  [<ffffffffa0d38a4d>] mdt_intent_reint+0x1ed/0x500 [mdt]
[ 3000.580369]  [<ffffffffa0d343fe>] mdt_intent_policy+0x38e/0x770 [mdt]
[ 3000.580694]  [<ffffffffa07c8dda>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
[ 3000.581100]  [<ffffffffa07f00d7>] ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc]
[ 3000.581411]  [<ffffffffa0d34936>] mdt_enqueue+0x46/0x130 [mdt]
[ 3000.581661]  [<ffffffffa0d281f2>] mdt_handle_common+0x932/0x1740 [mdt]
[ 3000.581926]  [<ffffffffa0d290d5>] mdt_regular_handle+0x15/0x20 [mdt]
[ 3000.582712]  [<ffffffffa081e483>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
[ 3000.583164]  [<ffffffffa049b66e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 3000.583440]  [<ffffffffa0817171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[ 3000.583707]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[ 3000.583965]  [<ffffffffa082101a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc]
[ 3000.584247]  [<ffffffffa0820480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3000.584542]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[ 3000.584864]  [<ffffffffa0820480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3000.585226]  [<ffffffffa0820480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3000.585503]  [<ffffffff8100c140>] ? child_rip+0x0/0x20

.
The core and modules are in /exports/crashdumps/192.168.10.218-2012-10-14-02:15:42

Another occurence happened a bit before that on a different node with no crashdump ability, so I only have these traces:

[ 2044.498003] Pid: 11763, comm: mdt01_013
[ 2044.498606]
[ 2044.498607] Call Trace:[ 2044.499226]  [<ffffffff810a9ee2>] ? __module_text_address+0x12/0x60
[ 2044.500232]  [<ffffffffa01e8b22>] ? cfs_mem_cache_alloc+0x22/0x30 [libcfs]
[ 2044.501324]  [<ffffffff81044f4e>] ? kernel_map_pages+0xfe/0x110
[ 2044.502229]  [<ffffffff81161d3f>] ? kmem_cache_free+0x7f/0x280
[ 2044.503134]  [<ffffffff81161792>] ? cache_alloc_debugcheck_after+0x152/0x290
[ 2044.504245]  [<ffffffffa01e8b22>] ? cfs_mem_cache_alloc+0x22/0x30 [libcfs]
[ 2044.505351]  [<ffffffffa01e8afe>] ? cfs_mem_cache_free+0xe/0x10 [libcfs]
[ 2044.506395]  [<ffffffffa0bb04c1>] ? osp_object_free+0x181/0x190 [osp]
[ 2044.507418]  [<ffffffffa03abcb3>] ? lu_object_free+0x123/0x1b0 [obdclass]
[ 2044.508561]  [<ffffffffa03ac4ed>] ? lu_object_put+0xad/0x350 [obdclass]
[ 2044.509607]  [<ffffffffa0bb1903>] ? osp_declare_object_create+0x163/0x540 [os
p]
[ 2044.510737]  [<ffffffffa0b7f6fc>] ? lod_qos_declare_object_on+0x16c/0x4c0 [lod]
[ 2044.511904]  [<ffffffffa0b819f7>] ? lod_alloc_qos.clone.0+0xdf7/0x1210 [lod]
[ 2044.513025]  [<ffffffffa0b82578>] ? lod_qos_prep_create+0x768/0x1af4 [lod]
[ 2044.514130]  [<ffffffffa0a59ff2>] ? osd_declare_inode_qid+0x1a2/0x270 [osd_ldiskfs]
[ 2044.515330]  [<ffffffffa0b7da8b>] ? lod_declare_striped_object+0x14b/0x920 [lod]
[ 2044.516492]  [<ffffffffa0b7e568>] ? lod_declare_object_create+0x308/0x4f0 [lo
[ 2044.517638]  [<ffffffffa0498c4f>] ? mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
[ 2044.518870]  [<ffffffffa04a9aca>] ? mdd_create+0x39a/0x1550 [mdd]
[ 2044.519822]  [<ffffffffa0acebc9>] ? mdt_reint_open+0x1079/0x1860 [mdt]
[ 2044.520851]  [<ffffffffa04af86e>] ? md_ucred+0x1e/0x60 [mdd]
[ 2044.521751]  [<ffffffffa0a99655>] ? mdt_ucred+0x15/0x20 [mdt]
[ 2044.522649]  [<ffffffffa0ab90a1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
[ 2044.523601]  [<ffffffffa0ab2483>] ? mdt_reint_internal+0x4e3/0x7e0 [mdt]
[ 2044.524665]  [<ffffffffa0ab2a4d>] ? mdt_intent_reint+0x1ed/0x500 [mdt]
[ 2044.525708]  [<ffffffffa0aae3fe>] ? mdt_intent_policy+0x38e/0x770 [mdt]
[ 2044.526761]  [<ffffffffa04fadda>] ? ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
[ 2044.527846]  [<ffffffffa05220d7>] ? ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc]
[ 2044.529031]  [<ffffffffa0aae936>] ? mdt_enqueue+0x46/0x130 [mdt]
[ 2044.529986]  [<ffffffffa0aa21f2>] ? mdt_handle_common+0x932/0x1740 [mdt]
[ 2044.531032]  [<ffffffffa0aa30d5>] ? mdt_regular_handle+0x15/0x20 [mdt]
[ 2044.532070]  [<ffffffffa0550483>] ? ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
[ 2044.533340]  [<ffffffffa01e866e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 2044.534339]  [<ffffffffa0549171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[ 2044.535394]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[ 2044.536212]  [<ffffffffa055301a>] ? ptlrpc_main+0xb9a/0x1960 [ptlrpc]
[ 2044.537257]  [<ffffffffa0552480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 2044.538225]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20


 Comments   
Comment by Oleg Drokin [ 14/Oct/12 ]

and it just happened again with both kinds of backtraces at once:

[ 3023.620919] Pid: 3295, comm: mdt01_002
[ 3023.621163] 
[ 3023.621163] Call Trace:
[ 3023.621555]  [<ffffffff8116172e>] ? cache_free_debugcheck+0x2ae/0x360
[ 3023.621818]  [<ffffffff811618df>] ? kmem_cache_free+0xff/0x250
[ 3023.622074]  [<ffffffff81161852>] ? kmem_cache_free+0x72/0x250
[ 3023.622325]  [<ffffffff811613e9>] ? cache_alloc_debugcheck_after+0x1f9/0x210
[ 3023.622608]  [<ffffffffa04b9b22>] ? cfs_mem_cache_alloc+0x22/0x30 [libcfs]
[ 3023.622889]  [<ffffffffa04b9afe>] ? cfs_mem_cache_free+0xe/0x10 [libcfs]
[ 3023.623165]  [<ffffffffa0e104c1>] ? osp_object_free+0x181/0x190 [osp]
[ 3023.623459]  [<ffffffffa0697d93>] ? lu_object_free+0x123/0x1b0 [obdclass]
[ 3023.623750]  [<ffffffffa06963bc>] ? lu_object_add_top+0x2c/0x30 [obdclass]
[ 3023.624049]  [<ffffffffa06985cd>] ? lu_object_put+0xad/0x350 [obdclass]
[ 3023.624322]  [<ffffffffa0e11903>] ? osp_declare_object_create+0x163/0x540 [osp]
[ 3023.624795]  [<ffffffffa0ddf6fc>] ? lod_qos_declare_object_on+0x16c/0x4c0 [lod]
[ 3023.625260]  [<ffffffffa0de19f7>] ? lod_alloc_qos.clone.0+0xdf7/0x1210 [lod]
[ 3023.625556]  [<ffffffffa0de2578>] ? lod_qos_prep_create+0x768/0x1af4 [lod]
[ 3023.625856]  [<ffffffffa0c9eff2>] ? osd_declare_inode_qid+0x1a2/0x270 [osd_ldiskfs]
[ 3023.626329]  [<ffffffffa0ddda8b>] ? lod_declare_striped_object+0x14b/0x920 [lod]
[ 3023.626789]  [<ffffffffa0dde568>] ? lod_declare_object_create+0x308/0x4f0 [lod]
[ 3023.627253]  [<ffffffffa0bc4c4f>] ? mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
[ 3023.627716]  [<ffffffffa0bd5aca>] ? mdd_create+0x39a/0x1550 [mdd]
[ 3023.628004]  [<ffffffffa0d2ebc9>] ? mdt_reint_open+0x1079/0x1860 [mdt]
[ 3023.628275]  [<ffffffffa0bdb86e>] ? md_ucred+0x1e/0x60 [mdd]
[ 3023.628563]  [<ffffffffa0cf9655>] ? mdt_ucred+0x15/0x20 [mdt]
[ 3023.628838]  [<ffffffffa0d190a1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
[ 3023.629125]  [<ffffffffa0d12483>] ? mdt_reint_internal+0x4e3/0x7e0 [mdt]
[ 3023.629419]  [<ffffffffa0d12a4d>] ? mdt_intent_reint+0x1ed/0x500 [mdt]
[ 3023.629710]  [<ffffffffa0d0e3fe>] ? mdt_intent_policy+0x38e/0x770 [mdt]
[ 3023.630031]  [<ffffffffa07e6dda>] ? ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
[ 3023.630346]  [<ffffffffa080e0d7>] ? ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc]
[ 3023.630804]  [<ffffffffa0d0e936>] ? mdt_enqueue+0x46/0x130 [mdt]
[ 3023.631088]  [<ffffffffa0d021f2>] ? mdt_handle_common+0x932/0x1740 [mdt]
[ 3023.631383]  [<ffffffffa0d030d5>] ? mdt_regular_handle+0x15/0x20 [mdt]
[ 3023.631691]  [<ffffffffa083c483>] ? ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
[ 3023.632168]  [<ffffffffa04b966e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 3023.632492]  [<ffffffffa0835171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[ 3023.632783]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[ 3023.633068]  [<ffffffffa083f01a>] ? ptlrpc_main+0xb9a/0x1960 [ptlrpc]
[ 3023.633395]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.633679]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
[ 3023.633963]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.634265]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.634548]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[ 3023.634806] 
[ 3023.635003] LustreError: dumping log to /tmp/lustre-log.1350198767.3295
[ 3023.652375] LNet: Service thread pid 3290 was inactive for 62.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 3023.653198] Pid: 3290, comm: mdt00_000
[ 3023.653433] 
[ 3023.653433] Call Trace:
[ 3023.653833]  [<ffffffff810385d8>] ? pvclock_clocksource_read+0x58/0xd0
[ 3023.654122]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
[ 3023.654403]  [<ffffffff814fad23>] rwsem_down_write_failed+0x23/0x30
[ 3023.654681]  [<ffffffff8127c133>] call_rwsem_down_write_failed+0x13/0x20
[ 3023.654966]  [<ffffffff814f9e85>] ? down_write+0x45/0x50
[ 3023.655212]  [<ffffffffa0de0d82>] lod_alloc_qos.clone.0+0x182/0x1210 [lod]
[ 3023.655483]  [<ffffffff814f970c>] ? __mutex_lock_slowpath+0x21c/0x2c0
[ 3023.655745]  [<ffffffffa0de2578>] lod_qos_prep_create+0x768/0x1af4 [lod]
[ 3023.656035]  [<ffffffffa0c9eff2>] ? osd_declare_inode_qid+0x1a2/0x270 [osd_ldiskfs]
[ 3023.656513]  [<ffffffffa0ddda8b>] lod_declare_striped_object+0x14b/0x920 [lod]
[ 3023.656978]  [<ffffffffa0dde568>] lod_declare_object_create+0x308/0x4f0 [lod]
[ 3023.657280]  [<ffffffffa0bc4c4f>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
[ 3023.657750]  [<ffffffffa0bd5aca>] mdd_create+0x39a/0x1550 [mdd]
[ 3023.658047]  [<ffffffffa0d2ebc9>] mdt_reint_open+0x1079/0x1860 [mdt]
[ 3023.658333]  [<ffffffffa0bdb86e>] ? md_ucred+0x1e/0x60 [mdd]
[ 3023.658611]  [<ffffffffa0cf9655>] ? mdt_ucred+0x15/0x20 [mdt]
[ 3023.658885]  [<ffffffffa0d190a1>] mdt_reint_rec+0x41/0xe0 [mdt]
[ 3023.659159]  [<ffffffffa0d12483>] mdt_reint_internal+0x4e3/0x7e0 [mdt]
[ 3023.659429]  [<ffffffffa0d12a4d>] mdt_intent_reint+0x1ed/0x500 [mdt]
[ 3023.659702]  [<ffffffffa0d0e3fe>] mdt_intent_policy+0x38e/0x770 [mdt]
[ 3023.660006]  [<ffffffffa07e6dda>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
[ 3023.660314]  [<ffffffffa080e0d7>] ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc]
[ 3023.660655]  [<ffffffffa0d0e936>] mdt_enqueue+0x46/0x130 [mdt]
[ 3023.660939]  [<ffffffffa0d021f2>] mdt_handle_common+0x932/0x1740 [mdt]
[ 3023.661229]  [<ffffffffa0d030d5>] mdt_regular_handle+0x15/0x20 [mdt]
[ 3023.661535]  [<ffffffffa083c483>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
[ 3023.662013]  [<ffffffffa04b966e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 3023.662317]  [<ffffffffa0835171>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[ 3023.662590]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[ 3023.662848]  [<ffffffffa083f01a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc]
[ 3023.663136]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.663397]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[ 3023.663652]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.663938]  [<ffffffffa083e480>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[ 3023.664223]  [<ffffffff8100c140>] ? child_rip+0x0/0x20

Definitely seems to be some sort of a race, since I made more load on the host hosting this VM (and added two cores into the VM) and hit it much sooner.
Probably on first racer.sh run.
crashdump is in /exports/crashdumps/192.168.10.218-2012-10-14-03:27:17

Comment by Alex Zhuravlev [ 18/Oct/12 ]

http://review.whamcloud.com/#change,4300

Comment by Alex Zhuravlev [ 29/Oct/12 ]

landed on master

Generated at Sat Feb 10 01:22:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.