[LU-3202] SWL Causes OST hang Created: 22/Apr/13  Updated: 26/Apr/13  Resolved: 26/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: HB
Environment:

LLNL/Hyperion


Attachments: File dit32.console.gz     File srv.tar.gz    
Severity: 3
Rank (Obsolete): 7822

 Description   

Started SWL. Had immediate watchdog timeouts on OSTs, clients are stuck in 'comp' state, are evicted by servers and unable to reconnect, as servers are returning EBUSY. Appears to be some sort of watchdog storm on OSTs, stack dump attached.



 Comments   
Comment by Cliff White (Inactive) [ 22/Apr/13 ]

sysrq -p from one OST

Comment by Cliff White (Inactive) [ 22/Apr/13 ]

Reformatted the filesystem, restarted the test, had failure immediately. attached are consoles from all OSTs, and console from the first client to be evicted.

Comment by Peter Jones [ 22/Apr/13 ]

Oleg is looking into this

Comment by Oleg Drokin [ 22/Apr/13 ]

This seems to be a pretty clear deadlock:

2013-04-22 11:21:58 ll_ost01_001  D 000000000000000a     0  5490      2 0x000000
00
2013-04-22 11:21:58  ffff8803b0c4f9b0 0000000000000046 0000000000000000 ffff8803
2f4e8950
2013-04-22 11:21:58  ffff8803b0c4fa10 ffff8802ad507020 ffff8803b0c4f9a0 ffff8802
0a13c518
2013-04-22 11:21:58  ffff8803b0c49058 ffff8803b0c4ffd8 000000000000fb88 ffff8803
b0c49058
2013-04-22 11:21:58 Call Trace:
2013-04-22 11:21:58  [<ffffffff8150f3be>] __mutex_lock_slowpath+0x13e/0x180
2013-04-22 11:21:58  [<ffffffff8150f25b>] mutex_lock+0x2b/0x50
2013-04-22 11:21:58  [<ffffffffa100265e>] osd_object_destroy+0x1be/0x580 [osd_ldiskfs]
2013-04-22 11:21:58  [<ffffffffa089ebcf>] ? ldiskfs_dirty_inode+0x4f/0x60 [ldiskfs]
2013-04-22 11:21:58  [<ffffffffa10dcd20>] ofd_object_destroy+0x380/0x680 [ofd]
2013-04-22 11:21:58  [<ffffffffa10d0056>] ofd_destroy_by_fid+0x266/0x610 [ofd]
2013-04-22 11:21:58  [<ffffffffa0aca7a0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
2013-04-22 11:21:58  [<ffffffffa0acbe00>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
2013-04-22 11:21:58  [<ffffffffa0af4345>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
2013-04-22 11:21:58  [<ffffffffa10d3bb7>] ofd_destroy+0x1a7/0x8b0 [ofd]

vs

2013-04-22 11:21:56  [<ffffffffa0873ce5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
2013-04-22 11:21:56  [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
2013-04-22 11:21:56  [<ffffffffa089714e>] ldiskfs_sync_file+0x15e/0x260 [ldiskfs
]
2013-04-22 11:21:56  [<ffffffffa0fff854>] osd_object_sync+0x144/0x190 [osd_ldisk
fs]
2013-04-22 11:21:56  [<ffffffffa10cfadb>] ofd_sync+0x36b/0x680 [ofd]
2013-04-22 11:21:56  [<ffffffffa109eb30>] ost_blocking_ast+0x640/0x10f0 [ost]
2013-04-22 11:21:56  [<ffffffffa0964baf>] ? lu_context_fini+0x2f/0xc0 [obdclass]
2013-04-22 11:21:56  [<ffffffffa0aab16c>] ldlm_cancel_callback+0x6c/0x1a0 [ptlrp
c]
2013-04-22 11:21:56  [<ffffffffa0aab2fa>] ldlm_lock_cancel+0x5a/0x1e0 [ptlrpc]
2013-04-22 11:21:56  [<ffffffffa0aceee4>] ldlm_request_cancel+0x254/0x410 [ptlrp
c]
2013-04-22 11:21:56  [<ffffffffa0acf1dd>] ldlm_handle_cancel+0x13d/0x240 [ptlrpc
]
2013-04-22 11:21:56  [<ffffffffa0ad1218>] ldlm_cancel_handler+0x3f8/0x600 [ptlrp
c]
2013-04-22 11:21:56  [<ffffffffa0b0539c>] ptlrpc_server_handle_request+0x40c/0xd
90 [ptlrpc]

In fact osd_object_destroy trying to take i_mutex while having a transaction open is a big no-no as it could lead to this exact lock inversions - usually i_mutex is taken bfore having a transaction open.
The ldiskfs_sync_file is called with i_mutex_held.

What's unclear is how come we did not hit this any earlier, this code has been present in our tree since mid last year, it seems.

Comment by Oleg Drokin [ 23/Apr/13 ]

Reassigned to FanYong as he's going to take it further with a fix.

Comment by nasf (Inactive) [ 23/Apr/13 ]

Originally, to control the race between the OI scrub inserting
OI mapping and the osd_object_destroy() removing OI mapping on
the same target, the inode::i_mutex was used.

But the unlink thread which called osd_object_destroy() already
started transaction handle. Such order is different from others
as to may cause some deadlock between transaction start and the
obtain inode::i_mutex.

So now, the osd_object_destroy() will not obtain inode::i_mutex,
instead, the OI scrub will check whether someone unlinked the
inode or not during the OI scrub rebuilding the OI mapping, and
remove the new-inserted OI mapping if the race happened.

This is the patch:
http://review.whamcloud.com/#change,6124

Comment by Jodi Levi (Inactive) [ 26/Apr/13 ]

Patch landed. Let me know if more work is needed and I will reopen the ticket.

Generated at Sat Feb 10 01:31:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.