[LU-5527] compilebench hung in cl_lock_state_wait() when writing Created: 21/Aug/14  Updated: 27/Apr/15  Resolved: 27/Apr/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File compilebench_hang-lola-24.log    
Severity: 3
Rank (Obsolete): 15385

 Description   

A compilebench run on Lola, where we inject 0.1% message drops between clients and servers, hung like this:

Aug 20 20:29:41 lola-24 kernel: python        S 0000000000000001     0 53350  53330 0x00000080
Aug 20 20:29:41 lola-24 kernel: ffff8807c033dbd8 0000000000000082 0000000000000000 ffff8807b52a4a98
Aug 20 20:29:41 lola-24 kernel: ffff8807c033db78 ffffffffa0af7f8f ffff8807c033db78 ffff8807ef4d7cf8
Aug 20 20:29:41 lola-24 kernel: ffff880804e15098 ffff8807c033dfd8 000000000000fbc8 ffff880804e15098
Aug 20 20:29:41 lola-24 kernel: Call Trace:
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0af7f8f>] ? lov_sublock_unlock+0x5f/0x140 [lov]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ed643>] cl_lock_state_wait+0x1d3/0x320 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ede0b>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05ee97e>] cl_lock_request+0x7e/0x270 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05f3934>] cl_io_lock+0x3c4/0x560 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05f3b72>] cl_io_loop+0xa2/0x1b0 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7e1c2>] ll_file_io_generic+0x412/0x8f0 [lustre]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa05e3ca9>] ? cl_env_get+0x29/0x350 [obdclass]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7eee3>] ll_file_aio_write+0x133/0x2b0 [lustre]
Aug 20 20:29:41 lola-24 kernel: [<ffffffffa0b7f1b9>] ll_file_write+0x159/0x290 [lustre]
Aug 20 20:29:41 lola-24 kernel: [<ffffffff811892e8>] vfs_write+0xb8/0x1a0
Aug 20 20:29:41 lola-24 kernel: [<ffffffff81189cb1>] sys_write+0x51/0x90
Aug 20 20:29:41 lola-24 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Attachment compilebench_hang-lola-24.log contains the complete stack dump. This is likely to be difficult to reproduce.



 Comments   
Comment by Johann Lombardi (Inactive) [ 13/Mar/15 ]

Not seen for a while.

Generated at Sat Feb 10 01:52:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.