[LU-5535] IOR hung during sys_write() Created: 22/Aug/14  Updated: 27/Apr/15  Resolved: 27/Apr/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File ior-hung.log    
Severity: 3
Rank (Obsolete): 15410

 Description   

I came across an IOR process hanging like this on a Lola client:

Aug 22 03:22:22 lola-22 kernel: IOR           D 0000000000000001     0  9127      1 0x00000084
Aug 22 03:22:22 lola-22 kernel: ffff8808227199a8 0000000000000086 ffff880494694d20 ffff8805109dae58
Aug 22 03:22:22 lola-22 kernel: ffff880494694d20 ffff8805ccc58e80 ffffffff8100bb8e ffff8808227199a8
Aug 22 03:22:22 lola-22 kernel: ffff880823655ab8 ffff880822719fd8 000000000000fbc8 ffff880823655ab8
Aug 22 03:22:22 lola-22 kernel: Call Trace:
Aug 22 03:22:22 lola-22 kernel: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
Aug 22 03:22:22 lola-22 kernel: [<ffffffff8105545d>] ? mutex_spin_on_owner+0x8d/0xc0
Aug 22 03:22:22 lola-22 kernel: [<ffffffff8152ac2e>] __mutex_lock_slowpath+0x13e/0x180
Aug 22 03:22:22 lola-22 kernel: [<ffffffff8152aacb>] mutex_lock+0x2b/0x50
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9a87f>] cl_lock_mutex_get+0x6f/0xd0 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9d297>] cl_lock_enclosure+0x187/0x210 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9d363>] cl_lock_closure_build+0x43/0x150 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa138c8bb>] lov_sublock_lock+0x6b/0x2c0 [lov]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa138d148>] lov_lock_use+0x98/0x330 [lov]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9cb95>] cl_use_try+0x175/0x2e0 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9ce5d>] cl_enqueue_try+0x15d/0x300 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9dd1f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e9e97e>] cl_lock_request+0x7e/0x270 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0ea3934>] cl_io_lock+0x3c4/0x560 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0ea3b72>] cl_io_loop+0xa2/0x1b0 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa14121c2>] ll_file_io_generic+0x412/0x8f0 [lustre]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa0e93ca9>] ? cl_env_get+0x29/0x350 [obdclass]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa1412ee3>] ll_file_aio_write+0x133/0x2b0 [lustre]
Aug 22 03:22:22 lola-22 kernel: [<ffffffffa14131b9>] ll_file_write+0x159/0x290 [lustre]
Aug 22 03:22:22 lola-22 kernel: [<ffffffff811892e8>] vfs_write+0xb8/0x1a0
Aug 22 03:22:22 lola-22 kernel: [<ffffffff81189cb1>] sys_write+0x51/0x90
Aug 22 03:22:22 lola-22 kernel: [<ffffffff810e212e>] ? __audit_syscall_exit+0x25e/0x290
Aug 22 03:22:22 lola-22 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

ior-hung.log contains the complete stack dump.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 22/Aug/14 ]

how long had it been in this state?

From the backtrace, it looks like that the lock cancelation started write back but it couldn't be finished. Did you notice the state of corresponding OST?

Comment by Li Wei (Inactive) [ 17/Mar/15 ]

IIRC, it was later discovered that a bug happened before this and very likely led to the hang. Closing this ticket as "not a bug".

Generated at Sat Feb 10 01:52:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.