[LU-4627] Client deadlock on ll_setattr_raw Created: 13/Feb/14  Updated: 12/Jul/16  Resolved: 10/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.4

Type: Bug Priority: Blocker
Reporter: Henri Doreau (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-4710 Deadlock on lli_trunc_sem in ll_setat... Resolved
Related
Severity: 3
Rank (Obsolete): 12657

 Description   

While investigating LU-3732 I've regularly bumped into what seems to be a deadlock on lli_trunc_sem, according to the trace below:

Call Trace:
[<ffffffff81511655>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff815117b3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff81284163>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff81510cb2>] ? down_write+0x32/0x40
[<ffffffffa0e03fe1>] ll_setattr_raw+0x191/0x10c0 [lustre]
[<ffffffff810758c7>] ? current_fs_time+0x27/0x30
[<ffffffffa0e04f6d>] ll_setattr+0x5d/0xf0 [lustre]
[<ffffffff8119ec18>] notify_change+0x168/0x340
[<ffffffffa06bb3bb>] ? libcfs_debug_vmsg2+0x50b/0xbb0 [libcfs]
[<ffffffff811197ef>] file_remove_suid+0x5f/0x90
[<ffffffff8111c2c0>] __generic_file_aio_write+0x220/0x490
[<ffffffffa06bbaa1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffff8111c5b8>] generic_file_aio_write+0x88/0x100
[<ffffffffa0e4353b>] vvp_io_write_start+0xdb/0x3d0 [lustre]
[<ffffffffa08cbcaa>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa08cfe24>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0de2a36>] ll_file_io_generic+0x2b6/0x710 [lustre]
[<ffffffffa08bfdf9>] ? cl_env_get+0x29/0x350 [obdclass]
[<ffffffffa0de3702>] ll_file_aio_write+0x142/0x2c0 [lustre]
[<ffffffffa0de39ec>] ll_file_write+0x16c/0x2a0 [lustre]
[<ffffffff811814b8>] vfs_write+0xb8/0x1a0
[<ffffffff81181e72>] sys_pwrite64+0x82/0xa0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

It seems to me that the down_read() in ll_file_io_generic() should only be done if iot == CIT_READ since in the write case, we down_write() later.



 Comments   
Comment by Peter Jones [ 13/Feb/14 ]

Bobijam

Could you please advise on this one?

Thanks

Peter

Comment by Oleg Drokin [ 13/Feb/14 ]

What's the version of Lustre being used here?

Comment by Henri Doreau (Inactive) [ 13/Feb/14 ]

This is the current HEAD

Comment by Zhenyu Xu [ 14/Feb/14 ]

I think you are right about it. patch tracking at http://review.whamcloud.com/9267

Comment by Peter Jones [ 10/Mar/14 ]

Landed for 2.6

Comment by James A Simmons [ 10/Oct/14 ]

Patch backport for b2_5 at http://review.whamcloud.com/#/c/12268. I'm hoping this will fix some performance issues with truncates I have seen.

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12268/
Subject: LU-4627 llite: deed taking lli_trunc_sem during file write
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: 8a2fe616a959f18a928d4edf185467cbf905c355

Generated at Sat Feb 10 01:44:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.