Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.6.0
-
None
-
Bug occurred during IOStress testing using code from master on SLES11 SP3. I assume the bug is in 2.6 because
LU-3321landed to that version.
-
3
-
12947
Description
Several application processes hang trying to get a write lock on ll_inode_info.lli_trunc_sem in ll_setattr_raw(). Looks like the processes are each deadlocked on themselves. The call to ll_file_io_generic() earlier in the call stack acquires a read lock on the same semaphore, which prevents the write lock from being granted in ll_setattr_raw().
This bug was introduced by LU-3321, review.whamcloud.com/7893.
> crash> bt > PID: 10475 TASK: ffff880837ae67f0 CPU: 0 COMMAND: "nsystst" > #0 [ffff88083cf05698] schedule at ffffffff8144947f > #1 [ffff88083cf057f0] rwsem_down_failed_common at ffffffff8144b6d5 > #2 [ffff88083cf05860] rwsem_down_write_failed at ffffffff8144b783 > #3 [ffff88083cf05870] call_rwsem_down_write_failed at ffffffff81219c43 > #4 [ffff88083cf058d0] ll_setattr_raw at ffffffffa07ed590 [lustre] > #5 [ffff88083cf059b0] ll_setattr at ffffffffa07ee557 [lustre] > #6 [ffff88083cf059c0] notify_change at ffffffff8116e1f0 > #7 [ffff88083cf05a30] file_remove_suid at ffffffff810fa3e1 > #8 [ffff88083cf05ab0] __generic_file_aio_write at ffffffff810fcd29 > #9 [ffff88083cf05b60] generic_file_aio_write at ffffffff810fcfc9 > #10 [ffff88083cf05ba0] vvp_io_write_start at ffffffffa0825cb0 [lustre] > #11 [ffff88083cf05c00] cl_io_start at ffffffffa0365682 [obdclass] > #12 [ffff88083cf05c30] cl_io_loop at ffffffffa0369204 [obdclass] > #13 [ffff88083cf05c60] ll_file_io_generic at ffffffffa07c3062 [lustre] > #14 [ffff88083cf05ce0] ll_file_aio_write at ffffffffa07c355e [lustre] > #15 [ffff88083cf05d30] do_sync_readv_writev at ffffffff811539cb > #16 [ffff88083cf05e40] do_readv_writev at ffffffff811548d4 > #17 [ffff88083cf05f30] vfs_writev at ffffffff81154a28 > #18 [ffff88083cf05f40] sys_writev at ffffffff81154b65 > #19 [ffff88083cf05f80] system_call_fastpath at ffffffff8145376b > crash> files | egrep "PID|husk1" > PID: 10475 TASK: ffff880837ae67f0 CPU: 0 COMMAND: "nsystst" > 3 ffff880835e43bc0 ffff8808000206c0 ffff880837e05178 REG /dsl/lus/husk1/ostest.vers/CL_nsystst03.2672/nsys_base.2 lli_trunc_sem info: > crash> eval 0xffff880837e05178 - 248 | grep hex > hexadecimal: ffff880837e05080 > crash> ll_inode_info ffff880837e05080 | grep -A 15 trunc_sem > f_trunc_sem = { > count = -4294967295, = 0xffffffff00000001 > wait_lock = { > { > rlock = { > raw_lock = { > slock = 2313 > } > } > } > }, > wait_list = { > next = 0xffff88083cf057f8, > prev = 0xffff88083cf057f8 > } > }, > crash> semaphore_waiter 0xffff88083cf057f8 > struct semaphore_waiter { > list = { > next = 0xffff880837e05440, > prev = 0xffff880837e05440 > }, > task = 0xffff880837ae67f0, > up = 2 > } > crash> ps | grep ffff880837ae67f0 > 10475 1 0 ffff880837ae67f0 UN 0.0 131484 5112 nsystst
LU-3321/7893 changed the logic in ll_file_io_generic to always acquire the lli_trunc_sem semaphore in the IO_NORMAL case. Formerly, the semaphore was only acquired in the read path, when ll_setattr would not be called.
From lustre/llite/file.c:ll_file_io_generic: > case IO_NORMAL: > cio->cui_iov = args->u.normal.via_iov; > cio->cui_nrsegs = args->u.normal.via_nrsegs; > cio->cui_tot_nrsegs = cio->cui_nrsegs; > cio->cui_iocb = args->u.normal.via_iocb; > if ((iot == CIT_WRITE) && > !(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) { > if (mutex_lock_interruptible(&lli-> > - lli_write_mutex)) > - GOTO(out, result = -ERESTARTSYS); > - write_mutex_locked = 1; > - } else if (iot == CIT_READ) { > - down_read(&lli->lli_trunc_sem); > - } > + lli_write_mutex)) > + GOTO(out, result = -ERESTARTSYS); > + write_mutex_locked = 1; > + } > + down_read(&lli->lli_trunc_sem); > break; > case IO_SENDFILE: > vio->u.sendfile.cui_actor = args->u.sendfile.via_actor;
Attachments
Issue Links
- duplicates
-
LU-4627 Client deadlock on ll_setattr_raw
-
- Resolved
-