[LU-4710] Deadlock on lli_trunc_sem in ll_setattr_raw() Created: 04/Mar/14  Updated: 05/Mar/14  Resolved: 05/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Ann Koehler (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Bug occurred during IOStress testing using code from master on SLES11 SP3. I assume the bug is in 2.6 because LU-3321 landed to that version.


Issue Links:
Duplicate
duplicates LU-4627 Client deadlock on ll_setattr_raw Resolved
Severity: 3
Rank (Obsolete): 12947

 Description   

Several application processes hang trying to get a write lock on ll_inode_info.lli_trunc_sem in ll_setattr_raw(). Looks like the processes are each deadlocked on themselves. The call to ll_file_io_generic() earlier in the call stack acquires a read lock on the same semaphore, which prevents the write lock from being granted in ll_setattr_raw().

This bug was introduced by LU-3321, review.whamcloud.com/7893.

> crash> bt
> PID: 10475  TASK: ffff880837ae67f0  CPU: 0   COMMAND: "nsystst"
>  #0 [ffff88083cf05698] schedule at ffffffff8144947f
>  #1 [ffff88083cf057f0] rwsem_down_failed_common at ffffffff8144b6d5
>  #2 [ffff88083cf05860] rwsem_down_write_failed at ffffffff8144b783
>  #3 [ffff88083cf05870] call_rwsem_down_write_failed at ffffffff81219c43
>  #4 [ffff88083cf058d0] ll_setattr_raw at ffffffffa07ed590 [lustre]
>  #5 [ffff88083cf059b0] ll_setattr at ffffffffa07ee557 [lustre]
>  #6 [ffff88083cf059c0] notify_change at ffffffff8116e1f0
>  #7 [ffff88083cf05a30] file_remove_suid at ffffffff810fa3e1
>  #8 [ffff88083cf05ab0] __generic_file_aio_write at ffffffff810fcd29
>  #9 [ffff88083cf05b60] generic_file_aio_write at ffffffff810fcfc9
> #10 [ffff88083cf05ba0] vvp_io_write_start at ffffffffa0825cb0 [lustre]
> #11 [ffff88083cf05c00] cl_io_start at ffffffffa0365682 [obdclass]
> #12 [ffff88083cf05c30] cl_io_loop at ffffffffa0369204 [obdclass]
> #13 [ffff88083cf05c60] ll_file_io_generic at ffffffffa07c3062 [lustre]
> #14 [ffff88083cf05ce0] ll_file_aio_write at ffffffffa07c355e [lustre]
> #15 [ffff88083cf05d30] do_sync_readv_writev at ffffffff811539cb
> #16 [ffff88083cf05e40] do_readv_writev at ffffffff811548d4
> #17 [ffff88083cf05f30] vfs_writev at ffffffff81154a28
> #18 [ffff88083cf05f40] sys_writev at ffffffff81154b65
> #19 [ffff88083cf05f80] system_call_fastpath at ffffffff8145376b

> crash> files | egrep "PID|husk1"
> PID: 10475  TASK: ffff880837ae67f0  CPU: 0   COMMAND: "nsystst"
>   3 ffff880835e43bc0 ffff8808000206c0 ffff880837e05178 REG  /dsl/lus/husk1/ostest.vers/CL_nsystst03.2672/nsys_base.2

lli_trunc_sem info:

> crash> eval 0xffff880837e05178 - 248 | grep hex
> hexadecimal: ffff880837e05080  
> crash> ll_inode_info ffff880837e05080 | grep -A 15 trunc_sem
>       f_trunc_sem = {
>         count = -4294967295, = 0xffffffff00000001
>         wait_lock = {
>           {
>             rlock = {
>               raw_lock = {
>                 slock = 2313
>               }
>             }
>           }
>         }, 
>         wait_list = {
>           next = 0xffff88083cf057f8, 
>           prev = 0xffff88083cf057f8
>         }
>       }, 
> crash> semaphore_waiter 0xffff88083cf057f8
> struct semaphore_waiter {
>   list = {
>     next = 0xffff880837e05440, 
>     prev = 0xffff880837e05440
>   }, 
>   task = 0xffff880837ae67f0, 
>   up = 2
> }
> crash> ps | grep ffff880837ae67f0
>   10475      1    0 ffff880837ae67f0  UN   0.0  131484   5112  nsystst

LU-3321/7893 changed the logic in ll_file_io_generic to always acquire the lli_trunc_sem semaphore in the IO_NORMAL case. Formerly, the semaphore was only acquired in the read path, when ll_setattr would not be called.

From lustre/llite/file.c:ll_file_io_generic:
>                 case IO_NORMAL:
>                         cio->cui_iov = args->u.normal.via_iov;
>                         cio->cui_nrsegs = args->u.normal.via_nrsegs;
>                         cio->cui_tot_nrsegs = cio->cui_nrsegs;
>                         cio->cui_iocb = args->u.normal.via_iocb;
>                          if ((iot == CIT_WRITE) &&
>                              !(cio->cui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
>                                 if (mutex_lock_interruptible(&lli->
> -                                                               lli_write_mutex))
> -                                        GOTO(out, result = -ERESTARTSYS);
> -                                write_mutex_locked = 1;
> -                        } else if (iot == CIT_READ) {
> -                               down_read(&lli->lli_trunc_sem);
> -                        }
> +                                                       lli_write_mutex))
> +                                       GOTO(out, result = -ERESTARTSYS);
> +                               write_mutex_locked = 1;
> +                       }
> +                       down_read(&lli->lli_trunc_sem);
>                          break;
>                  case IO_SENDFILE:
>                          vio->u.sendfile.cui_actor = args->u.sendfile.via_actor;


 Comments   
Comment by Ann Koehler (Inactive) [ 04/Mar/14 ]

Dump uploaded to ftp.whamcloud.com:/uploads/LU-4710/LU-4710_lli_trunc_sem_hang.tgz
I used the dump from node c0-0cs8n0 for my analysis.

Comment by Zhenyu Xu [ 05/Mar/14 ]

dup of LU-4627

Generated at Sat Feb 10 01:45:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.