Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16958

migrate vs regular ops deadlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      PID: 350193  TASK: ffff9bd65af446c0  CPU: 0   COMMAND: "getfattr"
       #0 [ffff9bd63ffb7950] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd63ffb79d8] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd63ffb79e8] rwsem_down_write_slowpath at ffffffffba0f41a7
          /tmp/kernel/./arch/x86/include/asm/current.h: 15
       #3 [ffff9bd63ffb7a88] down_write at ffffffffba5a691a
          /tmp/kernel/./include/linux/err.h: 36
       #4 [ffff9bd63ffb7ac0] vvp_inode_ops at ffffffffc116d57f [lustre]
          /home/lustre/linux-4.18.0-305.25.1.el8_4/./arch/x86/include/asm/current.h: 15
       #5 [ffff9bd63ffb7ae0] cl_object_inode_ops at ffffffffc0454a50 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 442
       #6 [ffff9bd63ffb7b18] lov_conf_set at ffffffffc0aa36c4 [lov]
          /home/lustre/master-mine/lustre/lov/lov_object.c: 1465
       #7 [ffff9bd63ffb7b88] cl_conf_set at ffffffffc04542d8 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 299
       #8 [ffff9bd63ffb7bb8] ll_layout_conf at ffffffffc111d110 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 5995
       #9 [ffff9bd63ffb7c28] ll_layout_refresh at ffffffffc111dad3 [lustre]
          /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155
      #10 [ffff9bd63ffb7cf0] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
      #11 [ffff9bd63ffb7d20] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
      #12 [ffff9bd63ffb7d58] cl_glimpse_size0 at ffffffffc11642ca [lustre]
          /home/lustre/master-mine/lustre/llite/glimpse.c: 204
      #13 [ffff9bd63ffb7da0] ll_getattr_dentry at ffffffffc111c65d [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1677
      #14 [ffff9bd63ffb7e50] vfs_statx at ffffffffba1d4be9
          /tmp/kernel/fs/stat.c: 204
      

      checking the stack on the process above inode was found at 0xffff9bd60367d350:

      crash> p *(struct ll_inode_info *)(0xffff9bd60367d350-0x150)
        lli_inode_magic = 287116773,
      ...
        lli_inode_lock_owner = 0xffff9bd68f51d380
      

      now check task 0xffff9bd68f51d380:

      crash> p *(struct task_struct *)0xffff9bd68f51d380|more
      ...
        pid = 348428,
      ...
      PID: 348428  TASK: ffff9bd68f51d380  CPU: 1   COMMAND: "lfs"
       #0 [ffff9bd613c37968] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd613c379f0] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd613c37a00] schedule_preempt_disabled at ffffffffba5a2a6c
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 79
       #3 [ffff9bd613c37a08] __mutex_lock at ffffffffba5a3a40
          /tmp/kernel/kernel/locking/mutex.c: 1038
       #4 [ffff9bd613c37ac8] ll_layout_refresh at ffffffffc111d577 [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1536
       #5 [ffff9bd613c37b88] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
       #6 [ffff9bd613c37bb8] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
       #7 [ffff9bd613c37bf0] ll_ioc_data_version at ffffffffc110c665 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3193
       #8 [ffff9bd613c37c28] ll_migrate at ffffffffc111b244 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3227
       #9 [ffff9bd613c37ca8] ll_dir_ioctl at ffffffffc1105563 [lustre]
          /home/lustre/master-mine/lustre/llite/dir.c: 2277
      #10 [ffff9bd613c37e88] do_vfs_ioctl at ffffffffba1e3199
          /tmp/kernel/fs/ioctl.c: 48
      

      it seems this is an locking order issue:
      ll_migrate() takes inode lock, then lli_layout_mutex (in ll_layout_refresh()) while other ops (like getfattr) use the reversed order.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: