Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16308

cl_object_put_last() is stuck in wait_event(atomic_read(&header->loh_ref) == 1)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      client may get stuck in cl_object_put_last():

      [465342.626191] INFO: task nwchem:108679 blocked for more than 120 seconds.
      [465342.632934]       Tainted: G           OE    --------- -t - 4.18.0-193.el8.x86_64 #1
      [465342.640783] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [465342.648720] nwchem          D    0 108679 108659 0x00004082
      [465342.655115] Call Trace:
      [465342.658245]  ? __schedule+0x24f/0x650
      [465342.662607]  schedule+0x2f/0xa0
      [465342.666446]  cl_inode_fini+0x137/0x1e0 [lustre]
      [465342.671705]  ? wake_up_q+0x70/0x70
      [465342.675813]  ll_clear_inode+0x1b3/0x570 [lustre]
      [465342.681197]  ll_delete_inode+0x58/0x220 [lustre]
      [465342.686571]  evict+0xd2/0x1a0
      [465342.690291]  do_unlinkat+0x250/0x2e0
      [465342.694604]  do_syscall_64+0x5b/0x1a0
      [465342.698910]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [465342.704672] RIP: 0033:0x7fd72cf373cb
      [465342.708989] Code: Bad RIP value.
      [465342.712929] RSP: 002b:00007ffe0b19b5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
      [465342.721236] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd72cf373cb
      [465342.729119] RDX: 0000000000000010 RSI: 0000000000000000 RDI: 00007ffe0b19ba10
      [465342.736982] RBP: 00007ffe0b19ba10 R08: 0000000000000000 R09: 0000000000000000
      [465342.744820] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000000000
      [465342.752638] R13: 00007ffe0b19dd70 R14: 00007ffe0b19beb0 R15: 00000000010a9e9d
      

      It should be woken up by lu_object_free():

              if (waitqueue_active(wq))
                      wake_up(wq);
      

      But according to description of waitqueue_active(), a smp_mb()/spinlock is needed to wake up reliably.

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            laisiyao Lai Siyao
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: