Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6085

racer stuck on mutex_lock in ll_setattr_raw()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0
    • None
    • 3
    • 16934

    Description

      With stack trace of:

      chmod         D 0000000000000000     0 25015      1 0x00000000
       ffff880175afdca8 0000000000000086 ffff880175afdc88 ffffffffa077c842
       ffff880175afdc28 ffff880182b3d400 ffffffff8100b9ce ffff880175afdca8
       ffff88018162baf8 ffff880175afdfd8 000000000000fb88 ffff88018162baf8
      Call Trace:
       [<ffffffffa077c842>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc]
       [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
       [<ffffffff810521eb>] ? mutex_spin_on_owner+0x9b/0xc0
       [<ffffffff8150fc5e>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff8150fafb>] mutex_lock+0x2b/0x50
       [<ffffffffa0e92e5c>] ll_setattr_raw+0x58c/0x1ae0 [lustre]
       [<ffffffff81192a72>] ? user_path_at+0x62/0xa0
       [<ffffffffa0e94415>] ll_setattr+0x65/0xd0 [lustre]
       [<ffffffff8119ead8>] notify_change+0x168/0x340
       [<ffffffff8117ee13>] sys_fchmodat+0xc3/0x100
       [<ffffffff81186fc6>] ? sys_newstat+0x36/0x50
       [<ffffffff8151171e>] ? do_device_not_available+0xe/0x10
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      It turned out that the inode mutex is already held by the current thread itself. The root cause of this issue is in function ll_md_setattr() where it calls simple_setattr() even setting attribute on the MDT fails:

                      ptlrpc_req_finished(request);
                      if (rc == -ENOENT) {
                              clear_nlink(inode);
                              /* Unlinked special device node? Or just a race?
                               * Pretend we done everything. */
                              if (!S_ISREG(inode->i_mode) &&
                                  !S_ISDIR(inode->i_mode)) {
                                      ia_valid = op_data->op_attr.ia_valid;
                                      op_data->op_attr.ia_valid &= ~TIMES_SET_FLAGS;
                                      rc = simple_setattr(dentry, &op_data->op_attr);
                                      op_data->op_attr.ia_valid = ia_valid;
                              }
                      } else if (rc != -EPERM && rc != -EACCES && rc != -ETXTBSY) {
                              CERROR("md_setattr fails: rc = %d\n", rc);
                      }
                      RETURN(rc);
      

      In racer, it may try to change a SOCK file to a regular file which will definitely fail. If that file happens to have been deleted, it will call simple_setattr() because it encounters ENOENT error, then the file's mode will be changed to regular file and then causes mutex_lock stuck.

      I will push a patch to fix this issue.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: