Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
3
-
16934
Description
With stack trace of:
chmod D 0000000000000000 0 25015 1 0x00000000 ffff880175afdca8 0000000000000086 ffff880175afdc88 ffffffffa077c842 ffff880175afdc28 ffff880182b3d400 ffffffff8100b9ce ffff880175afdca8 ffff88018162baf8 ffff880175afdfd8 000000000000fb88 ffff88018162baf8 Call Trace: [<ffffffffa077c842>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc] [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13 [<ffffffff810521eb>] ? mutex_spin_on_owner+0x9b/0xc0 [<ffffffff8150fc5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150fafb>] mutex_lock+0x2b/0x50 [<ffffffffa0e92e5c>] ll_setattr_raw+0x58c/0x1ae0 [lustre] [<ffffffff81192a72>] ? user_path_at+0x62/0xa0 [<ffffffffa0e94415>] ll_setattr+0x65/0xd0 [lustre] [<ffffffff8119ead8>] notify_change+0x168/0x340 [<ffffffff8117ee13>] sys_fchmodat+0xc3/0x100 [<ffffffff81186fc6>] ? sys_newstat+0x36/0x50 [<ffffffff8151171e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
It turned out that the inode mutex is already held by the current thread itself. The root cause of this issue is in function ll_md_setattr() where it calls simple_setattr() even setting attribute on the MDT fails:
ptlrpc_req_finished(request); if (rc == -ENOENT) { clear_nlink(inode); /* Unlinked special device node? Or just a race? * Pretend we done everything. */ if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) { ia_valid = op_data->op_attr.ia_valid; op_data->op_attr.ia_valid &= ~TIMES_SET_FLAGS; rc = simple_setattr(dentry, &op_data->op_attr); op_data->op_attr.ia_valid = ia_valid; } } else if (rc != -EPERM && rc != -EACCES && rc != -ETXTBSY) { CERROR("md_setattr fails: rc = %d\n", rc); } RETURN(rc);
In racer, it may try to change a SOCK file to a regular file which will definitely fail. If that file happens to have been deleted, it will call simple_setattr() because it encounters ENOENT error, then the file's mode will be changed to regular file and then causes mutex_lock stuck.
I will push a patch to fix this issue.