Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
Tiny writes to a regular file with setuid enabled results in a warning from notify_change() because the inode is not locked. As the write continues through ll_setattr_raw(), it sometimes hangs trying to lock the inode.
Warning in console log:
2019-01-25T19:30:35.591624-06:00 c0-0c1s10n1 ------------[ cut here ]------------ 2019-01-25T19:30:35.591630-06:00 c0-0c1s10n1 WARNING: CPU: 11 PID: 15121 at ../fs/attr.c:212 notify_change+0x398/0x408 2019-01-25T19:30:35.591644-06:00 c0-0c1s10n1 CPU: 11 PID: 15121 Comm: memfill2 Tainted: P W O 4.12.14-25.22_5.0.64-cray_ari_c #1 SLE15 (unreleased) 2019-01-25T19:30:35.591649-06:00 c0-0c1s10n1 Hardware name: Cavium Inc. Borg/Unknown, BIOS Cavium reference firmware version 6.3 05/01/2018 2019-01-25T19:30:35.591655-06:00 c0-0c1s10n1 task: ffff80be1a1be100 task.stack: ffff00001bd90000 2019-01-25T19:30:35.591660-06:00 c0-0c1s10n1 PC is at notify_change+0x398/0x408 2019-01-25T19:30:35.591665-06:00 c0-0c1s10n1 LR is at file_remove_privs+0xc4/0xf8 2019-01-25T19:30:35.591671-06:00 c0-0c1s10n1 pc : [<ffff000008271088>] lr : [<ffff0000082703cc>] pstate: 20000009 2019-01-25T19:30:35.591676-06:00 c0-0c1s10n1 sp : ffff00001bd93c20 2019-01-25T19:30:35.591676-06:00 c0-0c1s10n1 sp : ffff00001bd93c20 2019-01-25T19:30:35.591759-06:00 c0-0c1s10n1 Call trace: 2019-01-25T19:30:35.591823-06:00 c0-0c1s10n1 [<ffff000008271088>] notify_change+0x398/0x408 2019-01-25T19:30:35.591828-06:00 c0-0c1s10n1 [<ffff0000082703cc>] file_remove_privs+0xc4/0xf8 2019-01-25T19:30:35.591834-06:00 c0-0c1s10n1 [<ffff0000081cbe5c>] __generic_file_write_iter+0x5c/0x1d0 2019-01-25T19:30:35.591840-06:00 c0-0c1s10n1 [<ffff0000013867d0>] ll_file_write_iter+0x2c8/0x5b0 [lustre] 2019-01-25T19:30:35.591845-06:00 c0-0c1s10n1 [<ffff00000824b764>] __vfs_write+0xd4/0x130 2019-01-25T19:30:35.591850-06:00 c0-0c1s10n1 [<ffff00000824cc4c>] vfs_write+0xac/0x1b8 2019-01-25T19:30:35.591855-06:00 c0-0c1s10n1 [<ffff00000824e304>] SyS_write+0x54/0xb0
The warning code from notify_change is:
WARN_ON_ONCE(!inode_is_locked(inode));
The warning occurs on the ll_tiny_write path through ll_file_write_iter:
updates/kernel/fs/lustre > gdb lustre.ko
(gdb) list *(ll_file_write_iter+0x2c8)
0x1e7f8 is in ll_file_write_iter (/home/abuild/rpmbuild/BUILD/cray-lustre-2.11.0.200_cray_79_g4c42971/lustre/llite/file.c:1613).
1608 /home/abuild/rpmbuild/BUILD/cray-lustre-2.11.0.200_cray_79_g4c42971/lustre/llite/file.c: No such file or directory.
(gdb)
Source from lustre/llite/file.c
1597 static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter)
1598 {
1599 ssize_t count = iov_iter_count(iter);
1600 struct file *file = iocb->ki_filp;
1601 struct inode *inode = file_inode(file);
1602 ssize_t result = 0;
1603
1604 ENTRY;
1605
1606 /* Restrict writes to single page and < PAGE_SIZE. See comment at top
1607 * of function for why.
1608 */
1609 if (count >= PAGE_SIZE ||
1610 (iocb->ki_pos & (PAGE_SIZE-1)) + count > PAGE_SIZE)
1611 RETURN(0);
1612
1613 result = __generic_file_write_iter(iocb, iter); <--- location on stack
....
Note that the other Lustre call to __generic_file_write_iter in vvp_io_write_start() is wrapped with lock/unlock calls that are not included in the tiny write case.
1023 static int vvp_io_write_start(const struct lu_env *env,
1024 const struct cl_io_slice *ios)
1025 {
...
1033 bool lock_inode = !lli->lli_inode_locked &&
1034 !IS_NOSEC(inode);
...
1092 if (lock_inode)
1093 inode_lock(inode);
1094 result = __generic_file_write_iter(&io->u.ci_rw.rw_iocb,
1095 &io->u.ci_rw.rw_iter);
1096 if (lock_inode)
1097 inode_unlock(inode);
The proposed fix is to add this same logic to ll_do_tiny_write().
Because the inode is not locked, tiny writes can race in ll_setattr_raw(). The inode rw_semaphore gets corrupted, so the down_write request waits forever.
crash_arm64> bt 15121
PID: 15121 TASK: ffff80be1a1be100 CPU: 11 COMMAND: "memfill2"
#0 [ffff00001bd939e0] __switch_to at ffff000008086500
#1 [ffff00001bd93a00] __schedule at ffff000008773cbc
#2 [ffff00001bd93a70] schedule at ffff000008774330
#3 [ffff00001bd93a80] rwsem_down_write_failed at ffff0000087774a4
#4 [ffff00001bd93b00] down_write at ffff000008776884
#5 [ffff00001bd93b40] ll_setattr_raw at ffff0000013af07c [lustre]
#6 [ffff00001bd93c00] ll_setattr at ffff0000013af6c4 [lustre]
#7 [ffff00001bd93c20] notify_change at ffff000008270f28
#8 [ffff00001bd93c70] file_remove_privs at ffff0000082703c8
#9 [ffff00001bd93cf0] __generic_file_write_iter at ffff0000081cbe58
#10 [ffff00001bd93d40] ll_file_write_iter at ffff0000013867cc [lustre]
#11 [ffff00001bd93db0] __vfs_write at ffff00000824b760
#12 [ffff00001bd93e40] vfs_write at ffff00000824cc48
#13 [ffff00001bd93e80] sys_write at ffff00000824e300
#14 [ffff00001bd93ff0] el0_svc_naked at ffff000008083c7c
crash_arm64> bt -FF | grep inode ==> yields inode and lli pointers:
ll_inode_info ffff809e6d14bb80
inode ffff809e6d14bc08
crash_arm64> struct -o inode | grep i_rwsem
[160] struct rw_semaphore i_rwsem;
crash_arm64> eval 0xffff809e6d14bc08 + 160 | grep hex
hexadecimal: ffff809e6d14bca8
crash_arm64> rw_semaphore ffff809e6d14bca8
struct rw_semaphore {
count = {
counter = -1
},
wait_list = {
next = 0xffff00001bd93ae0,
prev = 0xffff00001bd93ae0
},
wait_lock = {
raw_lock = {
owner = 1,
next = 1
}
},
osq = {
tail = {
counter = 0
}
},
owner = 0x0
}
1584 int ll_setattr_raw(struct dentry *dentry, struct iattr *attr, bool hsm_import)
1585 {
....
1646 if (S_ISREG(inode->i_mode)) {
1647 if (attr->ia_valid & ATTR_SIZE)
1648 inode_dio_write_done(inode);
1649 inode_unlock(inode);
1650 }
.....
1720 EXIT;
1721 out:
1722 if (op_data != NULL)
1723 ll_finish_md_op_data(op_data);
1724
1725 if (S_ISREG(inode->i_mode)) {
1726 inode_lock(inode);
1727 if ((attr->ia_valid & ATTR_SIZE) && !hsm_import)
1728 inode_dio_wait(inode);
1729 /* Once we've got the i_mutex, it's safe to set the S_NOSEC
1730 * flag. ll_update_inode (called from ll_md_setattr), clears
1731 * inode flags, so there is a gap where S_NOSEC is not set.
1732 * This can cause a writer to take the i_mutex unnecessarily,
1733 * but this is safe to do and should be rare. */
1734 inode_has_no_xattr(inode);
1735 }
I haven't figured out the exact scenario that causes i_rwsem to end up with the -1 value, but it's not too important because ll_setattr_raw() should never be called without the semaphore already being locked.