Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
When a regression test suite is run on an NFS client against a NFS exported Lustre file system, the NFS server/Lustre client slows. Many of the nfsd threads are stuck in osc_extent_wait:
PID: 5989, 6017, 6018, 6022, 6023, 6024, 6025, 6026, 6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039, 6040, 6041, 6042, 6043
TASKS: 25
schedule at ffffffff8161523e
osc_extent_wait at ffffffffa0ec96b0 [osc]
osc_cache_wait_range at ffffffffa0ecff5c [osc]
osc_io_fsync_end at ffffffffa0ebc7c6 [osc]
cl_io_end at ffffffffa09d6ac5 [obdclass]
lov_io_end_wrapper at ffffffffa0ca3314 [lov]
lov_io_fsync_end at ffffffffa0ca366e [lov]
cl_io_end at ffffffffa09d6ac5 [obdclass]
cl_io_loop at ffffffffa09da0dc [obdclass]
cl_sync_file_range at ffffffffa0d9aea5 [lustre]
ll_writepages at ffffffffa0dc1e83 [lustre]
do_writepages at ffffffff811519ae
__filemap_fdatawrite_range at ffffffff81146121
filemap_write_and_wait_range at ffffffff8114623a
ll_fsync at ffffffffa0d9b09a [lustre]
vfs_fsync_range at ffffffff811d925b
vvp_io_write_start at ffffffffa0df29f7 [lustre]
cl_io_start at ffffffffa09d6d0e [obdclass]
cl_io_loop at ffffffffa09da0ce [obdclass]
ll_file_io_generic at ffffffffa0d91f88 [lustre]
ll_file_write_iter at ffffffffa0d9257d [lustre]
do_iter_readv_writev at ffffffff811a988a
do_readv_writev at ffffffff811aa258
vfs_writev at ffffffff811aa50c
nfsd_vfs_write at ffffffff812e5e02
nfsd_write at ffffffff812e84f8
nfsd3_proc_write at ffffffff812ed523
nfsd_dispatch at ffffffff812e14ae
svc_process at ffffffff815ec536
nfsd at ffffffff812e0ef0
kthread at ffffffff81074376
ret_from_fork at ffffffff8161983f
They are waiting for the extent's oe_state to change to OES_INV but there is no I/O pending that would cause the state to change. The ptlrpcd queues are empty; no threads are performing synchronous I/O.
The problem was traced to a kernel change in generic_write_sync(). It checks for IOCB_DSYNC in the ki_flags instead of O_SYNC and IS_SYNC. As a result, generic_write_sync() is not writing anything and osc_extents are not getting released before the wait begins.
Old function:
int generic_write_sync(struct file *file, loff_t pos, loff_t count)
{
if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
return 0;
return vfs_fsync_range(file, pos, pos + count - 1,
(file->f_flags & __O_SYNC) ? 0 : 1);
}
New function:
static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
{
if (iocb->ki_flags & IOCB_DSYNC) {
int ret = vfs_fsync_range(iocb->ki_filp,
iocb->ki_pos - count, iocb->ki_pos - 1,
(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
if (ret)
return ret;
}
return count;
}