Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
None
-
3
-
9223372036854775807
Description
When a regression test suite is run on an NFS client against a NFS exported Lustre file system, the NFS server/Lustre client slows. Many of the nfsd threads are stuck in osc_extent_wait:
PID: 5989, 6017, 6018, 6022, 6023, 6024, 6025, 6026, 6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039, 6040, 6041, 6042, 6043 TASKS: 25 schedule at ffffffff8161523e osc_extent_wait at ffffffffa0ec96b0 [osc] osc_cache_wait_range at ffffffffa0ecff5c [osc] osc_io_fsync_end at ffffffffa0ebc7c6 [osc] cl_io_end at ffffffffa09d6ac5 [obdclass] lov_io_end_wrapper at ffffffffa0ca3314 [lov] lov_io_fsync_end at ffffffffa0ca366e [lov] cl_io_end at ffffffffa09d6ac5 [obdclass] cl_io_loop at ffffffffa09da0dc [obdclass] cl_sync_file_range at ffffffffa0d9aea5 [lustre] ll_writepages at ffffffffa0dc1e83 [lustre] do_writepages at ffffffff811519ae __filemap_fdatawrite_range at ffffffff81146121 filemap_write_and_wait_range at ffffffff8114623a ll_fsync at ffffffffa0d9b09a [lustre] vfs_fsync_range at ffffffff811d925b vvp_io_write_start at ffffffffa0df29f7 [lustre] cl_io_start at ffffffffa09d6d0e [obdclass] cl_io_loop at ffffffffa09da0ce [obdclass] ll_file_io_generic at ffffffffa0d91f88 [lustre] ll_file_write_iter at ffffffffa0d9257d [lustre] do_iter_readv_writev at ffffffff811a988a do_readv_writev at ffffffff811aa258 vfs_writev at ffffffff811aa50c nfsd_vfs_write at ffffffff812e5e02 nfsd_write at ffffffff812e84f8 nfsd3_proc_write at ffffffff812ed523 nfsd_dispatch at ffffffff812e14ae svc_process at ffffffff815ec536 nfsd at ffffffff812e0ef0 kthread at ffffffff81074376 ret_from_fork at ffffffff8161983f
They are waiting for the extent's oe_state to change to OES_INV but there is no I/O pending that would cause the state to change. The ptlrpcd queues are empty; no threads are performing synchronous I/O.
The problem was traced to a kernel change in generic_write_sync(). It checks for IOCB_DSYNC in the ki_flags instead of O_SYNC and IS_SYNC. As a result, generic_write_sync() is not writing anything and osc_extents are not getting released before the wait begins.
Old function:
int generic_write_sync(struct file *file, loff_t pos, loff_t count) { if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host)) return 0; return vfs_fsync_range(file, pos, pos + count - 1, (file->f_flags & __O_SYNC) ? 0 : 1); }
New function:
static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count) { if (iocb->ki_flags & IOCB_DSYNC) { int ret = vfs_fsync_range(iocb->ki_filp, iocb->ki_pos - count, iocb->ki_pos - 1, (iocb->ki_flags & IOCB_SYNC) ? 0 : 1); if (ret) return ret; } return count; }