Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
I met this during racer. This looks like LU-6227, but not in directIO patch.
VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. VFS: Error -28 occurred while creating quota. LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272 LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG Pid: 19811, comm: ptlrpcd_4 Call Trace: [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc] [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc] [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc] [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc] [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc] [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass] [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc] [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc] [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc] [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc] [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc] [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc] [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc] [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc] [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc] [<ffffffff8109e66e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Patrick, it might also happen if the user runs out of quota, I believe.
The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common).
If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.