[LU-16043] (osc_page.c:183:osc_page_delete()) LBUG Created: 25/Jul/22  Updated: 10/Jan/24  Resolved: 06/Sep/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Vladimir Saveliev Assignee: Vladimir Saveliev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[  418.712805] LustreError: 12744:0:(osc_page.c:182:osc_page_delete()) Trying to teardown failed: -16
[  418.715341] LustreError: 12744:0:(osc_page.c:183:osc_page_delete()) ASSERTION( 0 ) failed:
[  418.717901] LustreError: 12744:0:(osc_page.c:183:osc_page_delete()) LBUG
[  418.719937] Pid: 12744, comm: rm 3.10.0-1160.71.1.el7.x86_64 #1 SMP Sun Jul 24 17:13:36 MSK 2022
[  418.722849] Call Trace:
[  418.723845] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[  418.725936] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  418.729036] [<0>] osc_page_delete+0x49b/0x500 [osc]
[  418.732641] [<0>] cl_page_delete0+0x85/0x320 [obdclass]
[  418.734915] [<0>] cl_page_delete+0x33/0x110 [obdclass]
[  418.736796] [<0>] ll_invalidatepage+0x7f/0x170 [lustre]
[  418.738250] [<0>] do_invalidatepage_range+0x7d/0x90
[  418.739724] [<0>] truncate_inode_page+0x77/0x80
[  418.741012] [<0>] truncate_inode_pages_range+0x1ea/0x750
[  418.742643] [<0>] truncate_inode_pages_final+0x4f/0x60
[  418.744691] [<0>] ll_truncate_inode_pages_final+0x21/0xe0 [lustre]
[  418.746819] [<0>] ll_delete_inode+0x38/0x150 [lustre]
[  418.748201] [<0>] evict+0xb4/0x180
[  418.749416] [<0>] iput+0xfc/0x190
[  418.750516] [<0>] do_unlinkat+0x1ae/0x2d0

The above happens on unlink if
ll_delete_inode->cl_sync_file_range->cl_io_loop->cl_io_start does not process all slices:

cl_sync_file_range()
	cl_io_loop()
		cl_io_start()
			list_for_each_entry(scan, &io->ci_layers, cis_linkage) {
				result = scan->cis_iop->op[io->ci_type].cio_start(env, scan);
					osc_io_fsync_start();
				if (result != 0)
					break;

if osc_io_fsync_start() fails for first stripe, second stripe does not get its dirty pages discarded, which makes ll_invalidatepage->osc_page_delete() to fail on a page of second stripe:

static void osc_page_delete(const struct lu_env *env,
...
        rc = osc_teardown_async_page(env, obj, opg);
        if (rc) {
                CL_PAGE_DEBUG(D_ERROR, env, slice->cpl_page,
                              "Trying to teardown failed: %d\n", rc);
                LASSERT(0);
	}



 Comments   
Comment by Gerrit Updater [ 25/Jul/22 ]

"Vladimir Saveliev <vladimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/48032
Subject: LU-16043 osc: allow error for write on CL_FSYNC_DISCARD
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: af53dd7d647404659b2445d26243c549fe50e34e

Comment by Gerrit Updater [ 06/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48032/
Subject: LU-16043 osc: allow error for write on CL_FSYNC_DISCARD
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 050c2fb23b1f98745305a3dfe3062ea5a66dfdb4

Comment by Peter Jones [ 06/Sep/23 ]

Landed for 2.16

Comment by Andreas Dilger [ 10/Jan/24 ]

This same "LASSERT(0)" was hit even including the "osc: allow error for write on CL_FSYNC_DISCARD" patch applied. Does it make sense to replace this "LASSERT()" with an error return? At worst it would leak some memory, and possibly the file could be cleaned up later through other means (lock cancellation, page cache flushing, whatever), but it is definitely disruptive to the user.

Generated at Sat Feb 10 03:23:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.