[LU-9479] sanity test 184d 244: don't instantiate PFL component when taking group lock Created: 09/May/17  Updated: 08/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: pfl

Issue Links:
Related
is related to LU-13645 Various data corruptions possible in ... Resolved
is related to LU-9341 PFL: append should not instantiate fu... Resolved
is related to LU-8998 Progressive File Layout (PFL) Resolved
is related to LU-9344 sanity test_244: sendfile_grouplock t... Resolved
is related to LU-9349 PFL known issues tracking ticket Resolved
is related to LU-9429 parallel-scale test_parallel_grouploc... Open
is related to LU-17403 lfs migrate: cannot get group lock: N... Open
is related to LU-9756 sanity test 184d fails with ‘lovea *... Open
is related to LU-9793 sanity test 244 fail Resolved
is related to LU-10070 PFL self-extending file layout Resolved
is related to LU-12738 PFL: append of PFL file should not in... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for bobijam <bobijam.xu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/67af86be-2027-11e7-9073-5254006e85c2.

The sub-test test_244 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 244

sendfile_grouplock.c calls sendfile_copy(sourfile, 0, destfile, 98765)
and sendfile_copy()->llapi_group_lock(fd_out, dest_gid);

which will call into lov_io_init() and atomic_inc(&lov->lo_active_ios)

and sendfile_copy() tries to write to the file, which will check to get layout, and ll_layout_refresh() finds there is an active ios (marked by ll_get_grouplock()), so the write hung there

sendfile_grou S 0000000000000000     0  7394   7321 0x00000080
 ffff88000eb3f618 0000000000000082 ffff88000eb3f5e0 ffff88000eb3f5dc
 00001ce200000000 ffff88003f828400 0000005dce083b5f ffff880003436ac0
 00000000000005ff 0000000100017a1d ffff88002b57fad0 ffff88000eb3ffd8
Call Trace:
 [<ffffffffa0afa20b>] lov_layout_wait+0x11b/0x220 [lov]
 [<ffffffff810640e0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0afc11e>] lov_conf_set+0x37e/0xa30 [lov]
 [<ffffffffa040f471>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa059d888>] cl_conf_set+0x58/0x100 [obdclass]
 [<ffffffffa0fa5dd4>] ll_layout_conf+0x84/0x3f0 [lustre]
 [<ffffffffa040f471>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0fb0b9d>] ll_layout_refresh+0x96d/0x1710 [lustre]
 [<ffffffffa040f471>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0ff7d6f>] vvp_io_init+0x32f/0x450 [lustre]
 [<ffffffffa040f471>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa05a5148>] cl_io_init0+0x88/0x150 [obdclass]
 [<ffffffffa05a7caa>] cl_io_init+0x4a/0xa0 [obdclass]
 [<ffffffffa05a7dbc>] cl_io_rw_init+0xbc/0x200 [obdclass]
 [<ffffffffa0fa7213>] ll_file_io_generic+0x203/0xaf0 [lustre]
 [<ffffffffa0fa941d>] ll_file_aio_write+0x13d/0x280 [lustre]
 [<ffffffffa0fa969a>] ll_file_write+0x13a/0x270 [lustre]
 [<ffffffff81189ef8>] vfs_write+0xb8/0x1a0
 [<ffffffff811ba76d>] kernel_write+0x3d/0x50
 [<ffffffff811ba7da>] write_pipe_buf+0x5a/0x90
 [<ffffffff811b9342>] splice_from_pipe_feed+0x72/0x120
 [<ffffffff811ba780>] ? write_pipe_buf+0x0/0x90
 [<ffffffff811ba780>] ? write_pipe_buf+0x0/0x90
 [<ffffffff811b9d9e>] __splice_from_pipe+0x6e/0x80
 [<ffffffff811ba780>] ? write_pipe_buf+0x0/0x90
 [<ffffffff811b9e01>] splice_from_pipe+0x51/0x70
 [<ffffffff811b9e3d>] default_file_splice_write+0x1d/0x30
 [<ffffffff811b9fca>] do_splice_from+0xba/0xf0
 [<ffffffff811ba020>] direct_splice_actor+0x20/0x30
 [<ffffffff811ba256>] splice_direct_to_actor+0xc6/0x1c0
 [<ffffffff811ba000>] ? direct_splice_actor+0x0/0x30
 [<ffffffff811ba39d>] do_splice_direct+0x4d/0x60
 [<ffffffff8118a344>] do_sendfile+0x184/0x1e0
 [<ffffffff8118a3d4>] sys_sendfile64+0x34/0xb0
 [<ffffffff810e031e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Andreas Dilger [ 09/May/17 ]

This problem was worked around in patch https://review.whamcloud.com/26646 "LU-9344 test: hung with sendfile_grouplock test12()" by instantiating all of the file's components before taking the group lock. However, for operations like lfs swap_layouts in sanity.sh test_184d or lfs migrate since this will instantiate all of the components even if the file is empty.

This is inefficient for the source file since it is about to be deleted, and for the destination file since it doesn't need all of those objects.

Per comments in LU-9344 and https://review.whamcloud.com/26646 it makes sense to not instantiate the components of the source file, since it is not being modified. It is also known what the file size is for writes during migration/swap_layouts, so it is possible to instantiate the required components in advance and leave any later ones alone.

Comment by Zhenyu Xu [ 23/May/17 ]

Jinshan,

from commit d25892863a30a385cdae2b1991ab0a9ca76fcd7d, the lov->lo_active_ios was created, and get grouplock also increases it.

    LU-1876 hsm: account for active IOs correctly
    
    We used to use refcount of lsm to account for active IOs but this
    turns out wrong because empty layout files don't have a lsm.
    
    In this patch, lov_object::lo_active_ios is invented to account for
    active IOs for both raid0 and empty layout;

I'm think of not accounting get_grouplock as a active IOs so that process with the grouplock won't hung itself in the first place, do you think it's plausible?

Comment by Zhenyu Xu [ 23/May/17 ]

well, even lo_active_ios is resolved with my last comment, there still exists problem that the later instantiated component does not protected by the prior given group lock.

Comment by Andreas Dilger [ 31/Aug/17 ]

Bobijam, can you discuss with Oleg some way we could get group locks for newly-instantiated components that is not racy. It seems that the workaround is not working properly, since there are now multiple problems with grouplocks being seen.

Comment by Patrick Farrell (Inactive) [ 20/Feb/18 ]

Is it not possible for a group lock to prevent instantiation of further components by those not holding a group lock? It would imply a group lock also had to take a layout lock, though.

Comment by Andreas Dilger [ 22/Feb/18 ]

Patrick, the problem here is that we don't want to prevent instantistion new component instantistion, but if new objects are instantisted they do not have a group lock on them (since that is handled between each OST and client separately for each object).

Comment by Patrick Farrell (Inactive) [ 22/Feb/18 ]

Right, I understand - Note "instantiation of further components by those not holding a group lock". I think that's necessary because otherwise that problem you note occurs - The new segment doesn't get the old group lock applied to it. I don't see a way to solve that, so instead holding a group lock should prevent instantiation of a new component by those not holding the group lock, because otherwise they could get around the group lock and write to that new component.

Those holding the group lock can do it, which gives them a chance to apply the group lock to the new component.

This is still potentially problematic though, if the client who instantiated the new layout releases its group lock, other clients holding a group lock will not have that portion locked and so it could become unlocked.

This is not fun.

Comment by Andreas Dilger [ 22/Feb/18 ]

The issue is that group locks are handled by the OSS, but object instantiation is done by the MDS, so there is no way for the MDS to even know that there is a group lock on the file...

Comment by Patrick Farrell (Inactive) [ 22/Feb/18 ]

Yup... No straightforward suggestions from me.

Comment by Patrick Farrell (Inactive) [ 18/May/18 ]

No intention to work on this right now, but I wanted to write this down here so it doesn't get forgotten:

As Andreas said, the issue is that group locks are handled by the OSS, but object instantiation is handled by the MDS.  As Oleg noted at LUG, given that, it seems like we need to move group locks to being layout locks.

A few thoughts on this.

  1. There is no provision (now) for group lock IDs in IBITS locks.  There is enough room in the ldlm_wire_policy_data union to fit a gid in the IBITS case.  And I'm not sure how much of the rest of the mdc/mdt/IBITS LDLM code will need to be modified to support them correctly.
  2. We still need data locks to cover the data i/o.  So a group lock could start with locking the layout exclusively, then getting locks on all of the existing data components (ie just lock from 0 to EOF and not worry about uninstantiated components).  Unlocking one would go in reverse order - unlock data, unlock layout.

There may also be some ... interesting ... interoperability issues with clients that do not request/expect group locks on the layout.  That might be challenging to get right.

Comment by Andreas Dilger [ 22/Oct/21 ]

It is worthwhile to note that the "MDS holds IBITS group lock" issue is (partly) solved already via patch https://review.whamcloud.com/39406 "LU-13645 ldlm: group locks for DOM IBIT lock", which was commit v2_13_56-61-g0674044036 and as such is included in 2.14.0.

That would potentially allow group locks to always lock on the MDT first, and then lock only initialized components. Older clients that don't understand the MDT group lock would (unfortunately) keep their current behavior of always instantiating and locking all components.

The new MDS could potentially revoke the layout lock (and keep it) from older clients that do not have this functionality, to prevent non-group locks from instantiating uninitialized components, but there is unfortunately no OBD_CONNECT_* flag for this. It might be enough in 2.15 for a group lock to hold the MDS_INODEBITS_LAYOUT lock exclusively, if there are uninitialized components to block older clients, but this wouldn't be needed if all components are already initialized, or all clients are new and understand MDS group locking (a new OBD_CONNECT_* flag could be added for this).

Generated at Sat Feb 10 02:26:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.