[LU-7198] vvp_io.c:701:vvp_io_fault_start()) binary changed while waiting for the page fault lock Created: 22/Sep/15  Updated: 14/Jun/18  Resolved: 25/Apr/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.4
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: D. Marc Stearman (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: llnl, llnlfixready
Environment:

Livermore branch: lustre-2.5.4-4chaos_2.6.32_504.16.2.1chaos.ch5.3.x86_64.x86_64


Issue Links:
Duplicate
Related
is related to LU-4308 MPI job causes errors "binary changed... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We continue to see messages like this:
<ConMan> Console [sierra562] log at 2015-07-20 15:00:00 PDT.
2015-07-20 15:56:56 Lustre: 13097:0:(vvp_io.c:701:vvp_io_fault_start()) binary [0x20004f83e:0x3bc8:0x0] changed while waiting for the page fault lock

<ConMan> Console [sierra562] log at 2015-07-20 16:00:00 PDT.
2015-07-20 16:13:50 Lustre: 13626:0:(vvp_io.c:701:vvp_io_fault_start()) binary [0x20004f83e:0x3bc8:0x0] changed while waiting for the page fault lock
2015-07-20 16:30:41 Lustre: 13679:0:(vvp_io.c:701:vvp_io_fault_start()) binary [0x20004f83e:0x3bc8:0x0] changed while waiting for the page fault lock

<ConMan> Console [sierra562] log at 2015-07-20 17:00:00 PDT.
2015-07-20 17:04:21 Lustre: 13807:0:(vvp_io.c:701:vvp_io_fault_start()) binary [0x20004f83e:0x3bc8:0x0] changed while waiting for the page fault lock
2015-07-20 17:38:08 Lustre: 14373:0:(vvp_io.c:701:vvp_io_fault_start()) binary [0x20004f83e:0x3bc8:0x0] changed while waiting for the page fault lock

Creating a new issue rather than reopen LU-4308.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 23/Sep/15 ]

Hi Bobijam,
Can you take this issue?
Thanks.
Joe

Comment by Gerrit Updater [ 24/Sep/15 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16622
Subject: LU-7198 llite: take/release lli_trunc_sem in vvp_io layer
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: efcd5d8ea9b6a241d588959a7a4688998105fcc5

Comment by Christopher Morrone [ 22/Oct/15 ]

Lustre 2.5.4-12chaos contains change 16650 targeting the b2_5_fe branch. That did not fix the stated problem:

2015-10-19 18:34:53 Lustre: 87030:0:(vvp_io.c:716:vvp_io_fault_start()) binary [0x200053b04:0xee27:0x0] changed while waiting for the page fault lock

<ConMan> Console [cab34] log at 2015-10-19 19:00:00 PDT.

<ConMan> Console [cab34] log at 2015-10-19 20:00:00 PDT.
2015-10-19 20:21:21 Lustre: 88635:0:(vvp_io.c:716:vvp_io_fault_start()) binary [0x200053b04:0xee27:0x0] changed while waiting for the page fault lock

Keep trying.

Comment by Christopher Morrone [ 22/Oct/15 ]

And, even worse, we now have applications hanging in sys_ftruncate all over the place. I can't help but think that this patch that messes with locking around vvp_io_setattr_start() is to blame for our current major center-wide disruption.

Here is the place where code is hanging:

2015-10-21 17:31:35 XXXXXX       D 0000000000000006     0 28745  28734 0x00000004
2015-10-21 17:31:35  ffff8805f0befb98 0000000000000086 ffff88034ac159c0 ffff88033b277150
2015-10-21 17:31:35  00ff8805f0befc28 0000000000000000 ffff8805f0befb28 ffff88034ac159c0
2015-10-21 17:31:35  ffff8805eee3e0c0 0000000000000006 ffff8805f3043068 ffff8805f0beffd8
2015-10-21 17:31:35 Call Trace:
2015-10-21 17:31:35  [<ffffffff8153e475>] rwsem_down_failed_common+0x95/0x1d0
2015-10-21 17:31:35  [<ffffffff8153e5d3>] rwsem_down_write_failed+0x23/0x30
2015-10-21 17:31:35  [<ffffffff8129fac3>] call_rwsem_down_write_failed+0x13/0x20
2015-10-21 17:31:35  [<ffffffff8153daa5>] ? down_write+0x45/0x50
2015-10-21 17:31:35  [<ffffffffa0b0c229>] vvp_io_setattr_start+0x129/0x170 [lustre]
2015-10-21 17:31:35  [<ffffffffa0638c7a>] cl_io_start+0x6a/0x140 [obdclass]
2015-10-21 17:31:35  [<ffffffffa063d384>] cl_io_loop+0xb4/0x1b0 [obdclass]
2015-10-21 17:31:35  [<ffffffffa0b04948>] cl_setattr_ost+0x218/0x2f0 [lustre]
2015-10-21 17:31:35  [<ffffffffa0ad06f1>] ll_setattr_raw+0xa31/0x1060 [lustre]
2015-10-21 17:31:35  [<ffffffffa0ad0d85>] ll_setattr+0x65/0xd0 [lustre]
2015-10-21 17:31:35  [<ffffffff811b1de8>] notify_change+0x168/0x340
2015-10-21 17:31:35  [<ffffffff81190bd4>] do_truncate+0x64/0xa0
2015-10-21 17:31:35  [<ffffffff8153c771>] ? mutex_lock+0x31/0x60
2015-10-21 17:31:35  [<ffffffff81190f80>] sys_ftruncate+0x120/0x130
2015-10-21 17:31:35  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
2015-10-21 17:31:35 XXXXXX       D 000000000000000b     0 28750  28734 0x00000004
2015-10-21 17:31:35  ffff88054385bd08 0000000000000082 0000000000000000 ffff8805dce716e0
2015-10-21 17:31:35  ffff88054385bcc8 0000000000000286 00005ab6b25e0bdc ffff88054385bd64
2015-10-21 17:31:35  ffffffffa0b3bb00 0000000105ec1b77 ffff8806395a05f8 ffff88054385bfd8
2015-10-21 17:31:35 Call Trace:
2015-10-21 17:31:35  [<ffffffff8153cc66>] __mutex_lock_slowpath+0x96/0x210
2015-10-21 17:31:35  [<ffffffff8153c77e>] mutex_lock+0x3e/0x60
2015-10-21 17:31:35  [<ffffffffa0acff9a>] ll_setattr_raw+0x2da/0x1060 [lustre]
2015-10-21 17:31:35  [<ffffffffa0ad0d85>] ll_setattr+0x65/0xd0 [lustre]
2015-10-21 17:31:35  [<ffffffff811b1de8>] notify_change+0x168/0x340
2015-10-21 17:31:35  [<ffffffff81190bd4>] do_truncate+0x64/0xa0
2015-10-21 17:31:35  [<ffffffff8153c771>] ? mutex_lock+0x31/0x60
2015-10-21 17:31:35  [<ffffffff81190f80>] sys_ftruncate+0x120/0x130
2015-10-21 17:31:35  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
Comment by Zhenyu Xu [ 22/Oct/15 ]

Would you mind getting the code line whereabout of ll_setattr_raw+0x2da?

Comment by Jinshan Xiong (Inactive) [ 22/Oct/15 ]

also please upload the reproduce program so that we can use it to check if the same problem exists in master.

Comment by Christopher Morrone [ 22/Oct/15 ]

I like your optimism, but no, we don't have a reproducer to upload.

I can get the line number later today when I'm free.

Comment by Christopher Morrone [ 22/Oct/15 ]

ll_setattr_raw+0x2da looks like the obvious location, mutex_lock on line 1620 below:

1616			}
1617			ll_finish_md_op_data(op_data);
1618		}
1619		if (!S_ISDIR(inode->i_mode)) {
1620			mutex_lock(&inode->i_mutex);
1621			if ((attr->ia_valid & ATTR_SIZE) && !hsm_import)
1622				inode_dio_wait(inode);
1623		}
1624	
1625		ll_stats_ops_tally(ll_i2sbi(inode), (attr->ia_valid & ATTR_SIZE)
Comment by Zhenyu Xu [ 27/Oct/15 ]

still investigating, haven't found the deallock root cause.

Comment by Zhenyu Xu [ 28/Oct/15 ]

Is there other kind of Dead threads besides these two? I think thread 28750 was waiting for the i_mutex which 28745 was holding, while 28745 seemed waiting for a write lock which is either lli->lli_trunc_sem (most likely) or i_alloc_sem (unlikely), and there should be other thread holding this lli_trunc_sem.

Comment by Christopher Morrone [ 28/Oct/15 ]

Unfortunately, I do not have that information.

Comment by Zhenyu Xu [ 10/Nov/15 ]

Can you provide bt -a output and keep thread name masqueraded if necessary?

Comment by Zhenyu Xu [ 11/Nov/15 ]

I find out the deadlock reason, this patch need some prepositional patch to drop the i_mutex usage in write path, or else vvp_io_write_start() will held lli->lli_trunc_sem then call lustre_generic_file_write()->generic_file_aio_write() which will get i_mutex, and this order is different from other IO path (setattr, truncate, etc).

So I need push some prepositional patch before this one.

Comment by Zhenyu Xu [ 11/Nov/15 ]

updated patch #16650 which depends on #17117 and #17116 (these two patches drops i_mutex usage in write path).

Comment by Christopher Morrone [ 11/Nov/15 ]

We can't provide further information on the buggy patch 16500 (bt -a). We stopped using it in production as soon as we identified that patch as the problem.

Keep in mind that not only did this patch introduce a deadlock, it also did not fix the error messages which this ticket is about. If you have only fixed the deadlock, then you should really be tracking your 16650 backport in some other ticket.

If you think it will address our error message, we would appreciate an explanation of why the latest revision is expected to address the error message versus the previous one.

Comment by Jinshan Xiong (Inactive) [ 13/Nov/15 ]

After taking a further look, I tend to think timestamp checking in vvp_io_fault_start() is not atomic therefore it's normal to get this kind of mismatch.

Taking a look at ll_merge_lvb() in b2_5, it first assigns mtime from MDS to inode's i_mtime, and then merge timestamp from OST objects. However, the timestamp from MDS could be stale because we maintain mtime mainly on OST objects(thinking about Lustre write implementation), and if the check in vvp_io_fault_start() happens before mtime on OST objects are merged, definitely it will get wrong timestamp from the inode, even the timestamp it fetched in vvp_io_fault_init() could be wrong in the first place.

I tend to think we should remove the timestamp check in vvp_io_fault_start().

Comment by Peter Jones [ 15/Dec/15 ]

Bobijam

Do you think that Jinshan's suggestion is viable? If so please can you create a patch on that basis?

Thanks

Peter

Comment by Zhenyu Xu [ 16/Dec/15 ]

http://review.whamcloud.com/#/c/17334/ is the patch

Comment by Peter Jones [ 16/Dec/15 ]

Bobi

LLNL will need a fix targeted on b2_5_fe

Peter

Comment by Zhenyu Xu [ 16/Dec/15 ]

port for b2_5_fe http://review.whamcloud.com/17624

Comment by D. Marc Stearman (Inactive) [ 19/Feb/16 ]

Chris, can you pull in the latest patch please?

Comment by Christopher Morrone [ 19/Feb/16 ]

There are no reviews on the backported patch, and it failed automated testing.

Comment by D. Marc Stearman (Inactive) [ 19/Feb/16 ]

Peter, please add this to the work queue. When a patch is ready for us, please add the "llnlfixready" lable.

Comment by Christopher Morrone [ 25/Mar/16 ]

We need a master branch version of this patch as well.

Comment by Gerrit Updater [ 26/Mar/16 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/19162
Subject: LU-7198 clio: remove mtime check in vvp_io_fault_start()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f35f69b866566ce8b7852facf8c914da297e9be0

Comment by Gerrit Updater [ 25/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19162/
Subject: LU-7198 clio: remove mtime check in vvp_io_fault_start()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d62f58c32074bc54055801ed0f919bfd5c277f6d

Comment by Peter Jones [ 25/Apr/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:06:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.