[LU-7860] LustreError: 19445:0:(ldlm_lock.c:2273:ldlm_lock_cancel()) ASSERTION( !(((( lock))->l_flags & (1ULL << 53)) != 0) ) failed Created: 09/Mar/16  Updated: 14/Jun/18  Resolved: 09/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Ruth Klundt (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

servers: (zfs) v2_8_0_0_RC2--PRISTINE-3.10.0-327.0.0.1chaos.ch6.x86_64
clients: 2.5-5chaos-CHANGED-3.10.0-327.10.1.1chaos.ch6.x86_64


Issue Links:
Related
is related to LU-6416 Client evicted on lock cancel Resolved
Severity: 4
Rank (Obsolete): 9223372036854775807

 Description   

This is an OSS node error encountered while running a 56 node/260 process IOR.

This is not a server-client combo that is likely to see production, but I'd guess any ASSERT triggered would be of interest.

The IOR jobs were failing on one of the 56 clients with ENOTDIR on attempting to open the data file.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 09/Mar/16 ]

Hi Ruth,
Do you intend this to be a severity 1 (meaning site is down)?
Thanks.
Joe

Comment by Peter Jones [ 09/Mar/16 ]

Ruth

You have flagged this ticket as severity 1 - meaning a production filesystem out of service. Looking at the details supplied it looks like this relates to testing you are doing on the community 2.8 release so I am wondering if your intention was to select the lowest severity (4)?

Peter

Comment by Ruth Klundt (Inactive) [ 09/Mar/16 ]

oops, sorry I meant lowest severity, so apparently I meant 5, 'not sever at all'

Comment by Peter Jones [ 09/Mar/16 ]

ok - no problem

Comment by Oleg Drokin [ 10/Mar/16 ]

Liang, apparently this is assertion that you have introduced and apparently it's incorrect and the previous handling for it was correct.
Would be greet f you can take another look.

Ruth, can you please upload a log from the crashed server if you have it? with a backtrace and all.

Comment by Joseph Gmitter (Inactive) [ 10/Mar/16 ]

Hi Liang,

Can you please have a look at the change?

Thanks.
Joe

Comment by Ruth Klundt (Inactive) [ 14/Mar/16 ]

The console log got very little info and no backtrace. syslog got nothing, I may have time to try to reproduce this week.

<ConMan> Console [cs48] log at 2016-03-04 19:00:00 MST.
2016-03-04 19:12:26 [28608.194920] Lustre: scratch4-OST001c: haven't heard from client 15c8341e-dd93-ee2b-fcaa-2c59e0dea086 (at 172.17.70.108@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8800b63fc800, cur 1457143946 expire 1457143796 last 1457143719
2016-03-04 19:12:26 [28608.217293] Lustre: Skipped 3 previous similar messages
2016-03-04 19:35:46 [30009.200717] LustreError: 19445:0:(ldlm_lock.c:2273:ldlm_lock_cancel()) ASSERTION( !(((( lock))->l_flags & (1ULL << 53)) != 0) ) failed:
2016-03-04 19:35:46 [30009.212987] LustreError: 19445:0:(ldlm_lock.c:2273:ldlm_lock_cancel()) LBUG
2016-03-04 19:35:46 [30009.219959] Pid: 19445, comm: ldlm_cn03_003

Comment by Gerrit Updater [ 31/May/16 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/20509
Subject: LU-7860 ldlm: revert part of commit 79e81d22
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8fc1eb987b20d3323b91fc38fe3b1ef17f3b071e

Comment by Gerrit Updater [ 20/Jun/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20509/
Subject: LU-7860 ldlm: revert part of commit 657bbc49
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 62a859fade43e23636170d054a4385d5b669774c

Comment by Joseph Gmitter (Inactive) [ 22/Jun/16 ]

Patch has landed to master for 2.9.0

Comment by Ned Bass [ 09/May/17 ]

LLNL hit this on a 2.8 server today. Please consider landing the fix to b2_8_fe. Thanks

Comment by Peter Jones [ 09/May/17 ]

Ned

As far as the community releases is concerned, this is fixed in 2.9. If you want a separate support ticket to track fixing on an FE release then we can link to this one.

Peter

PS/ THere is already a 2.8 FE port of this fix, it just needs to be integrated

Comment by Ned Bass [ 09/May/17 ]

Thanks Peter. I don't think we need a separate ticket. Let's just make sure that patch gets in the next 2.8 FE tag.

Comment by Peter Jones [ 09/May/17 ]

Sure. It was already flagged for inclusion.

Generated at Sat Feb 10 02:12:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.