[LU-7678] LBUG on client reexporting on NFS with ASSERTION( it->d.lustre.it_status != 0 ) in mdc_finish_intent_lock() - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.1.6
Labels:
- p4b
Environment:
RHEL 6 kernel 2.6.32-504.bl6.Bull.59 (with bull patches), lustre version 2.1.6 + a few patches.

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

A lustre client, which is exporting the filesystem as an NFS server frequently hits this LBug:
LustreError: 17502:0:(mdc_locks.c:797:mdc_finish_intent_lock()) ASSERTION( it->d.lustre.it_status != 0 ) failed:

The backtrace of the process is then:
crash> bt
PID: 17502 TASK: ffff8808556f4080 CPU: 19 COMMAND: "nfsd"
#0 [ffff88085aa535c8] machine_kexec at ffffffff81031dcb
#1 [ffff88085aa53628] crash_kexec at ffffffff810b5652
#2 [ffff88085aa536f8] panic at ffffffff814d4d5d
#3 [ffff88085aa53778] lbug_with_loc at ffffffffa0557deb [libcfs]
#4 [ffff88085aa53798] mdc_finish_intent_lock at ffffffffa0a1487a [mdc]
#5 [ffff88085aa53858] mdc_intent_lock at ffffffffa0a17c48 [mdc]
#6 [ffff88085aa53938] lmv_intent_open at ffffffffa0c8a920 [lmv]
#7 [ffff88085aa53a38] lmv_intent_lock at ffffffffa0c8b980 [lmv]
#8 [ffff88085aa53ac8] ll_intent_file_open at ffffffffa0b6e618 [lustre]
#9 [ffff88085aa53b58] ll_file_open at ffffffffa0b6faad [lustre]
#10 [ffff88085aa53c28] __dentry_open at ffffffff811784ca
#11 [ffff88085aa53c88] dentry_open at ffffffff81178762
#12 [ffff88085aa53cb8] nfsd_open at ffffffffa050f7ee [nfsd]
#13 [ffff88085aa53d08] nfsd_write at ffffffffa050fc93 [nfsd]
#14 [ffff88085aa53d68] nfsd3_proc_write at ffffffffa0518dbf [nfsd]
#15 [ffff88085aa53dd8] nfsd_dispatch at ffffffffa0509425 [nfsd]
#16 [ffff88085aa53e18] svc_process_common at ffffffffa03eff24 [sunrpc]
#17 [ffff88085aa53e98] svc_process at ffffffffa03f0560 [sunrpc]
#18 [ffff88085aa53eb8] nfsd at ffffffffa0509b52 [nfsd]
#19 [ffff88085aa53ee8] kthread at ffffffff8108912e
#20 [ffff88085aa53f48] kernel_thread at ffffffff810041ea

Looking at the associated lookup_intent structure (it), we get this:
crash> struct lookup_intent ffff88085aa53bc0
struct lookup_intent {
it_op = 1,
it_flags = 578846722,
it_create_mode = 33587200,
d = {
lustre =

{ it_disposition = 6, # DISP_LOOKUP_EXECD | DISP_LOOKUP_NEG it_status = 0, it_lock_handle = 0, it_data = 0xffff88085eb6d400, it_lock_mode = 0 }

}
}

The ticket ~~LU-3564~~ which was submitted in 2013, seems to be exactly the same issue in 2.1.5, but never made it out of Triage.
Is there already a fix to this issue which we could backport ?

I have a crash dump available if needed.

Attachments

Issue Links

duplicates

LU-3564 LBUG exporting lustre 2.1.5 via NFS on RHEL

Resolved

is related to

LU-2523 ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID

Resolved

Activity

[LU-7678] LBUG on client reexporting on NFS with ASSERTION( it->d.lustre.it_status != 0 ) in mdc_finish_intent_lock()

Peter Jones added a comment - 26/Jan/16 12:58 PM

Thanks Sebastien!

Peter Jones added a comment - 26/Jan/16 12:58 PM Thanks Sebastien!

Sebastien Piechurski added a comment - 26/Jan/16 11:48 AM

Great !
Thanks for the analysis.
This ticket can be closed then.

Sebastien Piechurski added a comment - 26/Jan/16 11:48 AM Great ! Thanks for the analysis. This ticket can be closed then.

Lai Siyao added a comment - 26/Jan/16 8:38 AM

This is a duplicate of ~~LU-2523~~, and the fix http://review.whamcloud.com/#/c/5417 is in 2.5.

Lai Siyao added a comment - 26/Jan/16 8:38 AM This is a duplicate of LU-2523 , and the fix http://review.whamcloud.com/#/c/5417 is in 2.5.

Sebastien Piechurski added a comment - 22/Jan/16 3:40 PM

Dumps and debuginfo files have been uploaded to the ftp site under /uploads/~~LU-7678~~.

Sebastien Piechurski added a comment - 22/Jan/16 3:40 PM Dumps and debuginfo files have been uploaded to the ftp site under /uploads/ LU-7678 .

Sebastien Piechurski added a comment - 19/Jan/16 9:04 AM

Hi Peter,

The site is IT4Innovation in Czech Republic.
They have no plan to upgrade to my knowledge, but if this issue is shown to be fixed in 2.5, this will be an argument to move them from this current distribution which will be out of support in March.

Sebastien Piechurski added a comment - 19/Jan/16 9:04 AM Hi Peter, The site is IT4Innovation in Czech Republic. They have no plan to upgrade to my knowledge, but if this issue is shown to be fixed in 2.5, this will be an argument to move them from this current distribution which will be out of support in March.

Peter Jones added a comment - 18/Jan/16 6:23 PM

Lai

I know that you have worked on a few NFS issues recently. Do you recognize this at all?

Sebastien

Which is the site affected and do they have plans to move to a more current release soon?

Peter

Peter Jones added a comment - 18/Jan/16 6:23 PM Lai I know that you have worked on a few NFS issues recently. Do you recognize this at all? Sebastien Which is the site affected and do they have plans to move to a more current release soon? Peter

Bruno Faccini (Inactive) added a comment - 18/Jan/16 2:15 PM

Hello Seb,
Can you provide/upload me the crash-dump, along with the kernel-[common-]debuginfo and lustre-debuginfo RPMs?

Bruno Faccini (Inactive) added a comment - 18/Jan/16 2:15 PM Hello Seb, Can you provide/upload me the crash-dump, along with the kernel- [common-] debuginfo and lustre-debuginfo RPMs?

People

Assignee:: Lai Siyao

Reporter:: Sebastien Piechurski

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Jan/16 12:44 PM

Updated:: 26/Jan/16 12:58 PM

Resolved:: 26/Jan/16 12:58 PM