Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.1.6
-
RHEL 6 kernel 2.6.32-504.bl6.Bull.59 (with bull patches), lustre version 2.1.6 + a few patches.
-
3
-
9223372036854775807
Description
A lustre client, which is exporting the filesystem as an NFS server frequently hits this LBug:
LustreError: 17502:0:(mdc_locks.c:797:mdc_finish_intent_lock()) ASSERTION( it->d.lustre.it_status != 0 ) failed:
The backtrace of the process is then:
crash> bt
PID: 17502 TASK: ffff8808556f4080 CPU: 19 COMMAND: "nfsd"
#0 [ffff88085aa535c8] machine_kexec at ffffffff81031dcb
#1 [ffff88085aa53628] crash_kexec at ffffffff810b5652
#2 [ffff88085aa536f8] panic at ffffffff814d4d5d
#3 [ffff88085aa53778] lbug_with_loc at ffffffffa0557deb [libcfs]
#4 [ffff88085aa53798] mdc_finish_intent_lock at ffffffffa0a1487a [mdc]
#5 [ffff88085aa53858] mdc_intent_lock at ffffffffa0a17c48 [mdc]
#6 [ffff88085aa53938] lmv_intent_open at ffffffffa0c8a920 [lmv]
#7 [ffff88085aa53a38] lmv_intent_lock at ffffffffa0c8b980 [lmv]
#8 [ffff88085aa53ac8] ll_intent_file_open at ffffffffa0b6e618 [lustre]
#9 [ffff88085aa53b58] ll_file_open at ffffffffa0b6faad [lustre]
#10 [ffff88085aa53c28] __dentry_open at ffffffff811784ca
#11 [ffff88085aa53c88] dentry_open at ffffffff81178762
#12 [ffff88085aa53cb8] nfsd_open at ffffffffa050f7ee [nfsd]
#13 [ffff88085aa53d08] nfsd_write at ffffffffa050fc93 [nfsd]
#14 [ffff88085aa53d68] nfsd3_proc_write at ffffffffa0518dbf [nfsd]
#15 [ffff88085aa53dd8] nfsd_dispatch at ffffffffa0509425 [nfsd]
#16 [ffff88085aa53e18] svc_process_common at ffffffffa03eff24 [sunrpc]
#17 [ffff88085aa53e98] svc_process at ffffffffa03f0560 [sunrpc]
#18 [ffff88085aa53eb8] nfsd at ffffffffa0509b52 [nfsd]
#19 [ffff88085aa53ee8] kthread at ffffffff8108912e
#20 [ffff88085aa53f48] kernel_thread at ffffffff810041ea
Looking at the associated lookup_intent structure (it), we get this:
crash> struct lookup_intent ffff88085aa53bc0
struct lookup_intent {
it_op = 1,
it_flags = 578846722,
it_create_mode = 33587200,
d = {
lustre =
}
}
The ticket LU-3564 which was submitted in 2013, seems to be exactly the same issue in 2.1.5, but never made it out of Triage.
Is there already a fix to this issue which we could backport ?
I have a crash dump available if needed.