[LU-4633] exception RIP: qsd_entry_iter_cb+29 Created: 14/Feb/14  Updated: 19/Feb/14  Resolved: 18/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: None
Environment:

RELEASE: 2.6.32-358.23.2.el6.20140115.x86_64.lustre241


Issue Links:
Related
is related to LU-4249 exception RIP: lqe64_hash_keycmp+12 Resolved
Severity: 3
Epic: server
Rank (Obsolete): 12680

 Description   

We seen a number of crashes with GPF.

PID: 20770  TASK: ffff881cf939caa0  CPU: 24  COMMAND: "lquota_wb_nbp7-"
 #0 [ffff881f97119770] machine_kexec at ffffffff81035e8b
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
 #1 [ffff881f971197d0] crash_kexec at ffffffff810c0492
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kernel/kexec.c: 1121
 #2 [ffff881f971198a0] kdb_kdump_check at ffffffff812858d7
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kdb/kdbmain.c: 1214
 #3 [ffff881f971198b0] kdb_main_loop at ffffffff81288ac7
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kdb/kdbmain.c: 1322
 #4 [ffff881f971199c0] kdb_save_running at ffffffff81282c2e
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kdb/kdbsupport.c: 798
 #5 [ffff881f971199d0] kdba_main_loop at ffffffff81463988
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kdb/kdba_support.c: 980
 #6 [ffff881f97119a10] kdb at ffffffff81285dc6
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kdb/kdbmain.c: 2165
 #7 [ffff881f97119a80] kdba_entry at ffffffff814632a7
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kdb/kdba_support.c: 1264
 #8 [ffff881f97119a90] notifier_call_chain at ffffffff81545255
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kernel/notifier.c: 95
 #9 [ffff881f97119ad0] atomic_notifier_call_chain at ffffffff815452ba
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kernel/notifier.c: 192
#10 [ffff881f97119ae0] notify_die at ffffffff8109c28e
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/kernel/notifier.c: 573
#11 [ffff881f97119b10] __die at ffffffff81543122
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kernel/dumpstack.c: 288
#12 [ffff881f97119b40] die at ffffffff8100f288
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kernel/dumpstack.c: 325
#13 [ffff881f97119b70] do_general_protection at ffffffff81542d02
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kernel/traps.c: 400
#14 [ffff881f97119ba0] general_protection at ffffffff81542495
    /usr/src/debug/kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86_64/kernel/entry.S
    [exception RIP: qsd_entry_iter_cb+29]
    RIP: ffffffffa0cde9bd  RSP: ffff881f97119c50  RFLAGS: 00010206
    RAX: 5a5a5a5a5a5a5a5a  RBX: ffff880db16a9d80  RCX: ffff881f97119d1c
    RDX: ffff880db16a9d80  RSI: ffff881f97119c80  RDI: ffff881ffcc303c0
    RBP: ffff881f97119c60   R8: 00000000fffffffb   R9: ffff881cc8f56c00
    R10: 0000000000000000  R11: 00000000000000be  R12: ffff881f97119d1c
    R13: 0000000000000024  R14: 5a5a5a5a5a5a5a5a  R15: ffffffffa0cde9a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#15 [ffff881f97119c68] cfs_hash_for_each_tight at ffffffffa04e31c5 [libcfs]
    /usr/src/debug/lustre-2.4.1/libcfs/libcfs/hash.c: 1473
#16 [ffff881f97119cc8] cfs_hash_for_each_safe at ffffffffa04e33e3 [libcfs]
    /usr/src/debug/lustre-2.4.1/libcfs/libcfs/hash.c: 1540
#17 [ffff881f97119cd8] qsd_start_reint_thread at ffffffffa0cdf497 [lquota]
    /usr/src/debug/lustre-2.4.1/lustre/quota/lquota_internal.h: 275
#18 [ffff881f97119d58] qsd_ready at ffffffffa0ce69f8 [lquota]
    /usr/src/debug/lustre-2.4.1/lustre/quota/qsd_handler.c: 264
#19 [ffff881f97119d88] qsd_adjust at ffffffffa0ce7654 [lquota]
    /usr/src/debug/lustre-2.4.1/lustre/quota/lquota_internal.h: 275
#20 [ffff881f97119e08] qsd_upd_thread at ffffffffa0ce3a1f [lquota]
    /usr/src/debug/lustre-2.4.1/lustre/quota/qsd_writeback.c: 413
#21 [ffff881f97119f48] kernel_thread at ffffffff8100c0ca
    /usr/src/debug////////kernel-lustre241-2.6.32-358.23.2.el6/linux-2.6.32-358.23.2.el6.20140115.x86_64/arch/x86/kernel/entry_64.S: 1213



 Comments   
Comment by Peter Jones [ 14/Feb/14 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 17/Feb/14 ]

Looks related to LU-4249

Comment by Zhenyu Xu [ 18/Feb/14 ]

dup of LU-4249

Comment by Mahmoud Hanafi [ 18/Feb/14 ]

Is rhis really a dup of LU-4249? The stack trace looks different.

Comment by Zhenyu Xu [ 19/Feb/14 ]

they are different occurrence of the same cause: lqe hash entry messed up (use after release), so we think it's a dup.

Generated at Sat Feb 10 01:44:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.