[LU-8983] kernel BUG at fs/jbd2/transaction.c:1028! Created: 31/Dec/16  Updated: 25/Apr/17  Resolved: 25/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6722 sanity-lfsck test_1a: FAIL: (3) Fail ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

 

We had OSS crash with the following stack trace.

 

PID: 33457 TASK: ffff880c4bac2040 CPU: 1 COMMAND: "ll_ost00_032"
 #0 [ffff880945a7b3c0] machine_kexec at ffffffff8103d0cb
 #1 [ffff880945a7b420] crash_kexec at ffffffff810cc1c2
 #2 [ffff880945a7b4f0] kdb_kdump_check at ffffffff812a0c57
 #3 [ffff880945a7b500] kdb_main_loop at ffffffff812a3e17
 #4 [ffff880945a7b610] kdb_save_running at ffffffff8129dfac
 #5 [ffff880945a7b620] kdba_main_loop at ffffffff8148fe88
 #6 [ffff880945a7b660] kdb at ffffffff812a1146
 #7 [ffff880945a7b6d0] report_bug at ffffffff812b45a3
 #8 [ffff880945a7b700] die at ffffffff810110af
 #9 [ffff880945a7b730] do_trap at ffffffff81579ba4
#10 [ffff880945a7b790] do_invalid_op at ffffffff8100ce65
#11 [ffff880945a7b830] invalid_op at ffffffff8100c01b
 [exception RIP: jbd2_journal_dirty_metadata+269]
 RIP: ffffffffa0ced92d RSP: ffff880945a7b8e0 RFLAGS: 00010246
 RAX: ffff8801cc3b03b8 RBX: ffff88049d3262f0 RCX: ffff881b972b2eb0
 RDX: 0000000000000000 RSI: ffff881b972b2eb0 RDI: ffff88049d3262f0
 RBP: ffff880945a7b900 R8: 6010000000000000 R9: 0000000000000000
 R10: 0000000000000002 R11: d84156c5635688c0 R12: ffff880111a985b0
 R13: ffff881b972b2eb0 R14: ffff880c4bbb8108 R15: ffff881035518e80
 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#12 [ffff880945a7b908] __ldiskfs_handle_dirty_metadata at ffffffffa0d54fdb [ldiskfs]
#13 [ffff880945a7b948] ldiskfs_free_inode at ffffffffa0d5f277 [ldiskfs]
#14 [ffff880945a7b9d8] ldiskfs_delete_inode at ffffffffa0d67d80 [ldiskfs]
#15 [ffff880945a7ba18] generic_delete_inode at ffffffff811af07e
#16 [ffff880945a7ba48] generic_drop_inode at ffffffff811af1d5
#17 [ffff880945a7ba68] iput at ffffffff811ae022
#18 [ffff880945a7ba88] osd_object_delete at ffffffffa13a8394 [osd_ldiskfs]
#19 [ffff880945a7bad8] lu_object_free at ffffffffa05fcc51 [obdclass]
#20 [ffff880945a7bb58] lu_object_put at ffffffffa05fd38d [obdclass]
#21 [ffff880945a7bbc8] ofd_object_put at ffffffffa15b2052 [ofd]
#22 [ffff880945a7bbd8] ofd_destroy_by_fid at ffffffffa15ae291 [ofd]
#23 [ffff880945a7bcd8] ofd_destroy_hdl at ffffffffa15a7aba [ofd]
#24 [ffff880945a7bd48] tgt_request_handle at ffffffffa08848ae [ptlrpc]
#25 [ffff880945a7bda8] ptlrpc_main at ffffffffa0831b61 [ptlrpc]
#26 [ffff880945a7bee8] kthread at ffffffff810a07ee
#27 [ffff880945a7bf48] kernel_thread at ffffffff8100c28a
 

This looks like LU-4382. But we have the patch. We have quota enabled on the OST but enforcement is not enabled.



 Comments   
Comment by Mahmoud Hanafi [ 31/Dec/16 ]

This should be LU BUG not Question.

Comment by Peter Jones [ 31/Dec/16 ]

Niu

Could you please advise?

Thanks

Peter

Comment by Mahmoud Hanafi [ 31/Dec/16 ]

Some additional info
Lustre Version: 2.7.2-2nas
Kernel: 2.6.32-573.26.1.el6
OST size: 44TB
External Journal: 1GB
Nothing in the logs before the crash.

Comment by Niu Yawei (Inactive) [ 03/Jan/17 ]

I believe this is dup of LU-4382, it reoccurred because of a regression introduced by LU-6722. https://review.whamcloud.com/#/c/15334/ where we reverted part of the original fix from LU-4382 mistakenly. This regression is amended in 2.8 by https://review.whamcloud.com/#/c/19732/ (this patch is tracked in LU-6722 as well)

Comment by Jay Lan (Inactive) [ 03/Jan/17 ]

Hi Niu, we need a back port of #19732 to b2_7_fe. There are conflicts in rhel7.2 and sles12.
Thanks!

Comment by Niu Yawei (Inactive) [ 04/Jan/17 ]

Port to b2_7_fe: https://review.whamcloud.com/#/c/22006/

Comment by Jay Lan (Inactive) [ 09/Jan/17 ]

This patch is in b2_9, but not in b2_8_fe.
We may skip b2_8_fe, but I think it would be consistent to also back port #19732 to b2_8_fe.

Generated at Sat Feb 10 02:22:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.