[LU-8211] LustreError: 9676:0:(nrs_tbf.c:89:nrs_tbf_rule_fini()) ASSERTION( list_empty(&rule->tr_cli_list) ) failed: ^M Created: 27/May/16 Updated: 22/Sep/16 Resolved: 22/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Emoly Liu |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.7.1-fe |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
enabling tbf then disabling without creating any rules triggers LBUG. LustreError: 9676:0:(nrs_tbf.c:89:nrs_tbf_rule_fini()) ASSERTION( list_empty(&rule->tr_cli_list) ) failed: ^M LustreError: 9676:0:(nrs_tbf.c:89:nrs_tbf_rule_fini()) LBUG^M Pid: 9676, comm: lctl^M ^M Call Trace:^M [<ffffffffa040e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M [<ffffffffa040ee97>] lbug_with_loc+0x47/0xb0 [libcfs]^M |
| Comments |
| Comment by Peter Jones [ 27/May/16 ] |
|
Emoly Could you please advise? Thanks Peter |
| Comment by Qian Yingjin (Inactive) [ 29/May/16 ] |
|
I have tries the following commands: Thanks, |
| Comment by Li Xi (Inactive) [ 29/May/16 ] |
|
Hi Mahmoud, Is following patch merged in your branch?
|
| Comment by Emoly Liu [ 30/May/16 ] |
|
Mahmoud, as Qian&LiXi said, could you please provide more information to show how to reproduce this LBUG and check your branch? Thanks. |
| Comment by Jay Lan (Inactive) [ 31/May/16 ] |
|
Hi Li Xi, we do not have the Our git repo, since b2_7_fe is private, our nas-2.7.1 repo is a private repo as well: Either you give me your github ID and I can add yours to the member list or you can use Peter Jones' ID to access the tree. |
| Comment by Mahmoud Hanafi [ 31/May/16 ] |
|
Is there a way for us to get a complete list of landed NRS/TBF patches not in 2.7.1-fe? I don't want to keep opening LU on landed patches. For example I just hit where the server locks up. Backtrace showed this. PID: 32437 TASK: ffff881032c35520 CPU: 3 COMMAND: "ldlm_cn02_023" #0 [ffff88085c426e90] crash_nmi_callback at ffffffff81032256 #1 [ffff88085c426ea0] notifier_call_chain at ffffffff81568515 #2 [ffff88085c426ee0] atomic_notifier_call_chain at ffffffff8156857a #3 [ffff88085c426ef0] notify_die at ffffffff810a44fe #4 [ffff88085c426f20] do_nmi at ffffffff8156618f #5 [ffff88085c426f50] nmi at ffffffff815659f0 [exception RIP: _spin_lock+30] RIP: ffffffff8156525e RSP: ffff88085c423e78 RFLAGS: 00000097 RAX: 0000000000004603 RBX: ffff880f5ea080e0 RCX: 0000000000000000 RDX: 0000000000004602 RSI: ffff88085c430658 RDI: ffff880f5ea080e0 RBP: ffff88085c423e78 R8: 0000000000000000 R9: 0000000000000001 R10: 000000000000002c R11: 00000000000000b4 R12: ffff880f5ea08000 R13: ffff88085c430600 R14: ffff88085c423f28 R15: ffffffffa07e4b50 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88085c423e78] _spin_lock at ffffffff8156525e #7 [ffff88085c423e80] nrs_tbf_timer_cb at ffffffffa07e4b7a [ptlrpc] #8 [ffff88085c423ea0] __run_hrtimer at ffffffff810a29ae #9 [ffff88085c423ef0] hrtimer_interrupt at ffffffff810a2d46 #10 [ffff88085c423f70] local_apic_timer_interrupt at ffffffff8103433d #11 [ffff88085c423f90] smp_apic_timer_interrupt at ffffffff8156ae25 #12 [ffff88085c423fb0] apic_timer_interrupt at ffffffff8100bc13 --- <IRQ stack> --- #13 [ffff880d87207c68] apic_timer_interrupt at ffffffff8100bc13 [exception RIP: nrs_resource_get_safe+88] RIP: ffffffffa07d7358 RSP: ffff880d87207d10 RFLAGS: 00000206 RAX: 0000000000004602 RBX: ffff880d87207d40 RCX: 0000000000000000 RDX: 0000000000004602 RSI: ffff880efb387778 RDI: ffff880fd2033140 RBP: ffffffff8100bc0e R8: ffff880f5ea080e0 R9: 0000000000000000 R10: 000000000000002c R11: 00000000000000b4 R12: ffff880d87207c90 R13: ffff880e4c2fc3d8 R14: ffffffffa086b640 R15: ffff881000000007 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #14 [ffff880d87207d48] ptlrpc_nrs_req_initialize at ffffffffa07d9dab [ptlrpc] #15 [ffff880d87207d68] ptlrpc_server_handle_req_in at ffffffffa079c647 [ptlrpc] #16 [ffff880d87207da8] ptlrpc_main at ffffffffa07a484c [ptlrpc] #17 [ffff880d87207ee8] kthread at ffffffff8109dc8e #18 [ffff880d87207f48] kernel_thread at ffffffff8100c28a PID: 32405 TASK: ffff880f2c04eab0 CPU: 5 COMMAND: "ldlm_cn02_004" #0 [ffff88085c446e90] crash_nmi_callback at ffffffff81032256 #1 [ffff88085c446ea0] notifier_call_chain at ffffffff81568515 #2 [ffff88085c446ee0] atomic_notifier_call_chain at ffffffff8156857a #3 [ffff88085c446ef0] notify_die at ffffffff810a44fe #4 [ffff88085c446f20] do_nmi at ffffffff8156618f #5 [ffff88085c446f50] nmi at ffffffff815659f0 [exception RIP: _spin_lock+33] RIP: ffffffff81565261 RSP: ffff880af029dd00 RFLAGS: 00000293 RAX: 0000000000004605 RBX: ffff880efb387478 RCX: 0000000000000000 RDX: 0000000000004602 RSI: ffff880efb387478 RDI: ffff880f5ea080e0 RBP: ffff880af029dd00 R8: ffff880f5ea080e0 R9: 0000000000000000 R10: 0000000000000031 R11: 00000000000000b4 R12: ffff880efb387478 R13: ffff880eb3ae53c0 R14: ffff880f5ea080e0 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff880af029dd00] _spin_lock at ffffffff81565261 #7 [ffff880af029dd08] nrs_resource_get_safe at ffffffffa07d7341 [ptlrpc] #8 [ffff880af029dd48] ptlrpc_nrs_req_initialize at ffffffffa07d9dab [ptlrpc] #9 [ffff880af029dd68] ptlrpc_server_handle_req_in at ffffffffa079c647 [ptlrpc] #10 [ffff880af029dda8] ptlrpc_main at ffffffffa07a484c [ptlrpc] #11 [ffff880af029dee8] kthread at ffffffff8109dc8e |
| Comment by Li Xi (Inactive) [ 04/Jun/16 ] |
|
Hi, Mahmoud The dump stack looks like following issue. And it has an patch which has already been merged into master branch. Would you please check whether your branch has merged this patch? Also, please check whether following patches are merged too:
And following patches are still under review, but worth to be merged for improvement: And also, if client side QoS is useful for your use case, please check: |
| Comment by Li Xi (Inactive) [ 04/Jun/16 ] |
|
Hi Jay, Sorry for missing you message. My Github ID is ddn-lixi. Would you please add me to the group? Thanks! |
| Comment by Jay Lan (Inactive) [ 06/Jun/16 ] |
|
Our production systems are running 2.7.1-4.1nasS. That build does not have most of the NRS TBF patches. Our next images on deck to be installed is 2.7.1-5nasS. It has all landed patches except Our 2.7.2-1nas images for testing also includes |
| Comment by Jay Lan (Inactive) [ 06/Jun/16 ] |
|
Hi Li Xi, I am sure you has access privilege to b2_7_fe, but I just want to double check before I add you to our lustre-nas-fe repo access list. |
| Comment by Mahmoud Hanafi [ 22/Sep/16 ] |
|
Can be closed |