[LU-6849] sanity-quota test 22 LBUG with “ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST” Created: 14/Jul/15  Updated: 02/Nov/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Li Xi
Resolution: Unresolved Votes: 0
Labels: None
Environment:

review-zfs-part-1 test group in autotest


Attachments: Text File lu6849.txt    
Issue Links:
Duplicate
duplicates LU-16253 sanityn: ASSERTION( orro->oo_ref == 0... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-quota test 22 failed with 'server shadow-14vm11 environments are insane!'. Logs for this failure are:
2015-07-13 16:36:38 - https://testing.hpdd.intel.com/test_sets/ba5b90c2-29a9-11e5-b0c2-5254006e85c2

From the OST console:

17:35:27:Lustre: DEBUG MARKER: umount -d -f /mnt/ost2
17:35:27:Lustre: 4055:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436808903/real 1436808903]  req@ffff88007da416c0 x1506591584180392/t0(0) o250->MGC10.1.4.170@tcp@10.1.4.170@tcp:26/25 lens 520/544 e 0 to 1 dl 1436808914 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
17:35:27:Lustre: server umount lustre-OST0001 complete
17:35:27:LustreError: 4041:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs
17:35:27:LustreError: 4041:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
17:35:27:Pid: 4041, comm: obd_zombid
17:35:27:
17:35:27:Call Trace:
17:35:27: [<ffffffffa0709875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
17:35:27: [<ffffffffa0709e77>] lbug_with_loc+0x47/0xb0 [libcfs]
17:35:27: [<ffffffffa0ada51c>] nrs_trr_hop_exit+0x1cc/0x1d0 [ptlrpc]
17:35:27: [<ffffffffa071bf46>] cfs_hash_putref+0x1a6/0x480 [libcfs]
17:35:27: [<ffffffffa0721dbb>] ? cfs_binheap_destroy+0x3db/0x810 [libcfs]
17:35:27: [<ffffffffa0adb61f>] nrs_orr_stop+0x6f/0x440 [ptlrpc]
17:35:27: [<ffffffffa0ad11af>] nrs_policy_stop0+0x4f/0x1f0 [ptlrpc]
17:35:27: [<ffffffffa0ad14f8>] nrs_policy_stop_locked+0x1a8/0x230 [ptlrpc]
17:35:27: [<ffffffffa0ad2854>] nrs_policy_unregister+0xb4/0x670 [ptlrpc]
17:35:27: [<ffffffffa0ad2f32>] ptlrpc_service_nrs_cleanup+0x122/0x5b0 [ptlrpc]
17:35:27: [<ffffffffa0a9a617>] ptlrpc_unregister_service+0x417/0xe10 [ptlrpc]
17:35:27: [<ffffffff8105bd73>] ? __wake_up+0x53/0x70
17:35:27: [<ffffffffa0fdc12a>] ost_cleanup+0x6a/0x1b0 [ost]
17:35:27: [<ffffffffa084cf55>] class_decref+0x1e5/0x4d0 [obdclass]
17:35:27: [<ffffffffa0715c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
17:35:27: [<ffffffffa083753c>] obd_zombie_impexp_cull+0x61c/0xac0 [obdclass]
17:35:27: [<ffffffffa0837a45>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
17:35:27: [<ffffffff81064be0>] ? default_wake_function+0x0/0x20
17:35:27: [<ffffffffa08379e0>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
17:35:27: [<ffffffff8109e78e>] kthread+0x9e/0xc0
17:35:27: [<ffffffff8100c28a>] child_rip+0xa/0x20
17:35:27: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
17:35:27: [<ffffffff8100c280>] ? child_rip+0x0/0x20
17:35:27:
17:35:27:Kernel panic - not syncing: LBUG
17:35:27:Pid: 4041, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.23.4.el6_lustre.g2382eb0.x86_64 #1
17:35:27:Call Trace:
17:35:27: [<ffffffff81529bbc>] ? panic+0xa7/0x16f
17:35:27: [<ffffffffa0709ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
17:35:27: [<ffffffffa0ada51c>] ? nrs_trr_hop_exit+0x1cc/0x1d0 [ptlrpc]
17:35:27: [<ffffffffa071bf46>] ? cfs_hash_putref+0x1a6/0x480 [libcfs]
17:35:27: [<ffffffffa0721dbb>] ? cfs_binheap_destroy+0x3db/0x810 [libcfs]
17:35:27: [<ffffffffa0adb61f>] ? nrs_orr_stop+0x6f/0x440 [ptlrpc]
17:35:27: [<ffffffffa0ad11af>] ? nrs_policy_stop0+0x4f/0x1f0 [ptlrpc]
17:35:27: [mory mce=off disable_cpu_apicid=0 memmap=exactmap memmap=627K@4K memmap=131449K@49779K elfcorehdr=181228K memmap=4K$0K memmap=9K$631K memmap=64K$960K memmap=12K$2097140K memmap=272K$4194032K
17:35:27:KERNEL supported cpus:


 Comments   
Comment by Oleg Drokin [ 05/Aug/15 ]

I just hit this too on current master in sanity test 77e:

<4>[53304.002633] Lustre: DEBUG MARKER: == sanityn test 77e: check TBF NID nrs policy == 06:44:27 (1438771467)
<0>[53304.068281] LustreError: 7480:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs
<0>[53304.069257] LustreError: 7480:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
<4>[53304.069586] Pid: 7480, comm: lctl
<4>[53304.069913] 
<4>[53304.069914] Call Trace:
<4>[53304.070424]  [<ffffffffa0771885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[53304.070763]  [<ffffffffa0771e87>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[53304.071034]  [<ffffffffa0fa3956>] nrs_trr_hop_exit+0x126/0x130 [ptlrpc]
<4>[53304.071376]  [<ffffffffa0784076>] cfs_hash_putref+0x1a6/0x490 [libcfs]
<4>[53304.071714]  [<ffffffffa078a05b>] ? cfs_binheap_destroy+0x3db/0x810 [libcfs]
<4>[53304.072071]  [<ffffffffa0fa49bd>] nrs_orr_stop+0x6d/0x330 [ptlrpc]
<4>[53304.072360]  [<ffffffffa0f9b33e>] nrs_policy_stop0+0x4e/0x1f0 [ptlrpc]
<4>[53304.072598]  [<ffffffffa0f9b8a8>] nrs_policy_stop_primary+0x198/0x1d0 [ptlrpc]
<4>[53304.072988]  [<ffffffffa0f9bed8>] nrs_policy_start_locked+0x458/0x690 [ptlrpc]
<4>[53304.073510]  [<ffffffffa0f9c2fc>] nrs_policy_ctl+0x1ec/0x280 [ptlrpc]
<4>[53304.073755]  [<ffffffffa0f9dbd2>] ptlrpc_nrs_policy_control+0xe2/0x2a0 [ptlrpc]
<4>[53304.074265]  [<ffffffffa0f7f276>] ptlrpc_lprocfs_nrs_seq_write+0x3e6/0x600 [ptlrpc]
<4>[53304.074822]  [<ffffffffa0f7ee90>] ? ptlrpc_lprocfs_nrs_seq_write+0x0/0x600 [ptlrpc]
<4>[53304.075350]  [<ffffffff811fb8c5>] proc_reg_write+0x85/0xc0
<4>[53304.075629]  [<ffffffff8118f4a8>] vfs_write+0xb8/0x1a0
<4>[53304.075862]  [<ffffffff8118fde1>] sys_write+0x51/0x90
<4>[53304.076099]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
<4>[53304.076380] 
<0>[53304.095801] Kernel panic - not syncing: LBUG

I have a crashdump and modules in /exports/crashdumps/192.168.10.224-2015-08-05-06\:44\:28/
tag in my source tree master-20150804

Comment by Oleg Drokin [ 08/Aug/15 ]

and I just hit it once again.

Comment by Oleg Drokin [ 30/Sep/15 ]

and just hit it once more

Comment by Oleg Drokin [ 07/Oct/15 ]

Hit once again

Comment by James Nunez (Inactive) [ 11/Jan/16 ]

More failures on master. This time in sanityn test_77e. Logs at:
2016-01-09 06:42:38 - https://testing.hpdd.intel.com/test_sets/b3b1e744-b6cb-11e5-9fda-5254006e85c2
2016-02-01 10:16:59 - https://testing.hpdd.intel.com/test_sets/2abcbe9e-c8fc-11e5-aaa9-5254006e85c2

Comment by Emoly Liu [ 05/Jul/16 ]

Hit this issue when running sanityn.sh 77e. lixi, could you please have a look? Thanks.

Comment by Li Xi (Inactive) [ 05/Jul/16 ]

Should we change oo_ref of struct nrs_orr_object to atomic_t? I feel that there are some places which could have race condition since the reference is not always protected by the same lock.

Comment by nasf (Inactive) [ 11/Dec/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/3c3d6282-be50-11e6-8111-5254006e85c2

Comment by Oleg Drokin [ 11/Jan/17 ]

I just hit this one again in master, note how it's closely located with the sleeping while atomic warning (I think there's a ticket about this, but cannot find it right away), possibly related?

[65131.137381] Lustre: DEBUG MARKER: == sanityn test 77d: check TRR nrs policy ============================================================ 17:59:22 (1484089162)
[65134.786924] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1123
[65134.793042] in_atomic(): 1, irqs_disabled(): 0, pid: 25850, name: lctl
[65134.793574] INFO: lockdep is turned off.
[65134.794029] CPU: 7 PID: 25850 Comm: lctl Tainted: G        W  OE  ------------   3.10.0-debug #1
[65134.794924] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[65134.795507]  ffff880093aa8940 0000000076eac45a ffff880073a9bd18 ffffffff816fd400
[65134.796569]  ffff880073a9bd30 ffffffff810b0109 0000000000000004 ffff880073a9bd80
[65134.797496]  ffffffffa02989d3 ffffffffa029d486 ffff8800749b5f80 0000000000000004
[65134.798352] Call Trace:
[65134.799008]  [<ffffffff816fd400>] dump_stack+0x19/0x1b
[65134.799696]  [<ffffffff810b0109>] __might_sleep+0xe9/0x110
[65134.800431]  [<ffffffffa02989d3>] cfs_hash_putref+0x233/0x500 [libcfs]
[65134.801010]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[65134.801498]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[65134.802537]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[65134.803040]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[65134.803823]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[65134.804591]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[65134.805015]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[65134.805809]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[65134.806600]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
[65134.806992]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
[65134.807398]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
[65134.807785]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
[65134.808180]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
[65134.808583] BUG: scheduling while atomic: lctl/25850/0x10000002
[65134.808958] INFO: lockdep is turned off.
[65134.809361] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea ata_generic sysfillrect sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata pcspkr serio_raw virtio_balloon i2c_core virtio_console floppy virtio_blk nfsd ip_tables [last unloaded: libcfs]
[65134.814034] CPU: 7 PID: 25850 Comm: lctl Tainted: G        W  OE  ------------   3.10.0-debug #1
[65134.814810] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[65134.815505]  ffff880093aa8940 0000000076eac45a ffff880073a9bc90 ffffffff816fd400
[65134.816785]  ffff880073a9bca8 ffffffff816f9258 ffff880073a9bfd8 ffff880073a9bd08
[65134.817813]  ffffffff817041c7 ffff880093aa8940 ffff880073a9bfd8 ffff880073a9bfd8
[65134.818721] Call Trace:
[65134.819087]  [<ffffffff816fd400>] dump_stack+0x19/0x1b
[65134.819528]  [<ffffffff816f9258>] __schedule_bug+0x59/0x68
[65134.819984]  [<ffffffff817041c7>] __schedule+0x807/0x950
[65134.820387]  [<ffffffff810b4956>] __cond_resched+0x26/0x30
[65134.821032]  [<ffffffff817045fa>] _cond_resched+0x3a/0x50
[65134.821624]  [<ffffffffa02989d8>] cfs_hash_putref+0x238/0x500 [libcfs]
[65134.822213]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[65134.822837]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[65134.823464]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[65134.824058]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[65134.825188]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[65134.826299]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[65134.826910]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[65134.827989]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[65134.829059]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
[65134.829648]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
[65134.830389]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
[65134.830966]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
[65134.831935]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
[65134.832463] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 0, with 1 refs
[65134.834538] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
[65134.835210] Pid: 25850, comm: lctl
[65134.835825] 
Call Trace:
[65134.836973]  [<ffffffffa02857d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[65134.837671]  [<ffffffffa0285841>] lbug_with_loc+0x41/0xb0 [libcfs]
[65134.838296]  [<ffffffffa09516ab>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
[65134.838884]  [<ffffffffa0298982>] cfs_hash_putref+0x1e2/0x500 [libcfs]
[65134.839519]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[65134.840295]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[65134.840990]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[65134.841729]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[65134.842903]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[65134.843961]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[65134.844570]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[65134.845658]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[65134.846795]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
[65134.847391]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
[65134.847958]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
[65134.848539]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
[65134.849076]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
[65134.849640] 
[65134.850122] Kernel panic - not syncing: LBUG
Comment by Oleg Drokin [ 29/Jul/17 ]

just hit this once again and again closely related to sleepign while atomic:

Hm this comment was really long and jira hated it.
let me see if I can add the full info as an attachment.

[22075.470864] Lustre: DEBUG MARKER: == sanityn test 77b: check CRR-N NRS policy ========================================================== 09:02:25 (1501333345)
[22085.359953] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1120
[22085.362133] in_atomic(): 1, irqs_disabled(): 0, pid: 8995, name: lctl
[22085.362674] 3 locks held by lctl/8995:
[22085.363514]  #0:  (sb_writers#3){......}, at: [<ffffffff811ed45b>] vfs_write+0x1ab/0x1e0
[22085.364569]  #1:  (&nrs_core.nrs_mutex){......}, at: [<ffffffffa05e46eb>] ptlrpc_lprocfs_nrs_seq_write+0x38b/0x560 [ptlrpc]
[22085.365679]  #2:  (&(&nrs->nrs_lock)->rlock){......}, at: [<ffffffffa05ffb0a>] nrs_policy_ctl+0x3a/0x2b0 [ptlrpc]
...
[22120.733831] BUG: scheduling while atomic: lctl/12741/0x10000002
[22120.734746] INFO: lockdep is turned off.
[22120.735614] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix serio_raw i2c_core virtio_balloon libata virtio_console virtio_blk pcspkr floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[22120.763431] CPU: 9 PID: 12741 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[22120.769006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[22120.770687]  ffff880073434b80 00000000ca53a485 ffff88008057bc90 ffffffff816fd3e4
[22120.774839]  ffff88008057bca8 ffffffff816f9218 ffff88008057bfd8 ffff88008057bd08
[22120.777191]  ffffffff817041a7 ffff880073434b80 ffff88008057bfd8 ffff88008057bfd8
[22120.780385] Call Trace:
[22120.781840]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[22120.783524]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
[22120.785190]  [<ffffffff817041a7>] __schedule+0x807/0x950
[22120.786814]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
[22120.788015]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
[22120.790636]  [<ffffffffa02000a8>] cfs_hash_putref+0x238/0x500 [libcfs]
[22120.794345]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[22120.795397]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[22120.797393]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[22120.799836]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[22120.801265]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[22120.803191]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[22120.803997]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[22120.828023]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[22120.837729]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[22120.848944]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[22120.849644]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[22120.850441]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[22120.851186]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[22120.851897] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs
[22120.856255] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
[22120.856969] Pid: 12741, comm: lctl
[22120.857575] 
Call Trace:
[22120.858750]  [<ffffffffa01ec7ce>] libcfs_call_trace+0x4e/0x60 [libcfs]
[22120.859460]  [<ffffffffa01ec85c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[22120.860231]  [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
[22120.861629]  [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs]
[22120.862123]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[22120.862821]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[22120.863589]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[22120.864315]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[22120.865614]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[22120.866922]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[22120.872716]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[22120.874257]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[22120.875521]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[22120.876212]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[22120.900486]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[22120.901959]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[22120.920647]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[22120.921334] 
[22120.921903] Kernel panic - not syncing: LBUG
[22120.922521] CPU: 9 PID: 12741 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[22120.923760] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[22120.924463]  ffffffffa020bed2 00000000ca53a485 ffff88008057bc70 ffffffff816fd3e4
[22120.925713]  ffff88008057bcf0 ffffffff816f8c34 ffffffff00000008 ffff88008057bd00
[22120.927552]  ffff88008057bca0 00000000ca53a485 00000000ca53a485 ffff88033e52d948
[22120.929142] Call Trace:
[22120.929738]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[22120.930405]  [<ffffffff816f8c34>] panic+0xd8/0x1e7
[22120.931066]  [<ffffffffa01ec874>] lbug_with_loc+0x64/0xb0 [libcfs]
[22120.931861]  [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
[22120.932566]  [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs]
[22120.933266]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[22120.934002]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[22120.934708]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[22120.935433]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[22120.936713]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[22120.937994]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[22120.938714]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[22120.939994]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[22120.941251]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[22120.941916]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[22120.942587]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[22120.943243]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[22120.943899]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b

Please take a look at it.

Comment by Arshad Hussain [ 20/Oct/19 ]

Another instance. http://testing.linuxhacker.ru:3333/lustre-reports/3782/results.html

Comment by Andreas Dilger [ 06/Nov/19 ]

+1 on master in sanityn.sh test_77d https://testing.whamcloud.com/test_sets/580c04d6-f963-11e9-a0ba-52540065bddc

Generated at Sat Feb 10 02:03:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.