Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6849

sanity-quota test 22 LBUG with “ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST”

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • review-zfs-part-1 test group in autotest
    • 3
    • 9223372036854775807

    Description

      sanity-quota test 22 failed with 'server shadow-14vm11 environments are insane!'. Logs for this failure are:
      2015-07-13 16:36:38 - https://testing.hpdd.intel.com/test_sets/ba5b90c2-29a9-11e5-b0c2-5254006e85c2

      From the OST console:

      17:35:27:Lustre: DEBUG MARKER: umount -d -f /mnt/ost2
      17:35:27:Lustre: 4055:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436808903/real 1436808903]  req@ffff88007da416c0 x1506591584180392/t0(0) o250->MGC10.1.4.170@tcp@10.1.4.170@tcp:26/25 lens 520/544 e 0 to 1 dl 1436808914 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      17:35:27:Lustre: server umount lustre-OST0001 complete
      17:35:27:LustreError: 4041:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs
      17:35:27:LustreError: 4041:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
      17:35:27:Pid: 4041, comm: obd_zombid
      17:35:27:
      17:35:27:Call Trace:
      17:35:27: [<ffffffffa0709875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      17:35:27: [<ffffffffa0709e77>] lbug_with_loc+0x47/0xb0 [libcfs]
      17:35:27: [<ffffffffa0ada51c>] nrs_trr_hop_exit+0x1cc/0x1d0 [ptlrpc]
      17:35:27: [<ffffffffa071bf46>] cfs_hash_putref+0x1a6/0x480 [libcfs]
      17:35:27: [<ffffffffa0721dbb>] ? cfs_binheap_destroy+0x3db/0x810 [libcfs]
      17:35:27: [<ffffffffa0adb61f>] nrs_orr_stop+0x6f/0x440 [ptlrpc]
      17:35:27: [<ffffffffa0ad11af>] nrs_policy_stop0+0x4f/0x1f0 [ptlrpc]
      17:35:27: [<ffffffffa0ad14f8>] nrs_policy_stop_locked+0x1a8/0x230 [ptlrpc]
      17:35:27: [<ffffffffa0ad2854>] nrs_policy_unregister+0xb4/0x670 [ptlrpc]
      17:35:27: [<ffffffffa0ad2f32>] ptlrpc_service_nrs_cleanup+0x122/0x5b0 [ptlrpc]
      17:35:27: [<ffffffffa0a9a617>] ptlrpc_unregister_service+0x417/0xe10 [ptlrpc]
      17:35:27: [<ffffffff8105bd73>] ? __wake_up+0x53/0x70
      17:35:27: [<ffffffffa0fdc12a>] ost_cleanup+0x6a/0x1b0 [ost]
      17:35:27: [<ffffffffa084cf55>] class_decref+0x1e5/0x4d0 [obdclass]
      17:35:27: [<ffffffffa0715c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      17:35:27: [<ffffffffa083753c>] obd_zombie_impexp_cull+0x61c/0xac0 [obdclass]
      17:35:27: [<ffffffffa0837a45>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
      17:35:27: [<ffffffff81064be0>] ? default_wake_function+0x0/0x20
      17:35:27: [<ffffffffa08379e0>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
      17:35:27: [<ffffffff8109e78e>] kthread+0x9e/0xc0
      17:35:27: [<ffffffff8100c28a>] child_rip+0xa/0x20
      17:35:27: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      17:35:27: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      17:35:27:
      17:35:27:Kernel panic - not syncing: LBUG
      17:35:27:Pid: 4041, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.23.4.el6_lustre.g2382eb0.x86_64 #1
      17:35:27:Call Trace:
      17:35:27: [<ffffffff81529bbc>] ? panic+0xa7/0x16f
      17:35:27: [<ffffffffa0709ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      17:35:27: [<ffffffffa0ada51c>] ? nrs_trr_hop_exit+0x1cc/0x1d0 [ptlrpc]
      17:35:27: [<ffffffffa071bf46>] ? cfs_hash_putref+0x1a6/0x480 [libcfs]
      17:35:27: [<ffffffffa0721dbb>] ? cfs_binheap_destroy+0x3db/0x810 [libcfs]
      17:35:27: [<ffffffffa0adb61f>] ? nrs_orr_stop+0x6f/0x440 [ptlrpc]
      17:35:27: [<ffffffffa0ad11af>] ? nrs_policy_stop0+0x4f/0x1f0 [ptlrpc]
      17:35:27: [mory mce=off disable_cpu_apicid=0 memmap=exactmap memmap=627K@4K memmap=131449K@49779K elfcorehdr=181228K memmap=4K$0K memmap=9K$631K memmap=64K$960K memmap=12K$2097140K memmap=272K$4194032K
      17:35:27:KERNEL supported cpus:
      

      Attachments

        Issue Links

          Activity

            [LU-6849] sanity-quota test 22 LBUG with “ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST”
            adilger Andreas Dilger added a comment - +1 on master in sanityn.sh test_77d https://testing.whamcloud.com/test_sets/580c04d6-f963-11e9-a0ba-52540065bddc
            arshad512 Arshad Hussain added a comment - Another instance. http://testing.linuxhacker.ru:3333/lustre-reports/3782/results.html
            green Oleg Drokin added a comment -

            just hit this once again and again closely related to sleepign while atomic:

            Hm this comment was really long and jira hated it.
            let me see if I can add the full info as an attachment.

            [22075.470864] Lustre: DEBUG MARKER: == sanityn test 77b: check CRR-N NRS policy ========================================================== 09:02:25 (1501333345)
            [22085.359953] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1120
            [22085.362133] in_atomic(): 1, irqs_disabled(): 0, pid: 8995, name: lctl
            [22085.362674] 3 locks held by lctl/8995:
            [22085.363514]  #0:  (sb_writers#3){......}, at: [<ffffffff811ed45b>] vfs_write+0x1ab/0x1e0
            [22085.364569]  #1:  (&nrs_core.nrs_mutex){......}, at: [<ffffffffa05e46eb>] ptlrpc_lprocfs_nrs_seq_write+0x38b/0x560 [ptlrpc]
            [22085.365679]  #2:  (&(&nrs->nrs_lock)->rlock){......}, at: [<ffffffffa05ffb0a>] nrs_policy_ctl+0x3a/0x2b0 [ptlrpc]
            ...
            [22120.733831] BUG: scheduling while atomic: lctl/12741/0x10000002
            [22120.734746] INFO: lockdep is turned off.
            [22120.735614] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix serio_raw i2c_core virtio_balloon libata virtio_console virtio_blk pcspkr floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
            [22120.763431] CPU: 9 PID: 12741 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
            [22120.769006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [22120.770687]  ffff880073434b80 00000000ca53a485 ffff88008057bc90 ffffffff816fd3e4
            [22120.774839]  ffff88008057bca8 ffffffff816f9218 ffff88008057bfd8 ffff88008057bd08
            [22120.777191]  ffffffff817041a7 ffff880073434b80 ffff88008057bfd8 ffff88008057bfd8
            [22120.780385] Call Trace:
            [22120.781840]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
            [22120.783524]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
            [22120.785190]  [<ffffffff817041a7>] __schedule+0x807/0x950
            [22120.786814]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
            [22120.788015]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
            [22120.790636]  [<ffffffffa02000a8>] cfs_hash_putref+0x238/0x500 [libcfs]
            [22120.794345]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [22120.795397]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [22120.797393]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [22120.799836]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [22120.801265]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [22120.803191]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [22120.803997]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [22120.828023]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [22120.837729]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
            [22120.848944]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
            [22120.849644]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
            [22120.850441]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
            [22120.851186]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
            [22120.851897] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs
            [22120.856255] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
            [22120.856969] Pid: 12741, comm: lctl
            [22120.857575] 
            Call Trace:
            [22120.858750]  [<ffffffffa01ec7ce>] libcfs_call_trace+0x4e/0x60 [libcfs]
            [22120.859460]  [<ffffffffa01ec85c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            [22120.860231]  [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
            [22120.861629]  [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs]
            [22120.862123]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [22120.862821]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [22120.863589]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [22120.864315]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [22120.865614]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [22120.866922]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [22120.872716]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [22120.874257]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [22120.875521]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
            [22120.876212]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
            [22120.900486]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
            [22120.901959]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
            [22120.920647]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
            [22120.921334] 
            [22120.921903] Kernel panic - not syncing: LBUG
            [22120.922521] CPU: 9 PID: 12741 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
            [22120.923760] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [22120.924463]  ffffffffa020bed2 00000000ca53a485 ffff88008057bc70 ffffffff816fd3e4
            [22120.925713]  ffff88008057bcf0 ffffffff816f8c34 ffffffff00000008 ffff88008057bd00
            [22120.927552]  ffff88008057bca0 00000000ca53a485 00000000ca53a485 ffff88033e52d948
            [22120.929142] Call Trace:
            [22120.929738]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
            [22120.930405]  [<ffffffff816f8c34>] panic+0xd8/0x1e7
            [22120.931066]  [<ffffffffa01ec874>] lbug_with_loc+0x64/0xb0 [libcfs]
            [22120.931861]  [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
            [22120.932566]  [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs]
            [22120.933266]  [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [22120.934002]  [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [22120.934708]  [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [22120.935433]  [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [22120.936713]  [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [22120.937994]  [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [22120.938714]  [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [22120.939994]  [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [22120.941251]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
            [22120.941916]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
            [22120.942587]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
            [22120.943243]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
            [22120.943899]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
            

            Please take a look at it.

            green Oleg Drokin added a comment - just hit this once again and again closely related to sleepign while atomic: Hm this comment was really long and jira hated it. let me see if I can add the full info as an attachment. [22075.470864] Lustre: DEBUG MARKER: == sanityn test 77b: check CRR-N NRS policy ========================================================== 09:02:25 (1501333345) [22085.359953] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1120 [22085.362133] in_atomic(): 1, irqs_disabled(): 0, pid: 8995, name: lctl [22085.362674] 3 locks held by lctl/8995: [22085.363514] #0: (sb_writers#3){......}, at: [<ffffffff811ed45b>] vfs_write+0x1ab/0x1e0 [22085.364569] #1: (&nrs_core.nrs_mutex){......}, at: [<ffffffffa05e46eb>] ptlrpc_lprocfs_nrs_seq_write+0x38b/0x560 [ptlrpc] [22085.365679] #2: (&(&nrs->nrs_lock)->rlock){......}, at: [<ffffffffa05ffb0a>] nrs_policy_ctl+0x3a/0x2b0 [ptlrpc] ... [22120.733831] BUG: scheduling while atomic: lctl/12741/0x10000002 [22120.734746] INFO: lockdep is turned off. [22120.735614] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix serio_raw i2c_core virtio_balloon libata virtio_console virtio_blk pcspkr floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [22120.763431] CPU: 9 PID: 12741 Comm: lctl Tainted: P W OE ------------ 3.10.0-debug #2 [22120.769006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [22120.770687] ffff880073434b80 00000000ca53a485 ffff88008057bc90 ffffffff816fd3e4 [22120.774839] ffff88008057bca8 ffffffff816f9218 ffff88008057bfd8 ffff88008057bd08 [22120.777191] ffffffff817041a7 ffff880073434b80 ffff88008057bfd8 ffff88008057bfd8 [22120.780385] Call Trace: [22120.781840] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [22120.783524] [<ffffffff816f9218>] __schedule_bug+0x59/0x68 [22120.785190] [<ffffffff817041a7>] __schedule+0x807/0x950 [22120.786814] [<ffffffff810b4936>] __cond_resched+0x26/0x30 [22120.788015] [<ffffffff817045da>] _cond_resched+0x3a/0x50 [22120.790636] [<ffffffffa02000a8>] cfs_hash_putref+0x238/0x500 [libcfs] [22120.794345] [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [22120.795397] [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [22120.797393] [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [22120.799836] [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [22120.801265] [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [22120.803191] [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [22120.803997] [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [22120.828023] [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [22120.837729] [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80 [22120.848944] [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0 [22120.849644] [<ffffffff811fe3dd>] ? putname+0x3d/0x60 [22120.850441] [<ffffffff811ede34>] SyS_write+0x84/0xf0 [22120.851186] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [22120.851897] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 1, with 1 refs [22120.856255] LustreError: 12741:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG [22120.856969] Pid: 12741, comm: lctl [22120.857575] Call Trace: [22120.858750] [<ffffffffa01ec7ce>] libcfs_call_trace+0x4e/0x60 [libcfs] [22120.859460] [<ffffffffa01ec85c>] lbug_with_loc+0x4c/0xb0 [libcfs] [22120.860231] [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc] [22120.861629] [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs] [22120.862123] [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [22120.862821] [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [22120.863589] [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [22120.864315] [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [22120.865614] [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [22120.866922] [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [22120.872716] [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [22120.874257] [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [22120.875521] [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80 [22120.876212] [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0 [22120.900486] [<ffffffff811fe3dd>] ? putname+0x3d/0x60 [22120.901959] [<ffffffff811ede34>] SyS_write+0x84/0xf0 [22120.920647] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [22120.921334] [22120.921903] Kernel panic - not syncing: LBUG [22120.922521] CPU: 9 PID: 12741 Comm: lctl Tainted: P W OE ------------ 3.10.0-debug #2 [22120.923760] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [22120.924463] ffffffffa020bed2 00000000ca53a485 ffff88008057bc70 ffffffff816fd3e4 [22120.925713] ffff88008057bcf0 ffffffff816f8c34 ffffffff00000008 ffff88008057bd00 [22120.927552] ffff88008057bca0 00000000ca53a485 00000000ca53a485 ffff88033e52d948 [22120.929142] Call Trace: [22120.929738] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [22120.930405] [<ffffffff816f8c34>] panic+0xd8/0x1e7 [22120.931066] [<ffffffffa01ec874>] lbug_with_loc+0x64/0xb0 [libcfs] [22120.931861] [<ffffffffa06055cb>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc] [22120.932566] [<ffffffffa0200052>] cfs_hash_putref+0x1e2/0x500 [libcfs] [22120.933266] [<ffffffffa0204b66>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [22120.934002] [<ffffffffa06059c9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [22120.934708] [<ffffffffa05fddd9>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [22120.935433] [<ffffffffa05fe748>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [22120.936713] [<ffffffffa05ffa84>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [22120.937994] [<ffffffffa05ffbfb>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [22120.938714] [<ffffffffa06021ef>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [22120.939994] [<ffffffffa05e4705>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [22120.941251] [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80 [22120.941916] [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0 [22120.942587] [<ffffffff811fe3dd>] ? putname+0x3d/0x60 [22120.943243] [<ffffffff811ede34>] SyS_write+0x84/0xf0 [22120.943899] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b Please take a look at it.
            green Oleg Drokin added a comment -

            I just hit this one again in master, note how it's closely located with the sleeping while atomic warning (I think there's a ticket about this, but cannot find it right away), possibly related?

            [65131.137381] Lustre: DEBUG MARKER: == sanityn test 77d: check TRR nrs policy ============================================================ 17:59:22 (1484089162)
            [65134.786924] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1123
            [65134.793042] in_atomic(): 1, irqs_disabled(): 0, pid: 25850, name: lctl
            [65134.793574] INFO: lockdep is turned off.
            [65134.794029] CPU: 7 PID: 25850 Comm: lctl Tainted: G        W  OE  ------------   3.10.0-debug #1
            [65134.794924] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [65134.795507]  ffff880093aa8940 0000000076eac45a ffff880073a9bd18 ffffffff816fd400
            [65134.796569]  ffff880073a9bd30 ffffffff810b0109 0000000000000004 ffff880073a9bd80
            [65134.797496]  ffffffffa02989d3 ffffffffa029d486 ffff8800749b5f80 0000000000000004
            [65134.798352] Call Trace:
            [65134.799008]  [<ffffffff816fd400>] dump_stack+0x19/0x1b
            [65134.799696]  [<ffffffff810b0109>] __might_sleep+0xe9/0x110
            [65134.800431]  [<ffffffffa02989d3>] cfs_hash_putref+0x233/0x500 [libcfs]
            [65134.801010]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [65134.801498]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [65134.802537]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [65134.803040]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [65134.803823]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [65134.804591]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [65134.805015]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [65134.805809]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [65134.806600]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
            [65134.806992]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
            [65134.807398]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
            [65134.807785]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
            [65134.808180]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
            [65134.808583] BUG: scheduling while atomic: lctl/25850/0x10000002
            [65134.808958] INFO: lockdep is turned off.
            [65134.809361] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea ata_generic sysfillrect sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata pcspkr serio_raw virtio_balloon i2c_core virtio_console floppy virtio_blk nfsd ip_tables [last unloaded: libcfs]
            [65134.814034] CPU: 7 PID: 25850 Comm: lctl Tainted: G        W  OE  ------------   3.10.0-debug #1
            [65134.814810] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [65134.815505]  ffff880093aa8940 0000000076eac45a ffff880073a9bc90 ffffffff816fd400
            [65134.816785]  ffff880073a9bca8 ffffffff816f9258 ffff880073a9bfd8 ffff880073a9bd08
            [65134.817813]  ffffffff817041c7 ffff880093aa8940 ffff880073a9bfd8 ffff880073a9bfd8
            [65134.818721] Call Trace:
            [65134.819087]  [<ffffffff816fd400>] dump_stack+0x19/0x1b
            [65134.819528]  [<ffffffff816f9258>] __schedule_bug+0x59/0x68
            [65134.819984]  [<ffffffff817041c7>] __schedule+0x807/0x950
            [65134.820387]  [<ffffffff810b4956>] __cond_resched+0x26/0x30
            [65134.821032]  [<ffffffff817045fa>] _cond_resched+0x3a/0x50
            [65134.821624]  [<ffffffffa02989d8>] cfs_hash_putref+0x238/0x500 [libcfs]
            [65134.822213]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [65134.822837]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [65134.823464]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [65134.824058]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [65134.825188]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [65134.826299]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [65134.826910]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [65134.827989]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [65134.829059]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
            [65134.829648]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
            [65134.830389]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
            [65134.830966]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
            [65134.831935]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
            [65134.832463] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 0, with 1 refs
            [65134.834538] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG
            [65134.835210] Pid: 25850, comm: lctl
            [65134.835825] 
            Call Trace:
            [65134.836973]  [<ffffffffa02857d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
            [65134.837671]  [<ffffffffa0285841>] lbug_with_loc+0x41/0xb0 [libcfs]
            [65134.838296]  [<ffffffffa09516ab>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc]
            [65134.838884]  [<ffffffffa0298982>] cfs_hash_putref+0x1e2/0x500 [libcfs]
            [65134.839519]  [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
            [65134.840295]  [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc]
            [65134.840990]  [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
            [65134.841729]  [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
            [65134.842903]  [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
            [65134.843961]  [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
            [65134.844570]  [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [65134.845658]  [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
            [65134.846795]  [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80
            [65134.847391]  [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0
            [65134.847958]  [<ffffffff811fe3fd>] ? putname+0x3d/0x60
            [65134.848539]  [<ffffffff811ede54>] SyS_write+0x84/0xf0
            [65134.849076]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
            [65134.849640] 
            [65134.850122] Kernel panic - not syncing: LBUG
            
            green Oleg Drokin added a comment - I just hit this one again in master, note how it's closely located with the sleeping while atomic warning (I think there's a ticket about this, but cannot find it right away), possibly related? [65131.137381] Lustre: DEBUG MARKER: == sanityn test 77d: check TRR nrs policy ============================================================ 17:59:22 (1484089162) [65134.786924] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1123 [65134.793042] in_atomic(): 1, irqs_disabled(): 0, pid: 25850, name: lctl [65134.793574] INFO: lockdep is turned off. [65134.794029] CPU: 7 PID: 25850 Comm: lctl Tainted: G W OE ------------ 3.10.0-debug #1 [65134.794924] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [65134.795507] ffff880093aa8940 0000000076eac45a ffff880073a9bd18 ffffffff816fd400 [65134.796569] ffff880073a9bd30 ffffffff810b0109 0000000000000004 ffff880073a9bd80 [65134.797496] ffffffffa02989d3 ffffffffa029d486 ffff8800749b5f80 0000000000000004 [65134.798352] Call Trace: [65134.799008] [<ffffffff816fd400>] dump_stack+0x19/0x1b [65134.799696] [<ffffffff810b0109>] __might_sleep+0xe9/0x110 [65134.800431] [<ffffffffa02989d3>] cfs_hash_putref+0x233/0x500 [libcfs] [65134.801010] [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [65134.801498] [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [65134.802537] [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [65134.803040] [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [65134.803823] [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [65134.804591] [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [65134.805015] [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [65134.805809] [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [65134.806600] [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80 [65134.806992] [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0 [65134.807398] [<ffffffff811fe3fd>] ? putname+0x3d/0x60 [65134.807785] [<ffffffff811ede54>] SyS_write+0x84/0xf0 [65134.808180] [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b [65134.808583] BUG: scheduling while atomic: lctl/25850/0x10000002 [65134.808958] INFO: lockdep is turned off. [65134.809361] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea ata_generic sysfillrect sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata pcspkr serio_raw virtio_balloon i2c_core virtio_console floppy virtio_blk nfsd ip_tables [last unloaded: libcfs] [65134.814034] CPU: 7 PID: 25850 Comm: lctl Tainted: G W OE ------------ 3.10.0-debug #1 [65134.814810] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [65134.815505] ffff880093aa8940 0000000076eac45a ffff880073a9bc90 ffffffff816fd400 [65134.816785] ffff880073a9bca8 ffffffff816f9258 ffff880073a9bfd8 ffff880073a9bd08 [65134.817813] ffffffff817041c7 ffff880093aa8940 ffff880073a9bfd8 ffff880073a9bfd8 [65134.818721] Call Trace: [65134.819087] [<ffffffff816fd400>] dump_stack+0x19/0x1b [65134.819528] [<ffffffff816f9258>] __schedule_bug+0x59/0x68 [65134.819984] [<ffffffff817041c7>] __schedule+0x807/0x950 [65134.820387] [<ffffffff810b4956>] __cond_resched+0x26/0x30 [65134.821032] [<ffffffff817045fa>] _cond_resched+0x3a/0x50 [65134.821624] [<ffffffffa02989d8>] cfs_hash_putref+0x238/0x500 [libcfs] [65134.822213] [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [65134.822837] [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [65134.823464] [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [65134.824058] [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [65134.825188] [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [65134.826299] [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [65134.826910] [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [65134.827989] [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [65134.829059] [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80 [65134.829648] [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0 [65134.830389] [<ffffffff811fe3fd>] ? putname+0x3d/0x60 [65134.830966] [<ffffffff811ede54>] SyS_write+0x84/0xf0 [65134.831935] [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b [65134.832463] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) ASSERTION( orro->oo_ref == 0 ) failed: Busy NRS TRR policy object for OST with index 0, with 1 refs [65134.834538] LustreError: 25850:0:(nrs_orr.c:483:nrs_trr_hop_exit()) LBUG [65134.835210] Pid: 25850, comm: lctl [65134.835825] Call Trace: [65134.836973] [<ffffffffa02857d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [65134.837671] [<ffffffffa0285841>] lbug_with_loc+0x41/0xb0 [libcfs] [65134.838296] [<ffffffffa09516ab>] nrs_trr_hop_exit+0xfb/0x130 [ptlrpc] [65134.838884] [<ffffffffa0298982>] cfs_hash_putref+0x1e2/0x500 [libcfs] [65134.839519] [<ffffffffa029d486>] ? cfs_binheap_destroy+0x396/0x830 [libcfs] [65134.840295] [<ffffffffa0951aa9>] nrs_orr_stop+0x69/0x270 [ptlrpc] [65134.840990] [<ffffffffa0949e59>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc] [65134.841729] [<ffffffffa094a7c8>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc] [65134.842903] [<ffffffffa094bb04>] nrs_policy_start_locked+0x624/0x670 [ptlrpc] [65134.843961] [<ffffffffa094bc7b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc] [65134.844570] [<ffffffffa094e25f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [65134.845658] [<ffffffffa092ea35>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc] [65134.846795] [<ffffffff8125d3ed>] proc_reg_write+0x3d/0x80 [65134.847391] [<ffffffff811ed38d>] vfs_write+0xbd/0x1e0 [65134.847958] [<ffffffff811fe3fd>] ? putname+0x3d/0x60 [65134.848539] [<ffffffff811ede54>] SyS_write+0x84/0xf0 [65134.849076] [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b [65134.849640] [65134.850122] Kernel panic - not syncing: LBUG
            yong.fan nasf (Inactive) added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/3c3d6282-be50-11e6-8111-5254006e85c2

            Should we change oo_ref of struct nrs_orr_object to atomic_t? I feel that there are some places which could have race condition since the reference is not always protected by the same lock.

            lixi Li Xi (Inactive) added a comment - Should we change oo_ref of struct nrs_orr_object to atomic_t? I feel that there are some places which could have race condition since the reference is not always protected by the same lock.
            emoly.liu Emoly Liu added a comment -

            Hit this issue when running sanityn.sh 77e. lixi, could you please have a look? Thanks.

            emoly.liu Emoly Liu added a comment - Hit this issue when running sanityn.sh 77e. lixi , could you please have a look? Thanks.
            jamesanunez James Nunez (Inactive) added a comment - - edited

            More failures on master. This time in sanityn test_77e. Logs at:
            2016-01-09 06:42:38 - https://testing.hpdd.intel.com/test_sets/b3b1e744-b6cb-11e5-9fda-5254006e85c2
            2016-02-01 10:16:59 - https://testing.hpdd.intel.com/test_sets/2abcbe9e-c8fc-11e5-aaa9-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - - edited More failures on master. This time in sanityn test_77e. Logs at: 2016-01-09 06:42:38 - https://testing.hpdd.intel.com/test_sets/b3b1e744-b6cb-11e5-9fda-5254006e85c2 2016-02-01 10:16:59 - https://testing.hpdd.intel.com/test_sets/2abcbe9e-c8fc-11e5-aaa9-5254006e85c2
            green Oleg Drokin added a comment -

            Hit once again

            green Oleg Drokin added a comment - Hit once again
            green Oleg Drokin added a comment -

            and just hit it once more

            green Oleg Drokin added a comment - and just hit it once more

            People

              lixi_wc Li Xi
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: