Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9839

lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Just hit this assertion on master-next that was immediately followed by a NULL pointer deref

      [60342.200382] Lustre: DEBUG MARKER: == sanity test 405: Various layout swap lock tests =================================================== 16:21:48 (1501964508)
      [60385.832096] LustreError: 10934:0:(lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: 
      [60385.832103] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [60385.832111] IP: [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832113] PGD 0 
      [60385.832114] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [60385.832130] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 rpcsec_gss_krb5 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm virtio_balloon drm_kms_helper pcspkr ata_piix serio_raw virtio_console virtio_blk floppy drm i2c_piix4 libata i2c_core nfsd ip_tables [last unloaded: libcfs]
      [60385.832132] CPU: 2 PID: 5577 Comm: kworker/u16:0 Tainted: G        W  OE  ------------   3.10.0-debug #1
      [60385.832133] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [60385.832138] Workqueue: writeback bdi_writeback_workfn (flush-lustre-3)
      [60385.832138] task: ffff88009d1c88c0 ti: ffff880058194000 task.ti: ffff880058194000
      [60385.832143] RIP: 0010:[<ffffffffa078decc>]  [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832145] RSP: 0018:ffff8800581978c0  EFLAGS: 00010297
      [60385.832146] RAX: ffff880014330f68 RBX: ffff880078ddedf0 RCX: 0000000000000000
      [60385.832146] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880014330f68
      [60385.832147] RBP: ffff880058197900 R08: 0000000000000001 R09: 0000000000000000
      [60385.832147] R10: 0000000000000000 R11: ffffffffffffffff R12: 0000000000000000
      [60385.832148] R13: ffff880078ddefd0 R14: ffff880078ddeda0 R15: ffff88000c9b7e10
      [60385.832149] FS:  0000000000000000(0000) GS:ffff8800bc680000(0000) knlGS:0000000000000000
      [60385.832149] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [60385.832150] CR2: 0000000000000000 CR3: 00000000ae6b8000 CR4: 00000000000006e0
      [60385.832152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [60385.832152] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [60385.832153] Stack:
      [60385.832155]  0000000000000000 0000000000000000 ffff880021d0fed0 0000000000000000
      [60385.832156]  0000000000000000 ffff88002e8451d0 0000000000000000 ffff880078ddeda0
      [60385.832157]  ffff8800581979b0 ffffffffa078ecda ffff88000c9b7e10 ffff880021dabfa0
      [60385.832157] Call Trace:
      [60385.832162]  [<ffffffffa078ecda>] lov_io_iter_init+0x1ca/0x8b0 [lov]
      [60385.832185]  [<ffffffffa037794c>] cl_io_iter_init+0x5c/0x120 [obdclass]
      [60385.832201]  [<ffffffffa0379c5c>] cl_io_loop+0x19c/0xb30 [obdclass]
      [60385.832213]  [<ffffffffa0dcae9b>] cl_sync_file_range+0x2db/0x380 [lustre]
      [60385.832222]  [<ffffffffa0dec5ba>] ll_writepages+0x7a/0x200 [lustre]
      [60385.832225]  [<ffffffff8117e8b1>] do_writepages+0x21/0x50
      [60385.832227]  [<ffffffff81218f50>] __writeback_single_inode+0x40/0x2b0
      [60385.832228]  [<ffffffff81219c11>] writeback_sb_inodes+0x2b1/0x4d0
      [60385.832230]  [<ffffffff81219ecf>] __writeback_inodes_wb+0x9f/0xd0
      [60385.832232]  [<ffffffff8121a76b>] wb_writeback+0x28b/0x340
      [60385.832233]  [<ffffffff8121c9cc>] bdi_writeback_workfn+0x20c/0x4e0
      [60385.832235]  [<ffffffff8109add6>] process_one_work+0x206/0x5b0
      [60385.832236]  [<ffffffff8109ad6b>] ? process_one_work+0x19b/0x5b0
      [60385.832238]  [<ffffffff8109b29b>] worker_thread+0x11b/0x3a0
      [60385.832239]  [<ffffffff8109b180>] ? process_one_work+0x5b0/0x5b0
      [60385.832240]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [60385.832242]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [60385.832245]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [60385.832246]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [60385.832257] Code: 85 bc 04 00 00 44 8b 82 08 01 00 00 8b 4d cc 48 8b 75 c0 44 39 c1 0f 83 aa 04 00 00 48 8b 92 10 01 00 00 48 89 c7 4a 8b 54 22 18 <48> 8b 0c f2 c7 83 b0 01 00 00 00 00 00 00 4c 89 7b 38 48 81 c1 
      [60385.832261] RIP  [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832262]  RSP <ffff8800581978c0>
      

      modules and crashdump are in /exports/crashdumps/192.168.10.224-2017-08-05-16\:22\:38 on my node
      tag in my tree: master-20170804

      Attachments

        Activity

          [LU-9839] lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54983
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: 57c968fdd2fc8dcf688fffeb6bc409674d8d6e98

          gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54983 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 57c968fdd2fc8dcf688fffeb6bc409674d8d6e98

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54981
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 2855936ef0647e31884b2e470d56ea49b3226418

          gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54981 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2855936ef0647e31884b2e470d56ea49b3226418

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51638/
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51638/ Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51638
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 72990b262c4dd763a109e7755f697acb0046c543

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51638 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 72990b262c4dd763a109e7755f697acb0046c543

          https://review.whamcloud.com/c/fs/lustre-release/+/51269
          Subject: LU-9839 clio: debug active_ios race

          crashed and gave additional info about a parallel i/o:

          [2577729.757681] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: ios: 1, last 
          io: osc_lru_shrink
          [2577729.776963] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) LBUG
          
          zam Alexander Zarochentsev added a comment - https://review.whamcloud.com/c/fs/lustre-release/+/51269 Subject: LU-9839 clio: debug active_ios race crashed and gave additional info about a parallel i/o: [2577729.757681] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: ios: 1, last io: osc_lru_shrink [2577729.776963] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) LBUG

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51269
          Subject: LU-9839 clio: debug active_ios race
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8705d17265065eeab1f058218e1fa870cc6f5cb6

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51269 Subject: LU-9839 clio: debug active_ios race Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8705d17265065eeab1f058218e1fa870cc6f5cb6
          gerrit Gerrit Updater added a comment - - edited

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50800
          Subject: LU-9839 lov: ensure lo_waitq is drains for layout change
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 9bfd454ba335124f8b98dc51eb5f727f58d972a7

          gerrit Gerrit Updater added a comment - - edited "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50800 Subject: LU-9839 lov: ensure lo_waitq is drains for layout change Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9bfd454ba335124f8b98dc51eb5f727f58d972a7

          This bug shouldn't happen. In lov_layout_change() we call llo_delete() which in turn calls lov_layout_wait() that empties lo_waitq. The reference count should be zero. Then the LASSERT happens a few steps later. How could it be possible to add new IO in that time?

          simmonsja James A Simmons added a comment - This bug shouldn't happen. In lov_layout_change() we call llo_delete() which in turn calls lov_layout_wait() that empties lo_waitq. The reference count should be zero. Then the LASSERT happens a few steps later. How could it be possible to add new IO in that time?

          Just ran into this on 2.15 LTS production system.

          simmonsja James A Simmons added a comment - Just ran into this on 2.15 LTS production system.
          green Oleg Drokin added a comment -

          Just hit it once more on current master-next

          green Oleg Drokin added a comment - Just hit it once more on current master-next
          green Oleg Drokin added a comment -

          Just hit this assertion again on master-next in sanity test 405:

          [ 6650.701477] Lustre: DEBUG MARKER: == sanity test 405: Various layout swap lock tests =================================================== 21:10:20 (1507857020)
          [ 6718.387820] LustreError: 7955:0:(lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: 
          [ 6718.387821] LustreError: 7955:0:(lov_object.c:879:lov_layout_change()) LBUG
          [ 6718.387822] Pid: 7955, comm: swap_lock_test
          [ 6718.387822] 
          Call Trace:
          [ 6718.387836]  [<ffffffffa01d57fe>] libcfs_call_trace+0x4e/0x60 [libcfs]
          [ 6718.387843]  [<ffffffffa01d588c>] lbug_with_loc+0x4c/0xb0 [libcfs]
          [ 6718.387853]  [<ffffffffa08de974>] lov_conf_set+0xab4/0xac0 [lov]
          [ 6718.387869]  [<ffffffffa0f70001>] ? ll_file_flock+0x841/0xd80 [lustre]
          [ 6718.387898]  [<ffffffffa03931b0>] cl_conf_set+0x60/0x120 [obdclass]
          [ 6718.387908]  [<ffffffffa0f7d301>] ll_layout_conf+0x81/0x400 [lustre]
          [ 6718.387918]  [<ffffffffa0f7dc3d>] ll_layout_refresh+0x5bd/0xb10 [lustre]
          [ 6718.387933]  [<ffffffffa0fc3797>] vvp_io_init+0x347/0x440 [lustre]
          [ 6718.387957]  [<ffffffffa039acb8>] cl_io_init0.isra.17+0x88/0x160 [obdclass]
          [ 6718.387974]  [<ffffffffa039ae4a>] cl_io_init+0x3a/0x80 [obdclass]
          [ 6718.387986]  [<ffffffffa0fbbc2a>] cl_get_grouplock+0xca/0x2f0 [lustre]
          [ 6718.387995]  [<ffffffffa0f7e84e>] ll_get_grouplock+0x22e/0x6d0 [lustre]
          [ 6718.388004]  [<ffffffffa0f834b2>] ll_file_ioctl+0x47c2/0x48a0 [lustre]
          [ 6718.388008]  [<ffffffff81201985>] do_vfs_ioctl+0x305/0x520
          [ 6718.388011]  [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50
          [ 6718.388013]  [<ffffffff81201c41>] SyS_ioctl+0xa1/0xc0
          [ 6718.388016]  [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
          [ 6718.388017] 
          [ 6718.388018] Kernel panic - not syncing: LBUG
          

          Crashdump is in 192.168.10.224-2017-10-12-21\:11\:34 on my system

          green Oleg Drokin added a comment - Just hit this assertion again on master-next in sanity test 405: [ 6650.701477] Lustre: DEBUG MARKER: == sanity test 405: Various layout swap lock tests =================================================== 21:10:20 (1507857020) [ 6718.387820] LustreError: 7955:0:(lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: [ 6718.387821] LustreError: 7955:0:(lov_object.c:879:lov_layout_change()) LBUG [ 6718.387822] Pid: 7955, comm: swap_lock_test [ 6718.387822] Call Trace: [ 6718.387836] [<ffffffffa01d57fe>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 6718.387843] [<ffffffffa01d588c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 6718.387853] [<ffffffffa08de974>] lov_conf_set+0xab4/0xac0 [lov] [ 6718.387869] [<ffffffffa0f70001>] ? ll_file_flock+0x841/0xd80 [lustre] [ 6718.387898] [<ffffffffa03931b0>] cl_conf_set+0x60/0x120 [obdclass] [ 6718.387908] [<ffffffffa0f7d301>] ll_layout_conf+0x81/0x400 [lustre] [ 6718.387918] [<ffffffffa0f7dc3d>] ll_layout_refresh+0x5bd/0xb10 [lustre] [ 6718.387933] [<ffffffffa0fc3797>] vvp_io_init+0x347/0x440 [lustre] [ 6718.387957] [<ffffffffa039acb8>] cl_io_init0.isra.17+0x88/0x160 [obdclass] [ 6718.387974] [<ffffffffa039ae4a>] cl_io_init+0x3a/0x80 [obdclass] [ 6718.387986] [<ffffffffa0fbbc2a>] cl_get_grouplock+0xca/0x2f0 [lustre] [ 6718.387995] [<ffffffffa0f7e84e>] ll_get_grouplock+0x22e/0x6d0 [lustre] [ 6718.388004] [<ffffffffa0f834b2>] ll_file_ioctl+0x47c2/0x48a0 [lustre] [ 6718.388008] [<ffffffff81201985>] do_vfs_ioctl+0x305/0x520 [ 6718.388011] [<ffffffff81706487>] ? _raw_spin_unlock_irq+0x27/0x50 [ 6718.388013] [<ffffffff81201c41>] SyS_ioctl+0xa1/0xc0 [ 6718.388016] [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b [ 6718.388017] [ 6718.388018] Kernel panic - not syncing: LBUG Crashdump is in 192.168.10.224-2017-10-12-21\:11\:34 on my system

          People

            simmonsja James A Simmons
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: