Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9839

lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Just hit this assertion on master-next that was immediately followed by a NULL pointer deref

      [60342.200382] Lustre: DEBUG MARKER: == sanity test 405: Various layout swap lock tests =================================================== 16:21:48 (1501964508)
      [60385.832096] LustreError: 10934:0:(lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: 
      [60385.832103] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [60385.832111] IP: [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832113] PGD 0 
      [60385.832114] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [60385.832130] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 rpcsec_gss_krb5 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm virtio_balloon drm_kms_helper pcspkr ata_piix serio_raw virtio_console virtio_blk floppy drm i2c_piix4 libata i2c_core nfsd ip_tables [last unloaded: libcfs]
      [60385.832132] CPU: 2 PID: 5577 Comm: kworker/u16:0 Tainted: G        W  OE  ------------   3.10.0-debug #1
      [60385.832133] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [60385.832138] Workqueue: writeback bdi_writeback_workfn (flush-lustre-3)
      [60385.832138] task: ffff88009d1c88c0 ti: ffff880058194000 task.ti: ffff880058194000
      [60385.832143] RIP: 0010:[<ffffffffa078decc>]  [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832145] RSP: 0018:ffff8800581978c0  EFLAGS: 00010297
      [60385.832146] RAX: ffff880014330f68 RBX: ffff880078ddedf0 RCX: 0000000000000000
      [60385.832146] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880014330f68
      [60385.832147] RBP: ffff880058197900 R08: 0000000000000001 R09: 0000000000000000
      [60385.832147] R10: 0000000000000000 R11: ffffffffffffffff R12: 0000000000000000
      [60385.832148] R13: ffff880078ddefd0 R14: ffff880078ddeda0 R15: ffff88000c9b7e10
      [60385.832149] FS:  0000000000000000(0000) GS:ffff8800bc680000(0000) knlGS:0000000000000000
      [60385.832149] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [60385.832150] CR2: 0000000000000000 CR3: 00000000ae6b8000 CR4: 00000000000006e0
      [60385.832152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [60385.832152] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [60385.832153] Stack:
      [60385.832155]  0000000000000000 0000000000000000 ffff880021d0fed0 0000000000000000
      [60385.832156]  0000000000000000 ffff88002e8451d0 0000000000000000 ffff880078ddeda0
      [60385.832157]  ffff8800581979b0 ffffffffa078ecda ffff88000c9b7e10 ffff880021dabfa0
      [60385.832157] Call Trace:
      [60385.832162]  [<ffffffffa078ecda>] lov_io_iter_init+0x1ca/0x8b0 [lov]
      [60385.832185]  [<ffffffffa037794c>] cl_io_iter_init+0x5c/0x120 [obdclass]
      [60385.832201]  [<ffffffffa0379c5c>] cl_io_loop+0x19c/0xb30 [obdclass]
      [60385.832213]  [<ffffffffa0dcae9b>] cl_sync_file_range+0x2db/0x380 [lustre]
      [60385.832222]  [<ffffffffa0dec5ba>] ll_writepages+0x7a/0x200 [lustre]
      [60385.832225]  [<ffffffff8117e8b1>] do_writepages+0x21/0x50
      [60385.832227]  [<ffffffff81218f50>] __writeback_single_inode+0x40/0x2b0
      [60385.832228]  [<ffffffff81219c11>] writeback_sb_inodes+0x2b1/0x4d0
      [60385.832230]  [<ffffffff81219ecf>] __writeback_inodes_wb+0x9f/0xd0
      [60385.832232]  [<ffffffff8121a76b>] wb_writeback+0x28b/0x340
      [60385.832233]  [<ffffffff8121c9cc>] bdi_writeback_workfn+0x20c/0x4e0
      [60385.832235]  [<ffffffff8109add6>] process_one_work+0x206/0x5b0
      [60385.832236]  [<ffffffff8109ad6b>] ? process_one_work+0x19b/0x5b0
      [60385.832238]  [<ffffffff8109b29b>] worker_thread+0x11b/0x3a0
      [60385.832239]  [<ffffffff8109b180>] ? process_one_work+0x5b0/0x5b0
      [60385.832240]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [60385.832242]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [60385.832245]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [60385.832246]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [60385.832257] Code: 85 bc 04 00 00 44 8b 82 08 01 00 00 8b 4d cc 48 8b 75 c0 44 39 c1 0f 83 aa 04 00 00 48 8b 92 10 01 00 00 48 89 c7 4a 8b 54 22 18 <48> 8b 0c f2 c7 83 b0 01 00 00 00 00 00 00 4c 89 7b 38 48 81 c1 
      [60385.832261] RIP  [<ffffffffa078decc>] lov_sub_get+0x1ec/0x760 [lov]
      [60385.832262]  RSP <ffff8800581978c0>
      

      modules and crashdump are in /exports/crashdumps/192.168.10.224-2017-08-05-16\:22\:38 on my node
      tag in my tree: master-20170804

      Attachments

        Activity

          [LU-9839] lov_object.c:879:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed

          "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55199
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: acbbd73a68e70b93dddfd1bbac8b5cc1873cdda6

          gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55199 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: acbbd73a68e70b93dddfd1bbac8b5cc1873cdda6

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54983
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: 57c968fdd2fc8dcf688fffeb6bc409674d8d6e98

          gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54983 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 57c968fdd2fc8dcf688fffeb6bc409674d8d6e98

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54981
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 2855936ef0647e31884b2e470d56ea49b3226418

          gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54981 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2855936ef0647e31884b2e470d56ea49b3226418

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51638/
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51638/ Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51638
          Subject: LU-9839 clio: lov active ios accounting fix
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 72990b262c4dd763a109e7755f697acb0046c543

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51638 Subject: LU-9839 clio: lov active ios accounting fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 72990b262c4dd763a109e7755f697acb0046c543

          https://review.whamcloud.com/c/fs/lustre-release/+/51269
          Subject: LU-9839 clio: debug active_ios race

          crashed and gave additional info about a parallel i/o:

          [2577729.757681] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: ios: 1, last 
          io: osc_lru_shrink
          [2577729.776963] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) LBUG
          
          zam Alexander Zarochentsev added a comment - https://review.whamcloud.com/c/fs/lustre-release/+/51269 Subject: LU-9839 clio: debug active_ios race crashed and gave additional info about a parallel i/o: [2577729.757681] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) ASSERTION( atomic_read(&lov->lo_active_ios) == 0 ) failed: ios: 1, last io: osc_lru_shrink [2577729.776963] LustreError: 11039:0:(lov_object.c:1286:lov_layout_change()) LBUG

          "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51269
          Subject: LU-9839 clio: debug active_ios race
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8705d17265065eeab1f058218e1fa870cc6f5cb6

          gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51269 Subject: LU-9839 clio: debug active_ios race Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8705d17265065eeab1f058218e1fa870cc6f5cb6
          gerrit Gerrit Updater added a comment - - edited

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50800
          Subject: LU-9839 lov: ensure lo_waitq is drains for layout change
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 9bfd454ba335124f8b98dc51eb5f727f58d972a7

          gerrit Gerrit Updater added a comment - - edited "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50800 Subject: LU-9839 lov: ensure lo_waitq is drains for layout change Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9bfd454ba335124f8b98dc51eb5f727f58d972a7

          This bug shouldn't happen. In lov_layout_change() we call llo_delete() which in turn calls lov_layout_wait() that empties lo_waitq. The reference count should be zero. Then the LASSERT happens a few steps later. How could it be possible to add new IO in that time?

          simmonsja James A Simmons added a comment - This bug shouldn't happen. In lov_layout_change() we call llo_delete() which in turn calls lov_layout_wait() that empties lo_waitq. The reference count should be zero. Then the LASSERT happens a few steps later. How could it be possible to add new IO in that time?

          Just ran into this on 2.15 LTS production system.

          simmonsja James A Simmons added a comment - Just ran into this on 2.15 LTS production system.

          People

            simmonsja James A Simmons
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: