Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12741

crash in osd_object_delete at end of sanity

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      It looks like something broke in master/b2_12 relatively recently.

      Typical crash:

      [14473.601088] Lustre: DEBUG MARKER: == sanity test complete, duration 5433 sec =========================================================== 03:14:06 (1567926846)
      [14492.366277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000c80
      [14492.370164] IP: [<ffffffffa0b8e854>] osd_object_delete+0x1f4/0x2a0 [osd_ldiskfs]
      [14492.372007] PGD 0 
      [14492.372787] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      [14492.373666] Modules linked in: dm_flakey dm_mod lustre(OE) mdt(OE) mdd(OE) mdc(OE) obdecho(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common virtio_console pcspkr i2c_piix4 virtio_balloon binfmt_misc ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks libata floppy virtio_blk serio_raw i2c_core [last unloaded: mdt]
      [14492.388104] CPU: 6 PID: 8302 Comm: ldlm_cn03_003 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.6-debug #2
      [14492.389885] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [14492.390901] task: ffff88003c6209c0 ti: ffff88004139c000 task.ti: ffff88004139c000
      [14492.392652] RIP: 0010:[<ffffffffa0b8e854>]  [<ffffffffa0b8e854>] osd_object_delete+0x1f4/0x2a0 [osd_ldiskfs]
      [14492.394473] RSP: 0018:ffff88004139fa30  EFLAGS: 00010246
      [14492.395341] RAX: 0000000000000000 RBX: 0000000000000c80 RCX: 0000000000000000
      [14492.396537] RDX: ffff88006430ae00 RSI: ffffffffa0be0160 RDI: ffff880082322b40
      [14492.397291] RBP: ffff88004139fa60 R08: 0000000000000000 R09: d8c8000000000000
      [14492.398095] R10: ffff8800bb75e000 R11: ffff8800bb75e7c8 R12: 0000000000000000
      [14492.398993] R13: ffff880115978e00 R14: ffff880082322b40 R15: 0000000000000000
      [14492.399675] FS:  0000000000000000(0000) GS:ffff880139780000(0000) knlGS:0000000000000000
      [14492.401123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [14492.402167] CR2: 0000000000000c80 CR3: 0000000130b34000 CR4: 00000000000006e0
      [14492.422237] Call Trace:
      [14492.423115]  [<ffffffffa036d0a5>] lu_object_free.isra.31+0x65/0x170 [obdclass]
      [14492.424986]  [<ffffffffa0370e42>] lu_object_put+0xc2/0x3c0 [obdclass]
      [14492.426016]  [<ffffffffa0d7fea0>] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt]
      [14492.427067]  [<ffffffffa0d7ff51>] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt]
      [14492.428115]  [<ffffffffa05f25c6>] ldlm_work_cp_ast_lock+0xa6/0x1d0 [ptlrpc]
      [14492.429196]  [<ffffffffa06393b0>] ptlrpc_set_wait+0x70/0x790 [ptlrpc]
      [14492.430253]  [<ffffffffa062fe6d>] ? ptlrpc_prep_set+0x5d/0x290 [ptlrpc]
      [14492.431457]  [<ffffffffa0350279>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [14492.432481]  [<ffffffff810b5a70>] ? __init_waitqueue_head+0x20/0x30
      [14492.433523]  [<ffffffffa062ff07>] ? ptlrpc_prep_set+0xf7/0x290 [ptlrpc]
      [14492.434526]  [<ffffffffa05f7e15>] ldlm_run_ast_work+0xd5/0x380 [ptlrpc]
      [14492.435592]  [<ffffffffa05f927f>] __ldlm_reprocess_all+0xff/0x340 [ptlrpc]
      [14492.436653]  [<ffffffffa05f94d0>] ldlm_reprocess_all+0x10/0x20 [ptlrpc]
      [14492.437708]  [<ffffffffa06219b4>] ldlm_handle_convert0+0x2f4/0x450 [ptlrpc]
      [14492.438727]  [<ffffffffa0621ffb>] ldlm_cancel_handler+0x29b/0x590 [ptlrpc]
      [14492.439812]  [<ffffffffa06524b6>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
      [14492.441715]  [<ffffffffa06564a1>] ptlrpc_main+0xb91/0x2110 [ptlrpc]
      [14492.442734]  [<ffffffff810c32ed>] ? finish_task_switch+0x5d/0x1b0
      [14492.443744]  [<ffffffff817b6cd0>] ? __schedule+0x410/0xa00
      [14492.445222]  [<ffffffffa0655910>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc]
      [14492.447031]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [14492.447974]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      [14492.448960]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
      [14492.449957]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
      [14492.450934] Code: e0 03 00 0f 1f 40 00 4d 85 ed 74 c0 4c 89 f7 48 c7 c6 60 01 be a0 e8 cc e7 7d ff 49 89 c4 44 89 f8 31 c9 49 8d 9c 24 80 0c 00 00 <49> 89 84 24 80 0c 00 00 4c 89 ee 4c 89 f7 48 89 da e8 86 16 f0 
      

      Always in dom discard path

      First master-next occurence in githash f8c100f.

      List of patches:

      f8c100f LU-12575 build: add ibutils2 for MOFED build
      0ecb29f LU-12560 tests: Use full path for test-groups
      4f4f90b LU-12400 ptlrpc: Sun RPC changes for RCU locking
      5a19817 LU-12527 utils: Make lustre_user.h c++-legal
      f0ba5de LU-12472 tests: update sanity-krb5.sh
      adfa543 LU-4315 doc: split lctl get_param and set_param man pages
      a57ede6 LU-12355 ldiskfs: Remove old map blocks support
      87f6b68 LU-12405 lnet: Oracle OFED extensions default to on
      04cdd15 LU-12343 osc: Fix dom handling in weight_ast
      3d1920a LU-8066 mdt: migrate procfs files to sysfs
      bc02a4e LU-12075 mdt: commit migrate transaction with locks held
      27cd9fd LU-10070 test: llapi_layout_test enhancements
      2490ed4 LU-11617 mdc: fix possible deadlock in chlg_open()
      860dbcb LU-12559 ptlrpc: Hold imp lock for idle reconnect
      ce3ccbd LU-6142 tests: Fix style issues for write_disjoint.c
      6012e3e LU-6142 tests: Fix style issues for write_append_truncate.c
      16792c9 LU-6142 tests: Fix style issues for lp_utils.c
      ac153a9 LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
      b598d82 LU-12523 ptlrpc: Don't get jobid in body_v2
      b2f2bfc LU-6202 utils: remove obsolete l_ioctl2() wrapper
      42fdd2f LU-12440 lnet: Misleading error from lnet_is_health_check
      aec7b1a LU-12439 lnet: Convert noisy timeout error to cdebug
      93419c4 LU-11023 quota: remove quota pool ID
      

      First b2_12-next occurrence githash: 8ec5896 list of patches:

      3674d393d5 LU-12608 kernel: kernel update RHEL7.6 [3.10.0-957.27.2.el7]
      fe03ca414f LU-11761 fld: let's caller to retry FLD_QUERY
      316310cbb4 LU-12387 tests: Validate l_tunedisk max_sectors_kb tuning
      0edb0a6951 LU-8130 libcfs: don't include rhashtable if unavailable
      9ac11632fb LU-12660 kernel: kernel update SLES12 SP4 [4.12.14-95.29.1]
      3a35d97aee LU-12539 build: pass --with-o2ib when building deb packages
      e2ac8c3269 LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
      cbb6d8c8ef LU-12586 lov: Correct write_intent end for trunc
      a1e888dcbc LU-10756 ptlrpc: change IMPORT_SET_* macros into real functions
      bbf40d8c71 LU-11537 osp: avoid nested transaction
      61c93c46c4 LU-12343 osc: Fix dom handling in weight_ast
      

      I guess LU-12343 is the common factor here?

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: