Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17048

Crash in lod_declare_update_extents

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      After additional testing was added to racer for migration and mirroring (not sure which exactly is the culprint here), patches: https://review.whamcloud.com/13669 and https://review.whamcloud.com/41368

      a frequent crash cropped up:

       [ 6876.856207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000006
      [ 6876.857009] IP: [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod]
      [ 6876.857009] PGD 2aab7d067 PUD 322652067 PMD 0 
      [ 6876.857009] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 6876.857009] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc32_generic crc_t10dif crct10dif_generic crct10dif_common virtio_console virtio_balloon pcspkr i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix drm_panel_orientation_quirks serio_raw virtio_blk libata i2c_core floppy [last unloaded: libcfs]
      [ 6876.866880] CPU: 11 PID: 5226 Comm: mdt05_000 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.9-debug #2
      [ 6876.866880] Hardware name: Red Hat KVM, BIOS 1.16.0-3.module_el8.7.0+1218+f626c2ff 04/01/2014
      [ 6876.866880] task: ffff8800a5780010 ti: ffff88028bde8000 task.ti: ffff88028bde8000
      [ 6876.866880] RIP: 0010:[<ffffffffa0f50aa6>]  [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod]
      [ 6876.866880] RSP: 0018:ffff88028bdeb738  EFLAGS: 00010246
      [ 6876.866880] RAX: 0000000000000000 RBX: ffff8802a92d9b40 RCX: ffff8802a28d6378
      [ 6876.866880] RDX: ffff880079bc3ef8 RSI: ffffffffa0f812e0 RDI: ffff880323d607d8
      [ 6876.866880] RBP: ffff88028bdeb830 R08: 0000000000000000 R09: 0000000000000000
      [ 6876.866880] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88026c882e98
      [ 6876.866880] R13: ffff8800a29c0ff8 R14: ffff880323d607d8 R15: ffff8802a92d9b40
      [ 6876.866880] FS:  0000000000000000(0000) GS:ffff880331cc0000(0000) knlGS:0000000000000000
      [ 6876.866880] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6876.866880] CR2: 0000000000000006 CR3: 00000002ad280000 CR4: 00000000000007e0
      [ 6876.866880] Call Trace:
      [ 6876.882318]  [<ffffffffa0c0a1d0>] ? osd_trans_create+0xa0/0x690 [osd_ldiskfs]
      [ 6876.882318]  [<ffffffffa03ecf88>] ? lu_buf_alloc+0x58/0x320 [obdclass]
      [ 6876.882318]  [<ffffffffa0c0a1d0>] ? osd_trans_create+0xa0/0x690 [osd_ldiskfs]
      [ 6876.882318]  [<ffffffffa0f34098>] ? lod_striping_load+0x98/0x6c0 [lod]
      [ 6876.884562]  [<ffffffffa0f592aa>] lod_declare_update_plain+0x56a/0x980 [lod]
      [ 6876.884562]  [<ffffffff8122054f>] ? __kmalloc+0x1ef/0x370
      [ 6876.884562]  [<ffffffffa0f5b83e>] lod_declare_layout_change+0x69e/0xba0 [lod]
      [ 6876.884562]  [<ffffffffa03ecfa1>] ? lu_buf_alloc+0x71/0x320 [obdclass]
      [ 6876.884562]  [<ffffffffa0dd339b>] mdd_declare_layout_change+0x4b/0x100 [mdd]
      [ 6876.884562]  [<ffffffffa0dde491>] mdd_layout_change+0xd91/0x1bc0 [mdd]
      [ 6876.884562]  [<ffffffffa0e3db3f>] mdt_layout_change+0x2bf/0x450 [mdt]
      [ 6876.884562]  [<ffffffffa0e444f0>] mdt_intent_layout+0x910/0xeb0 [mdt]
      [ 6876.884562]  [<ffffffffa0e3b4fc>] mdt_intent_opc+0x1dc/0xc40 [mdt]
      [ 6876.884562]  [<ffffffffa0e43be0>] ? mdt_intent_open+0x480/0x480 [mdt]
      [ 6876.884562]  [<ffffffffa0e4134a>] mdt_intent_policy+0xfa/0x460 [mdt]
      [ 6876.884562]  [<ffffffffa06941b1>] ldlm_lock_enqueue+0x3b1/0xbb0 [ptlrpc]
      [ 6876.884562]  [<ffffffffa01c0f05>] ? cfs_hash_rw_unlock+0x15/0x20 [libcfs]
      [ 6876.884562]  [<ffffffffa01c4186>] ? cfs_hash_add+0xa6/0x180 [libcfs]
      [ 6876.884562]  [<ffffffffa06bc4e5>] ldlm_handle_enqueue+0x375/0x17d0 [ptlrpc]
      [ 6876.884562]  [<ffffffffa063ca00>] ? lustre_msg_buf_v2+0x1e0/0x1f0 [ptlrpc]
      [ 6876.884562]  [<ffffffffa06fe958>] tgt_enqueue+0x68/0x240 [ptlrpc]
      [ 6876.894516]  [<ffffffffa0708f8e>] tgt_request_handle+0x88e/0x19b0 [ptlrpc]
      [ 6876.894784]  [<ffffffffa064ebb1>] ptlrpc_server_handle_request+0x251/0xc00 [ptlrpc]
      [ 6876.894784]  [<ffffffffa06508c6>] ptlrpc_main+0xc66/0x1670 [ptlrpc]
      [ 6876.894784]  [<ffffffff810dbb51>] ? put_prev_entity+0x31/0x400
      [ 6876.894784]  [<ffffffff814119f9>] ? do_raw_spin_unlock+0x49/0x90
      [ 6876.894784]  [<ffffffffa064fc60>] ? ptlrpc_wait_event+0x620/0x620 [ptlrpc]
      [ 6876.894784]  [<ffffffff810ba114>] kthread+0xe4/0xf0
      [ 6876.894784]  [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140
      [ 6876.894784]  [<ffffffff817ede5d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 6876.894784]  [<ffffffff810ba030>] ? kthread_create_on_node+0x140/0x140
      [ 6876.894784] Code: 28 ff 01 74 0d f6 05 c5 2a 28 ff 04 0f 85 92 07 00 00 48 63 45 b8 48 8d 04 80 48 01 c0 48 89 85 68 ff ff ff 49 03 87 98 00 00 00 <0f> b7 58 06 0f b7 40 08 83 c0 01 41 f6 87 a0 00 00 00 1e 44 0f 
      [ 6876.901347] RIP  [<ffffffffa0f50aa6>] lod_declare_update_extents.isra.57+0x86/0x1910 [lod]
      [ 6876.901347]  RSP <ffff88028bdeb738>
      [ 6876.901347] CR2: 0000000000000006

      Here are crashes with crashdumps:

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-100-2023-08-20-07:06:56/

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-115-2023-08-21-05:51:53/

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-105-2023-08-22-03:00:09/

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-50-2023-08-22-10:12:59/

       

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: