Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12719

crash in lustre_find_lwp_by_index()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      This is from DDN-777, oss2c106 hit a crash:

      [17555139.368778] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [17555139.371091] IP: [<ffffffffc08c165d>] class_export_get+0xd/0x90 [obdclass]
      [17555139.373230] PGD 0 
      [17555139.374784] Oops: 0002 [#1] SMP 
      [17555139.376492] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) nfsv3 nfs fscache bonding rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) devlink sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper sg i6300esb joydev ppdev parport_pc parport sfablkdriver(OE) cryptd i2c_piix4 pcspkr nfsd auth_rpcgss nfs_acl lockd knem(OE) grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod sr_mod
      [17555139.393461]  cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata e1000 crct10dif_pclmul crct10dif_common crc32c_intel igbvf i2c_core floppy serio_raw dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mlxfw]
      [17555139.400726] CPU: 8 PID: 24841 Comm: lfsck Tainted: G           OE  ------------   3.10.0-693.21.1.el7_lustre.2.7.21.3.ddn19.g2ba2cd7.x86_64 #1
      [17555139.404837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      [17555139.408957] task: ffff88081580dee0 ti: ffff880dee718000 task.ti: ffff880dee718000
      [17555139.411193] RIP: 0010:[<ffffffffc08c165d>]  [<ffffffffc08c165d>] class_export_get+0xd/0x90 [obdclass]
      [17555139.413660] RSP: 0018:ffff880dee71bba8  EFLAGS: 00010286
      [17555139.415636] RAX: ffff8814538b800c RBX: 0000000000000000 RCX: 000000000000001a
      [17555139.417801] RDX: 000000000000000e RSI: ffff880dee71bbd0 RDI: 0000000000000000
      [17555139.419941] RBP: ffff880dee71bbb0 R08: 0000000000000031 R09: 0000000000000003
      [17555139.422077] R10: 0000000000000000 R11: 000000000000000f R12: ffff881660650fc0
      [17555139.424199] R13: ffff88149a1dadb4 R14: ffff881660650fd0 R15: ffff88145435a000
      [17555139.426297] FS:  0000000000000000(0000) GS:ffff881664000000(0000) knlGS:0000000000000000
      [17555139.428493] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [17555139.430408] CR2: 0000000000000040 CR3: 0000000001a06000 CR4: 00000000003607e0
      [17555139.432470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [17555139.434498] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [17555139.436498] Call Trace:
      [17555139.437987]  [<ffffffffc091ee11>] lustre_find_lwp_by_index+0x161/0x190 [obdclass]
      [17555139.440034]  [<ffffffffc10dad69>] lfsck_layout_slave_notify_master+0x199/0x9a0 [lfsck]
      [17555139.442104]  [<ffffffffc10e0a97>] lfsck_layout_slave_double_scan+0xf7/0xe40 [lfsck]
      [17555139.444121]  [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20
      [17555139.445889]  [<ffffffffc10a312f>] lfsck_double_scan+0x5f/0x210 [lfsck]
      [17555139.447751]  [<ffffffffc10479ed>] ? osd_otable_it_fini+0x1ad/0x3b0 [osd_ldiskfs]
      [17555139.449695]  [<ffffffff811e3896>] ? kfree+0x106/0x140
      [17555139.451391]  [<ffffffffc10a88b6>] lfsck_master_engine+0x446/0x13f0 [lfsck]
      [17555139.453276]  [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20
      [17555139.455000]  [<ffffffffc10a8470>] ? lfsck_master_oit_engine+0x1d10/0x1d10 [lfsck]
      [17555139.456932]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [17555139.458556]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [17555139.460299]  [<ffffffff816c1577>] ret_from_fork+0x77/0xb0
      [17555139.461949]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [17555139.463658] Code: 85 3d e4 4b f0 ff 55 48 89 e5 74 0c 8b 05 dc 4b f0 ff c1 e8 05 83 e0 01 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb <f0> ff 47 40 f6 05 b4 4b f0 ff 40 74 09 f6 05 af 4b f0 ff 20 75 
      [17555139.469009] RIP  [<ffffffffc08c165d>] class_export_get+0xd/0x90 [obdclass]
      [17555139.470849]  RSP <ffff880dee71bba8>
      [17555139.472275] CR2: 0000000000000040
      

      This should be an lfsck bug, I'll check whether it's fixed in master first.

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            laisiyao Lai Siyao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: