Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11584

kernel BUG at ldiskfs.h:1907!

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0
    • Lustre 2.10.5
    • None
    • 1
    • 9223372036854775807

    Description

      server keeps crashing with the following error.

      [  981.957669] Lustre: nbp13-OST0008: trigger OI scrub by RPC for the [0x100080000:0x217edd:0x0] with flags 0x4a, rc = 0
      [  981.989579] Lustre: Skipped 11 previous similar messages
      [ 1045.404615] ------------[ cut here ]------------
      [ 1045.418484] kernel BUG at /tmp/rpmbuild-lustre-jlan-ItUrr9b3/BUILD/lustre-2.10.5/ldiskfs/ldiskfs.h:1907!
      [ 1045.446989] invalid opcode: 0000 [#1] SMP 
      [ 1045.459302] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) dm_service_time ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) lpfc ib_iser(OE) libiscsi scsi_transport_iscsi crct10dif_generic scsi_transport_fc scsi_tgt rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) bonding ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) sunrpc dm_mirror dm_region_hash dm_log mlx5_ib(OE) ib_core(OE) intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel i2c_algo_bit ttm dm_multipath aesni_intel drm_kms_helper lrw syscopyarea gf128mul sysfillrect sysimgblt glue_helper fb_sys_fops ablk_helper mlx5_core(OE) mlxfw(OE) tg3 ses cryptd mlx_compat(OE) drm ptp ipmi_si enclosure mei_me i2c_core pps_core hpwdt hpilo ipmi_devintf lpc_ich dm_mod mfd_core mei shpchp pcspkr wmi ipmi_msghandler acpi_power_meter binfmt_misc tcp_bic ip_tables virtio_scsi virtio_ring virtio xfs libcrc32c ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sg usb_storage smartpqi(E) crc32c_intel scsi_transport_sas [last unloaded: pps_core]
      [ 1045.776428] CPU: 5 PID: 11348 Comm: lfsck Tainted: G           OE  ------------   3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1
      [ 1045.811992] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2018
      [ 1045.837624] task: ffff882ddca23f40 ti: ffff882bd280c000 task.ti: ffff882bd280c000
      [ 1045.860117] RIP: 0010:[<ffffffffa10fbd04>]  [<ffffffffa10fbd04>] ldiskfs_rec_len_to_disk.part.9+0x4/0x10 [ldiskfs]
      [ 1045.891259] RSP: 0018:ffff882bd280f980  EFLAGS: 00010207
      [ 1045.907218] RAX: 0000000000000000 RBX: ffff882bd280fb58 RCX: ffff882bd280f994
      [ 1045.928666] RDX: 00000000ffffffac RSI: ffffffffffffff81 RDI: 00000000ffffff81
      [ 1045.950113] RBP: ffff882bd280f980 R08: 00000000ffffff81 R09: ffffffffa10fded0
      [ 1045.971560] R10: ffff88303f803b00 R11: 0000000000ffffff R12: 000000000000003c
      [ 1045.993006] R13: ffff881e2eae7708 R14: ffff881e2eae7690 R15: 0000000000000000
      [ 1046.014452] FS:  0000000000000000(0000) GS:ffff882f7ef40000(0000) knlGS:0000000000000000
      [ 1046.038775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1046.056039] CR2: 00007ffff20df034 CR3: 0000002ef4268000 CR4: 00000000003607e0
      [ 1046.077485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1046.098932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1046.120378] Call Trace:
      [ 1046.127717]  [<ffffffffa10fe245>] htree_inlinedir_to_tree+0x445/0x450 [ldiskfs]
      [ 1046.149690]  [<ffffffff8123002e>] ? __generic_file_splice_read+0x4ee/0x5e0
      [ 1046.170356]  [<ffffffff81234cdd>] ? __getblk+0x2d/0x2e0
      [ 1046.186052]  [<ffffffff81234c4c>] ? __find_get_block+0xbc/0x120
      [ 1046.203841]  [<ffffffff81234cdd>] ? __getblk+0x2d/0x2e0
      [ 1046.219541]  [<ffffffffa10cdfa0>] ? __ldiskfs_get_inode_loc+0x110/0x3e0 [ldiskfs]
      [ 1046.242039]  [<ffffffffa10c89ef>] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs]
      [ 1046.264536]  [<ffffffffa10c0277>] ldiskfs_htree_fill_tree+0x137/0x2f0 [ldiskfs]
      [ 1046.286507]  [<ffffffff811df826>] ? kmem_cache_alloc_trace+0x1d6/0x200
      [ 1046.306126]  [<ffffffffa10ae5ec>] ldiskfs_readdir+0x61c/0x850 [ldiskfs]
      [ 1046.326012]  [<ffffffffa1147640>] ? osd_declare_ref_del+0x130/0x130 [osd_ldiskfs]
      [ 1046.348507]  [<ffffffff812256b2>] ? generic_getxattr+0x52/0x70
      [ 1046.366036]  [<ffffffffa1145cde>] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs]
      [ 1046.387747]  [<ffffffffa1145eb7>] osd_it_ea_load+0x37/0x100 [osd_ldiskfs]
      [ 1046.408158]  [<ffffffffa122808c>] lfsck_open_dir+0x11c/0x3a0 [lfsck]
      [ 1046.427257]  [<ffffffffa1228cb2>] lfsck_master_oit_engine+0x9a2/0x1190 [lfsck]
      [ 1046.448969]  [<ffffffff816946f7>] ? __schedule+0x477/0xa30
      [ 1046.465453]  [<ffffffffa1229d96>] lfsck_master_engine+0x8f6/0x1360 [lfsck]
      [ 1046.486120]  [<ffffffff810c4d40>] ? wake_up_state+0x20/0x20
      [ 1046.502865]  [<ffffffffa12294a0>] ? lfsck_master_oit_engine+0x1190/0x1190 [lfsck]
      [ 1046.525360]  [<ffffffff810b1131>] kthread+0xd1/0xe0
      [ 1046.540011]  [<ffffffff810b1060>] ? insert_kthread_work+0x40/0x40
      [ 1046.558323]  [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0
      [ 1046.574540]  [<ffffffff810b1060>] ? insert_kthread_work+0x40/0x40
      [ 1046.592852] Code: 44 04 02 48 8d 44 03 c8 48 01 c7 e8 b7 f6 22 e0 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b 0f 1f 40 00 55 48 89 e5 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 85 f6 48 
      [ 1046.650192] RIP  [<ffffffffa10fbd04>] ldiskfs_rec_len_to_disk.part.9+0x4/0x10 [ldiskfs]
      
      

      Attachments

        1. debug-lfsck-nbp15-MDT0000.gz
          60 kB
        2. dumpe2fs.out
          36 kB
        3. nbp13.debug.gz
          24.76 MB
        4. nbp13.lfsck.debug.out1.gz
          297 kB
        5. nbp13.lfsck.debug.out2.gz
          4 kB
        6. oi_scrub.out
          6 kB

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: