Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6361 LFSCK 4: improve LFSCK performance
  3. LU-6351

LFSCK MDS crash: unable to handle kernel NULL pointer dereference

    Details

    • Type: Technical task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.7.0
    • Fix Version/s: Lustre 2.8.0
    • Labels:
    • Environment:
      OpenSFS cluster running lustre 2.7.0-RC-4 build # 29 with two MDSs with two MDTs each, three OSSs with two OSTs each and three clients.
    • Rank (Obsolete):
      17773

      Description

      While running the stability test from the LFSCK Phase 3 test plan, the primary MDS, containing MDT0 and MDT2, crashed. The stability test crates 150 directories with 10,000 object each; files, striped directories, links, etc. . Then one process calls LFSCK namespace over and over while another process deletes the 150 directories and objects and all other processes create directories with files, striped directories, etc.

      The first LFSCK namespace on all four MDTs runs and completes. The second time LFSCK is called, the primary MDS crashes. When the MDT comes back, I see that the status of LFSCK is “crashed” :

      do_facet mds1 lctl get_param -n mdd.scratch-MDT0000.lfsck_namespace
      status = crashed
      

      When the MDS comes back, I see the following errors in dmesg:

      Lustre: scratch-MDT0002-osp-MDT0000: Connection restored to scratch-MDT0002 (at 0@lo)
      Lustre: scratch-MDT0002: Recovery over after 0:05, of 9 clients 9 recovered and 0 were evicted.
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166586, ql: 1, comp: 8, conn: 9, next: 4328166588, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166599, ql: 1, comp: 8, conn: 9, next: 4328166601, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166607, ql: 1, comp: 8, conn: 9, next: 4328166609, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166613, ql: 1, comp: 8, conn: 9, next: 4328166615, last_committed: 4328166570)
      Lustre: scratch-MDT0000-osp-MDT0002: Connection restored to scratch-MDT0000 (at 0@lo)
      Lustre: scratch-MDT0000: Recovery over after 0:35, of 9 clients 9 recovered and 0 were evicted.
      

      From the vmcore-dmesg:

      <1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000ae
      <1>IP: [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4>PGD 0 
      <4>Oops: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 9 
      <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldi
      skfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrp
      c(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic crc3
      2c_intel jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil
      ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq
      _ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_um
      ad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor_support serio
      _raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core io
      atdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp p
      ps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic at
      a_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_
      mod [last unloaded: libcfs]
      <4>
      <4>Pid: 8092, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
      <4>RIP: 0010:[<ffffffffa0dee512>]  [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4>RSP: 0018:ffff880b446cdaa0  EFLAGS: 00010246
      <4>RAX: 0000000000000000 RBX: ffff8807e2a16900 RCX: ffff880a0bedeb14
      <4>RDX: ffff8801f990d070 RSI: ffff8807e2a16900 RDI: ffff880191c11e40
      <4>RBP: ffff880b446cdb30 R08: fffffffffffffffe R09: ffffffffa0dee430
      <4>R10: 0000000000000000 R11: 0000000000000002 R12: ffff880191c11e40
      <4>R13: ffff880191c11e40 R14: ffff880a0bedeb14 R15: ffff880a0bedeb14
      <4>FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 00000000000000ae CR3: 0000000001a85000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process lfsck_namespace (pid: 8092, threadinfo ffff880b446cc000, task ffff880c28df0040)
      <4>Stack:
      <4> ffff8801ac526000 ffff880a0bedeb14 000000000000001a ffff8801f990d350
      <4><d> ffff880a0bedeae8 0000000000004000 ffff880b446cdb30 ffffffff8128daa4
      <4><d> ffff8801f990d070 ffff880b446cdb40 ffff880b446cdb00 ffffffffa05e22c3
      <4>Call Trace:
      <4> [<ffffffff8128daa4>] ? snprintf+0x34/0x40
      <4> [<ffffffffa05e22c3>] ? fld_server_lookup+0x53/0x330 [fld]
      <4> [<ffffffffa0eee082>] lfsck_namespace_check_exist+0xd2/0x410 [lfsck]
      <4> [<ffffffffa0f24fc6>] lfsck_namespace_handle_striped_master+0x1b6/0xb50 [lfsck]
      <4> [<ffffffffa0868931>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      <4> [<ffffffffa0ef1532>] lfsck_namespace_assistant_handler_p1+0xb52/0x2310 [lfsck]
      <4> [<ffffffff81170b79>] ? __drain_alien_cache+0x89/0xa0
      <4> [<ffffffffa0ee16e6>] lfsck_assistant_engine+0x496/0x1de0 [lfsck]
      <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa0ee1250>] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck]
      <4> [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>Code: 05 04 af 04 00 a3 16 00 00 48 c7 05 05 af 04 00 00 00 00 00 c7 05 f3 ae
      04 00 01 00 00 00 e8 76 5c 76 ff 4c 8b 45 90 48 8b 43 40 <0f> b7 80 ae 00 00 00
      25 00 f0 00 00 3d 00 40 00 00 0f 85 75 0a 
      <1>RIP  [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4> RSP <ffff880b446cdaa0>
      <4>CR2: 00000000000000ae
      

      I will upload the vmcore.

        Attachments

          Activity

            People

            • Assignee:
              yong.fan nasf (Inactive)
              Reporter:
              jamesanunez James Nunez
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: