Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6361 LFSCK 4: improve LFSCK performance
  3. LU-6351

LFSCK MDS crash: unable to handle kernel NULL pointer dereference

Details

    • Technical task
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.7.0
    • OpenSFS cluster running lustre 2.7.0-RC-4 build # 29 with two MDSs with two MDTs each, three OSSs with two OSTs each and three clients.
    • 17773

    Description

      While running the stability test from the LFSCK Phase 3 test plan, the primary MDS, containing MDT0 and MDT2, crashed. The stability test crates 150 directories with 10,000 object each; files, striped directories, links, etc. . Then one process calls LFSCK namespace over and over while another process deletes the 150 directories and objects and all other processes create directories with files, striped directories, etc.

      The first LFSCK namespace on all four MDTs runs and completes. The second time LFSCK is called, the primary MDS crashes. When the MDT comes back, I see that the status of LFSCK is “crashed” :

      do_facet mds1 lctl get_param -n mdd.scratch-MDT0000.lfsck_namespace
      status = crashed
      

      When the MDS comes back, I see the following errors in dmesg:

      Lustre: scratch-MDT0002-osp-MDT0000: Connection restored to scratch-MDT0002 (at 0@lo)
      Lustre: scratch-MDT0002: Recovery over after 0:05, of 9 clients 9 recovered and 0 were evicted.
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166586, ql: 1, comp: 8, conn: 9, next: 4328166588, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166599, ql: 1, comp: 8, conn: 9, next: 4328166601, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166607, ql: 1, comp: 8, conn: 9, next: 4328166609, last_committed: 4328166570)
      LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166613, ql: 1, comp: 8, conn: 9, next: 4328166615, last_committed: 4328166570)
      Lustre: scratch-MDT0000-osp-MDT0002: Connection restored to scratch-MDT0000 (at 0@lo)
      Lustre: scratch-MDT0000: Recovery over after 0:35, of 9 clients 9 recovered and 0 were evicted.
      

      From the vmcore-dmesg:

      <1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000ae
      <1>IP: [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4>PGD 0 
      <4>Oops: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 9 
      <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldi
      skfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrp
      c(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic crc3
      2c_intel jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil
      ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq
      _ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_um
      ad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor_support serio
      _raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core io
      atdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp p
      ps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic at
      a_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_
      mod [last unloaded: libcfs]
      <4>
      <4>Pid: 8092, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
      <4>RIP: 0010:[<ffffffffa0dee512>]  [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4>RSP: 0018:ffff880b446cdaa0  EFLAGS: 00010246
      <4>RAX: 0000000000000000 RBX: ffff8807e2a16900 RCX: ffff880a0bedeb14
      <4>RDX: ffff8801f990d070 RSI: ffff8807e2a16900 RDI: ffff880191c11e40
      <4>RBP: ffff880b446cdb30 R08: fffffffffffffffe R09: ffffffffa0dee430
      <4>R10: 0000000000000000 R11: 0000000000000002 R12: ffff880191c11e40
      <4>R13: ffff880191c11e40 R14: ffff880a0bedeb14 R15: ffff880a0bedeb14
      <4>FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 00000000000000ae CR3: 0000000001a85000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process lfsck_namespace (pid: 8092, threadinfo ffff880b446cc000, task ffff880c28df0040)
      <4>Stack:
      <4> ffff8801ac526000 ffff880a0bedeb14 000000000000001a ffff8801f990d350
      <4><d> ffff880a0bedeae8 0000000000004000 ffff880b446cdb30 ffffffff8128daa4
      <4><d> ffff8801f990d070 ffff880b446cdb40 ffff880b446cdb00 ffffffffa05e22c3
      <4>Call Trace:
      <4> [<ffffffff8128daa4>] ? snprintf+0x34/0x40
      <4> [<ffffffffa05e22c3>] ? fld_server_lookup+0x53/0x330 [fld]
      <4> [<ffffffffa0eee082>] lfsck_namespace_check_exist+0xd2/0x410 [lfsck]
      <4> [<ffffffffa0f24fc6>] lfsck_namespace_handle_striped_master+0x1b6/0xb50 [lfsck]
      <4> [<ffffffffa0868931>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      <4> [<ffffffffa0ef1532>] lfsck_namespace_assistant_handler_p1+0xb52/0x2310 [lfsck]
      <4> [<ffffffff81170b79>] ? __drain_alien_cache+0x89/0xa0
      <4> [<ffffffffa0ee16e6>] lfsck_assistant_engine+0x496/0x1de0 [lfsck]
      <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa0ee1250>] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck]
      <4> [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>Code: 05 04 af 04 00 a3 16 00 00 48 c7 05 05 af 04 00 00 00 00 00 c7 05 f3 ae
      04 00 01 00 00 00 e8 76 5c 76 ff 4c 8b 45 90 48 8b 43 40 <0f> b7 80 ae 00 00 00
      25 00 f0 00 00 3d 00 40 00 00 0f 85 75 0a 
      <1>RIP  [<ffffffffa0dee512>] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
      <4> RSP <ffff880b446cdaa0>
      <4>CR2: 00000000000000ae
      

      I will upload the vmcore.

      Attachments

        Activity

          [LU-6351] LFSCK MDS crash: unable to handle kernel NULL pointer dereference

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14009/
          Subject: LU-6351 lfsck: check object existence before using it
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f3ea0cea6bb6766eaa55571774b9ae942a6bf297

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14009/ Subject: LU-6351 lfsck: check object existence before using it Project: fs/lustre-release Branch: master Current Patch Set: Commit: f3ea0cea6bb6766eaa55571774b9ae942a6bf297

          Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/14009
          Subject: LU-6351 lfsck: check object existence before using it
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 776db8e76865f86e3be511375c55479b0817b559

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/14009 Subject: LU-6351 lfsck: check object existence before using it Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 776db8e76865f86e3be511375c55479b0817b559

          Under some cases, when the LFSCK locate the object via its FID, it does not check whether it exists or not, then further using such object may access NULL-pointed local object (inode for ldiskfs).

          Part of the issue has been fixed in the patch: http://review.whamcloud.com/#/c/13993/. But it is not enough. I will make another path for the left issues.

          yong.fan nasf (Inactive) added a comment - Under some cases, when the LFSCK locate the object via its FID, it does not check whether it exists or not, then further using such object may access NULL-pointed local object (inode for ldiskfs). Part of the issue has been fixed in the patch: http://review.whamcloud.com/#/c/13993/ . But it is not enough. I will make another path for the left issues.

          On one of the OSTs:

          Lustre: MGC192.168.2.125@o2ib: Connection restored to MGS (at 192.168.2.125@o2ib)
          Lustre: Skipped 88 previous similar messages
          LustreError: 167-0: scratch-MDT0002-lwp-OST0005: This client was evicted by scratch-MDT0002; in progress operations using this service will fail.
          LustreError: Skipped 69 previous similar messages
          Lustre: scratch-OST0005: deleting orphan objects from 0x0:686571 to 0x0:686625
          Lustre: scratch-OST0004: deleting orphan objects from 0x0:686568 to 0x0:686625
          LustreError: 3986:0:(ofd_grant.c:183:ofd_grant_sanity_check()) ofd_statfs: tot_granted 262912 != fo_tot_granted 85590784
          LustreError: 3986:0:(ofd_grant.c:189:ofd_grant_sanity_check()) ofd_statfs: tot_dirty 0 != fo_tot_dirty 1048576
          
          jamesanunez James Nunez (Inactive) added a comment - On one of the OSTs: Lustre: MGC192.168.2.125@o2ib: Connection restored to MGS (at 192.168.2.125@o2ib) Lustre: Skipped 88 previous similar messages LustreError: 167-0: scratch-MDT0002-lwp-OST0005: This client was evicted by scratch-MDT0002; in progress operations using this service will fail. LustreError: Skipped 69 previous similar messages Lustre: scratch-OST0005: deleting orphan objects from 0x0:686571 to 0x0:686625 Lustre: scratch-OST0004: deleting orphan objects from 0x0:686568 to 0x0:686625 LustreError: 3986:0:(ofd_grant.c:183:ofd_grant_sanity_check()) ofd_statfs: tot_granted 262912 != fo_tot_granted 85590784 LustreError: 3986:0:(ofd_grant.c:189:ofd_grant_sanity_check()) ofd_statfs: tot_dirty 0 != fo_tot_dirty 1048576

          vmcore and vmcore-dmesg.txt are at uploads/LU-6351.

          jamesanunez James Nunez (Inactive) added a comment - vmcore and vmcore-dmesg.txt are at uploads/ LU-6351 .

          People

            yong.fan nasf (Inactive)
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: