Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6066

lfsck_namespace_repair_nlink() ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • Single node test system (MDTx2, OSTx3, client), RHEL 2.6.32-431.29.2.el6 kernel, Lustre master v2_6_91_0-49-ge0ece89
    • 3
    • 16885

    Description

      I was testing out some filesystem corruption (mounted MDT as type ldiskfs, copied MDT file and all xattrs from hosts to hosts.clone, then modified LMA FID and LOV ostid f_oid=0x1 to f_oid=0x2) so that they would share the same OST object but have different FIDs.

      When remounting the MDT as type lustre and listing the files, it detected OI corruption due to the missing FID and started OI scrub:

      Lustre: testfs-MDT0000: trigger OI scrub by RPC for [0x2c00059f0:0x2:0x0], rc = 0 [2]
      

      which appeared to be successful since I could list all the files.

      I deleted the hosts.clone file, and then observed (as expected) that ls returned an error because the referenced OST objects no longer existed. However, I was unable to unlink the original filename, even when using munlink which should ignore any errors. This was apparently because I had (accidentally) made the cloned file share the same FID f_oid=0x2 as a third file hosts2, and figured that the duplication of the MDT FID was causing problems since it couldn't find this FID in the OI anymore.

      I tried running lctl lfsck_start -M testfs-MDT0000 -A to rebuild the OI to contain the original f_oid=0x2 inode (which still existed in the host2 LMA), but immediately hit the below assertions on two different LFSCK threads:

      LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
      LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) LBUG
      Pid: 20102, comm: lfsck_namespace
      Call Trace:
       [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
       [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
       [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
       [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
      
      LustreError: 20097:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
      Pid: 20097, comm: lfsck_namespace
      Call Trace:
       [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
       [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
       [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
       [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
      LustreError: dumping log to /tmp/lustre-log.1419280935.20097
      

      We definitely shouldn't be LASSERTing on data from the filesystem.

      Attachments

        Activity

          [LU-6066] lfsck_namespace_repair_nlink() ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed
          yong.fan nasf (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
          adilger Andreas Dilger made changes -
          Description Original: I was testing out some filesystem corruption (mounted MDT as type ldiskfs, copied MDT file and all xattrs from {{hosts}} to {{hosts.clone}}, then modified LMA FID and LOV ostid f_oid=0x1 to f_oid=0x2) so that they would share the same OST object but have different FIDs.

          When remounting the MDT as type lustre and listing the files, it detected OI corruption due to the missing FID and started OI scrub:
          {{noformat}}
          Lustre: testfs-MDT0000: trigger OI scrub by RPC for [0x2c00059f0:0x2:0x0], rc = 0 [2]
          {{noformat}}
          which appeared to be successful since I could list all the files.

          I deleted the {{hosts.clone}} file, and then observed (as expected) that {{ls}} returned an error because the referenced OST objects no longer existed. However, I was unable to unlink the original filename, even when using {{munlink}} which should ignore any errors. This was apparently because I had (accidentally) made the cloned file share the same FID f_oid=0x2 as a third file {{hosts2}}, and figured that the duplication of the MDT FID was causing problems since it couldn't find this FID in the OI anymore.

          I tried running {{lctl lfsck_start -M testfs-MDT0000 -A}} to rebuild the OI to contain the original f_oid=0x2 inode (which still existed in the {{host2}} LMA), but immediately hit the below assertions on two different LFSCK threads:
          {{noformat}}
          LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
          LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) LBUG
          Pid: 20102, comm: lfsck_namespace
          Call Trace:
           [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
           [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
           [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
           [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
           [<ffffffff8109abf6>] kthread+0x96/0xa0

          LustreError: 20097:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
          Pid: 20097, comm: lfsck_namespace
          Call Trace:
           [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
           [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
           [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
           [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
           [<ffffffff8109abf6>] kthread+0x96/0xa0
          LustreError: dumping log to /tmp/lustre-log.1419280935.20097
          {{noformat}}

          We definitely shouldn't be LASSERTing on data from the filesystem.
          New: I was testing out some filesystem corruption (mounted MDT as type ldiskfs, copied MDT file and all xattrs from {{hosts}} to {{hosts.clone}}, then modified LMA FID and LOV ostid f_oid=0x1 to f_oid=0x2) so that they would share the same OST object but have different FIDs.

          When remounting the MDT as type lustre and listing the files, it detected OI corruption due to the missing FID and started OI scrub:
          {noformat}
          Lustre: testfs-MDT0000: trigger OI scrub by RPC for [0x2c00059f0:0x2:0x0], rc = 0 [2]
          {noformat}
          which appeared to be successful since I could list all the files.

          I deleted the {{hosts.clone}} file, and then observed (as expected) that {{ls}} returned an error because the referenced OST objects no longer existed. However, I was unable to unlink the original filename, even when using {{munlink}} which should ignore any errors. This was apparently because I had (accidentally) made the cloned file share the same FID f_oid=0x2 as a third file {{hosts2}}, and figured that the duplication of the MDT FID was causing problems since it couldn't find this FID in the OI anymore.

          I tried running {{lctl lfsck_start -M testfs-MDT0000 -A}} to rebuild the OI to contain the original f_oid=0x2 inode (which still existed in the {{host2}} LMA), but immediately hit the below assertions on two different LFSCK threads:
          {noformat}
          LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
          LustreError: 20102:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) LBUG
          Pid: 20102, comm: lfsck_namespace
          Call Trace:
           [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
           [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
           [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
           [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
           [<ffffffff8109abf6>] kthread+0x96/0xa0

          LustreError: 20097:0:(lfsck_namespace.c:2921:lfsck_namespace_repair_nlink()) ASSERTION( (((lfsck_object_type(obj)) & 00170000) == 0100000) ) failed:
          Pid: 20097, comm: lfsck_namespace
          Call Trace:
           [<ffffffffa0812895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0812e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa0f06bf1>] lfsck_namespace_repair_nlink+0x6b1/0xa60 [lfsck]
           [<ffffffffa0f1b9bf>] lfsck_namespace_double_scan_one+0x23f/0x1410 [lfsck]
           [<ffffffffa0f1d899>] lfsck_namespace_assistant_handler_p2+0xd09/0x11b0 [lfsck]
           [<ffffffffa0eff399>] lfsck_assistant_engine+0x14e9/0x1e00 [lfsck]
           [<ffffffff8109abf6>] kthread+0x96/0xa0
          LustreError: dumping log to /tmp/lustre-log.1419280935.20097
          {noformat}

          We definitely shouldn't be LASSERTing on data from the filesystem.
          adilger Andreas Dilger made changes -
          Labels New: HB
          yong.fan nasf (Inactive) made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          adilger Andreas Dilger created issue -

          People

            yong.fan nasf (Inactive)
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: