Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14864

osd_fid_lookup() ASSERTION( !updated ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
    • None
    • Lustre 2.12.6 with ldiskfs (with MDT0000 formatted in Lustre 2.7)
    • 3
    • 9223372036854775807

    Description

      This crash has occurred in production.

      A admin executed script sending fid2path requests with malformed FID. This caused the following crash on the MDT0000.

      [65226.320307] LustreError: 24497:0:(osd_handler.c:1368:osd_fid_lookup()) ASSERTION( !updated ) failed: 
      [65226.321549] LustreError: 24497:0:(osd_handler.c:1368:osd_fid_lookup()) LBUG
      [65226.322160] Pid: 24497, comm: mdt00_004 3.10.0-1127.8.2.el7_lustre.x86_64 #1 SMP Mon Jun 8 13:48:45 UTC 2020
      [65226.322161] Call Trace:
      [65226.322171]  [<ffffffffc09c57cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [65226.322180]  [<ffffffffc09c587c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [65226.322184]  [<ffffffffc113c4e1>] osd_fid_lookup+0x17e1/0x1d90 [osd_ldiskfs]
      [65226.322192]  [<ffffffffc113caf1>] osd_object_init+0x61/0x110 [osd_ldiskfs]
      [65226.322198]  [<ffffffffc0c0d82b>] lu_object_start.isra.35+0x8b/0x120 [obdclass]
      [65226.322221]  [<ffffffffc0c11744>] lu_object_find_at+0x234/0xab0 [obdclass]
      [65226.322235]  [<ffffffffc0c11fd6>] lu_object_find+0x16/0x20 [obdclass]
      [65226.322248]  [<ffffffffc12e146b>] mdt_object_find+0x4b/0x170 [mdt]
      [65226.322259]  [<ffffffffc12ed0a5>] mdt_rpc_fid2path+0x4a5/0x17c0 [mdt]
      [65226.322267]  [<ffffffffc12ee770>] mdt_get_info+0x3b0/0x490 [mdt]
      [65226.322275]  [<ffffffffc0f5489a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      [65226.322319]  [<ffffffffc0ef973b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [65226.322343]  [<ffffffffc0efd0a4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [65226.322366]  [<ffffffffa2ac6691>] kthread+0xd1/0xe0
      [65226.322369]  [<ffffffffa3192d37>] ret_from_fork_nospec_end+0x0/0x39
      [65226.322372]  [<ffffffffffffffff>] 0xffffffffffffffff
      

       

      After some troubleshooting, I was able to reproduce the issue on a 2.12.7.

      Here are the following steps:

      1. Format filesystem (ldiskfs and mount FS
      2. umount MDT0000
      3. Mount MDT0000 as ldiskfs and modify OI_scrub file (binary) by clearing the SIF_NO_HANDLE_OLD_FID in sf_internal_flags
      4. Remount MDT0000 as lustre
      5. Get the inode number of  a Lustre system file:
        debugfs -c -R 'stat lfsck_layout' /dev/mapper/mds1_flakey
        Inode: 132 ....
      6. Execute fid2path with inode as FID (IGIF FID):
        lfs fid2path /media/lustrefs/client 0x84
      7. Crash

      Dk log extracted from the crash:

      00000004:00000040:0.0:1626701081.815693:0:24497:0:(mdt_handler.c:5736:mdt_object_init()) object init, fid = [0x84:0x0:0x0]
      00000004:00000010:0.0:1626701081.815695:0:24497:0:(mdd_object.c:277:mdd_object_alloc()) slab-alloced 'mdd_obj': 96 at ffff9c86d6d6a120.
      00000004:00000001:0.0:1626701081.815697:0:24497:0:(mdt_handler.c:5745:mdt_object_init()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:0.0:1626701081.815698:0:24497:0:(mdd_object.c:299:mdd_object_init()) Process entered
      00000004:00000001:0.0:1626701081.815699:0:24497:0:(lod_dev.c:212:lod_object_alloc()) Process entered
      00000004:00000010:0.0:1626701081.815700:0:24497:0:(lod_dev.c:214:lod_object_alloc()) slab-alloced 'lod_obj': 152 at ffff9c8699f347b8.
      00000004:00000001:0.0:1626701081.815701:0:24497:0:(lod_dev.c:224:lod_object_alloc()) Process leaving (rc=18446634700655249336 : -109373054302280 : ffff9c8699f347b8)
      00000004:00000001:0.0:1626701081.815701:0:24497:0:(mdd_object.c:309:mdd_object_init()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:0.0:1626701081.815703:0:24497:0:(lod_object.c:6967:lod_object_init()) Process entered
      00000004:00000001:0.0:1626701081.815703:0:24497:0:(lod_dev.c:128:lod_fld_lookup()) Process entered
      80000000:00000001:0.0:1626701081.815704:0:24497:0:(fld_handler.c:248:fld_server_lookup()) Process entered
      80000000:00000001:0.0:1626701081.815704:0:24497:0:(fld_handler.c:212:fld_local_lookup()) Process entered
      80000000:00000001:0.0:1626701081.815705:0:24497:0:(fld_cache.c:517:fld_cache_lookup()) Process entered
      80000000:00000001:0.0:1626701081.815705:0:24497:0:(fld_cache.c:536:fld_cache_lookup()) Process leaving (rc=0 : 0 : 0)
      80000000:00000001:0.0:1626701081.815706:0:24497:0:(fld_handler.c:229:fld_local_lookup()) Process leaving (rc=0 : 0 : 0)
      80000000:00000001:0.0:1626701081.815706:0:24497:0:(fld_handler.c:252:fld_server_lookup()) Process leaving (rc=0 : 0 : 0)
      00000004:00000040:0.0:1626701081.815707:0:24497:0:(lod_dev.c:169:lod_fld_lookup()) lustrefs-MDT0000-mdtlov: got tgt 0 for sequence: 0x84
      00000004:00000001:0.0:1626701081.815708:0:24497:0:(lod_dev.c:171:lod_fld_lookup()) Process leaving (rc=0 : 0 : 0)
      00080000:00000010:0.0:1626701081.815709:0:24497:0:(osd_handler.c:420:osd_object_alloc()) kmalloced '(mo)': 200 at ffff9c86d56fc400.
      00000004:00000001:0.0:1626701081.815710:0:24497:0:(lod_object.c:7010:lod_object_init()) Process leaving (rc=0 : 0 : 0)
      00080000:00000001:0.0:1626701081.815711:0:24497:0:(osd_handler.c:1142:osd_fid_lookup()) Process entered
      00080000:00000001:0.0:1626701081.815712:0:24497:0:(osd_oi.c:576:fid_is_on_ost()) Process entered
      00080000:00000001:0.0:1626701081.815712:0:24497:0:(osd_oi.c:584:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0)
      00080000:00000001:0.0:1626701081.815713:0:24497:0:(osd_handler.c:628:osd_iget_check()) Process entered
      00080000:00000001:0.0:1626701081.815715:0:24497:0:(osd_handler.c:770:osd_iget_check()) Process leaving via put (rc=0 : 0 : 0x0)
      00080000:00000001:0.0:1626701081.815715:0:24497:0:(osd_handler.c:854:osd_check_lma()) Process entered
      00080000:00000001:0.0:1626701081.815722:0:24497:0:(osd_handler.c:929:osd_check_lma()) Process leaving (rc=18446744073709551538 : -78 : ffffffffffffffb2)
      00080000:00000001:0.0:1626701081.815723:0:24497:0:(osd_oi.c:576:fid_is_on_ost()) Process entered
      00080000:00000001:0.0:1626701081.815723:0:24497:0:(osd_oi.c:584:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0)
      00080000:00000001:0.0:1626701081.815723:0:24497:0:(osd_oi.c:576:fid_is_on_ost()) Process entered
      00080000:00000001:0.0:1626701081.815723:0:24497:0:(osd_oi.c:584:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0)
      00080000:00000001:0.0:1626701081.815724:0:24497:0:(osd_handler.c:628:osd_iget_check()) Process entered
      00080000:00000001:0.0:1626701081.815724:0:24497:0:(osd_handler.c:770:osd_iget_check()) Process leaving via put (rc=0 : 0 : 0x0)
      00080000:00000001:0.0:1626701081.815725:0:24497:0:(osd_handler.c:854:osd_check_lma()) Process entered
      00080000:00000001:0.0:1626701081.815725:0:24497:0:(osd_handler.c:929:osd_check_lma()) Process leaving (rc=18446744073709551538 : -78 : ffffffffffffffb2)
      00080000:00040000:0.0:1626701081.815726:0:24497:0:(osd_handler.c:1368:osd_fid_lookup()) ASSERTION( !updated ) failed: 
      00080000:00040000:0.0:1626701081.816969:0:24497:0:(osd_handler.c:1368:osd_fid_lookup()) LBUG
      

      It appears that the flag SIF_NO_HANDLE_OLD_FID was not set in older versions of Lustre. In our case the FS was formatted in Lustre 2.7. But I checked some other filesystems with a Lustre's clusterstore version and they seem to be affected too.

      This cause od_igif_inoi flag to not be set in osd_device.
      And then the following code get the inode directly from the FID's sequence (as in Lustre 1.8 for IGIF FID):

      int osd_oi_lookup(....
      ....
               if (!osd->od_igif_inoi && fid_is_igif(fid)) {              
                       osd_id_gen(id, lu_igif_ino(fid), lu_igif_gen(fid));
                       return 0;                                          
               }                                                          
      

      Then osd_check_lma returns an error because FID in LMA and the current FID (IGIF: [0x84:0x0:0x0]) does not match: 

      static int osd_check_lma(const struct lu_env *env, struct osd_object *obj)
      ......
               if (fid != NULL && unlikely(!lu_fid_eq(rfid, fid))) {                    
                       if (fid_is_idif(rfid) && fid_is_idif(fid)) {                     
                               ...................                                    
                       }                                                                
                                                                                        
                       rc = -EREMCHG;                                                   
               }                                                                        
       

       

      Attachments

        1. getinfo_fid2path.txt
          0.3 kB
          Etienne Aujames
        2. gf_root_fid.txt
          0.1 kB
          Etienne Aujames
        3. osd_device.txt
          16 kB
          Etienne Aujames
        4. vmcore-dk.log
          81.13 MB
          Etienne Aujames
        5. vmcore-dmesg.txt
          112 kB
          Etienne Aujames

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: