Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16826

MDS nodes panicked running lfsck repair create lost objects: (osd_handler.c:6260:osd_index_declare_ea_insert()) ASSERTION( fid != ((void *)0) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Full stack tace is:

      [81891.829222] LustreError: 2384348:0:(osd_handler.c:6260:osd_index_declare_ea_insert()) ASSERTION( fid != ((void *)0) ) failed:
      [81891.842752] LustreError: 2384348:0:(osd_handler.c:6260:osd_index_declare_ea_insert()) LBUG
      [81891.851987] Pid: 2384348, comm: lfsck_namespace 4.18.0-305.10.2.x6.4.010.32.x86_64 #1 SMP Thu Apr 27 19:48:12 MDT 2023
      [81891.863654] Call Trace TBD:
      [81891.867456] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
      [81891.873549] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
      [81891.879328] [<0>] osd_index_declare_ea_insert+0x3d4/0x480 [osd_ldiskfs]
      [81891.886923] [<0>] lod_sub_declare_insert+0xef/0x240 [lod]
      [81891.893314] [<0>] lfsck_namespace_repair_dangling+0xe75/0x1370 [lfsck]
      [81891.900770] [<0>] lfsck_namespace_assistant_handler_p1+0x13b1/0x2020 [lfsck]
      [81891.908732] [<0>] lfsck_assistant_engine+0x359/0x1c20 [lfsck]
      [81891.915378] [<0>] kthread+0x116/0x130
      [81891.919931] [<0>] ret_from_fork+0x1f/0x40
      [81891.924807] Kernel panic - not syncing: LBUG
      [81891.929939] CPU: 24 PID: 2384348 Comm: lfsck_namespace Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.10.2.x6.4.010.32.x86_64 #1
      [81891.944936] Hardware name: Viking Enterprise Solutions VSSEP1EC/VSSEP1EC, BIOS RWH3LJ-10.07.00 08/29/2022
      [81891.955347] Call Trace:
      [81891.958645]  dump_stack+0x5c/0x80
      [81891.962787]  panic+0xe7/0x2a9
      [81891.966564]  ? ret_from_fork+0x1f/0x40
      [81891.971112]  lbug_with_loc.cold.10+0x18/0x18 [libcfs]
      [81891.976956]  osd_index_declare_ea_insert+0x3d4/0x480 [osd_ldiskfs]
      [81891.983914]  ? osd_index_declare_ea_delete+0x1cd/0x2f0 [osd_ldiskfs]
      [81891.991040]  lod_sub_declare_insert+0xef/0x240 [lod]
      [81891.996762]  lfsck_namespace_repair_dangling+0xe75/0x1370 [lfsck]
      [81892.003700]  ? dt_lookup_dir+0x80/0x190 [obdclass]
      [81892.009229]  lfsck_namespace_assistant_handler_p1+0x13b1/0x2020 [lfsck]
      [81892.016561]  ? __schedule+0x2cc/0x700
      [81892.020938]  lfsck_assistant_engine+0x359/0x1c20 [lfsck]
      [81892.026945]  ? __switch_to+0x10c/0x480
      [81892.031371]  ? __schedule+0x2cc/0x700
      [81892.035689]  ? finish_wait+0x80/0x80
      [81892.039917]  ? lfsck_master_engine+0xcd0/0xcd0 [lfsck]
      [81892.045680]  kthread+0x116/0x130
      [81892.049530]  ? kthread_flush_work_fn+0x10/0x10
      [81892.054580]  ret_from_fork+0x1f/0x40 

      Attachments

        Issue Links

          Activity

            [LU-16826] MDS nodes panicked running lfsck repair create lost objects: (osd_handler.c:6260:osd_index_declare_ea_insert()) ASSERTION( fid != ((void *)0) ) failed

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57686
            Subject: LU-16826 tests: lfsck to repair a dangling remote entry
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 29c327c689cec8b994b4ee838e18bda61f66107b

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57686 Subject: LU-16826 tests: lfsck to repair a dangling remote entry Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 29c327c689cec8b994b4ee838e18bda61f66107b

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57685
            Subject: LU-16826 lfsck: init rec_fid before declare_insert
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 46d1958515a353a0f7c71c11fc3ad01104a10725

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57685 Subject: LU-16826 lfsck: init rec_fid before declare_insert Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 46d1958515a353a0f7c71c11fc3ad01104a10725

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57624
            Subject: LU-16826 lfsck: stop all should work
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c3e4dbafb95db6128592b9368efda36b47415d6d

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57624 Subject: LU-16826 lfsck: stop all should work Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c3e4dbafb95db6128592b9368efda36b47415d6d
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50998/
            Subject: LU-16826 tests: lfsck to repair a dangling remote entry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 07e02a600e5707de30e1441ce56b68b0cbc3c260

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50998/ Subject: LU-16826 tests: lfsck to repair a dangling remote entry Project: fs/lustre-release Branch: master Current Patch Set: Commit: 07e02a600e5707de30e1441ce56b68b0cbc3c260

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50980/
            Subject: LU-16826 lfsck: init rec_fid before declare_insert
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 02ac821653a0b2d897442e276d0afc31755064a4

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50980/ Subject: LU-16826 lfsck: init rec_fid before declare_insert Project: fs/lustre-release Branch: master Current Patch Set: Commit: 02ac821653a0b2d897442e276d0afc31755064a4

            "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50998
            Subject: LU-16826 tests: lfsck to repair a dangling remote entry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: afbf69ada658ca63c7a5953f3de12beb49d3a62b

            gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50998 Subject: LU-16826 tests: lfsck to repair a dangling remote entry Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: afbf69ada658ca63c7a5953f3de12beb49d3a62b

            Rerproducer:

            MDSCOUNT=2 sh llmount.sh 
            
            ../utils/lfs mkdir -i 0 /mnt/lustre/mdt0dir
            ../utils/lfs mkdir -i 1 /mnt/lustre/mdt1dir
            
            touch /mnt/lustre/mdt0dir/foo
            mv /mnt/lustre/mdt0dir/foo /mnt/lustre/mdt1dir/
            FOOFID=$(../utils/lfs path2fid /mnt/lustre/mdt1dir/foo | sed -E 's/^.(.*).$/\1/')
            echo $FOOFID 
            
            sync
            umount /mnt/lustre-mds1
            umount /mnt/lustre-mds2
            
            echo "rm /REMOTE_PARENT_DIR/$FOOFID" | debugfs -w /dev/mapper/mds1_flakey 
            
            mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1/
            mount -t lustre /dev/mapper/mds2_flakey /mnt/lustre-mds2/
            
            ../utils/lctl lfsck_start -M lustre-MDT0000 -C
            ../utils/lctl lfsck_start -M lustre-MDT0001 -C
            
            
            zam Alexander Zarochentsev added a comment - Rerproducer: MDSCOUNT=2 sh llmount.sh ../utils/lfs mkdir -i 0 /mnt/lustre/mdt0dir ../utils/lfs mkdir -i 1 /mnt/lustre/mdt1dir touch /mnt/lustre/mdt0dir/foo mv /mnt/lustre/mdt0dir/foo /mnt/lustre/mdt1dir/ FOOFID=$(../utils/lfs path2fid /mnt/lustre/mdt1dir/foo | sed -E 's/^.(.*).$/\1/' ) echo $FOOFID sync umount /mnt/lustre-mds1 umount /mnt/lustre-mds2 echo "rm /REMOTE_PARENT_DIR/$FOOFID" | debugfs -w /dev/mapper/mds1_flakey mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1/ mount -t lustre /dev/mapper/mds2_flakey /mnt/lustre-mds2/ ../utils/lctl lfsck_start -M lustre-MDT0000 -C ../utils/lctl lfsck_start -M lustre-MDT0001 -C

            "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50980
            Subject: LU-16826 lfsck: init rec_fid before declare_insert
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5dd21e56cbd9a695bf5444218bdec7206c346afe

            gerrit Gerrit Updater added a comment - "Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50980 Subject: LU-16826 lfsck: init rec_fid before declare_insert Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5dd21e56cbd9a695bf5444218bdec7206c346afe
            lfsck_namespace_repair_dangling(...):
            ...
                    /* 7a. if child is remote, delete and insert to generate local agent */
                    if (dt_object_remote(child)) {
                            rc = dt_declare_delete(env, parent,
                                                   (const struct dt_key *)lnr->lnr_name,
                                                   th);
                            if (rc)
                                    GOTO(stop, rc);
            
            ===>        rc = dt_declare_insert(env, parent, (const struct dt_rec *)rec,
                                                   (const struct dt_key *)lnr->lnr_name,
                                                   th);
                            if (rc)
                                    GOTO(stop, rc);
                    }
            

            Looks like 7a code path was never called (or the crash has not been reported yet), it misses rec->ref_fid initialisation before calling dt_declare_insert(), it causes an assertion failure in
            osd_index_declare_ea_insert().

            zam Alexander Zarochentsev added a comment - lfsck_namespace_repair_dangling(...): ... /* 7a. if child is remote, delete and insert to generate local agent */ if (dt_object_remote(child)) { rc = dt_declare_delete(env, parent, ( const struct dt_key *)lnr->lnr_name, th); if (rc) GOTO(stop, rc); ===> rc = dt_declare_insert(env, parent, ( const struct dt_rec *)rec, ( const struct dt_key *)lnr->lnr_name, th); if (rc) GOTO(stop, rc); } Looks like 7a code path was never called (or the crash has not been reported yet), it misses rec->ref_fid initialisation before calling dt_declare_insert(), it causes an assertion failure in osd_index_declare_ea_insert().

            People

              zam Alexander Zarochentsev
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: