Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4878

fld_server_lookup() ASSERTION( fld->lsf_control_exp ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.3
    • 3
    • 13490

    Description

      The following LBUG appeared at customer site, during the mount process on all OSS in lustre 2.4.3 version.

      LustreError: 12838:0:(fld_handler.c:172:fld_server_lookup()) ASSERTION(fld->lsf_control_exp ) failed:
      LustreError: 12838:0:(fld_handler.c:172:fld_server_lookup()) LBUG
      
      Pid: 12838, comm: mount.lustre
      
      Call Trace:
       libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       lbug_with_loc+0x47/0xb0 [libcfs]
       fld_server_lookup+0x2f7/0x3d0 [fld]
       osd_fld_lookup+0x71/0x1d0 [osd_ldiskfs]
       osd_remote_fid+0x9a/0x280 [osd_ldiskfs]
       osd_index_ea_lookup+0521/0x850 [osd_ldiskfs]
       dt_lookup_dir+0x6f/0x130 [obdclass]
       llog_osd_open+0x485/0xc00 [obdclass]
       llog_open+0xba/0x2c0 [obdclass]
       mgc_process_log [mgc]
       mgc_process_config [mgc]
       lustre_process_log [obdclass]
       server_start_targets [obdclass]
       server_fill_super [obdclass]
       lustre_fill_super[obdclass]
       get_sb_nodev
       lustre_get_sb
       vfs_kern_mount
       do_kern_mount
       do_mount
       sys_mount
       system_call_fastpath
      

      This issue seems the same as LU-3126 for which a patch has been landed in lustre 2.5. Unfortunately no patch has been provided for lustre 2.4 release.

      Attachments

        Issue Links

          Activity

            [LU-4878] fld_server_lookup() ASSERTION( fld->lsf_control_exp ) failed
            pjones Peter Jones added a comment -

            Yes this would be under consideration for 2.4.4.

            pjones Peter Jones added a comment - Yes this would be under consideration for 2.4.4.

            Hi Bruno,

            Yes this ticket can be closed since our tests have shown the issue is fixed with patches #9929 and #9958.
            I hope both patches are planned for integration in 2.4 if a new version is released.

            pichong Gregoire Pichon added a comment - Hi Bruno, Yes this ticket can be closed since our tests have shown the issue is fixed with patches #9929 and #9958. I hope both patches are planned for integration in 2.4 if a new version is released.

            Hello Gregoire,
            Since patch #9958 is planned for 2.4 integration, do you agree if we close/resolve this issue as fixed ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Gregoire, Since patch #9958 is planned for 2.4 integration, do you agree if we close/resolve this issue as fixed ?

            Gregoire, don't misunderstand me, I did not mean that you added patches without good reasons to do so, but only that doing so you fall back out from our regression/interop testing process.
            Concerning the fact you added #5049 due to LU-2959, that may help for #5049 and #9958 to finally land ...

            bfaccini Bruno Faccini (Inactive) added a comment - Gregoire, don't misunderstand me, I did not mean that you added patches without good reasons to do so, but only that doing so you fall back out from our regression/interop testing process. Concerning the fact you added #5049 due to LU-2959 , that may help for #5049 and #9958 to finally land ...

            Hello Bruno,

            Actually the patch #5049 "LU-2059 llog: MGC to use OSD API for backup logs" has been integrated by Bull on top of release 2.4.x because the customer hit the LBUG ASSERTION(cli->cl_mgc_configs_dir) described in LU-2959. As mentionned in that ticket, the LBUG is fixed by patch #5049.

            These problems occured in lustre 2.4.x release and need to be addressed.

            pichong Gregoire Pichon added a comment - Hello Bruno, Actually the patch #5049 " LU-2059 llog: MGC to use OSD API for backup logs" has been integrated by Bull on top of release 2.4.x because the customer hit the LBUG ASSERTION(cli->cl_mgc_configs_dir) described in LU-2959 . As mentionned in that ticket, the LBUG is fixed by patch #5049. These problems occured in lustre 2.4.x release and need to be addressed.

            Hello Gregoire,
            I am not sure that my patch #9958 will finally be fully accepted+landed to b2_4 ... The main reason of this is that #5049 is itself still not in b2_4 and may be won't, so #9958 is not necessary then as Mike commented in patch with reason !!
            This points to some limit in the process where people use to decide to add more patches on top of releases we tested vs regressions and interoperability...

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Gregoire, I am not sure that my patch #9958 will finally be fully accepted+landed to b2_4 ... The main reason of this is that #5049 is itself still not in b2_4 and may be won't, so #9958 is not necessary then as Mike commented in patch with reason !! This points to some limit in the process where people use to decide to add more patches on top of releases we tested vs regressions and interoperability...

            Thanks for the backport Bruno. Our comments interleaved !

            I have tested a lustre version 2.4.3 with both additional patches

            • #9929 LU-3126 osd: remove fld lookup during configuration
            • #9958 LU-4878 osd-ldiskfs: don't assert on possible upgrade (backport of LU-3915)

            The OSS is able to start without any problem. Filesystem is operational.

            I am now waiting for these patches to be fully approved and Maloo tested so they can be delivered to the customer.

            pichong Gregoire Pichon added a comment - Thanks for the backport Bruno. Our comments interleaved ! I have tested a lustre version 2.4.3 with both additional patches #9929 LU-3126 osd: remove fld lookup during configuration #9958 LU-4878 osd-ldiskfs: don't assert on possible upgrade (backport of LU-3915 ) The OSS is able to start without any problem. Filesystem is operational. I am now waiting for these patches to be fully approved and Maloo tested so they can be delivered to the customer.

            You may have missed my previous update that already confirmed what you finally found!
            So yes, I agree that you can add #7673 or its back-port on top of your 2.4.3 version that also include #5049 ...

            bfaccini Bruno Faccini (Inactive) added a comment - You may have missed my previous update that already confirmed what you finally found! So yes, I agree that you can add #7673 or its back-port on top of your 2.4.3 version that also include #5049 ...

            People

              bfaccini Bruno Faccini (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: