Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17983

LBUG: in mdt_fix_reply() hit during FOFB testing

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      http://steve-10.hpc.amslabs.hpecorp.net/TestScreen/neo/index.php?sessionstarttime=1686053724&groupname=racer-FOFB&configuration=dectet&doc_id=647f2360c0ebbd13d13157af&DataBase=lustre&ets=1686131374115

      0811 (mds) console:

      [  187.296125] LustreError: 7079:0:(mdt_open.c:1373:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid  req@000000005a1b14a2 x1767947079689920/t0(4294975400) o101->7237381a-7e60-4f2c-bd3d-c3a6e2670fcd@192.168.1.18@tcp:40/0 lens 576/3424 e 0 to 0 dl 1686046445 ref 1 fl Complete:/4/0 rc 0/0 job:'ls.0'
      [  187.301936] LustreError: 7079:0:(mdt_handler.c:4777:mdt_intent_open()) @@@ Replay open failed with -14  req@000000005a1b14a2 x1767947079689920/t0(4294975400) o101->7237381a-7e60-4f2c-bd3d-c3a6e2670fcd@192.168.1.18@tcp:40/0 lens 576/600 e 0 to 0 dl 1686046445 ref 1 fl Complete:/4/0 rc 0/0 job:'ls.0'
      [  188.159843] LustreError: 7079:0:(mdt_lib.c:816:mdt_fix_reply()) ASSERTION( req_capsule_get_size(pill, &RMF_MDT_MD, RCL_SERVER) == info->mti_attr.ma_lmm_size ) failed: 
      [  188.176571] LustreError: 7079:0:(mdt_lib.c:816:mdt_fix_reply()) LBUG
      [  188.181052] Pid: 7079, comm: tgt_recover_0 4.18.0-305.10.2.x6.6.000.74.x86_64 #1 SMP Tue May 23 07:54:25 MDT 2023
      [  188.199769] Call Trace TBD:
      [  188.205106] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
      [  188.211989] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
      [  188.218936] [<0>] mdt_fix_reply+0x6c9/0x700 [mdt]
      [  188.232980] [<0>] mdt_reint_internal+0x4e8/0x7d0 [mdt]
      [  188.236403] [<0>] mdt_intent_open+0x137/0x420 [mdt]
      [  188.240056] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
      [  188.243683] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
      [  188.246854] [<0>] ldlm_lock_enqueue+0x47a/0xb00 [ptlrpc]
      [  188.249546] [<0>] ldlm_handle_enqueue0+0x739/0x1410 [ptlrpc]
      [  188.252974] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
      [  188.256096] [<0>] tgt_request_handle+0xc93/0x1950 [ptlrpc]
      [  188.259454] [<0>] handle_recovery_req+0x140/0x270 [ptlrpc]
      [  188.263323] [<0>] replay_request_or_update.isra.30+0x2f8/0xa30 [ptlrpc]
      [  188.273023] [<0>] target_recovery_thread+0x748/0x12e0 [ptlrpc]
      [  188.278908] [<0>] kthread+0x116/0x130
      [  188.289521] [<0>] ret_from_fork+0x35/0x40
      [  188.292975] Kernel panic - not syncing: LBUG
      

      Attachments

        Activity

          [LU-17983] LBUG: in mdt_fix_reply() hit during FOFB testing
          pjones Peter Jones added a comment -

          Merged for 2.16

          pjones Peter Jones added a comment - Merged for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55551/
          Subject: LU-17983 mdt: mti_big_lov and mti_big_lmv
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: a1c7d6412e91c98ad96d82123f736342c6165328

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55551/ Subject: LU-17983 mdt: mti_big_lov and mti_big_lmv Project: fs/lustre-release Branch: master Current Patch Set: Commit: a1c7d6412e91c98ad96d82123f736342c6165328
          pjones Peter Jones added a comment -

          As per discussion on the LWG call today, moving tickets that do not appear to be essential to fix version 2.17. If the fix lands before code freeze we will update the fix version to reflect that but we want to focus on activities on the critical path. Please speak up if you think that this issue definitely needs to be fixed before we could issue a 2.16 release.

          pjones Peter Jones added a comment - As per discussion on the LWG call today, moving tickets that do not appear to be essential to fix version 2.17. If the fix lands before code freeze we will update the fix version to reflect that but we want to focus on activities on the critical path. Please speak up if you think that this issue definitely needs to be fixed before we could issue a 2.16 release.

          "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55551
          Subject: LU-17983 mdt: protect ea attr size
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3a6a2146d6d02c41cd3128706e9d4f88dc7628d4

          gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55551 Subject: LU-17983 mdt: protect ea attr size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3a6a2146d6d02c41cd3128706e9d4f88dc7628d4

          People

            stancheff Shaun Tancheff
            stancheff Shaun Tancheff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: