Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18599

BUG: unable to handle kernel NULL pointer dereference in lustre_msg_get_opc

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      [ 4633.704169] Lustre: DEBUG MARKER: == replay-single test 80e: DNE: create remote dir, drop MDT1 rep, fail MDT0 ========================================================== 13:50:53 (1735048253)
      [ 4633.795919] LustreError: 287593:0:(ldlm_lib.c:3250:target_send_reply_msg()) @@@ dropping reply  req@ffff8b819d0464c0 x1819325261762944/t55834574916(0) o36->f7deb4f7-2bf3-49f1-9d35-d00b552fb152@0@lo:130/0 lens 560/448 e 0 to 0 dl 1735048265 ref 1 fl Interpret:/200/0 rc 0/0 job:'lfs.0' uid:0 gid:0
      [ 4633.799256] LustreError: 287593:0:(ldlm_lib.c:3250:target_send_reply_msg()) Skipped 1 previous similar message
      [ 4637.119428] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [ 4637.162012] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      [ 4637.310504] systemd[1]: mnt-lustre\x2dmds1.mount: Succeeded.
      [ 4637.351726] Lustre: Failing over lustre-MDT0000
      [ 4637.588293] Lustre: server umount lustre-MDT0000 complete
      [ 4650.151047] Lustre: lustre-MDT0001: Client f7deb4f7-2bf3-49f1-9d35-d00b552fb152 (at 0@lo) reconnecting
      [ 4650.151217] Lustre: Skipped 2 previous similar messages
      [ 4650.175380] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [ 4650.175715] PGD 0 P4D 0 
      [ 4650.178978] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [ 4650.179022] CPU: 0 PID: 3469 Comm: ptlrpcd_rcv Tainted: G        W  O     --------- -  - 4.18.0 #11
      [ 4650.179104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
      [ 4650.179168] RIP: 0010:lustre_msg_get_opc+0x1/0xe0 [ptlrpc]
      [ 4650.179305] Code: d3 1b 00 00 00 02 00 48 c7 05 77 d3 1b 00 d0 05 d3 c0 e8 b2 6a b3 ff b8 68 12 00 00 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 53 <81> 7f 08 d3 0b d0 0b 48 89 fb 74 59 48 b8 00 01 00 00 22 04 00 00
      [ 4650.179411] RSP: 0018:ffff8b819aaabce0 EFLAGS: 00010282
      [ 4650.179448] RAX: acc08380f03c0ad2 RBX: ffff8b81cbd93b00 RCX: 0000000000000001
      [ 4650.179496] RDX: 0000000080000001 RSI: ffffffffc06aa76c RDI: 0000000000000000
      [ 4650.179543] RBP: ffff8b81901cd980 R08: 0000000000000010 R09: 0000000000000000
      [ 4650.179590] R10: ffff8b81a4eb6000 R11: ffff8b81a4eb55a6 R12: ffff8b81a8d0efc0
      [ 4650.179641] R13: ffff8b818c54e258 R14: acc08380f03c0871 R15: ffff8b81cbd93c90
      [ 4650.179689] FS:  0000000000000000(0000) GS:ffff8b82a3800000(0000) knlGS:0000000000000000
      [ 4650.179744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4650.179815] CR2: 0000000000000008 CR3: 00000001041ff000 CR4: 0000000000350eb0
      [ 4650.179871] Call Trace:
      [ 4650.179905]  mdc_replay_open+0xd6/0x460 [mdc]
      [ 4650.179952]  ptlrpc_replay_interpret+0x14e/0x7b0 [ptlrpc]
      [ 4650.180083]  ? lustre_msg_clear_flags+0x1c/0x90 [ptlrpc]
      [ 4650.180206]  ptlrpc_check_set+0x52b/0x3270 [ptlrpc]
      [ 4650.180326]  ptlrpcd+0x832/0xa20 [ptlrpc]
      [ 4650.180443]  ? do_wait_intr_irq+0x70/0x70
      [ 4650.180482]  ? ptlrpc_disconnect_import+0x3e0/0x3e0 [ptlrpc]
      [ 4650.180608]  kthread+0x16e/0x1a0
      

      Attachments

        Activity

          [LU-18599] BUG: unable to handle kernel NULL pointer dereference in lustre_msg_get_opc
          pjones Peter Jones added a comment -

          Merged for 2.17

          pjones Peter Jones added a comment - Merged for 2.17

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57587/
          Subject: LU-18599 mdc: assign mod_close_req when RPC is ready
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5c120ce3ad31675fe9dbae3ec05182dae2ff20b2

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57587/ Subject: LU-18599 mdc: assign mod_close_req when RPC is ready Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5c120ce3ad31675fe9dbae3ec05182dae2ff20b2

          "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57587
          Subject: LU-18599 mdc: assign mod_close_req when RPC is ready
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: da60b4cec97b246fbedb13e69fb55e6a05b3a5d9

          gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57587 Subject: LU-18599 mdc: assign mod_close_req when RPC is ready Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: da60b4cec97b246fbedb13e69fb55e6a05b3a5d9

          short analysis:

          crash> p *(struct ptlrpc_request *)0xffff8b81cbd93b00|more
          $5 = {
            rq_type = 4711,
            rq_status = 301,
            rq_replied = 1,
            rq_err = 0,
            rq_timedout = 0,
            rq_transno = 373662154771,
            rq_xid = 1819325261762688,
                cr_cb_data = 0xffff8b81abc64f00,
                cr_commit_cb = 0xffffffffc0f1c560 <mdc_commit_open>,
                cr_replay_cb = 0xffffffffc0f1c100 <mdc_replay_open>
              rc_fmt = 0xffffffffc0ce7cc0 <RQF_LDLM_INTENT_OPEN>,
          
          
          
          crash> p *(struct md_open_data *)0xffff8b81abc64f00
          $7 = {
            mod_och = 0xffff8b81a8d0efc0,
            mod_open_req = 0xffff8b81cbd93b00,
            mod_close_req = 0xffff8b81901cd980,
            mod_refcount = {
              counter = 3
            },
            mod_is_create = false
          
          
          crash> p *(struct ptlrpc_request *)0xffff8b81901cd980
          $8 = {
            rq_type = 4711,
            rq_status = 0,
            rq_phase = RQ_PHASE_RPC,
            rq_next_phase = RQ_PHASE_UNDEFINED,
            rq_transno = 0,
            rq_xid = 1819325261788288,
            rq_replied = 0,
            rq_err = 0,
            rq_timedout = 0,
            rq_resend = 0,
            rq_restart = 0,
            rq_replay = 0,
            rq_no_resend = 0,
            rq_waiting = 1,
                cr_commit_cb = 0x0,
                cr_replay_cb = 0x0
            rq_reqbuf = 0xffff8b818e678800,
            rq_repbuf = 0x0,
            rq_import = 0xffff8b81910a3000,
            rq_timeout = 11,
              rc_req = 0xffff8b81901cd980,
              rc_reqmsg = 0xffff8b818e678800,
              rc_repmsg = 0x0,
              rc_req_swab_mask = 0,
              rc_rep_swab_mask = 0,
              rc_fmt = 0xffffffffc0ce7c00 <RQF_MDS_CLOSE>,
              rc_loc = RCL_CLIENT,
          

          AFAICS, the close RPC was being packed at the time, it had no rq_reqpmsg which mdc_replay_open() wanted to check.

          bzzz Alex Zhuravlev added a comment - short analysis: crash> p *(struct ptlrpc_request *)0xffff8b81cbd93b00|more $5 = { rq_type = 4711, rq_status = 301, rq_replied = 1, rq_err = 0, rq_timedout = 0, rq_transno = 373662154771, rq_xid = 1819325261762688, cr_cb_data = 0xffff8b81abc64f00, cr_commit_cb = 0xffffffffc0f1c560 <mdc_commit_open>, cr_replay_cb = 0xffffffffc0f1c100 <mdc_replay_open> rc_fmt = 0xffffffffc0ce7cc0 <RQF_LDLM_INTENT_OPEN>, crash> p *(struct md_open_data *)0xffff8b81abc64f00 $7 = { mod_och = 0xffff8b81a8d0efc0, mod_open_req = 0xffff8b81cbd93b00, mod_close_req = 0xffff8b81901cd980, mod_refcount = { counter = 3 }, mod_is_create = false crash> p *(struct ptlrpc_request *)0xffff8b81901cd980 $8 = { rq_type = 4711, rq_status = 0, rq_phase = RQ_PHASE_RPC, rq_next_phase = RQ_PHASE_UNDEFINED, rq_transno = 0, rq_xid = 1819325261788288, rq_replied = 0, rq_err = 0, rq_timedout = 0, rq_resend = 0, rq_restart = 0, rq_replay = 0, rq_no_resend = 0, rq_waiting = 1, cr_commit_cb = 0x0, cr_replay_cb = 0x0 rq_reqbuf = 0xffff8b818e678800, rq_repbuf = 0x0, rq_import = 0xffff8b81910a3000, rq_timeout = 11, rc_req = 0xffff8b81901cd980, rc_reqmsg = 0xffff8b818e678800, rc_repmsg = 0x0, rc_req_swab_mask = 0, rc_rep_swab_mask = 0, rc_fmt = 0xffffffffc0ce7c00 <RQF_MDS_CLOSE>, rc_loc = RCL_CLIENT, AFAICS, the close RPC was being packed at the time, it had no rq_reqpmsg which mdc_replay_open() wanted to check.

          People

            bzzz Alex Zhuravlev
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: