Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9171

GPF in osc_page_gang_lookup doing ELC with ldlm_cancel_no_wait_policy()

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      > [ 9337.400497] general protection fault: 0000 1 SMP
      > [ 9337.513717] CPU: 14 PID: 115720 Comm: ll_agl_31230 Tainted: G W OE NX 3.12.60-52.57.1.11767.0.PTF.996988-default #1
      > [ 9337.526131] Hardware name: /0CNCJW, BIOS 1.3.6 06/03/2015
      > [ 9337.532441] task: ffff88027aaa8240 ti: ffff8802d9458000 task.ti: ffff8802d9458000
      > [ 9337.540785] RIP: 0010:[<ffffffffa0b20416>] [<ffffffffa0b20416>] osc_page_gang_lookup+0x156/0x870 [osc]
      > [ 9337.551282] RSP: 0018:ffff8802d9459990 EFLAGS: 00010202
      > [ 9337.557204] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff886358d97470
      > [ 9337.565161] RDX: 0000000000000000 RSI: ffff8802d9459950 RDI: ffff8802d65c7170
      > [ 9337.573117] RBP: ffff886358d97648 R08: 0000000000000000 R09: 0000000000000001
      > [ 9337.581074] R10: 0000000000000000 R11: 0000000000000001 R12: 5a5a5a5a5a5a5a5a
      > [ 9337.589030] R13: ffff8802d73ae2d0 R14: ffff886358d97470 R15: ffff8802d65c7060
      > [ 9337.596986] FS: 0000000000000000(0000) GS:ffff88607f4e0000(0000) knlGS:0000000000000000
      > [ 9337.606021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > [ 9337.612438] CR2: 00007f6942c05de8 CR3: 00000063488c2000 CR4: 00000000001407e0
      > [ 9337.620393] Stack:
      > [ 9337.622632] ffffffffffffffff ffff880106b62000 ffff8862ff8cd680 ffff886358d97550
      > [ 9337.630921] ffffffffa0b07140 00ff8862ff8cd680 ffff886300000001 0000000000000000
      > [ 9337.639203] ffff8802d65c7180 ffff886358d97470 ffff8802d65c7170 ffff886358d97650
      > [ 9337.647486] Call Trace:
      > [ 9337.650248] [<ffffffffa0b09aa0>] osc_ldlm_weigh_ast+0x340/0x480 [osc]
      > [ 9337.657560] [<ffffffffa0af67fb>] osc_cancel_weight+0x3b/0xa0 [osc]
      > [ 9337.664586] [<ffffffffa0886d9b>] ldlm_cancel_no_wait_policy+0x2b/0x90 [ptlrpc]
      > [ 9337.672765] [<ffffffffa08887a1>] ldlm_prepare_lru_list+0x221/0x500 [ptlrpc]
      > [ 9337.680653] [<ffffffffa088cfc5>] ldlm_cancel_lru_local+0x15/0x40 [ptlrpc]
      > [ 9337.688347] [<ffffffffa088d1fc>] ldlm_prep_elc_req+0x20c/0x480 [ptlrpc]
      > [ 9337.695847] [<ffffffffa088d494>] ldlm_prep_enqueue_req+0x24/0x30 [ptlrpc]
      > [ 9337.703529] [<ffffffffa0affbd1>] osc_enqueue_base+0x1c1/0x6e0 [osc]
      > [ 9337.710622] [<ffffffffa0b09097>] osc_lock_enqueue+0x357/0xa00 [osc]
      > [ 9337.717756] [<ffffffffa0bff793>] cl_lock_enqueue+0x63/0x120 [obdclass]
      > [ 9337.725180] [<ffffffffa0a9cecc>] lov_lock_enqueue+0x9c/0x170 [lov]
      > [ 9337.732179] [<ffffffffa0bff793>] cl_lock_enqueue+0x63/0x120 [obdclass]
      > [ 9337.739600] [<ffffffffa0bffce2>] cl_lock_request+0x62/0x1e0 [obdclass]
      > [ 9337.747040] [<ffffffffa0e6e487>] cl_glimpse_lock+0x337/0x3d0 [lustre]
      > [ 9337.754359] [<ffffffffa0e6e7e7>] cl_glimpse_size0+0x1b7/0x1c0 [lustre]
      > [ 9337.761769] [<ffffffffa0e69b65>] ll_agl_trigger+0x115/0x4a0 [lustre]
      > [ 9337.768982] [<ffffffffa0e6a04d>] ll_agl_thread+0x15d/0x4b0 [lustre]
      > [ 9337.776098] [<ffffffff81077874>] kthread+0xb4/0xc0
      > [ 9337.781530] [<ffffffff81523498>] ret_from_fork+0x58/0x90

      Attachments

        Activity

          [LU-9171] GPF in osc_page_gang_lookup doing ELC with ldlm_cancel_no_wait_policy()
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25700/
          Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: e0850b31ccd5597b564d4385a424bfa6be6f2f3e

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25700/ Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy Project: fs/lustre-release Branch: master Current Patch Set: Commit: e0850b31ccd5597b564d4385a424bfa6be6f2f3e

          Attached test reproduced the crash but it works only with 2.6.32 kernel.
          Here is resulting failure
          {noformat}
          crash> bt
          PID: 28473  TASK: ffff880058a9e040  CPU: 0   COMMAND: "stat"
           #0 [ffff880059c0b5f0] machine_kexec at ffffffff81038f3b
           #1 [ffff880059c0b650] crash_kexec at ffffffff810c5b02
           #2 [ffff880059c0b720] oops_end at ffffffff81529030
           #3 [ffff880059c0b750] die at ffffffff81010e0b
           #4 [ffff880059c0b780] do_general_protection at ffffffff81528b32
           #5 [ffff880059c0b7b0] general_protection at ffffffff81528305
              [exception RIP: osc_ldlm_weigh_ast+306]
              RIP: ffffffffa09bcec2  RSP: ffff880059c0b868  RFLAGS: 00010292
              RAX: 5a5a5a5a5a5a5a02  RBX: ffff88004c729c78  RCX: ffff88004c729da0
              RDX: 5a5a5a5a5a5a5a5a  RSI: 0000000000000000  RDI: ffff88004c729db0
              RBP: ffff880059c0b8d8   R8: 0000000000000000   R9: ffffffff816457a0
              R10: 0000000000000001  R11: 0000000000000000  R12: ffff88005a690768
              R13: ffff880059c0b8a6  R14: ffff880057b9eac0  R15: ffff88004c729db0
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          {noformat}

          askulysh Andriy Skulysh added a comment - Attached test reproduced the crash but it works only with 2.6.32 kernel. Here is resulting failure {noformat} crash> bt PID: 28473  TASK: ffff880058a9e040  CPU: 0   COMMAND: "stat"  #0 [ffff880059c0b5f0] machine_kexec at ffffffff81038f3b  #1 [ffff880059c0b650] crash_kexec at ffffffff810c5b02  #2 [ffff880059c0b720] oops_end at ffffffff81529030  #3 [ffff880059c0b750] die at ffffffff81010e0b  #4 [ffff880059c0b780] do_general_protection at ffffffff81528b32  #5 [ffff880059c0b7b0] general_protection at ffffffff81528305     [exception RIP: osc_ldlm_weigh_ast+306]     RIP: ffffffffa09bcec2  RSP: ffff880059c0b868  RFLAGS: 00010292     RAX: 5a5a5a5a5a5a5a02  RBX: ffff88004c729c78  RCX: ffff88004c729da0     RDX: 5a5a5a5a5a5a5a5a  RSI: 0000000000000000  RDI: ffff88004c729db0     RBP: ffff880059c0b8d8   R8: 0000000000000000   R9: ffffffff816457a0     R10: 0000000000000001  R11: 0000000000000000  R12: ffff88005a690768     R13: ffff880059c0b8a6  R14: ffff880057b9eac0  R15: ffff88004c729db0     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018 {noformat}

          Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: https://review.whamcloud.com/25700
          Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c609c91808fee9f308fbbfcf487397918d03065a

          gerrit Gerrit Updater added a comment - Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: https://review.whamcloud.com/25700 Subject: LU-9171 osc: GPF while doing ELC with no_wait_policy Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c609c91808fee9f308fbbfcf487397918d03065a

          People

            wc-triage WC Triage
            askulysh Andriy Skulysh
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: