Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8175

conflicting PW & PR extent locks on a client

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      > [5034040.035051] Lustre: 16432:0:(client.c:1910:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453393018/real 1453393018] req@ffff881f9d653c00 x1518811430048732/t0(0) o3->snx11091-OST0028-osc-ffff881fe6574800@172.17.47.209@o2ib1013:6/4 lens 488/432 e 0 to 1 dl 1453393778 ref 2 fl Rpc:XU/2/ffffffff rc -11/-1
      > [5034040.035057] Lustre: 16432:0:(client.c:1910:ptlrpc_expire_one_request()) Skipped 32 previous similar messages
      > [5034482.398979] Lustre: snx11091-OST000b-osc-ffff881fe6574800: Connection to snx11091-OST000b (at 172.17.47.201@o2ib1013) was lost; in progress operations using this service will wait for recovery to complete
      > [5034482.398984] Lustre: Skipped 7 previous similar messages
      > [5034482.399254] Lustre: snx11091-OST000b-osc-ffff881fe6574800: Connection restored to snx11091-OST000b (at 172.17.47.201@o2ib1013)
      > [5034482.399257] Lustre: Skipped 7 previous similar messages
      > [5034798.943798] Lustre: 16422:0:(client.c:1910:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453393778/real 1453393778] req@ffff881fe4cc9000 x1518811430052084/t0(0) o4->snx11091-OST0028-osc-ffff881fe6574800@172.17.47.209@o2ib1013:6/4 lens 488/448 e 0 to 1 dl 1453394538 ref 2 fl Rpc:XU/2/ffffffff rc -11/-1
      > [5034798.943805] Lustre: 16442:0:(client.c:1910:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453393778/real 1453393778] req@ffff881fe4cc9400 x1518811430052092/t0(0) o4->snx11091-OST0028-osc-ffff881fe6574800@172.17.47.209@o2ib1013:6/4 lens 488/448 e 0 to 1 dl 1453394538 ref 2 fl Rpc:XU/2/ffffffff rc 0/-1
      > [5034798.943811] Lustre: 16442:0:(client.c:1910:ptlrpc_expire_one_request()) Skipped 30 previous similar messages
      > [5035427.382998] Lustre: snx11091-OST002a-osc-ffff881fe6574800: Connection restored to snx11091-OST002a (at 172.17.47.210@o2ib1012)
      > [5035427.383001] Lustre: Skipped 7 previous similar messages
      > [5035429.345176] LustreError: 16406:0:(osc_cache.c:2421:osc_teardown_async_page()) extent ffff88071aac01e0@

      {[0 -> 255/255], [3|0|+|cache|wihuY|ffff880877eec198], [1048576|256|+|+|ffff880e6beab738|256| (null)]}

      trunc at 0.
      > [5035429.345183] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) page@ffff880973c33000[3 ffff88037416ae18 4 0 1 (null) (null) 0x0]
      > [5035429.345188] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) vvp-page@ffff880973c330a0(0:0:0) vm@ffffea0006449948 20000000001079 2:0 ffff880973c33000 0 lru
      > [5035429.345191] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) lov-page@ffff880973c330f8, raid0
      > [5035429.345198] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) osc-page@ffff880973c33160 0: 1< 0x845fed 2 0 + - > 2< 0 0 4096 0x0 0x420 | (null) ffff881fe6ae0620 ffff880877eec198 > 3< + ffff880768e26380 0 0 0 > 4< 0 9 8 0 - | + - + + > 5< + - + - | 0 - | 948 - +>
      > [5035429.345202] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) end page@ffff880973c33000
      > [5035429.345204] LustreError: 16406:0:(osc_page.c:333:osc_page_delete()) Trying to teardown failed: -16
      > [5035429.345206] LustreError: 16406:0:(osc_page.c:334:osc_page_delete()) ASSERTION( 0 ) failed:
      > [5035429.353732] LustreError: 16406:0:(osc_page.c:334:osc_page_delete()) LBUG
      > [5035429.360601] Pid: 16406, comm: ptlrpcd_3
      > [5035429.360602]
      > [5035429.360603] Call Trace:
      > [5035429.360612] [<ffffffff81004b95>] dump_trace+0x75/0x300
      > [5035429.360636] [<ffffffffa089c82a>] libcfs_debug_dumpstack+0x4a/0x70 [libcfs]
      > [5035429.360664] [<ffffffffa089cd5e>] lbug_with_loc+0x3e/0xb0 [libcfs]
      > [5035429.360678] [<ffffffffa1d35103>] osc_page_delete+0x393/0x3d0 [osc]
      > [5035429.360722] [<ffffffffa09f43fd>] cl_page_delete0+0x6d/0x200 [obdclass]
      > [5035429.360765] [<ffffffffa09f45c5>] cl_page_delete+0x35/0x120 [obdclass]
      > [5035429.360817] [<ffffffffa1e695c6>] ll_invalidatepage+0x96/0x160 [lustre]
      > [5035429.360850] [<ffffffffa1e7b45c>] vvp_page_discard+0xcc/0x170 [lustre]
      > [5035429.360887] [<ffffffffa09f2ce8>] cl_page_invoid+0x58/0x150 [obdclass]
      > [5035429.360918] [<ffffffffa1d4193e>] check_and_discard_cb+0x13e/0x190 [osc]
      > [5035429.360934] [<ffffffffa1d41b4d>] osc_page_gang_lookup+0x1bd/0x340 [osc]
      > [5035429.360951] [<ffffffffa1d41e0b>] osc_lock_discard_pages+0x13b/0x240 [osc]
      > [5035429.360966] [<ffffffffa1d37993>] osc_lock_flush+0xf3/0x270 [osc]
      > [5035429.360979] [<ffffffffa1d37c09>] osc_lock_cancel+0xf9/0x1e0 [osc]
      > [5035429.361005] [<ffffffffa09f6bc5>] cl_lock_cancel0+0x65/0x150 [obdclass]
      > [5035429.361050] [<ffffffffa09f9f76>] cl_lock_hold_release+0x1e6/0x2c0 [obdclass]
      > [5035429.361081] [<ffffffffa1d3a613>] osc_lock_upcall+0x223/0x460 [osc]
      > [5035429.361093] [<ffffffffa1d1b82d>] osc_enqueue_fini+0x9d/0x270 [osc]
      > [5035429.361102] [<ffffffffa1d1e883>] osc_enqueue_interpret+0xe3/0x1e0 [osc]
      > [5035429.361136] [<ffffffffa1c00152>] ptlrpc_check_set+0x562/0x1b60 [ptlrpc]
      > [5035429.361174] [<ffffffffa1c2bd5b>] ptlrpcd_check+0x52b/0x550 [ptlrpc]
      > [5035429.361219] [<ffffffffa1c2c39b>] ptlrpcd+0x32b/0x410 [ptlrpc]
      > [5035429.361244] [<ffffffff81083f16>] kthread+0x96/0xa0
      > [5035429.361249] [<ffffffff8146d964>] kernel_thread_helper+0x4/0x10
      > [5035429.361252]
      > [5035429.361378] Kernel panic - not syncing: LBUG

      Attachments

        Activity

          [LU-8175] conflicting PW & PR extent locks on a client
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20345/
          Subject: LU-8175 ldlm: conflicting PW & PR extent locks on a client
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 80a818b80373bebd1438a74aeebda102b4885e53

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20345/ Subject: LU-8175 ldlm: conflicting PW & PR extent locks on a client Project: fs/lustre-release Branch: master Current Patch Set: Commit: 80a818b80373bebd1438a74aeebda102b4885e53

          LU-8388 looks to be dup of this issue.

          parinay parinay v kondekar (Inactive) added a comment - LU-8388 looks to be dup of this issue.

          Hi,

          Just a general note - The underlying problem (two overlapping extent locks granted during failover) isn't limited to a single client like in this case (IE, it can happen with locks from two different clients) and can lead to data corruption. We'll try to open an LU about that some time soon.

          paf Patrick Farrell (Inactive) added a comment - Hi, Just a general note - The underlying problem (two overlapping extent locks granted during failover) isn't limited to a single client like in this case (IE, it can happen with locks from two different clients) and can lead to data corruption. We'll try to open an LU about that some time soon.

          Hi Jinshan,

          Can you please review the patch?

          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - Hi Jinshan, Can you please review the patch? Thanks. Joe

          Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/20345
          Subject: LU-8175 ldlm: conflicting PW & PR extent locks on a client
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 9a105cb6160629cd848e5d6a45a3a11028559f39

          gerrit Gerrit Updater added a comment - Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/20345 Subject: LU-8175 ldlm: conflicting PW & PR extent locks on a client Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9a105cb6160629cd848e5d6a45a3a11028559f39

          People

            jay Jinshan Xiong (Inactive)
            askulysh Andriy Skulysh
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: