Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13102

llog_cat_id2handle()) error opening log id rc = -2 osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING )

Details

    • 3
    • 9223372036854775807

    Description

      During massive IO at cluster we could reproduce the next issue.

      Nov 24 12:25:15 snx11281n003 kernel: LustreError: 12929:0:(llog_cat.c:404:llog_cat_id2handle()) snx11281-OST0000-osc-MDT0001: error opening log id [0x22903:0x1:0x0]:0: rc = -2
      Nov 24 12:25:15 snx11281n003 kernel: LustreError: 12929:0:(osp_sync.c:1293:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 3 changes, 0 in progress, 0 in flight
      Nov 24 12:25:15 snx11281n003 kernel: LustreError: 12929:0:(osp_sync.c:1293:osp_sync_thread()) LBUG
      Nov 24 12:25:15 snx11281n003 kernel: Pid: 12929, comm: osp-syn-0-1 3.10.0-957.1.3957.1.3.x3.3.36.x86_64 #1 SMP Tue Nov 19 09:47:32 PST 2019
      Nov 24 12:25:15 snx11281n003 kernel: Call Trace:
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffc0a427cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffc0a4287c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffc18b5a1b>] osp_sync_thread+0xa1b/0xa70 [osp]
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffa72c1c31>] kthread+0xd1/0xe0
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffa7976c1d>] ret_from_fork_nospec_begin+0x7/0x21
      Nov 24 12:25:15 snx11281n003 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff
      

      And with additional debug.

       llog_cat_set_first_idx())   catlog [0x6:0x1:0x0] first idx 34730, last_idx 34731
       osp_sync_process_queues()) 1 changes, 0 in progress, 0 in flight
       llog_process_thread()) stop processing plain 0x76941:1:0 index 64767 count 1
       llog_process_thread()) index: 34731, lh_last_idx: 34730 synced_idx: 34730 lgh_last_idx: 34731
       llog_cat_process_common()) processing log [0x2780f:0x1:0x0]:0   at index 34731 of catalog [0x6:0x1:0x0]
      llog_cat_id2handle()) snx11281-OST0001-osc-MDT0001:  error opening log id [0x2780f:0x1:0x0]:0:rc = -2
      

      RC
      The logic for rereading a llog buffer had an exception for a full catalog, when lgh_last_idx = llh_cat_idx and a first processing index is a llh_cat_idx+1. This check is based on a value lh_last_idx, which stays unchanged between buffer read. But llh_cat_idx could go forward, and this lead to a wrong check where reread doesn't happen. As a result Lustre got ENOENT for a record and stoped osp processing.

      Attachments

        Activity

          [LU-13102] llog_cat_id2handle()) error opening log id rc = -2 osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING )

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38541/
          Subject: LU-13102 llog: fix processing of a wrapped catalog
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 21b34dd8a226fbaf2fee52b294f2482aa8670ad2

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38541/ Subject: LU-13102 llog: fix processing of a wrapped catalog Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 21b34dd8a226fbaf2fee52b294f2482aa8670ad2

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38541
          Subject: LU-13102 llog: fix processing of a wrapped catalog
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 4be4f2d6589a740788a3fd1ca545422a4e6e6500

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38541 Subject: LU-13102 llog: fix processing of a wrapped catalog Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 4be4f2d6589a740788a3fd1ca545422a4e6e6500
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37102/
          Subject: LU-13102 llog: fix processing of a wrapped catalog
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: a4f049b96562fd502b1948fb082767351e040a1c

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37102/ Subject: LU-13102 llog: fix processing of a wrapped catalog Project: fs/lustre-release Branch: master Current Patch Set: Commit: a4f049b96562fd502b1948fb082767351e040a1c
          yujian Jian Yu added a comment - More failures on master branch: https://testing.whamcloud.com/test_sets/aae2a638-412f-11ea-9630-52540065bddc https://testing.whamcloud.com/test_sets/87deb48c-4130-11ea-9847-52540065bddc https://testing.whamcloud.com/test_sets/7a5d326e-415a-11ea-9847-52540065bddc https://testing.whamcloud.com/test_sets/894f039c-4164-11ea-9b1e-52540065bddc
          yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/1160632e-3bb7-11ea-adca-52540065bddc

          Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37102
          Subject: LU-13102 llog: fix processing of a wrapped catalog
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: b55bbbca0c425926d5fcc04a598a6754297fda8f

          gerrit Gerrit Updater added a comment - Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37102 Subject: LU-13102 llog: fix processing of a wrapped catalog Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b55bbbca0c425926d5fcc04a598a6754297fda8f

          People

            aboyko Alexander Boyko
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: