Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2285

Test failure on replay-ost-single test_3: write page inode failed -2

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 5469

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/70b83f68-2797-11e2-9e20-52540035b04c.

      The sub-test test_3 failed with the following error:

      test_3 failed with 1

      on client we can see in dmesg that the write actually failed.

      Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 \(1352148149\)
      Lustre: DEBUG MARKER: == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149)
      LustreError: 17578:0:(vvp_io.c:1037:vvp_io_commit_write()) Write page 11 of inode ffff81031f41fb28 failed -2
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-ost-single test_3: @@@@@@ FAIL: test_3 failed with 1 
      

      Info required for matching: replay-ost-single 3

      Attachments

        Issue Links

          Activity

            [LU-2285] Test failure on replay-ost-single test_3: write page inode failed -2
            bogl Bob Glossman (Inactive) added a comment - another one in b2_5: https://maloo.whamcloud.com/test_sets/b3d5a116-9c60-11e3-8f3e-52540035b04c
            bogl Bob Glossman (Inactive) added a comment - another one: https://maloo.whamcloud.com/test_sessions/5121f06e-9b49-11e3-8a4e-52540035b04c
            bogl Bob Glossman (Inactive) added a comment - another one: https://maloo.whamcloud.com/test_sets/7c9d76dc-9b12-11e3-8ad7-52540035b04c

            I have lowered the priority from "blocker" to "major". The last patch should be worked out before the release, but could wait after the feature freeze.

            liwei Li Wei (Inactive) added a comment - I have lowered the priority from "blocker" to "major". The last patch should be worked out before the release, but could wait after the feature freeze.

            Only one test patch is left; I'd suggest lowering the priority from "blocker".

            liwei Li Wei (Inactive) added a comment - Only one test patch is left; I'd suggest lowering the priority from "blocker".
            liwei Li Wei (Inactive) added a comment - - edited

            An update on the patches as of Dec 22:

            http://review.whamcloud.com/4668 (Debug messages) Landed.
            http://review.whamcloud.com/4511 (Orphan cleanup from correct ID) Landed.
            http://review.whamcloud.com/4625 (opd_pre_last_created) Landed.
            http://review.whamcloud.com/4590 (Block allocation during orphan cleanups) Landed.
            http://review.whamcloud.com/4610 (New regression test) Being refreshed.

            liwei Li Wei (Inactive) added a comment - - edited An update on the patches as of Dec 22: http://review.whamcloud.com/4668 (Debug messages) Landed. http://review.whamcloud.com/4511 (Orphan cleanup from correct ID) Landed. http://review.whamcloud.com/4625 (opd_pre_last_created) Landed. http://review.whamcloud.com/4590 (Block allocation during orphan cleanups) Landed. http://review.whamcloud.com/4610 (New regression test) Being refreshed.

            > The new test. Still a bit real-time dependent.

            there are a lot of tests depending on time already. all the 40-46 in sanityn, for example.

            bzzz Alex Zhuravlev added a comment - > The new test. Still a bit real-time dependent. there are a lot of tests depending on time already. all the 40-46 in sanityn, for example.

            http://review.whamcloud.com/4610

            The new test. Still a bit real-time dependent.

            liwei Li Wei (Inactive) added a comment - http://review.whamcloud.com/4610 The new test. Still a bit real-time dependent.

            look at something like:

            /* This will trigger a watchdog timeout */
            OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP,
            (MDT_SERVICE_WATCHDOG_FACTOR *
            at_get(&svcpt->scp_at_estimate)) + 1);

            bzzz Alex Zhuravlev added a comment - look at something like: /* This will trigger a watchdog timeout */ OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP, (MDT_SERVICE_WATCHDOG_FACTOR * at_get(&svcpt->scp_at_estimate)) + 1);

            It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

            liwei Li Wei (Inactive) added a comment - It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

            hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

            bzzz Alex Zhuravlev added a comment - hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

            People

              liwei Li Wei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: