Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2285

Test failure on replay-ost-single test_3: write page inode failed -2

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 5469

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/70b83f68-2797-11e2-9e20-52540035b04c.

      The sub-test test_3 failed with the following error:

      test_3 failed with 1

      on client we can see in dmesg that the write actually failed.

      Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 \(1352148149\)
      Lustre: DEBUG MARKER: == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149)
      LustreError: 17578:0:(vvp_io.c:1037:vvp_io_commit_write()) Write page 11 of inode ffff81031f41fb28 failed -2
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-ost-single test_3: @@@@@@ FAIL: test_3 failed with 1 
      

      Info required for matching: replay-ost-single 3

      Attachments

        Issue Links

          Activity

            [LU-2285] Test failure on replay-ost-single test_3: write page inode failed -2

            look at something like:

            /* This will trigger a watchdog timeout */
            OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP,
            (MDT_SERVICE_WATCHDOG_FACTOR *
            at_get(&svcpt->scp_at_estimate)) + 1);

            bzzz Alex Zhuravlev added a comment - look at something like: /* This will trigger a watchdog timeout */ OBD_FAIL_TIMEOUT(OBD_FAIL_MDS_STATFS_LCW_SLEEP, (MDT_SERVICE_WATCHDOG_FACTOR * at_get(&svcpt->scp_at_estimate)) + 1);

            It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

            liwei Li Wei (Inactive) added a comment - It is tricky to make sure the file creation (osp_precreate_reserve() and osp_precreate_get_id()) happens after osp_precreate_cleanup_orphans() has read opd_last_used_id and before the orphan cleanup RPC is replied.

            hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

            bzzz Alex Zhuravlev added a comment - hmm, why is it tricky? basically we want non-empty pool in a specific OSP, then stop corresponded OST, wait till MDS got disconnected and try to create on using that specific OST ?

            The new test is a bit tricky to get right, but here is a proof-of-concept fix for the major issue: http://review.whamcloud.com/4590.

            liwei Li Wei (Inactive) added a comment - The new test is a bit tricky to get right, but here is a proof-of-concept fix for the major issue: http://review.whamcloud.com/4590 .

            no, I think the problem is there and it's more serious than you described. basically there should be no reservations/allocations done during orphan cleanup procedure. so I'm suggesting to develop a test reproducing this at first.

            bzzz Alex Zhuravlev added a comment - no, I think the problem is there and it's more serious than you described. basically there should be no reservations/allocations done during orphan cleanup procedure. so I'm suggesting to develop a test reproducing this at first.
            liwei Li Wei (Inactive) added a comment - - edited

            Alex, are you suggesting there is no problem needs to be fixed with regard to the race? But I don't get how existing code block allocations during orphan deletions.

            liwei Li Wei (Inactive) added a comment - - edited Alex, are you suggesting there is no problem needs to be fixed with regard to the race? But I don't get how existing code block allocations during orphan deletions.

            I'd suggest to develop a test which block orphan cleanup procedure on OST and force MDS to allocate pre-created object. the race you described seem to be a subset of general case where any object allocation should be blocked during orphan cleanup procedure.

            bzzz Alex Zhuravlev added a comment - I'd suggest to develop a test which block orphan cleanup procedure on OST and force MDS to allocate pre-created object. the race you described seem to be a subset of general case where any object allocation should be blocked during orphan cleanup procedure.
            liwei Li Wei (Inactive) added a comment - - edited

            The file creation was satisfied from the pre-created pool without waiting for a pre-creation RPC. The race I was trying to point out is between osp_precreate_get_id() and osp_precreate_cleanup_orphans(). The latter sends on-disk last used IDs to OSTs, while osp_precreate_get_id() increments in-memory last used IDs. Hence, the race may cause the object IDs assigned based on pre-OST-failure counters to be destroyed by the concurrent orphan deletion request. The log shows that this happens.

            If osp_precreate_cleanup_orphans() sends in-memory last used IDs instead, this problem would go away, I think. Of course, we need to make sure in-memory last used IDs are properly initialized with their on-disk counterparts before the initial orphan deletions. I haven't tried this yet. Any objections?

            liwei Li Wei (Inactive) added a comment - - edited The file creation was satisfied from the pre-created pool without waiting for a pre-creation RPC. The race I was trying to point out is between osp_precreate_get_id() and osp_precreate_cleanup_orphans(). The latter sends on-disk last used IDs to OSTs, while osp_precreate_get_id() increments in-memory last used IDs. Hence, the race may cause the object IDs assigned based on pre-OST-failure counters to be destroyed by the concurrent orphan deletion request. The log shows that this happens. If osp_precreate_cleanup_orphans() sends in-memory last used IDs instead, this problem would go away, I think. Of course, we need to make sure in-memory last used IDs are properly initialized with their on-disk counterparts before the initial orphan deletions. I haven't tried this yet. Any objections?

            > The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST.

            correct, though this is supposed to be so in the code, look at osp_precreate_thread():

            upon reconnect we restart the main loop and do not get to osp_precreate_send() before synchronous osp_precreate_cleanup_orphans() is completed.

            probably there is a race between reconnect and ptlrpc_queue_wait() ?

            bzzz Alex Zhuravlev added a comment - > The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST. correct, though this is supposed to be so in the code, look at osp_precreate_thread(): upon reconnect we restart the main loop and do not get to osp_precreate_send() before synchronous osp_precreate_cleanup_orphans() is completed. probably there is a race between reconnect and ptlrpc_queue_wait() ?

            In both reports above, the processes writing to Lustre got ENOENT:

            == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149)
            Failing ost1 on node client-21-ib
            CMD: client-21-ib grep -c /mnt/ost1' ' /proc/mounts
            tee: standard output: No such file or directory
            [...]
            

            I'm able to reproduce this ENOENT locally on a single-node VM setup with ZFS targets:

            == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116)
            replay-ost-single.sh: line 72: echo: write error: No such file or directory
             replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2 
            

            From my syslog:

            Nov 12 12:38:36 h221g kernel: LustreError: 22747:0:(osp_precreate.c:382:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: going to cleanup orphans since 3
            Nov 12 12:38:37 h221g kernel: LustreError: 22605:0:(ofd_obd.c:1069:ofd_orphans_destroy()) lustre-OST0000: deleting orphan objects from 5 to 67
            Nov 12 12:38:37 h221g kernel: Lustre: DEBUG MARKER: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116)
            Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 33: opd_pre_last_created 67opd_pre_used_id 34 opd_pre_reserved 0
            Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 35
            Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 35: 0
            Nov 12 12:38:41 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 34: rc -2
            Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 33: rc -2
            Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 32: opd_pre_last_created 67opd_pre_used_id 35 opd_pre_reserved 0
            Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 36
            Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 36: 0
            Nov 12 12:38:42 h221g kernel: Lustre: DEBUG MARKER: replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2
            Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) Skipped 1 previous similar message
            Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 32: rc -2
            Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:447:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: Got last_id 67 from OST, last_used is 36, next 67
            Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:284:osp_precreate_send()) lustre-OST0000-osc-MDT0000: new last_created 100
            

            The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST.

            Another problem observed during the investigation is that osp sends last used ID plus one in OBD_FL_DELORPHAN requests but OFD (as well as OBDFilter) expects just last used ID.

            I'll submit patches to fix these issues.

            liwei Li Wei (Inactive) added a comment - In both reports above, the processes writing to Lustre got ENOENT: == replay-ost-single test 3: Fail OST during write, with verification ================================ 12:42:29 (1352148149) Failing ost1 on node client-21-ib CMD: client-21-ib grep -c /mnt/ost1' ' /proc/mounts tee: standard output: No such file or directory [...] I'm able to reproduce this ENOENT locally on a single-node VM setup with ZFS targets: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116) replay-ost-single.sh: line 72: echo: write error: No such file or directory replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2 From my syslog: Nov 12 12:38:36 h221g kernel: LustreError: 22747:0:(osp_precreate.c:382:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: going to cleanup orphans since 3 Nov 12 12:38:37 h221g kernel: LustreError: 22605:0:(ofd_obd.c:1069:ofd_orphans_destroy()) lustre-OST0000: deleting orphan objects from 5 to 67 Nov 12 12:38:37 h221g kernel: Lustre: DEBUG MARKER: == replay-ost-single test 2: |x| 10 open(O_CREAT)s == 12:38:36 (1352695116) Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 33: opd_pre_last_created 67opd_pre_used_id 34 opd_pre_reserved 0 Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 35 Nov 12 12:38:37 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 35: 0 Nov 12 12:38:41 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 34: rc -2 Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 33: rc -2 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:721:osp_precreate_reserve()) lustre-OST0000-osc-MDT0000: precreated 32: opd_pre_last_created 67opd_pre_used_id 35 opd_pre_reserved 0 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_precreate.c:780:osp_precreate_get_id()) Incremented opd_pre_used_id for OSP 0: 36 Nov 12 12:38:42 h221g kernel: LustreError: 22501:0:(osp_object.c:291:osp_object_create()) Written last used ID for OSP 0: 36: 0 Nov 12 12:38:42 h221g kernel: Lustre: DEBUG MARKER: replay-ost-single test_2: @@@@@@ FAIL: create /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.2-2 Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) Skipped 1 previous similar message Nov 12 12:38:42 h221g kernel: LustreError: 22605:0:(ldlm_resource.c:1107:ldlm_resource_get()) lvbo_init failed for resource 32: rc -2 Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:447:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: Got last_id 67 from OST, last_used is 36, next 67 Nov 12 12:38:43 h221g kernel: LustreError: 22747:0:(osp_precreate.c:284:osp_precreate_send()) lustre-OST0000-osc-MDT0000: new last_created 100 The file creation in question happened after the OSP had sent the OBD_FL_DELORPHAN request but before the OSP got the reply and updated its counters. I think the file creation should wait until the last created ID was known from the OST. Another problem observed during the investigation is that osp sends last used ID plus one in OBD_FL_DELORPHAN requests but OFD (as well as OBDFilter) expects just last used ID. I'll submit patches to fix these issues.
            liwei Li Wei (Inactive) added a comment - https://maloo.whamcloud.com/test_sets/a2a98882-2865-11e2-9c12-52540035b04c

            People

              liwei Li Wei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: