Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      A reincarnation of LU-12040 with PFL. An attempt to replay a pooled PFL file create/open silently fails.

      Attachments

        Issue Links

          Activity

            [LU-13809] PFL file lost during recovery
            spitzcor Cory Spitz added a comment -

            zam, were you going to push to b2_12 then?

            spitzcor Cory Spitz added a comment - zam , were you going to push to b2_12 then?

            spitzcor
            > is there any work remaining for this ticket?
            no work except porting to 2.12

            zam Alexander Zarochentsev added a comment - spitzcor > is there any work remaining for this ticket? no work except porting to 2.12
            spitzcor Cory Spitz added a comment -

            zam, is there any work remaining for this ticket? If not, I think we can resolve it for 2.14.0 with the landing of https://review.whamcloud.com/#/c/39468/.

            spitzcor Cory Spitz added a comment - zam , is there any work remaining for this ticket? If not, I think we can resolve it for 2.14.0 with the landing of https://review.whamcloud.com/#/c/39468/ .

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39468/
            Subject: LU-13809 mdc: fix lovea for replay
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 72d45e1d344c5559d7620102a86a83bbf095796b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39468/ Subject: LU-13809 mdc: fix lovea for replay Project: fs/lustre-release Branch: master Current Patch Set: Commit: 72d45e1d344c5559d7620102a86a83bbf095796b

            the test https://review.whamcloud.com/39468 illustrates a file loss:

            Failing mds1 on devvm1
            Stopping /mnt/lustre-mds1 (opts:) on devvm1
            reboot facets: mds1
            Failover mds1 to devvm1
            mount facets: mds1
            Starting mds1:   /dev/mapper/mds1_flakey /mnt/lustre-mds1
            Started lustre-MDT0000
            devvm1: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid
            mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
             replay-single test_134: @@@@@@ FAIL: pfl file does not exist 
              Trace dump:
              = ./../tests/test-framework.sh:6216:error()
              = replay-single.sh:4910:test_134()
              = ./../tests/test-framework.sh:6519:run_one()
              = ./../tests/test-framework.sh:6568:run_one_logged()
              = ./../tests/test-framework.sh:6393:run_test()
              = replay-single.sh:4912:main()
            Dumping lctl log to /tmp/test_logs/1595329100/replay-single.test_134.*.1595329165.log
            Dumping logs only on local client.
            Resetting fail_loc on all nodes...done.
            Destroy the created pools: pool_134
            lustre.pool_134
            OST lustre-OST0001_UUID removed from pool lustre.pool_134
            Pool lustre.pool_134 destroyed
            FAIL 134 (40s)
            [root@devvm1 tests]#
            
            zam Alexander Zarochentsev added a comment - the test https://review.whamcloud.com/39468 illustrates a file loss: Failing mds1 on devvm1 Stopping /mnt/lustre-mds1 (opts:) on devvm1 reboot facets: mds1 Failover mds1 to devvm1 mount facets: mds1 Starting mds1: /dev/mapper/mds1_flakey /mnt/lustre-mds1 Started lustre-MDT0000 devvm1: executing wait_import_state_mount (FULL|IDLE) mdc.lustre-MDT0000-mdc-*.mds_server_uuid mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec replay-single test_134: @@@@@@ FAIL: pfl file does not exist Trace dump: = ./../tests/test-framework.sh:6216:error() = replay-single.sh:4910:test_134() = ./../tests/test-framework.sh:6519:run_one() = ./../tests/test-framework.sh:6568:run_one_logged() = ./../tests/test-framework.sh:6393:run_test() = replay-single.sh:4912:main() Dumping lctl log to /tmp/test_logs/1595329100/replay-single.test_134.*.1595329165.log Dumping logs only on local client. Resetting fail_loc on all nodes...done. Destroy the created pools: pool_134 lustre.pool_134 OST lustre-OST0001_UUID removed from pool lustre.pool_134 Pool lustre.pool_134 destroyed FAIL 134 (40s) [root@devvm1 tests]#

            Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/39468
            Subject: LU-13809 tests: improve replay-single test_134
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e3ecde8b6e83d4c9dfa4d78ea0cbd203e197e3c0

            gerrit Gerrit Updater added a comment - Alexander Zarochentsev (alexander.zarochentsev@hpe.com) uploaded a new patch: https://review.whamcloud.com/39468 Subject: LU-13809 tests: improve replay-single test_134 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e3ecde8b6e83d4c9dfa4d78ea0cbd203e197e3c0

            People

              zam Alexander Zarochentsev
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: