Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17679

sanity test_851: FAIL: fanotify did not report anything after 30 seconds

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c495f242-9411-4202-8c2b-3ef0c94c9f0b

      test_851 failed with the following error:

      fanotify did not report anything after 30 seconds
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/103426 - 4.18.0-513.18.1.el8_9.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/103426 - 4.18.0-513.18.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_851 - fanotify did not report anything after 30 seconds

      Attachments

        Issue Links

          Activity

            [LU-17679] sanity test_851: FAIL: fanotify did not report anything after 30 seconds
            flei Feng Lei added a comment -

            In my env, export PDSH=ssh can fix this issue.

            flei Feng Lei added a comment - In my env, export PDSH=ssh can fix this issue.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58141/
            Subject: LU-17679 tests: enable sanity/851

            this patch breaks local testing:

            == sanity test 851: fanotify can monitor open/read/write/close events for lustre fs ========================================================== 07:21:27 (1748330487)
            striped dir -i1 -c2 -H fnv_1a_64 /mnt/lustre/d851.sanity
                PID  NI COMMAND
             390341 -10 nice
            Started...
            ./../tests/test-framework.sh: line 3607: [[: cannot run remote command on localhost with no_dsh: syntax error in expression (error token is "run remote command on localhost with no_dsh")
            Waiting 30s for '0'
            ./../tests/test-framework.sh: line 3607: [[: cannot run remote command on localhost with no_dsh: syntax error in expression (error token is "run remote command on localhost with no_dsh")
            
            bzzz Alex Zhuravlev added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58141/ Subject: LU-17679 tests: enable sanity/851 this patch breaks local testing: == sanity test 851: fanotify can monitor open/read/write/close events for lustre fs ========================================================== 07:21:27 (1748330487) striped dir -i1 -c2 -H fnv_1a_64 /mnt/lustre/d851.sanity PID NI COMMAND 390341 -10 nice Started... ./../tests/test-framework.sh: line 3607: [[: cannot run remote command on localhost with no_dsh: syntax error in expression (error token is "run remote command on localhost with no_dsh" ) Waiting 30s for '0' ./../tests/test-framework.sh: line 3607: [[: cannot run remote command on localhost with no_dsh: syntax error in expression (error token is "run remote command on localhost with no_dsh" )

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58141/
            Subject: LU-17679 tests: enable sanity/851
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 58acb22072a917cbc16cccffae1de5e3f2962d52

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/58141/ Subject: LU-17679 tests: enable sanity/851 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 58acb22072a917cbc16cccffae1de5e3f2962d52
            flei Feng Lei added a comment - - edited

            Repeat the test for several thousands of times locally and in Maloo, it did not reproduce.

            Look at the existing 4 test failures, it seems that monitor_lustre program did not report anything. One possible reason is that the program is forked from bash but never scheduled and executed in 30/60 seconds. It is a little ridiculous that a progam is not scheduled and executed in such a long time. But thinking of the test machine is a virutal machine, we cannot deny this hypothesis.

            So now I'm going to:

            1) in monitor_lustre program, print start message just before it starts to monitor the event fd.

            2) in test script, wait for the start message to verify that monitor_lustrefs is executed and is ready to receive events.

            3) in test script, improve the priority of monitor_lustre process.

            I would suggest to merge the patch and enable sanity/851 again for daily testing. If it happens again, at least we can gather more information for further analysis.

            flei Feng Lei added a comment - - edited Repeat the test for several thousands of times locally and in Maloo, it did not reproduce. Look at the existing 4 test failures, it seems that monitor_lustre program did not report anything. One possible reason is that the program is forked from bash but never scheduled and executed in 30/60 seconds. It is a little ridiculous that a progam is not scheduled and executed in such a long time. But thinking of the test machine is a virutal machine, we cannot deny this hypothesis. So now I'm going to: 1) in monitor_lustre program, print start message just before it starts to monitor the event fd. 2) in test script, wait for the start message to verify that monitor_lustrefs is executed and is ready to receive events. 3) in test script, improve the priority of monitor_lustre process. I would suggest to merge the patch and enable sanity/851 again for daily testing. If it happens again, at least we can gather more information for further analysis.

            "Feng Lei <flei@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58141
            Subject: LU-17679 tests: enable sanity/851
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 049a6bb9bb66ef90edee4eb3ba52141791a06126

            gerrit Gerrit Updater added a comment - "Feng Lei <flei@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58141 Subject: LU-17679 tests: enable sanity/851 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 049a6bb9bb66ef90edee4eb3ba52141791a06126

            "Feng Lei <flei@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58136
            Subject: LU-17679 tests: try to reproduce sanity/851 failure
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a8a6cdd369fa6518baf2d62165b1387d43fa4fc8

            gerrit Gerrit Updater added a comment - "Feng Lei <flei@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58136 Subject: LU-17679 tests: try to reproduce sanity/851 failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a8a6cdd369fa6518baf2d62165b1387d43fa4fc8

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57359/
            Subject: LU-17679 tests: Disable sanity/851
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6aee73ea00ea0e2d02b7fc8be388e7d8e95c1e9f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57359/ Subject: LU-17679 tests: Disable sanity/851 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6aee73ea00ea0e2d02b7fc8be388e7d8e95c1e9f
            flei Feng Lei added a comment -

            I'm not sure fanotify is a reliable feature in kernel. It seems that inotify is more popular and better tested.

            I'll reopen this issue but please disable the test temporarily .

            flei Feng Lei added a comment - I'm not sure fanotify is a reliable feature in kernel. It seems that inotify is more popular and better tested. I'll reopen this issue but please disable the test temporarily .
            bzzz Alex Zhuravlev added a comment - +1 on master: https://testing.whamcloud.com/test_sets/106a076a-bf0d-42bf-a596-e15b89e5fafd
            lixi_wc Li Xi added a comment -

            It is not so productive to just disable/remove a test case without understanding the reason. I am not sure it is precise that fanotify() is purely a VFS interface with no relationship with Lustre. Even so, if fanotify() works well on local file system, not on Lustre (on a single client), that is a problem that needs improvement/fix.

            lixi_wc Li Xi added a comment - It is not so productive to just disable/remove a test case without understanding the reason. I am not sure it is precise that fanotify() is purely a VFS interface with no relationship with Lustre. Even so, if fanotify() works well on local file system, not on Lustre (on a single client), that is a problem that needs improvement/fix.

            People

              flei Feng Lei
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: