Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18523

tests sanity/test_65j: wait for ll_sa stat-ahead thread to quit during cleanup

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      # ONLY=63-65  bash sanity.sh
      ...
      == sanity test 65j: set default striping on root directory (bug 6367)=========================================================== 10:11:30 (1733393490)
      [  134.944066] Lustre: DEBUG MARKER: == sanity test 65j: set default striping on root directory (bug 6367)=========================================================== 10:11:30 (1733393490)
      cln..There are ll_sa thread not exit!
      test_65j returned 20
      FAIL 65j (2s)
      

      The list pattern statahead can optimize the stat() workload as follows:

      • opendir() will authorize the statahead
      • readdir() to get the name and inode number for the dentries
      • do stat() on the dentries one by one
      • closedir() will deauthorize the statahead
        After closedir() is called, the statahead thread will quit.

      However, this is not same for fname pattern statahead which workload is calling stat() on the files in Alphabetic sorting order of the file name and does not include by opendir/closedir() call pair.
      For a fname pattern statahead, the statahead thread will wait for a certain time period (30s by default) and it is not quit until the user is no longer using the directory (traverse and stat) for this long time.

      Thus after fname statahead pattern is enabled by default, we must wait for stat-ahead thread quit during cleanup. Otherwise, it will faile sanity/test_65j.

      Attachments

        Issue Links

          Activity

            [LU-18523] tests sanity/test_65j: wait for ll_sa stat-ahead thread to quit during cleanup

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57337/
            Subject: LU-18523 tests: wait statahead thread quit during cleanup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1a6967eff6663c70e15f14e853f531316ad9eab5

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57337/ Subject: LU-18523 tests: wait statahead thread quit during cleanup Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1a6967eff6663c70e15f14e853f531316ad9eab5

            Is there any reproducer for this problem?

            nothing specific, I did hit this in local testing of my patch.

            bzzz Alex Zhuravlev added a comment - Is there any reproducer for this problem? nothing specific, I did hit this in local testing of my patch.
            qian_wc Qian Yingjin added a comment - - edited

            create a ticket LU-18536 for the new bug.

            qian_wc Qian Yingjin added a comment - - edited create a ticket LU-18536 for the new bug.
            qian_wc Qian Yingjin added a comment -

            Hi Alex,

            Is there any reproducer for this problem?

            I found a possible bug in the stat-ahead code, will make a patch later.

            qian_wc Qian Yingjin added a comment - Hi Alex, Is there any reproducer for this problem? I found a possible bug in the stat-ahead code, will make a patch later.

            I'm not sure what would be the correct ticket for this problem:

            [ 1971.751519] systemd[1]: mnt-lustre.mount: Succeeded.
            [ 1994.470244] Lustre: lustre-MDT0000: haven't heard from client b1480839-909d-4d3c-a88e-ac243d683b7a (at 0@lo) in 33 seconds. I think it's dead, and I am evicting it. exp ffff9987114b0000, cur 1733932014 expire 1733931984 last 1733931981
            [ 1996.471000] Lustre: lustre-MDT0001: haven't heard from client b1480839-909d-4d3c-a88e-ac243d683b7a (at 0@lo) in 35 seconds. I think it's dead, and I am evicting it. exp ffff99866e14b000, cur 1733932016 expire 1733931986 last 1733931981
            [ 1996.474826] Lustre: Skipped 4 previous similar messages
            [ 2109.110074] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1024ms for inuse entry [0x200000402:0x458:0x0] to be finished
            [ 2263.020079] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1152ms for inuse entry [0x200000402:0x458:0x0] to be finished
            [ 2417.010074] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1280ms for inuse entry [0x200000402:0x458:0x0] to be finished
            [ 2724.520064] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1536ms for inuse entry [0x200000402:0x458:0x0] to be finished
            [ 2724.520348] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) Skipped 1 previous similar message
            [ 3340.070174] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 2048ms for inuse entry [0x200000402:0x458:0x0] to be finished
            [ 3340.112119] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) Skipped 3 previous similar messages
            

            the traces:

            PID: 103888   TASK: ffff998684d94e40  CPU: 0    COMMAND: "ll_sa_18552"
             #0 [ffff99866ff87d20] __schedule at ffffffff8c6f6bd6
                /tmp/kernel/kernel/sched/core.c: 3755
             #1 [ffff99866ff87d78] schedule at ffffffff8c6f7170
                /tmp/kernel/kernel/sched/core.c: 4602
             #2 [ffff99866ff87d90] schedule_timeout at ffffffff8c6fd475
                /tmp/kernel/kernel/time/timer.c: 1860
             #3 [ffff99866ff87e48] msleep at ffffffff8c1624a2
                /tmp/kernel/kernel/time/timer.c: 2011
             #4 [ffff99866ff87e58] ll_statahead_thread at ffffffffc173a7a1 [lustre]
                /home/lustre/master-mine/lustre/llite/statahead.c: 1646
             #5 [ffff99866ff87f10] kthread at ffffffff8c10383e
                /tmp/kernel/kernel/kthread.c: 354
             #6 [ffff99866ff87f50] ret_from_fork at ffffffff8c8001c4
                /tmp/kernel/arch/x86/entry/entry_64.S: 328
            
            PID: 177657   TASK: ffff99867b05c880  CPU: 1    COMMAND: "umount"
             #0 [ffff99868398fd68] __schedule at ffffffff8c6f6bd6
                /tmp/kernel/kernel/sched/core.c: 3755
             #1 [ffff99868398fdc0] schedule at ffffffff8c6f7170
                /tmp/kernel/kernel/sched/core.c: 4602
             #2 [ffff99868398fdd8] schedule_timeout at ffffffff8c6fd475
                /tmp/kernel/kernel/time/timer.c: 1860
             #3 [ffff99868398fe90] ll_kill_super at ffffffffc16f9e96 [lustre]
                /home/lustre/linux-4.18.0-477.15.1.el8_8/include/linux/compiler.h: 278
             #4 [ffff99868398fea8] lustre_kill_super at ffffffffc1733063 [lustre]
                /home/lustre/master-mine/lustre/llite/super25.c: 212
             #5 [ffff99868398feb8] deactivate_locked_super at ffffffff8c260e64
                /tmp/kernel/fs/super.c: 340
             #6 [ffff99868398fed0] cleanup_mnt at ffffffff8c282f46
                /tmp/kernel/fs/namespace.c: 115
             #7 [ffff99868398fee0] task_work_run at ffffffff8c10142a
                /tmp/kernel/kernel/task_work.c: 127
             #8 [ffff99868398ff20] exit_to_usermode_loop at ffffffff8c002355
                /tmp/kernel/./include/linux/tracehook.h: 188
             #9 [ffff99868398ff38] do_syscall_64 at ffffffff8c002b0e
                /tmp/kernel/arch/x86/entry/common.c: 200
            #10 [ffff99868398ff50] entry_SYSCALL_64_after_hwframe at ffffffff8c80007d
                /tmp/kernel/arch/x86/entry/entry_64.S: 147
            
            bzzz Alex Zhuravlev added a comment - I'm not sure what would be the correct ticket for this problem: [ 1971.751519] systemd[1]: mnt-lustre.mount: Succeeded. [ 1994.470244] Lustre: lustre-MDT0000: haven 't heard from client b1480839-909d-4d3c-a88e-ac243d683b7a (at 0@lo) in 33 seconds. I think it' s dead, and I am evicting it. exp ffff9987114b0000, cur 1733932014 expire 1733931984 last 1733931981 [ 1996.471000] Lustre: lustre-MDT0001: haven 't heard from client b1480839-909d-4d3c-a88e-ac243d683b7a (at 0@lo) in 35 seconds. I think it' s dead, and I am evicting it. exp ffff99866e14b000, cur 1733932016 expire 1733931986 last 1733931981 [ 1996.474826] Lustre: Skipped 4 previous similar messages [ 2109.110074] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1024ms for inuse entry [0x200000402:0x458:0x0] to be finished [ 2263.020079] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1152ms for inuse entry [0x200000402:0x458:0x0] to be finished [ 2417.010074] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1280ms for inuse entry [0x200000402:0x458:0x0] to be finished [ 2724.520064] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 1536ms for inuse entry [0x200000402:0x458:0x0] to be finished [ 2724.520348] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) Skipped 1 previous similar message [ 3340.070174] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) lustre: statahead thread waited 2048ms for inuse entry [0x200000402:0x458:0x0] to be finished [ 3340.112119] Lustre: 103888:0:(statahead.c:1649:ll_statahead_thread()) Skipped 3 previous similar messages the traces: PID: 103888 TASK: ffff998684d94e40 CPU: 0 COMMAND: "ll_sa_18552" #0 [ffff99866ff87d20] __schedule at ffffffff8c6f6bd6 /tmp/kernel/kernel/sched/core.c: 3755 #1 [ffff99866ff87d78] schedule at ffffffff8c6f7170 /tmp/kernel/kernel/sched/core.c: 4602 #2 [ffff99866ff87d90] schedule_timeout at ffffffff8c6fd475 /tmp/kernel/kernel/time/timer.c: 1860 #3 [ffff99866ff87e48] msleep at ffffffff8c1624a2 /tmp/kernel/kernel/time/timer.c: 2011 #4 [ffff99866ff87e58] ll_statahead_thread at ffffffffc173a7a1 [lustre] /home/lustre/master-mine/lustre/llite/statahead.c: 1646 #5 [ffff99866ff87f10] kthread at ffffffff8c10383e /tmp/kernel/kernel/kthread.c: 354 #6 [ffff99866ff87f50] ret_from_fork at ffffffff8c8001c4 /tmp/kernel/arch/x86/entry/entry_64.S: 328 PID: 177657 TASK: ffff99867b05c880 CPU: 1 COMMAND: "umount" #0 [ffff99868398fd68] __schedule at ffffffff8c6f6bd6 /tmp/kernel/kernel/sched/core.c: 3755 #1 [ffff99868398fdc0] schedule at ffffffff8c6f7170 /tmp/kernel/kernel/sched/core.c: 4602 #2 [ffff99868398fdd8] schedule_timeout at ffffffff8c6fd475 /tmp/kernel/kernel/time/timer.c: 1860 #3 [ffff99868398fe90] ll_kill_super at ffffffffc16f9e96 [lustre] /home/lustre/linux-4.18.0-477.15.1.el8_8/include/linux/compiler.h: 278 #4 [ffff99868398fea8] lustre_kill_super at ffffffffc1733063 [lustre] /home/lustre/master-mine/lustre/llite/super25.c: 212 #5 [ffff99868398feb8] deactivate_locked_super at ffffffff8c260e64 /tmp/kernel/fs/ super .c: 340 #6 [ffff99868398fed0] cleanup_mnt at ffffffff8c282f46 /tmp/kernel/fs/namespace.c: 115 #7 [ffff99868398fee0] task_work_run at ffffffff8c10142a /tmp/kernel/kernel/task_work.c: 127 #8 [ffff99868398ff20] exit_to_usermode_loop at ffffffff8c002355 /tmp/kernel/./include/linux/tracehook.h: 188 #9 [ffff99868398ff38] do_syscall_64 at ffffffff8c002b0e /tmp/kernel/arch/x86/entry/common.c: 200 #10 [ffff99868398ff50] entry_SYSCALL_64_after_hwframe at ffffffff8c80007d /tmp/kernel/arch/x86/entry/entry_64.S: 147

            "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57337
            Subject: LU-18523 tests: wait statahead thread quit during cleanup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c241c4d8406ae29a46cf426dfd3ce8e14d21d827

            gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57337 Subject: LU-18523 tests: wait statahead thread quit during cleanup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c241c4d8406ae29a46cf426dfd3ce8e14d21d827

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: