Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18147

client page cache - page still in cache

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • Lustre 2.16.0
    • None
    • 018c4e8f25 (origin/master, origin/HEAD) LU-18110 doc: lctl multiple NIDs specification not clear
    • 3
    • 9223372036854775807

    Description

      While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
      test scripts is

      test_117() {
              local stripe_size
              (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs"
              $LCTL set_param llite.*.hybrid_io=0
              rm -rf $DIR/$tdir
              mkdir -p $DIR/$tdir
              $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping"
              stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile)
              $LCTL mark "==== write"
              dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync
              sync
              $LCTL mark "=== read"
              for i in /sys/devices/virtual/bdi/lustre-*; do
              echo 2048 > $i/read_ahead_kb
              done
      dir=/sys/kernel/debug/tracing
      set -x
      pushd $dir
      sysctl kernel.ftrace_enabled=1
      echo 0 > tracing_on
      echo 10000 > buffer_size_kb
      #echo 'nop' > current_tracer
      #echo '' >set_graph_function
      echo function_graph > current_tracer
      echo vfs_fadvise > set_graph_function
      echo > kprobe_events
      echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events
      echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events
      echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events
      # ax - page
      # r13 - nr_read
      echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events
      echo 1 > events/kprobes/kp1/enable
      echo 1 > events/kprobes/kp2/enable
      echo 1 > events/kprobes/kp3/enable
      echo 1 > events/kprobes/kp4/enable
      echo > trace
      echo 1 > tracing_on
      popd
              fadvise_dontneed_helper $DIR/$tdir/$tfile
              fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 ))
      pushd $dir
      echo 0 > events/kprobes/kp1/enable
      echo 0 > events/kprobes/kp2/enable
      echo 0 > events/kprobes/kp3/enable
      echo 0 > events/kprobes/kp4/enable
      echo 0 > tracing_on
      cat trace > /tmp/trace
      echo > trace
      echo > kprobe_events
      popd
      set +x
      }
      run_test 117 "RA should don't panic for multistripe"
      

      fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
      and it don't work. While inspecting a source this bug I started to trace client side and found
      fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.

      ftrace log don't have a readpage calls which expected as pages removed from page cache.
      I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.

      Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.

      Attachments

        Activity

          [LU-18147] client page cache - page still in cache
          pjones Peter Jones made changes -
          Fix Version/s Original: Lustre 2.16.0 [ 15190 ]
          paf Patrick Farrell (Inactive) made changes -
          Link Original: This issue is related to LU-13802 [ LU-13802 ]
          paf Patrick Farrell (Inactive) made changes -
          Summary Original: Hybrid IO had broke a client page cache - page still in cache. New: client page cache - page still in cache
          adilger Andreas Dilger made changes -
          Link New: This issue is related to LU-13802 [ LU-13802 ]
          adilger Andreas Dilger made changes -
          Fix Version/s New: Lustre 2.16.0 [ 15190 ]
          adilger Andreas Dilger made changes -
          Affects Version/s New: Lustre 2.16.0 [ 15190 ]
          stancheff Shaun Tancheff made changes -
          Description Original: While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
          test scripts is
          test_117() {
                  local stripe_size
                  (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs"
                  $LCTL set_param llite.*.hybrid_io=0
                  rm -rf $DIR/$tdir
                  mkdir -p $DIR/$tdir
                  $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping"
                  stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile)
                  $LCTL mark "==== write"
                  dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync
                  sync
                  $LCTL mark "=== read"
                  for i in /sys/devices/virtual/bdi/lustre-*; do
                  echo 2048 > $i/read_ahead_kb
                  done
          dir=/sys/kernel/debug/tracing
          set -x
          pushd $dir
          sysctl kernel.ftrace_enabled=1
          echo 0 > tracing_on
          echo 10000 > buffer_size_kb
          #echo 'nop' > current_tracer
          #echo '' >set_graph_function
          echo function_graph > current_tracer
          echo vfs_fadvise > set_graph_function
          echo > kprobe_events
          echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events
          echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events
          echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events
          # ax - page
          # r13 - nr_read
          echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events
          echo 1 > events/kprobes/kp1/enable
          echo 1 > events/kprobes/kp2/enable
          echo 1 > events/kprobes/kp3/enable
          echo 1 > events/kprobes/kp4/enable
          echo > trace
          echo 1 > tracing_on
          popd
                  fadvise_dontneed_helper $DIR/$tdir/$tfile
                  fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 ))
          pushd $dir
          echo 0 > events/kprobes/kp1/enable
          echo 0 > events/kprobes/kp2/enable
          echo 0 > events/kprobes/kp3/enable
          echo 0 > events/kprobes/kp4/enable
          echo 0 > tracing_on
          cat trace > /tmp/trace
          echo > trace
          echo > kprobe_events
          popd
          set +x
          }
          run_test 117 "RA should don't panic for multistripe"

          fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
          and it don't work. While inspecting a source this bug I started to trace client side and found
          fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.

          ftrace log don't have a readpage calls which expected as pages removed from page cache.
          I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.

          Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.


          New: While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
          test scripts is
          {noformat}
          test_117() {
                  local stripe_size
                  (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs"
                  $LCTL set_param llite.*.hybrid_io=0
                  rm -rf $DIR/$tdir
                  mkdir -p $DIR/$tdir
                  $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping"
                  stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile)
                  $LCTL mark "==== write"
                  dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync
                  sync
                  $LCTL mark "=== read"
                  for i in /sys/devices/virtual/bdi/lustre-*; do
                  echo 2048 > $i/read_ahead_kb
                  done
          dir=/sys/kernel/debug/tracing
          set -x
          pushd $dir
          sysctl kernel.ftrace_enabled=1
          echo 0 > tracing_on
          echo 10000 > buffer_size_kb
          #echo 'nop' > current_tracer
          #echo '' >set_graph_function
          echo function_graph > current_tracer
          echo vfs_fadvise > set_graph_function
          echo > kprobe_events
          echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events
          echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events
          echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events
          # ax - page
          # r13 - nr_read
          echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events
          echo 1 > events/kprobes/kp1/enable
          echo 1 > events/kprobes/kp2/enable
          echo 1 > events/kprobes/kp3/enable
          echo 1 > events/kprobes/kp4/enable
          echo > trace
          echo 1 > tracing_on
          popd
                  fadvise_dontneed_helper $DIR/$tdir/$tfile
                  fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 ))
          pushd $dir
          echo 0 > events/kprobes/kp1/enable
          echo 0 > events/kprobes/kp2/enable
          echo 0 > events/kprobes/kp3/enable
          echo 0 > events/kprobes/kp4/enable
          echo 0 > tracing_on
          cat trace > /tmp/trace
          echo > trace
          echo > kprobe_events
          popd
          set +x
          }
          run_test 117 "RA should don't panic for multistripe"
          {noformat}

          fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
          and it don't work. While inspecting a source this bug I started to trace client side and found
          fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.

          ftrace log don't have a readpage calls which expected as pages removed from page cache.
          I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.

          Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.


          shadow Alexey Lyashkov made changes -
          Description Original: While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
          test scripts is
          test_117() {
                  local stripe_size
                  (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs"
                  $LCTL set_param llite.*.hybrid_io=0
                  rm -rf $DIR/$tdir
                  mkdir -p $DIR/$tdir
                  $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping"
                  stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile)
                  $LCTL mark "==== write"
                  dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync
                  sync
                  $LCTL mark "=== read"
                  for i in /sys/devices/virtual/bdi/lustre-*; do
                  echo 2048 > $i/read_ahead_kb
                  done
          dir=/sys/kernel/debug/tracing
          set -x
          pushd $dir
          sysctl kernel.ftrace_enabled=1
          echo 0 > tracing_on
          echo 10000 > buffer_size_kb
          #echo 'nop' > current_tracer
          #echo '' >set_graph_function
          echo function_graph > current_tracer
          echo vfs_fadvise > set_graph_function
          echo > kprobe_events
          echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events
          echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events
          echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events
          # ax - page
          # r13 - nr_read
          echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events
          echo 1 > events/kprobes/kp1/enable
          echo 1 > events/kprobes/kp2/enable
          echo 1 > events/kprobes/kp3/enable
          echo 1 > events/kprobes/kp4/enable
          echo > trace
          echo 1 > tracing_on
          popd
                  fadvise_dontneed_helper $DIR/$tdir/$tfile
                  fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 ))
          pushd $dir
          echo 0 > events/kprobes/kp1/enable
          echo 0 > events/kprobes/kp2/enable
          echo 0 > events/kprobes/kp3/enable
          echo 0 > events/kprobes/kp4/enable
          echo 0 > tracing_on
          cat trace > /tmp/trace
          echo > trace
          echo > kprobe_events
          popd
          set +x
          }
          run_test 117 "RA should don't panic for multistripe"

          fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
          and it don't work. While inspecting a source this bug I started to trace client side and found
          fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.
          lustre log say strange lines

          {noformat}
          00000020:00008000:4.0:1723644734.683217:0:50699:0:(cl_page.c:338:cl_page_find()) 253@[0x200000401:0x51:0x0] fffff72885c473c0 0 1
          00000008:00000040:4.0:1723644734.683226:0:50699:0:(osc_cache.c:2230:osc_prep_async_page()) oap ffff9ae3d8bc9588 vmpage fffff72885c473c0 obj off 1036288
          00000008:00000020:4.0:1723644734.692718:0:50699:0:(osc_cache.c:2308:osc_queue_async_io()) obj ffff9ae4290a3178 ready 0|-|- wr 0|-|- rd 0|- oap ffff9ae3d8bc9588 page fffff72885c473c0 added for cmd 2
          00000008:00000020:4.0:1723644734.692722:0:50699:0:(osc_cache.c:1373:osc_consume_write_grant()) using 4096 grant credits for brw ffff9ae3d8bc95c0 page fffff72885c473c0
          00000008:00000040:4.0:1723644734.704038:0:50699:0:(osc_cache.c:2230:osc_prep_async_page()) oap ffff9ae40f7e2b68 vmpage fffff72885ebd5c0 obj off 1036288
          00000008:00000020:4.0:1723644734.713519:0:50699:0:(osc_cache.c:2308:osc_queue_async_io()) obj ffff9ae4290a3428 ready 0|-|- wr 0|-|- rd 0|- oap ffff9ae40f7e2b68 page fffff72885ebd5c0 added for cmd 2
          00000008:00008000:5.0:1723644734.829100:0:2858:0:(osc_request.c:1336:osc_checksum_bulk()) page fffff72885c473c0 map ffff9ae4207cee08 index 253 flags 17ffffc0005028 count 3 priv ffff9ae3d8bc9518: off 0
          00000008:00000040:2.0:1723644735.993418:0:50793:0:(osc_cache.c:2430:osc_teardown_async_page()) teardown oap ffff9ae3d8bc9588 page ffff9ae3d8bc9578 at index 253.
          {noformat}

          and ftrace log don't have a readpage calls which expected as pages removed from page cache.
          I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.

          Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.


          New: While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
          test scripts is
          test_117() {
                  local stripe_size
                  (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs"
                  $LCTL set_param llite.*.hybrid_io=0
                  rm -rf $DIR/$tdir
                  mkdir -p $DIR/$tdir
                  $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping"
                  stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile)
                  $LCTL mark "==== write"
                  dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync
                  sync
                  $LCTL mark "=== read"
                  for i in /sys/devices/virtual/bdi/lustre-*; do
                  echo 2048 > $i/read_ahead_kb
                  done
          dir=/sys/kernel/debug/tracing
          set -x
          pushd $dir
          sysctl kernel.ftrace_enabled=1
          echo 0 > tracing_on
          echo 10000 > buffer_size_kb
          #echo 'nop' > current_tracer
          #echo '' >set_graph_function
          echo function_graph > current_tracer
          echo vfs_fadvise > set_graph_function
          echo > kprobe_events
          echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events
          echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events
          echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events
          # ax - page
          # r13 - nr_read
          echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events
          echo 1 > events/kprobes/kp1/enable
          echo 1 > events/kprobes/kp2/enable
          echo 1 > events/kprobes/kp3/enable
          echo 1 > events/kprobes/kp4/enable
          echo > trace
          echo 1 > tracing_on
          popd
                  fadvise_dontneed_helper $DIR/$tdir/$tfile
                  fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 ))
          pushd $dir
          echo 0 > events/kprobes/kp1/enable
          echo 0 > events/kprobes/kp2/enable
          echo 0 > events/kprobes/kp3/enable
          echo 0 > events/kprobes/kp4/enable
          echo 0 > tracing_on
          cat trace > /tmp/trace
          echo > trace
          echo > kprobe_events
          popd
          set +x
          }
          run_test 117 "RA should don't panic for multistripe"

          fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
          and it don't work. While inspecting a source this bug I started to trace client side and found
          fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.

          ftrace log don't have a readpage calls which expected as pages removed from page cache.
          I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.

          Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.


          shadow Alexey Lyashkov created issue -

          People

            paf Patrick Farrell (Inactive)
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: