Details
-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Lustre 2.16.0
-
None
-
018c4e8f25 (origin/master, origin/HEAD) LU-18110 doc: lctl multiple NIDs specification not clear
-
3
-
9223372036854775807
Description
While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
test scripts is
test_117() { local stripe_size (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs" $LCTL set_param llite.*.hybrid_io=0 rm -rf $DIR/$tdir mkdir -p $DIR/$tdir $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping" stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile) $LCTL mark "==== write" dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync sync $LCTL mark "=== read" for i in /sys/devices/virtual/bdi/lustre-*; do echo 2048 > $i/read_ahead_kb done dir=/sys/kernel/debug/tracing set -x pushd $dir sysctl kernel.ftrace_enabled=1 echo 0 > tracing_on echo 10000 > buffer_size_kb #echo 'nop' > current_tracer #echo '' >set_graph_function echo function_graph > current_tracer echo vfs_fadvise > set_graph_function echo > kprobe_events echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events # ax - page # r13 - nr_read echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events echo 1 > events/kprobes/kp1/enable echo 1 > events/kprobes/kp2/enable echo 1 > events/kprobes/kp3/enable echo 1 > events/kprobes/kp4/enable echo > trace echo 1 > tracing_on popd fadvise_dontneed_helper $DIR/$tdir/$tfile fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 )) pushd $dir echo 0 > events/kprobes/kp1/enable echo 0 > events/kprobes/kp2/enable echo 0 > events/kprobes/kp3/enable echo 0 > events/kprobes/kp4/enable echo 0 > tracing_on cat trace > /tmp/trace echo > trace echo > kprobe_events popd set +x } run_test 117 "RA should don't panic for multistripe"
fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used.
and it don't work. While inspecting a source this bug I started to trace client side and found
fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache.
ftrace log don't have a readpage calls which expected as pages removed from page cache.
I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags.
Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls.
Attachments
Activity
Fix Version/s | Original: Lustre 2.16.0 [ 15190 ] |
Summary | Original: Hybrid IO had broke a client page cache - page still in cache. | New: client page cache - page still in cache |
Fix Version/s | New: Lustre 2.16.0 [ 15190 ] |
Affects Version/s | New: Lustre 2.16.0 [ 15190 ] |
Description |
Original:
While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
test scripts is test_117() { local stripe_size (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs" $LCTL set_param llite.*.hybrid_io=0 rm -rf $DIR/$tdir mkdir -p $DIR/$tdir $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping" stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile) $LCTL mark "==== write" dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync sync $LCTL mark "=== read" for i in /sys/devices/virtual/bdi/lustre-*; do echo 2048 > $i/read_ahead_kb done dir=/sys/kernel/debug/tracing set -x pushd $dir sysctl kernel.ftrace_enabled=1 echo 0 > tracing_on echo 10000 > buffer_size_kb #echo 'nop' > current_tracer #echo '' >set_graph_function echo function_graph > current_tracer echo vfs_fadvise > set_graph_function echo > kprobe_events echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events # ax - page # r13 - nr_read echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events echo 1 > events/kprobes/kp1/enable echo 1 > events/kprobes/kp2/enable echo 1 > events/kprobes/kp3/enable echo 1 > events/kprobes/kp4/enable echo > trace echo 1 > tracing_on popd fadvise_dontneed_helper $DIR/$tdir/$tfile fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 )) pushd $dir echo 0 > events/kprobes/kp1/enable echo 0 > events/kprobes/kp2/enable echo 0 > events/kprobes/kp3/enable echo 0 > events/kprobes/kp4/enable echo 0 > tracing_on cat trace > /tmp/trace echo > trace echo > kprobe_events popd set +x } run_test 117 "RA should don't panic for multistripe" fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used. and it don't work. While inspecting a source this bug I started to trace client side and found fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache. ftrace log don't have a readpage calls which expected as pages removed from page cache. I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags. Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls. |
New:
While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
test scripts is {noformat} test_117() { local stripe_size (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs" $LCTL set_param llite.*.hybrid_io=0 rm -rf $DIR/$tdir mkdir -p $DIR/$tdir $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping" stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile) $LCTL mark "==== write" dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync sync $LCTL mark "=== read" for i in /sys/devices/virtual/bdi/lustre-*; do echo 2048 > $i/read_ahead_kb done dir=/sys/kernel/debug/tracing set -x pushd $dir sysctl kernel.ftrace_enabled=1 echo 0 > tracing_on echo 10000 > buffer_size_kb #echo 'nop' > current_tracer #echo '' >set_graph_function echo function_graph > current_tracer echo vfs_fadvise > set_graph_function echo > kprobe_events echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events # ax - page # r13 - nr_read echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events echo 1 > events/kprobes/kp1/enable echo 1 > events/kprobes/kp2/enable echo 1 > events/kprobes/kp3/enable echo 1 > events/kprobes/kp4/enable echo > trace echo 1 > tracing_on popd fadvise_dontneed_helper $DIR/$tdir/$tfile fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 )) pushd $dir echo 0 > events/kprobes/kp1/enable echo 0 > events/kprobes/kp2/enable echo 0 > events/kprobes/kp3/enable echo 0 > events/kprobes/kp4/enable echo 0 > tracing_on cat trace > /tmp/trace echo > trace echo > kprobe_events popd set +x } run_test 117 "RA should don't panic for multistripe" {noformat} fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used. and it don't work. While inspecting a source this bug I started to trace client side and found fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache. ftrace log don't have a readpage calls which expected as pages removed from page cache. I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags. Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls. |
Description |
Original:
While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
test scripts is test_117() { local stripe_size (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs" $LCTL set_param llite.*.hybrid_io=0 rm -rf $DIR/$tdir mkdir -p $DIR/$tdir $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping" stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile) $LCTL mark "==== write" dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync sync $LCTL mark "=== read" for i in /sys/devices/virtual/bdi/lustre-*; do echo 2048 > $i/read_ahead_kb done dir=/sys/kernel/debug/tracing set -x pushd $dir sysctl kernel.ftrace_enabled=1 echo 0 > tracing_on echo 10000 > buffer_size_kb #echo 'nop' > current_tracer #echo '' >set_graph_function echo function_graph > current_tracer echo vfs_fadvise > set_graph_function echo > kprobe_events echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events # ax - page # r13 - nr_read echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events echo 1 > events/kprobes/kp1/enable echo 1 > events/kprobes/kp2/enable echo 1 > events/kprobes/kp3/enable echo 1 > events/kprobes/kp4/enable echo > trace echo 1 > tracing_on popd fadvise_dontneed_helper $DIR/$tdir/$tfile fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 )) pushd $dir echo 0 > events/kprobes/kp1/enable echo 0 > events/kprobes/kp2/enable echo 0 > events/kprobes/kp3/enable echo 0 > events/kprobes/kp4/enable echo 0 > tracing_on cat trace > /tmp/trace echo > trace echo > kprobe_events popd set +x } run_test 117 "RA should don't panic for multistripe" fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used. and it don't work. While inspecting a source this bug I started to trace client side and found fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache. lustre log say strange lines {noformat} 00000020:00008000:4.0:1723644734.683217:0:50699:0:(cl_page.c:338:cl_page_find()) 253@[0x200000401:0x51:0x0] fffff72885c473c0 0 1 00000008:00000040:4.0:1723644734.683226:0:50699:0:(osc_cache.c:2230:osc_prep_async_page()) oap ffff9ae3d8bc9588 vmpage fffff72885c473c0 obj off 1036288 00000008:00000020:4.0:1723644734.692718:0:50699:0:(osc_cache.c:2308:osc_queue_async_io()) obj ffff9ae4290a3178 ready 0|-|- wr 0|-|- rd 0|- oap ffff9ae3d8bc9588 page fffff72885c473c0 added for cmd 2 00000008:00000020:4.0:1723644734.692722:0:50699:0:(osc_cache.c:1373:osc_consume_write_grant()) using 4096 grant credits for brw ffff9ae3d8bc95c0 page fffff72885c473c0 00000008:00000040:4.0:1723644734.704038:0:50699:0:(osc_cache.c:2230:osc_prep_async_page()) oap ffff9ae40f7e2b68 vmpage fffff72885ebd5c0 obj off 1036288 00000008:00000020:4.0:1723644734.713519:0:50699:0:(osc_cache.c:2308:osc_queue_async_io()) obj ffff9ae4290a3428 ready 0|-|- wr 0|-|- rd 0|- oap ffff9ae40f7e2b68 page fffff72885ebd5c0 added for cmd 2 00000008:00008000:5.0:1723644734.829100:0:2858:0:(osc_request.c:1336:osc_checksum_bulk()) page fffff72885c473c0 map ffff9ae4207cee08 index 253 flags 17ffffc0005028 count 3 priv ffff9ae3d8bc9518: off 0 00000008:00000040:2.0:1723644735.993418:0:50793:0:(osc_cache.c:2430:osc_teardown_async_page()) teardown oap ffff9ae3d8bc9588 page ffff9ae3d8bc9578 at index 253. {noformat} and ftrace log don't have a readpage calls which expected as pages removed from page cache. I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags. Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls. |
New:
While creating a test case for RA bug I started to write a test script and found anomaly in the client side.
test scripts is test_117() { local stripe_size (( $OSTCOUNT >= 2 )) || skip "needs >= 2 OSTs" $LCTL set_param llite.*.hybrid_io=0 rm -rf $DIR/$tdir mkdir -p $DIR/$tdir $LFS setstripe -c 2 -S 1M $DIR/$tdir/$tfile || error "can't set striping" stripe_size=$($LFS getstripe -S $DIR/$tdir/$tfile) $LCTL mark "==== write" dd if=/dev/zero of=$DIR/$tdir/$tfile bs=$stripe_size count=10 oflag=sync sync $LCTL mark "=== read" for i in /sys/devices/virtual/bdi/lustre-*; do echo 2048 > $i/read_ahead_kb done dir=/sys/kernel/debug/tracing set -x pushd $dir sysctl kernel.ftrace_enabled=1 echo 0 > tracing_on echo 10000 > buffer_size_kb #echo 'nop' > current_tracer #echo '' >set_graph_function echo function_graph > current_tracer echo vfs_fadvise > set_graph_function echo > kprobe_events echo 'p:kp1 __do_page_cache_readahead %di +80(+0(%di)):u64 %dx %cx' > kprobe_events echo 'r:kp2 __do_page_cache_readahead $retval' >> kprobe_events echo 'p:kp3 __do_page_cache_readahead+100 %ax %cx %dx %di' >> kprobe_events # ax - page # r13 - nr_read echo 'p:kp4 __do_page_cache_readahead+309 %ax %r13' >> kprobe_events echo 1 > events/kprobes/kp1/enable echo 1 > events/kprobes/kp2/enable echo 1 > events/kprobes/kp3/enable echo 1 > events/kprobes/kp4/enable echo > trace echo 1 > tracing_on popd fadvise_dontneed_helper $DIR/$tdir/$tfile fadvise_willneed_helper $DIR/$tdir/$tfile 0 $((stripe_size * 4 )) pushd $dir echo 0 > events/kprobes/kp1/enable echo 0 > events/kprobes/kp2/enable echo 0 > events/kprobes/kp3/enable echo 0 > events/kprobes/kp4/enable echo 0 > tracing_on cat trace > /tmp/trace echo > trace echo > kprobe_events popd set +x } run_test 117 "RA should don't panic for multistripe" fadvise_willneed_helper - just issue an advice(WILLNEED) or same readahead(2) may used. and it don't work. While inspecting a source this bug I started to trace client side and found fadvice found a page in the page cache, while dontneed_helper invalidate a pages in cache. ftrace log don't have a readpage calls which expected as pages removed from page cache. I tries with sysctl -w vm.drop_caches=1 and it have same result - page live in page cache and have uptodate flags. Switching to the cray 2.15 code I don't see this bug and ftrace log have a records about readpage calls. |