Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
Lustre 2.16.0, Lustre 2.15.5, Lustre 2.15.6
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8d022068-68bb-4bbe-8686-e7928146c95e
test_2a failed with the following error:
Timeout occurred after 97 minutes, last suite running was lustre-rsync-test
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/100812 - 4.18.0-477.27.1.el8_8.x86_64 (x86_64/RHEL 8.8/2.15.59.54)
servers: https://build.whamcloud.com/job/lustre-reviews/100812 - 4.18.0-477.27.1.el8_lustre.x86_64 (ldiskfs/x86_64/RHEL 8.8/2.15.59.54)
Client logs
== lustre-rsync-test test 2a: Replicate files created by dbench. ===== 18:06:53 (1702404413) CMD: trevis-127vm1 /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n CMD: trevis-127vm1 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm mdd.lustre-MDT0000.changelog_mask=+hsm CMD: trevis-127vm1 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n CMD: trevis-127vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0001.changelog_mask -n CMD: trevis-127vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask=+hsm mdd.lustre-MDT0001.changelog_mask=+hsm CMD: trevis-127vm2 /usr/sbin/lctl --device lustre-MDT0001 changelog_register -n CMD: trevis-127vm1 /usr/sbin/lctl get_param mdd.lustre-MDT0002.changelog_mask -n CMD: trevis-127vm1 /usr/sbin/lctl set_param mdd.lustre-MDT0002.changelog_mask=+hsm mdd.lustre-MDT0002.changelog_mask=+hsm CMD: trevis-127vm1 /usr/sbin/lctl --device lustre-MDT0002 changelog_register -n CMD: trevis-127vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0003.changelog_mask -n CMD: trevis-127vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask=+hsm mdd.lustre-MDT0003.changelog_mask=+hsm CMD: trevis-127vm2 /usr/sbin/lctl --device lustre-MDT0003 changelog_register -n Registered 4 changelog users: 'cl3 cl3 cl3 cl3' lustre-MDT0000: Registered changelog user cl3 looking for dbench program /usr/bin/dbench found dbench client file /usr/share/dbench/client.txt '/usr/share/dbench/client.txt' -> 'client.txt' copying required dbench files to chroot directory /mnt/lustre/d2a.lustre-rsync-test '/usr/bin/dbench' -> 'dbench' lib64/libpopt.so.0 lib64/libc.so.6 lib64/ld-linux-x86-64.so.2 running 'dbench 2 -t 60' in /mnt/lustre/d2a.lustre-rsync-test at Tue Dec 12 18:06:58 UTC 2023 waiting for dbench pid 27630 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 60 seconds with load 'client.txt' and minimum warmup 12 secs failed to create barrier semaphore 0 of 2 processes prepared for launch 0 sec 2 of 2 processes prepared for launch 0 sec releasing clients 2 670 42.34 MB/sec warmup 1 sec latency 243.676 ms 2 761 21.39 MB/sec warmup 2 sec latency 221.017 ms 2 946 14.56 MB/sec warmup 3 sec latency 191.389 ms : 2 34715 5.27 MB/sec execute 55 sec latency 241.709 ms 2 35719 5.47 MB/sec execute 56 sec latency 119.864 ms 2 36221 5.41 MB/sec execute 57 sec latency 173.668 ms 2 36352 5.33 MB/sec execute 58 sec latency 132.522 ms 2 36769 5.33 MB/sec execute 59 sec latency 160.981 ms 2 cleanup 60 sec 0 cleanup 60 sec Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 10567 2.825 123.841 Close 7710 0.642 24.838 Rename 440 3.282 41.200 Unlink 2156 0.976 22.880 Qpathinfo 9492 0.636 38.188 Qfileinfo 1629 0.076 3.208 Qfsinfo 1719 0.018 1.539 Sfileinfo 868 2.031 39.754 Find 3625 0.296 41.384 WriteX 5127 0.449 36.828 ReadX 16023 0.009 0.055 LockX 32 0.440 0.998 UnlockX 32 0.504 1.832 Flush 745 93.845 969.365 Throughput 5.32869 MB/sec 2 clients 2 procs max_latency=969.378 ms stopping dbench in /mnt/lustre/d2a.lustre-rsync-test at Tue Dec 12 18:08:10 UTC 2023 with return code 0 clean dbench files in /mnt/lustre/d2a.lustre-rsync-test /mnt/lustre/d2a.lustre-rsync-test /mnt/lustre/d2a.lustre-rsync-test removed directory 'clients/client0/~dmtmp/PARADOX' removed directory 'clients/client0/~dmtmp/PM' : removed directory 'clients/client1/~dmtmp/PWRPNT' removed directory 'clients/client1/~dmtmp' removed directory 'clients/client1' removed directory 'clients' removed 'client.txt' removed 'dbench' removed 'lib64/libpopt.so.0' removed 'lib64/libc.so.6' removed 'lib64/ld-linux-x86-64.so.2' /mnt/lustre/d2a.lustre-rsync-test dbench successfully finished
======= 18:06:53 \(1702404413\) [ 302.317896] Lustre: DEBUG MARKER: == lustre-rsync-test test 2a: Replicate files created by dbench. ==== 18:06:53 (1702404413) [ 305.333993] Lustre: lustre-OST0005-osc-ffff895045192000: disconnect after 24s idle [ 305.335802] Lustre: Skipped 1 previous similar message [ 407.730814] Lustre: lustre-OST0000-osc-ffff895045192000: disconnect after 24s idle [ 412.850667] Lustre: lustre-OST0006-osc-ffff895045192000: disconnect after 23s idle [ 412.852206] Lustre: Skipped 6 previous similar messages [ 5749.820145] sysrq: SysRq : Changing Loglevel [ 5749.821211] sysrq: Loglevel set to 8 [ 5749.823561] sysrq: SysRq : Show State : [ 5752.514580] task:lustre_rsync state:D stack: 0 pid:27687 ppid: 27182 flags:0x00004080 [ 5752.516178] Call Trace: [ 5752.516719] __schedule+0x2d1/0x870 [ 5752.518361] schedule+0x55/0xf0 [ 5752.519019] jbd2_log_wait_commit+0xac/0x120 [jbd2] [ 5752.520751] ext4_sync_file+0x365/0x3b0 [ext4] [ 5752.521695] ext4_file_write_iter+0x1a6/0x3e0 [ext4] [ 5752.522710] new_sync_write+0x112/0x160 [ 5752.523537] vfs_write+0xa5/0x1b0 [ 5752.524238] ksys_write+0x4f/0xb0 [ 5752.524948] do_syscall_64+0x5b/0x1b0
The jbd2 thread is stuck on IO submission to the VM storage (vda):
[ 5750.277661] task:jbd2/vda1-8 state:D stack: 0 pid: 368 ppid: 2 flags:0x80004000 [ 5750.279261] Call Trace: [ 5750.279801] __schedule+0x2d1/0x870 [ 5750.281340] schedule+0x55/0xf0 [ 5750.282004] io_schedule+0x12/0x40 [ 5750.282714] bit_wait_io+0xd/0x50 [ 5750.283418] __wait_on_bit+0x2d/0x90 [ 5750.284155] out_of_line_wait_on_bit+0x91/0xb0 [ 5750.285919] jbd2_journal_commit_transaction+0x14db/0x19f0 [jbd2] [ 5750.288069] kjournald2+0xbd/0x270 [jbd2] [ 5750.290560] kthread+0x134/0x150
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
lustre-rsync-test test_2a - Timeout occurred after 97 minutes, last suite running was lustre-rsync-test
Attachments
Issue Links
- is duplicated by
-
LU-16635 lustre-rsync-test test_2a: timeout
-
- Resolved
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...