[LU-17361] lustre-rsync-test test_2a: Timeout occurred after 97 minutes, last suite running was lustre-rsync-test - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.17.0
Affects Version/s: Lustre 2.16.0, Lustre 2.15.5, Lustre 2.15.6
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8d022068-68bb-4bbe-8686-e7928146c95e

test_2a failed with the following error:

Timeout occurred after 97 minutes, last suite running was lustre-rsync-test

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/100812 - 4.18.0-477.27.1.el8_8.x86_64 (x86_64/RHEL 8.8/2.15.59.54)
servers: https://build.whamcloud.com/job/lustre-reviews/100812 - 4.18.0-477.27.1.el8_lustre.x86_64 (ldiskfs/x86_64/RHEL 8.8/2.15.59.54)

Client logs

== lustre-rsync-test test 2a: Replicate files created by dbench. ===== 18:06:53 (1702404413)
CMD: trevis-127vm1 /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n
CMD: trevis-127vm1 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm
mdd.lustre-MDT0000.changelog_mask=+hsm
CMD: trevis-127vm1 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
CMD: trevis-127vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0001.changelog_mask -n
CMD: trevis-127vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask=+hsm
mdd.lustre-MDT0001.changelog_mask=+hsm
CMD: trevis-127vm2 /usr/sbin/lctl --device lustre-MDT0001 changelog_register -n
CMD: trevis-127vm1 /usr/sbin/lctl get_param mdd.lustre-MDT0002.changelog_mask -n
CMD: trevis-127vm1 /usr/sbin/lctl set_param mdd.lustre-MDT0002.changelog_mask=+hsm
mdd.lustre-MDT0002.changelog_mask=+hsm
CMD: trevis-127vm1 /usr/sbin/lctl --device lustre-MDT0002 changelog_register -n
CMD: trevis-127vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0003.changelog_mask -n
CMD: trevis-127vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask=+hsm
mdd.lustre-MDT0003.changelog_mask=+hsm
CMD: trevis-127vm2 /usr/sbin/lctl --device lustre-MDT0003 changelog_register -n
Registered 4 changelog users: 'cl3 cl3 cl3 cl3'
lustre-MDT0000: Registered changelog user cl3
looking for dbench program
/usr/bin/dbench
found dbench client file /usr/share/dbench/client.txt
'/usr/share/dbench/client.txt' -> 'client.txt'
copying required dbench files to chroot directory /mnt/lustre/d2a.lustre-rsync-test
'/usr/bin/dbench' -> 'dbench'
lib64/libpopt.so.0
lib64/libc.so.6
lib64/ld-linux-x86-64.so.2
running 'dbench 2 -t 60' in /mnt/lustre/d2a.lustre-rsync-test at Tue Dec 12 18:06:58 UTC 2023
waiting for dbench pid 27630
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004
Running for 60 seconds with load 'client.txt' and minimum warmup 12 secs
failed to create barrier semaphore 
0 of 2 processes prepared for launch   0 sec
2 of 2 processes prepared for launch   0 sec
releasing clients
   2       670    42.34 MB/sec  warmup   1 sec  latency 243.676 ms
   2       761    21.39 MB/sec  warmup   2 sec  latency 221.017 ms
   2       946    14.56 MB/sec  warmup   3 sec  latency 191.389 ms
:
   2     34715     5.27 MB/sec  execute  55 sec  latency 241.709 ms
   2     35719     5.47 MB/sec  execute  56 sec  latency 119.864 ms
   2     36221     5.41 MB/sec  execute  57 sec  latency 173.668 ms
   2     36352     5.33 MB/sec  execute  58 sec  latency 132.522 ms
   2     36769     5.33 MB/sec  execute  59 sec  latency 160.981 ms
   2  cleanup  60 sec
   0  cleanup  60 sec
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      10567     2.825   123.841
 Close           7710     0.642    24.838
 Rename           440     3.282    41.200
 Unlink          2156     0.976    22.880
 Qpathinfo       9492     0.636    38.188
 Qfileinfo       1629     0.076     3.208
 Qfsinfo         1719     0.018     1.539
 Sfileinfo        868     2.031    39.754
 Find            3625     0.296    41.384
 WriteX          5127     0.449    36.828
 ReadX          16023     0.009     0.055
 LockX             32     0.440     0.998
 UnlockX           32     0.504     1.832
 Flush            745    93.845   969.365
Throughput 5.32869 MB/sec  2 clients  2 procs  max_latency=969.378 ms
stopping dbench in /mnt/lustre/d2a.lustre-rsync-test at Tue Dec 12 18:08:10 UTC 2023 with return code 0
clean dbench files in /mnt/lustre/d2a.lustre-rsync-test
/mnt/lustre/d2a.lustre-rsync-test /mnt/lustre/d2a.lustre-rsync-test
removed directory 'clients/client0/~dmtmp/PARADOX'
removed directory 'clients/client0/~dmtmp/PM'
:
removed directory 'clients/client1/~dmtmp/PWRPNT'
removed directory 'clients/client1/~dmtmp'
removed directory 'clients/client1'
removed directory 'clients'
removed 'client.txt'
removed 'dbench'
removed 'lib64/libpopt.so.0'
removed 'lib64/libc.so.6'
removed 'lib64/ld-linux-x86-64.so.2'
/mnt/lustre/d2a.lustre-rsync-test
dbench successfully finished

======= 18:06:53 \(1702404413\)
[  302.317896] Lustre: DEBUG MARKER: == lustre-rsync-test test 2a: Replicate files created by dbench. ==== 18:06:53 (1702404413)
[  305.333993] Lustre: lustre-OST0005-osc-ffff895045192000: disconnect after 24s idle
[  305.335802] Lustre: Skipped 1 previous similar message
[  407.730814] Lustre: lustre-OST0000-osc-ffff895045192000: disconnect after 24s idle
[  412.850667] Lustre: lustre-OST0006-osc-ffff895045192000: disconnect after 23s idle
[  412.852206] Lustre: Skipped 6 previous similar messages
[ 5749.820145] sysrq: SysRq : Changing Loglevel
[ 5749.821211] sysrq: Loglevel set to 8
[ 5749.823561] sysrq: SysRq : Show State
:
[ 5752.514580] task:lustre_rsync    state:D stack:    0 pid:27687 ppid: 27182 flags:0x00004080
[ 5752.516178] Call Trace:
[ 5752.516719]  __schedule+0x2d1/0x870
[ 5752.518361]  schedule+0x55/0xf0
[ 5752.519019]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
[ 5752.520751]  ext4_sync_file+0x365/0x3b0 [ext4]
[ 5752.521695]  ext4_file_write_iter+0x1a6/0x3e0 [ext4]
[ 5752.522710]  new_sync_write+0x112/0x160
[ 5752.523537]  vfs_write+0xa5/0x1b0
[ 5752.524238]  ksys_write+0x4f/0xb0
[ 5752.524948]  do_syscall_64+0x5b/0x1b0

The jbd2 thread is stuck on IO submission to the VM storage (vda):

[ 5750.277661] task:jbd2/vda1-8     state:D stack:    0 pid:  368 ppid:     2 flags:0x80004000
[ 5750.279261] Call Trace:
[ 5750.279801]  __schedule+0x2d1/0x870
[ 5750.281340]  schedule+0x55/0xf0
[ 5750.282004]  io_schedule+0x12/0x40
[ 5750.282714]  bit_wait_io+0xd/0x50
[ 5750.283418]  __wait_on_bit+0x2d/0x90
[ 5750.284155]  out_of_line_wait_on_bit+0x91/0xb0
[ 5750.285919]  jbd2_journal_commit_transaction+0x14db/0x19f0 [jbd2]
[ 5750.288069]  kjournald2+0xbd/0x270 [jbd2]
[ 5750.290560]  kthread+0x134/0x150

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
lustre-rsync-test test_2a - Timeout occurred after 97 minutes, last suite running was lustre-rsync-test

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lustre-rsync-test_2023-12-13_0408.zip
1.56 MB
13/Dec/23 5:33 AM

Issue Links

is duplicated by

LU-16635 lustre-rsync-test test_2a: timeout

Resolved

is related to

LU-4256 lustre-rsync-test test_2b: Failure in replication; differences found

Open

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(112 mentioned in)

Activity

People

Assignee:: Jian Yu

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 13/Dec/23 5:30 AM

Updated:: 30/May/25 9:33 PM

Resolved:: 07/May/25 9:48 PM