[LU-11513] sanityn: test_36 timeout Created: 12/Oct/18  Updated: 29/Sep/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10441 sanityn: test 36 timeout Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0b7da81a-c48d-11e8-b143-52540065bddc

== sanityn test 36: handle ESTALE/open-unlink correctly ============================================== 06:10:31 (1538287831)
striped dir -i0 -c2 /mnt/lustre/d36.sanityn
Waiting for orphan cleanup...
CMD: trevis-10vm9 /usr/sbin/lctl list_param osp.*osc*.old_sync_processed 2> /dev/null
osp.lustre-OST0000-osc-MDT0000.old_sync_processed
osp.lustre-OST0000-osc-MDT0002.old_sync_processed
osp.lustre-OST0001-osc-MDT0000.old_sync_processed
osp.lustre-OST0001-osc-MDT0002.old_sync_processed
osp.lustre-OST0002-osc-MDT0000.old_sync_processed
osp.lustre-OST0002-osc-MDT0002.old_sync_processed
osp.lustre-OST0003-osc-MDT0000.old_sync_processed
osp.lustre-OST0003-osc-MDT0002.old_sync_processed
osp.lustre-OST0004-osc-MDT0000.old_sync_processed
osp.lustre-OST0004-osc-MDT0002.old_sync_processed
osp.lustre-OST0005-osc-MDT0000.old_sync_processed
osp.lustre-OST0005-osc-MDT0002.old_sync_processed
osp.lustre-OST0006-osc-MDT0000.old_sync_processed
osp.lustre-OST0006-osc-MDT0002.old_sync_processed
osp.lustre-OST0007-osc-MDT0000.old_sync_processed
osp.lustre-OST0007-osc-MDT0002.old_sync_processed
wait 40 secs maximumly for trevis-10vm10,trevis-10vm9 mds-ost sync done.
CMD: trevis-10vm10,trevis-10vm9 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed
Waiting for local destroys to complete
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.950879 s, 55.1 MB/s
CMD: trevis-10vm10,trevis-10vm9 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-10vm8 lctl set_param -n osd*.*OS*.force_sync=1
multiop /mnt/lustre2/d36.sanityn/f36.sanityn vO_r52428800c
TMPPIPE=/tmp/multiop_open_wait_pipe.25051
CMD: trevis-10vm9 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1
CMD: trevis-10vm9 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
CMD: trevis-10vm9 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
CMD: trevis-10vm9 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
CMD: trevis-10vm9 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
CMD: trevis-10vm9 /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
sleep 5 for ZFS zfs
sleep 5 for ZFS zfs
Waiting for local destroys to complete
*** cycle(0) *** before(75990016) after_dd(75901952) after(75953152)
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.250934 s, 209 MB/s
CMD: trevis-10vm10,trevis-10vm9 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-10vm8 lctl set_param -n osd*.*OS*.force_sync=1
multiop /mnt/lustre2/d36.sanityn/f36.sanityn vO_r52428800c
TMPPIPE=/tmp/multiop_open_wait_pipe.25051


 Comments   
Comment by James Nunez (Inactive) [ 29/Sep/20 ]

We are seeing sanityn test_36 hang for ldiskfs and for Ubuntu 18.04 client testing.

One example is at https://testing.whamcloud.com/test_sessions/a1ae61fc-5b87-42d5-9394-f036edd66a8f

Generated at Sat Feb 10 02:44:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.