[LU-15375] Interop: replay-dual test_0b: test_0b failed with 2 Created: 15/Dec/21  Updated: 14/Jun/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15741 replay-dual test_0b: mgc_request.c:25... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ca1fedd1-9803-4773-97f1-2730ab8c150d

test_0b failed with the following error:

test_0b failed with 2

MDS dmesg shows

[12823.044045] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 0b: lost client during waiting for next transno ================================== 22:07:39 \(1638569259\)
[12823.231681] Lustre: DEBUG MARKER: == replay-dual test 0b: lost client during waiting for next transno ================================== 22:07:39 (1638569259)
[12823.400110] Lustre: DEBUG MARKER: sync; sync; sync
[12824.672935] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
[12825.006067] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
[12825.357337] LustreError: 20408:0:(osd_handler.c:2557:osd_ro()) lustre-MDT0000: ffff92f287824e00 CANNOT BE SET READONLY: rc = -95
[12825.533941] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
[12825.724412] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[12826.017335] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[12826.365358] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
[12826.534971] Lustre: Failing over lustre-MDT0000

OSS dmesg shows

[12823.152365] Lustre: DEBUG MARKER: == replay-dual test 0b: lost client during waiting for next transno ================================== 22:07:39 (1638569259)
[12829.782132] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.240.26.233@tcp) was lost; in progress operations using this service will wait for recovery to complete
[12829.784740] Lustre: Skipped 14 previous similar messages
[12836.789813] Lustre: 14035:0:(client.c:2169:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1638569266/real 1638569266]  req@ffff9d0b37e96880 x1718151062158016/t0(0) o400->MGC10.240.26.233@tcp@10.240.26.233@tcp:26/25 lens 224/224 e 0 to 1 dl 1638569273 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[12836.794244] Lustre: 14035:0:(client.c:2169:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[12836.795799] LustreError: 166-1: MGC10.240.26.233@tcp: Connection to MGS (at 10.240.26.233@tcp) was lost; in progress operations using this service will fail
[12836.797974] LustreError: Skipped 1 previous similar message
[12845.800498] LustreError: 167-0: lustre-MDT0000-lwp-OST0000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[12845.802734] LustreError: Skipped 6 previous similar messages
[12845.808951] LustreError: 25722:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-OST0004: failed to enqueue global quota lock, glb fid:[0x200000006:0x20000:0x0], rc:-5
[12845.811246] LustreError: 25722:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 11 previous similar messages
[12847.047082] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null ||
				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
[12847.599689] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-82vm18.onyx.whamcloud.com: executing set_default_debug -1 all 4
[12847.801812] Lustre: DEBUG MARKER: onyx-82vm18.onyx.whamcloud.com: executing set_default_debug -1 all 4
[12853.810047] Lustre: Evicted from MGS (at 10.240.26.233@tcp) after server handle changed from 0x7fcc42c82f796875 to 0x7fcc42c82f797962
[12860.174099] Lustre: lustre-OST0005: haven't heard from client 416316de-b11b-fbdd-00bb-1133143845ca (at 10.240.26.223@tcp) in 52 seconds. I think it's dead, and I am evicting it. exp ffff9d0b08d57400, cur 1638569296 expire 1638569266 last 1638569244
[12860.177549] Lustre: Skipped 6 previous similar messages
[12900.238353] Lustre: lustre-OST0006: haven't heard from client 5f002890-e8e5-654d-f766-4b64a2d37032 (at 10.240.26.223@tcp) in 52 seconds. I think it's dead, and I am evicting it. exp ffff9d0b3aa25000, cur 1638569336 expire 1638569306 last 1638569284
[12900.241825] Lustre: Skipped 6 previous similar messages
[12916.247579] Lustre: lustre-OST0001: deleting orphan objects from 0x0:557773 to 0x0:566467
[12916.247580] Lustre: lustre-OST0000: deleting orphan objects from 0x0:123667 to 0x0:123696
[12916.251441] Lustre: lustre-OST0006: deleting orphan objects from 0x0:2820 to 0x0:3553
[12916.257434] Lustre: lustre-OST0002: deleting orphan objects from 0x0:17727 to 0x0:17985
[12916.263794] Lustre: lustre-OST0003: deleting orphan objects from 0x0:3011 to 0x0:3585
[12916.268432] Lustre: lustre-OST0005: deleting orphan objects from 0x0:2820 to 0x0:3553
[12916.278392] Lustre: lustre-OST0004: deleting orphan objects from 0x0:2979 to 0x0:3553
[12919.582057] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_0b: @@@@@@ FAIL: test_0b failed with 2 
[12919.776761] Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: test_0b failed with 2
[12920.014451] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2021-12-03/lustre-b2_12_full-part-1_150_1_28_f8f96387-511f-41d4-9638-59cd95dbc82d//replay-dual.test_0b.debug_log.$(hostname -s).1638569356.log;
         dmesg > /autotest/autotest-1/2021-12-03/lustre-b2_12_full-par

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_0b - test_0b failed with 2



 Comments   
Comment by Andreas Dilger [ 17/Dec/21 ]

The "CANNOT BE SET READONLY" message is because we do not patch the kernel with "dev_rdonly" anymore, but rather use the dm-flakey device with newer Lustre releases. So this is an testing interop issue that will not be fixed.

Generated at Sat Feb 10 03:17:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.