[LU-15158] replay-single test_70c: tar failed Created: 25/Oct/21  Updated: 03/Aug/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9f575d94-3cb6-4561-af5b-b7c7d3770b7f

test_70c failed with the following error:

1: tar failed

We can see the following errors on the client side:

[ 5499.314171] Lustre: DEBUG MARKER: test_70c fail mds1 1 times
[ 5520.149364] LustreError: 7755:0:(import.c:1304:ptlrpc_connect_interpret()) lustre-MDT0000_UUID: went back in time (transno 322122548361 was previously committed, server now claims 317827580285)!
[ 5520.181665] LustreError: 7755:0:(client.c:3178:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@00000000001755b3 x1714357638644864/t322122547216(322122547216) o101->lustre-MDT0000-mdc-ffff8bb6d17d5000@10.9.6.60@tcp:12/10 lens 640/600 e 0 to 0 dl 1634944195 ref 2 fl Interpret:RPQU/4/0 rc 301/301 job:'dbench.0'
[ 5520.186656] LustreError: 7755:0:(client.c:3178:ptlrpc_replay_interpret()) Skipped 4 previous similar messages
[ 5520.194360] Lustre: 7755:0:(client.c:3139:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@000000006fb48540 x1714357639358656/t322122548374(322122548374) o36->lustre-MDT0000-mdc-ffff8bb6d17d5000@10.9.6.60@tcp:12/10 lens 504/448 e 0 to 0 dl 1634944195 ref 2 fl Interpret:RQU/4/0 rc -75/-75 job:'tar.0'
[ 5524.617118] Lustre: 7755:0:(client.c:3139:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@000000004081dfea x1714357639374912/t322122548396(322122548396) o36->lustre-MDT0000-mdc-ffff8bb6d17d5000@10.9.6.60@tcp:12/10 lens 504/448 e 0 to 0 dl 1634944199 ref 2 fl Interpret:RQU/4/0 rc -75/-75 job:'tar.0'
[ 5525.923431] Lustre: 7755:0:(client.c:3139:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@00000000a2efc507 x1714357639379200/t322122548398(322122548398) o36->lustre-MDT0000-mdc-ffff8bb6d17d5000@10.9.6.60@tcp:12/10 lens 488/456 e 0 to 0 dl 1634944200 ref 2 fl Interpret:RQU/4/0 rc -75/-75 job:'tar.0'

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-single test_70c - 1: tar failed



 Comments   
Comment by Etienne Aujames [ 27/Oct/21 ]

Same type of errors observed in "replay-single test_70e":
https://testing.whamcloud.com/test_sets/5f344854-b64e-4dd3-9ee1-32cec29275ce

[ 6549.222847] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70e fail mds1 1 times
[ 6549.682626] Lustre: DEBUG MARKER: test_70e fail mds1 1 times
[ 6551.403860] LustreError: 11-0: lustre-MDT0000-mdc-ffff99e8483f0000: operation ldlm_enqueue to node 10.9.5.56@tcp failed: rc = -19
[ 6561.983671] Lustre: 7755:0:(client.c:2288:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1635271353/real 1635271353]  req@0000000031681ce1 x1714699623217408/t0(0) o400->MGC10.9.5.56@tcp@10.9.5.56@tcp:26/25 lens 224/224 e 0 to 1 dl 1635271360 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:1.0'
[ 6561.989058] LustreError: 166-1: MGC10.9.5.56@tcp: Connection to MGS (at 10.9.5.56@tcp) was lost; in progress operations using this service will fail
[ 6568.130405] Lustre: Evicted from MGS (at 10.9.5.56@tcp) after server handle changed from 0x60e222e5b10fd71e to 0x60e222e5b138fee9
[ 6568.133322] LustreError: 7753:0:(import.c:1304:ptlrpc_connect_interpret()) lustre-MDT0000_UUID: went back in time (transno 322122595906 was previously committed, server now claims 317827580261)!
[ 6573.211081] LustreError: 11-0: lustre-MDT0000-mdc-ffff99e8483f0000: operation ldlm_enqueue to node 10.9.5.56@tcp failed: rc = -75
[ 6573.213364] Lustre: 7753:0:(client.c:3139:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@000000001186a09a x1714699622111936/t322122595907(322122595907) o101->lustre-MDT0000-mdc-ffff99e8483f0000@10.9.5.56@tcp:12/10 lens 576/600 e 0 to 0 dl 1635271378 ref 2 fl Interpret:RQU/4/0 rc -75/-75 job:'touch.0'
[ 6573.218357] LustreError: 7753:0:(mdc_request.c:673:mdc_replay_open()) @@@ Open request replay failed with -75   req@000000001186a09a x1714699622111936/t322122595907(322122595907) o101->lustre-MDT0000-mdc-ffff99e8483f0000@10.9.5.56@tcp:12/10 lens 576/600 e 0 to 0 dl 1635271378 ref 2 fl Interpret:RQU/4/0 rc -75/0 job:'touch.0'
[ 6573.980465] Lustre: 7753:0:(client.c:3139:ptlrpc_replay_interpret()) @@@ Version mismatch during replay  req@0000000030684a9e x1714699622188352/t322122596306(322122596306) o101->lustre-MDT0000-mdc-ffff99e8483f0000@10.9.5.56@tcp:12/10 lens 576/600 e 0 to 0 dl 1635271379 ref 2 fl Interpret:RQU/4/0 rc -75/-75 job:'touch.0'
[ 6573.988775] Lustre: 7753:0:(client.c:3139:ptlrpc_replay_interpret()) Skipped 236 previous similar messages
[ 6573.991428] LustreError: 7753:0:(mdc_request.c:673:mdc_replay_open()) @@@ Open request replay failed with -75   req@0000000030684a9e x1714699622188352/t322122596306(322122596306) o101->lustre-MDT0000-mdc-ffff99e8483f0000@10.9.5.56@tcp:12/10 lens 576/600 e 0 to 0 dl 1635271379 ref 2 fl Interpret:RQU/4/0 rc -75/0 job:'touch.0'
[ 6573.997944] LustreError: 7753:0:(mdc_request.c:673:mdc_replay_open()) Skipped 78 previous similar messages
[ 6586.046739] Lustre: 7753:0:(import.c:1453:completed_replay_interpret()) lustre-MDT0000-mdc-ffff99e8483f0000: version recovery fails, reconnecting
Generated at Sat Feb 10 03:15:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.