[LU-17147] sanity-lsnapshot test_1b: Fail to create lss_1b_0 Created: 26/Sep/23 Updated: 18/Oct/23 Resolved: 18/Oct/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Alexander Boyko <c17825@cray.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6df69fab-a0cd-49b7-a6dd-426a9c267d8a test_1b failed with the following error: Create lss_1b_0 CMD: onyx-60vm8 /usr/sbin/lctl snapshot_create -F lustre -n lss_1b_0 onyx-60vm8: Fail to freeze barrier for lustre: Invalid argument onyx-60vm8: Can't create the snapshot lss_1b_0 pdsh@onyx-60vm1: onyx-60vm8: ssh exited with exit code 22 CMD: onyx-60vm8 cat /var/log/lsnapshot.log Mon Sep 25 18:02:19 2023 (784346:jt_snapshot_create:1532:lustre:ssh): Create snapshot lss_0_0 successfully with comment <(null)>, barrier , timeout <30> Mon Sep 25 18:02:21 2023 (784875:jt_snapshot_create:1532:lustre:ssh): Create snapshot lss_0_1 successfully with comment <(null)>, barrier , timeout <-1> Mon Sep 25 18:02:23 2023 (785404:jt_snapshot_create:1532:lustre:ssh): Create snapshot lss_0_2 successfully with comment , barrier , timeout <30> Mon Sep 25 18:02:25 2023 (785933:jt_snapshot_create:1532:lustre:ssh): Create snapshot lss_0_3 successfully with comment , barrier , timeout <30> Mon Sep 25 18:02:26 2023 (786462:jt_snapshot_create:1525:lustre:ssh): Can't create snapshot lss_0_0 with comment <(null)> barrier , timeout <30>: -17 Mon Sep 25 18:02:32 2023 (787140:jt_snapshot_create:1532:lustre:ssh): Create snapshot lss_1a_0 successfully with comment <(null)>, barrier , timeout <30> Mon Sep 25 18:02:35 2023 (787677:jt_snapshot_mount:2520:lustre:ssh): The snapshot lss_1a_0 is mounted Mon Sep 25 18:02:38 2023 (788595:jt_snapshot_umount:2675:lustre:ssh): the snapshot lss_1a_0 have been umounted Mon Sep 25 18:02:44 2023 (789655:snapshot_create:1419:lustre:ssh): Can't set barrier within 30 seconds on lustre: rc = -22 Mon Sep 25 18:02:44 2023 (789655:jt_snapshot_create:1525:lustre:ssh): Can't create snapshot lss_1b_0 with comment <(null)> barrier , timeout <30>: -22 sanity-lsnapshot test_1b: @@@@@@ FAIL: (1) Fail to create lss_1b_0 [14771.317798] Lustre: DEBUG MARKER: == sanity-lsnapshot test 1b: mount snapshot without original filesystem mounted ========================================================== 18:02:43 (1695664963) [14771.487370] Lustre: DEBUG MARKER: /usr/sbin/lctl snapshot_create -F lustre -n lss_1b_0 [14772.106064] Lustre: DEBUG MARKER: cat /var/log/lsnapshot.log [14772.479236] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-lsnapshot test_1b: @@@@@@ FAIL: \(1\) Fail to create lss_1b_0 [14772.681271] Lustre: DEBUG MARKER: sanity-lsnapshot test_1b: @@@@@@ FAIL: (1) Fail to create lss_1b_0 Test session details: <<Please provide additional information about the failure here>> VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 28/Sep/23 ] |
|
This has only failed recently (and twice) with patch https://review.whamcloud.com/52498 " |
| Comment by Alexander Boyko [ 17/Oct/23 ] |
|
I analyzed this issue. And It is really side effect of patch mgc: reconnection without pinger. client 00010000:00010000:4.0:1693550583.414234:0:2655926:0:(ldlm_lockd.c:2456:ldlm_callback_handler()) callback on lock 0x28f467194556273 - lock disappeared 00010000:00010000:4.0:1693550583.416981:0:2655926:0:(ldlm_lockd.c:2361:ldlm_callback_errmsg()) @@@ Operate with invalid parameter, NID=12345-0@lo lock=0x28f46719455636f: rc = 0 req@0000000079795088 x1775816258196096/t0(0) o106->b6ac8162-74c3-4f9e-abd8-ddbd41354f83@0@lo:279/0 lens 400/224 e 0 to 0 dl 1693550594 ref 1 fl Interpret:/0/0 rc -22/-22 job:'' uid:4294967295 gid:4294967295 server 20000000:00000001:5.0:1693550583.411102:0:2670065:0:(mgs_barrier.c:323:mgs_barrier_freeze()) Process entered 20000000:00000001:5.0:1693550583.411149:0:2670065:0:(mgs_barrier.c:128:mgs_barrier_glimpse_lock()) Process entered 20000000:00010000:5.0:1693550583.411348:0:2670065:0:(mgs_barrier.c:194:mgs_barrier_glimpse_lock()) ### glimpse 20000000:00000001:5.0:1693550583.418575:0:2670065:0:(mgs_barrier.c:62:mgs_barrier_gl_interpret_reply()) Process entered 20000000:00000001:5.0:1693550583.418607:0:2670065:0:(mgs_barrier.c:71:mgs_barrier_gl_interpret_reply()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea) 20000000:00000001:5.0:1693550583.419278:0:2670065:0:(mgs_barrier.c:62:mgs_barrier_gl_interpret_reply()) Process entered 20000000:00000001:5.0:1693550583.419291:0:2670065:0:(mgs_barrier.c:71:mgs_barrier_gl_interpret_reply()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea) 20000000:00000001:4.0:1693550583.423784:0:2670065:0:(mgs_barrier.c:62:mgs_barrier_gl_interpret_reply()) Process entered 20000000:00000001:4.0:1693550583.423808:0:2670065:0:(mgs_barrier.c:88:mgs_barrier_gl_interpret_reply()) Process leaving via out (rc=0 : 0 : 0x0) 20000000:00000001:4.0:1693550583.465200:0:2670065:0:(mgs_barrier.c:62:mgs_barrier_gl_interpret_reply()) Process entered 20000000:00000001:4.0:1693550583.465212:0:2670065:0:(mgs_barrier.c:88:mgs_barrier_gl_interpret_reply()) Process leaving via out (rc=0 : 0 : 0x0) 20000000:00000001:4.0:1693550583.466645:0:2670065:0:(mgs_barrier.c:206:mgs_barrier_glimpse_lock()) Process leaving via out (rc=0 : 0 : 0x0) 20000000:00000001:4.0:1693550583.466677:0:2670065:0:(mgs_barrier.c:456:mgs_barrier_freeze()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea) I'll fix it at main patch |