[LU-1263] 2.1.1<->2.2 OST0000 cannot be mounted after sanity test_27z failed Created: 27/Mar/12  Updated: 07/May/13  Resolved: 07/May/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: 2.1.1-RHEL6
client: 2.2-RC2-RHEL6


Attachments: File 1263.tar.gz    
Issue Links:
Related
is related to LU-611 Add lfs find --stripe_count --stripe_... Resolved
Severity: 3
Rank (Obsolete): 4262

 Description   

sanity subtest 51b hang after subtest 27z failed. The OST0000 cannot be mounted.

OST dmesg:
--------------------
[root@fat-intel-4 ~]# LustreError: 4319:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (19) req@ffff88032a885000 x1397613683283791/t0(0) o-1><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1332872055 ref 1 fl Interpret:/ffffffff/ffffffff rc -19/-1
LustreError: 4319:0:(ldlm_lib.c:2129:target_send_reply_msg()) Skipped 357 previous similar messages
LustreError: 137-5: UUID 'lustre-OST0000_UUID' is not available for connect (no target)
LustreError: Skipped 357 previous similar messages

[root@fat-intel-4 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/sdb2 on /mnt/ost2 type lustre (rw)
/dev/sdb3 on /mnt/ost3 type lustre (rw)
/dev/sdc1 on /mnt/ost4 type lustre (rw)
/dev/sdc2 on /mnt/ost5 type lustre (rw)
/dev/sdc3 on /mnt/ost6 type lustre (rw)
/dev/sdb1 on /mnt/ost1 type ldiskfs (rw)

client console:
---------------------
Lustre: DEBUG MARKER: == sanity test 27z: check SEQ/OID on the MDT and OST filesystems == 10:36:11 (1332869771)
Lustre: DEBUG MARKER: sanity test_27z: @@@@@@ FAIL: parent SEQ mismatch
LustreError: 11-0: an error occurred while communicating with 10.10.4.131@tcp. The obd_ping operation failed with -107
Lustre: lustre-OST0000-osc-ffff880333317c00: Connection to service lustre-OST0000 via nid 10.10.4.131@tcp was lost; in progress operations using this service will wait for recovery to complete.
Lustre: DEBUG MARKER: == sanity test 51b: mkdir .../t-0 — .../t-70 ====================== 10:40:18 (1332870018)
LustreError: 11-0: an error occurred while communicating with 10.10.4.131@tcp. The ost_connect operation failed with -19
LustreError: Skipped 24 previous similar messages
Lustre: 2597:0:(import.c:524:import_select_connection()) lustre-OST0000-osc-ffff880333317c00: tried all connections, increasing latency to 21s
Lustre: 2597:0:(import.c:524:import_select_connection()) Skipped 24 previous similar messages

please find debug log in the attached.



 Comments   
Comment by Sarah Liu [ 27/Mar/12 ]

dmesg and debug log of OST

Comment by Andreas Dilger [ 28/Mar/12 ]

This is already fixed in my cleanup of sanity test_27z in http://review.whamcloud.com/2022 "LU-611 tests: clean up code style in tests/lfs" (which also depends on http://review.whamcloud.com/1264).

Comment by Oleg Drokin [ 28/Mar/12 ]

This is a test script issue.
Test 27z should remount ost back even if there was a test failure.

Comment by Andreas Dilger [ 07/May/13 ]

Marking fixed, since test_27z patch fix has been landed.

Generated at Sat Feb 10 01:15:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.