Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 1.8.6
-
None
-
One mgs/mds, two OSS, two clients, lustre-1.8.6.81.
Server is running rhel5.7, client is running sles11sp1.
ofed 1.5.4.1.
-
3
-
6108
Description
SANITY_QUOTA test_32 always failed. Test was started from service331 (a lustre client actually):
...
Formatting mgs, mds, osts
...
Setup mgs, mdt, osts
start mds /dev/sdb1 -o errors=panic,acl
Starting mds: -o errors=panic,acl /dev/sdb1 /mnt/mds
service360: Reading test skip list from /usr/lib64/lustre/tests/cfg/tests-to-skip.sh
service360: #!/bin/bash
service360: #SANITY_BIGFILE_EXCEPT="64b"
service360: #export SANITY_EXCEPT="$SANITY_BIGFILE_EXCEPT"
service360: MDSSIZE=2000000, OSTSIZE=2000000.
service360: ncli_nas.sh: Before init_clients_lists
service360: ncli_nas.sh: Done init_clients_lists
service360: lnet.debug=0x33f1504
service360: lnet.subsystem_debug=0xffb7e3ff
service360: lnet.debug_mb=16
Started lustre-MDT0000
start ost1 /dev/sdb1 -o errors=panic,mballoc,extents
Starting ost1: -o errors=panic,mballoc,extents /dev/sdb1 /mnt/ost1
service361: mount.lustre: mount /dev/sdb1 at /mnt/ost1 failed: Cannot send after transport endpoint shutdown
mount -t lustre /dev/sdb1 /mnt/ost1
Start of /dev/sdb1 on ost1 failed 108
...
It seems this test is the only one that set LOAD_MODULES_REMOTE=true before calling cleanup_and_setup_lustre and failed. Sometimes only OST1 had error 108 problem, sometimes both OST1 and OST2 were hit with this problem. I put "sleep 3" in setupall()
after mds started but before trying to start ost, but it did not help.
The 'demsg' from MDS (service360) showed:
Lustre: DEBUG MARKER: == test 32: check lqs hash(bug 21846) ========================================== == 11:05:01
Lustre: MDT lustre-MDT0000 has stopped.
LustreError: 28890:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 28890:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: MGS has stopped.
Lustre: server umount lustre-MDT0000 complete
Lustre: Removed LNI 10.151.26.38@o2ib
Lustre: OBD class driver, http://wiki.whamcloud.com/
Lustre: Lustre Version: 1.8.6.81
Lustre: Build Version: lustre/scripts-1.8.6
Lustre: Listener bound to ib1:10.151.26.38:987:mlx4_0
Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
Lustre: Added LNI 10.151.26.38@o2ib [8/64/0/180]
Lustre: Filtering OBD driver; http://wiki.whamcloud.com/
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
Lustre: MGS MGS started
Lustre: MGC10.151.26.38@o2ib: Reactivating import
Lustre: MGS: Logs for fs lustre were removed by user request. All servers must be restarted in order to regenerate the logs.
Lustre: Setting parameter lustre-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Enabling user_xattr
Lustre: Enabling ACL
Lustre: lustre-MDT0000: new disk, initializing
Lustre: lustre-MDT0000: Now serving lustre-MDT0000 on /dev/sdb1 with recovery enabled
Lustre: 30206:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) lustre-MDT0000: group upcall set to /usr/sbin/l_getgroups
Lustre: MGS: Regenerating lustre-OSTffff log by user request.
Lustre: lustre-MDT0000: temporarily refusing client connection from 10.151.25.182@o2ib
LustreError: 30130:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (11) req@ffff8103f64cc000 x1397073513545734/t0 o38><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1332353120 ref 1 fl Interpret:/0/0 rc -11/0
Lustre: 30308:0:(mds_lov.c:1155:mds_notify()) MDS lustre-MDT0000: add target lustre-OST0000_UUID
Lustre: 29699:0:(quota_master.c:1718:mds_quota_recovery()) Only 0/1 OSTs are active, abort quota recovery
Lustre: MDS lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: sanity-quota test_32: @@@@@@ FAIL: Rehearsh didn't happen
Lustre: DEBUG MARKER: == test 99: Quota off =============================== == 11:08:33
The 'dmesg' from OST1 (service361) showed:
Lustre: DEBUG MARKER: == test 32: check lqs hash(bug 21846) ========================================== == 11:05:01
Lustre: OST lustre-OST0000 has stopped.
LustreError: 5972:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
LustreError: 5972:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Lustre: 5972:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1395992728675719 sent from MGC10.151.26.38@o2ib to NID 10.151.26.38@o2ib 6s ago has timed out (6s prior to deadline).
req@ffff810407edd800 x1395992728675719/t0 o251->MGS@MGC10.151.26.38@o2ib_0:26/25 lens 192/384 e 0 to 1 dl 1332352932 ref 1 fl Rpc:N/0/0 rc 0/0
Lustre: server umount lustre-OST0000 complete
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode
Lustre: 20460:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1395992728675720 sent from MGC10.151.26.38@o2ib to NID 10.151.26.38@o2ib 0s ago has failed due to network error (5s prior to deadline).
req@ffff8103ca5edc00 x1395992728675720/t0 o250->MGS@MGC10.151.26.38@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1332353093 ref 1 fl Rpc:N/0/0 rc 0/0
LustreError: 7058:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff8103b2ca7800 x1395992728675721/t0 o253->MGS@MGC10.151.26.38@o2ib_0:26/25 lens 4736/4928 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 7058:0:(obd_mount.c:1112:server_start_targets()) Required registration failed for lustre-OSTffff: -108
LustreError: 7058:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -108
LustreError: 7058:0:(obd_mount.c:1453:server_put_super()) no obd lustre-OSTffff
LustreError: 7058:0:(obd_mount.c:147:server_deregister_mount()) lustre-OSTffff not registered
Lustre: server umount lustre-OSTffff complete
LustreError: 7058:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-108)
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: sanity-quota test_32: @@@@@@ FAIL: Rehearsh didn't happen
Lustre: DEBUG MARKER: == test 99: Quota off =============================== == 11:08:33
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA