Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.2.0, Lustre 1.8.6
-
None
-
lustre-1.8.6.81
OFED1.5.3.1
NASA AMES
-
3
-
7037
Description
Upgrading to lustre 1.8.6 and OFED1.5.3.1 we have started to see OST<->MDT connection issue.
We have checked the IB fabric for errors and have found none.
Are there any know issues with Lustre1.8.6 and OFED1.5.3?
=== ERROR ON MDS ===
Dec 28 07:04:56 service100 kernel: Lustre: 6149:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1389011653751232 sent from nbp6-OST0002-osc to NID 10.151.25.157@o2ib 7s ago has timed out (7s prior to deadline).
Dec 28 07:04:56 service100 kernel: req@ffff81071b30ac00 x1389011653751232/t0 o13->nbp6-OST0002_UUID@10.151.25.157@o2ib:7/4 lens 192/528 e 0 to 1 dl 1325084696 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 28 07:04:56 service100 kernel: Lustre: 6149:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 258 previous similar messages
Dec 28 07:04:56 service100 kernel: Lustre: nbp6-OST0002-osc: Connection to service nbp6-OST0002 via nid 10.151.25.157@o2ib was lost; in progress operations using this service will wait for recovery to complete.
Dec 28 07:04:56 service100 kernel: Lustre: Skipped 2 previous similar messages
Dec 28 07:05:04 service100 kernel: Lustre: 6151:0:(import.c:517:import_select_connection()) nbp6-OST000a-osc: tried all connections, increasing latency to 11s
Dec 28 07:05:04 service100 kernel: Lustre: 6151:0:(import.c:517:import_select_connection()) Skipped 220 previous similar messages
Dec 28 07:05:06 service100 kernel: Lustre: nbp6-OST0042-osc: Connection restored to service nbp6-OST0042 using nid 10.151.25.157@o2ib.
Dec 28 07:05:06 service100 kernel: Lustre: Skipped 14 previous similar messages
Dec 28 07:05:06 service100 kernel: LustreError: 30626:0:(quota_ctl.c:473:lov_quota_ctl()) ost 75 is inactive
Dec 28 07:05:06 service100 kernel: LustreError: 30626:0:(quota_ctl.c:473:lov_quota_ctl()) Skipped 5 previous similar messages Dec 28 07:05:06 service100 kernel: Lustre: MDS nbp6-MDT0000: nbp6-OST0042_UUID now active, resetting orphans
Dec 28 07:05:06 service100 kernel: Lustre: Skipped 29 previous similar messages
Dec 28 07:05:07 service100 kernel: LustreError: 30630:0:(quota_master.c:1698:qmaster_recovery_main()) nbp6-MDT0000: qmaster recovery failed for uid 11631 rc:-11)
Dec 28 07:05:07 service100 kernel: LustreError: 30630:0:(quota_master.c:1698:qmaster_recovery_main()) Skipped 52 previous similar messages
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA