Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
1
-
10068
Description
Hi WC,
There was a hardware failure at Purdue today that took out a 6620 controller. After fixing the issue, the MDT fails to connect to one OST and have intermittent connections with another. fid2dentry is getting passed an obd_id of 0 which causes it to return a ESTALE to the MDT. I looked in bz, but I couldn't find anything similar. Have you seen anything or do you have any ideas on how to get it online?
We've tried rebooting the MDS and OSS, but after recovery it still has this issue. Would aborting recovery help? How about the CATALOGS trick? Let me know if other logs would help.
Thanks,
Kit
Relevant MDT logs:
Feb 4 17:50:53 mds-a01 kernel: [13485616.075071] LustreError: 12085:0:(osc_create.c:585:osc_create()) lustrefatal: invalid object id A-OST0001-osc: oscc recovery failed: -116
Feb 4 17:50:53 mds-a01 kernel: [13485616.075526] LustreError: 12085:0:(lov_obd.c:1131:lov_clear_orphans()) error in orphan recovery on OST idx 1/36: rc = -116
Feb 4 17:50:53 mds-a01 kernel: [13485616.076025] LustreError: 12085:0:(mds_lov.c:1062:__mds_lov_synchronize()) lustreA-OST0001_UUID failed at mds_lov_clear_orphans: -116
Feb 4 17:50:53 mds-a01 kernel: [13485616.076482] LustreError: 12085:0:(mds_lov.c:1071:__mds_lov_synchronize()) lustreA-OST0001_UUID sync failed -116, deactivating
Feb 4 17:51:39 mds-a01 kernel: [13485661.612612] LustreError: 12408:0:(osc_create.c:585:osc_create()) lustreA-OST0001-osc: oscc recovery failed: -116
lctl dl
...
28 UP osc lustreA-OST000b-osc lustreA-mdtlov_UUID 5
29 UP osc lustreA-OST0000-osc lustreA-mdtlov_UUID 5
30 IN osc lustreA-OST0001-osc lustreA-mdtlov_UUID 5
31 UP osc lustreA-OST0002-osc lustreA-mdtlov_UUID 5
32 UP osc lustreA-OST0003-osc lustreA-mdtlov_UUID 5
...
Relevant OST logs:
Feb 4 17:43:53 oss-a01 kernel: [ 1333.618994] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) lustreA-OST0001: object 2283250:0 lookup error: rc -116
Feb 4 17:43:53 oss-a01 kernel: [ 1333.619430] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) Skipped 1 previous similar message
Feb 4 17:43:55 oss-a01 kernel: [ 1336.503075] LustreError: 9981:0:(filter_lvb.c:90:filter_lvbo_init()) lustreA-OST0001: bad object 2283250/0: rc -116
Feb 4 17:43:55 oss-a01 kernel: [ 1336.503630] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) lvbo_init failed for resource 2283250: rc -116
Feb 4 17:43:55 oss-a01 kernel: [ 1336.504092] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) Skipped 37 previous similar messages