Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-61

MDT can't connect to OST after hardware event: oscc recovery failed: -116

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.6
    • None
    • None
    • 1
    • 10068

    Description

      Hi WC,

      There was a hardware failure at Purdue today that took out a 6620 controller. After fixing the issue, the MDT fails to connect to one OST and have intermittent connections with another. fid2dentry is getting passed an obd_id of 0 which causes it to return a ESTALE to the MDT. I looked in bz, but I couldn't find anything similar. Have you seen anything or do you have any ideas on how to get it online?

      We've tried rebooting the MDS and OSS, but after recovery it still has this issue. Would aborting recovery help? How about the CATALOGS trick? Let me know if other logs would help.

      Thanks,
      Kit

      Relevant MDT logs:
      Feb 4 17:50:53 mds-a01 kernel: [13485616.075071] LustreError: 12085:0:(osc_create.c:585:osc_create()) lustrefatal: invalid object id A-OST0001-osc: oscc recovery failed: -116
      Feb 4 17:50:53 mds-a01 kernel: [13485616.075526] LustreError: 12085:0:(lov_obd.c:1131:lov_clear_orphans()) error in orphan recovery on OST idx 1/36: rc = -116
      Feb 4 17:50:53 mds-a01 kernel: [13485616.076025] LustreError: 12085:0:(mds_lov.c:1062:__mds_lov_synchronize()) lustreA-OST0001_UUID failed at mds_lov_clear_orphans: -116
      Feb 4 17:50:53 mds-a01 kernel: [13485616.076482] LustreError: 12085:0:(mds_lov.c:1071:__mds_lov_synchronize()) lustreA-OST0001_UUID sync failed -116, deactivating
      Feb 4 17:51:39 mds-a01 kernel: [13485661.612612] LustreError: 12408:0:(osc_create.c:585:osc_create()) lustreA-OST0001-osc: oscc recovery failed: -116

      lctl dl
      ...
      28 UP osc lustreA-OST000b-osc lustreA-mdtlov_UUID 5
      29 UP osc lustreA-OST0000-osc lustreA-mdtlov_UUID 5
      30 IN osc lustreA-OST0001-osc lustreA-mdtlov_UUID 5
      31 UP osc lustreA-OST0002-osc lustreA-mdtlov_UUID 5
      32 UP osc lustreA-OST0003-osc lustreA-mdtlov_UUID 5
      ...

      Relevant OST logs:
      Feb 4 17:43:53 oss-a01 kernel: [ 1333.618994] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) lustreA-OST0001: object 2283250:0 lookup error: rc -116
      Feb 4 17:43:53 oss-a01 kernel: [ 1333.619430] LustreError: 10635:0:(filter.c:1428:filter_fid2dentry()) Skipped 1 previous similar message
      Feb 4 17:43:55 oss-a01 kernel: [ 1336.503075] LustreError: 9981:0:(filter_lvb.c:90:filter_lvbo_init()) lustreA-OST0001: bad object 2283250/0: rc -116
      Feb 4 17:43:55 oss-a01 kernel: [ 1336.503630] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) lvbo_init failed for resource 2283250: rc -116
      Feb 4 17:43:55 oss-a01 kernel: [ 1336.504092] LustreError: 9981:0:(ldlm_resource.c:860:ldlm_resource_add()) Skipped 37 previous similar messages

      Attachments

        Activity

          People

            cliffw Cliff White (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: