Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1749

llog_lvfs_create()) error looking up logfile

Details

    • 3
    • 4411

    Description

      We had an MDS (running 2.1.1-17chaos) lock up due to bug LU-1276. The admins rebooted the node to work around the issue, and after the reboot when the MDS started up we hit the following error:

      2012-08-14 13:48:55 LustreError: 3697:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x1a768b0a:0xa2f17dca: rc -116
      2012-08-14 13:48:55 LustreError: 3697:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id 0x1a768b0a:a2f17dca: rc -116
      2012-08-14 13:48:55 LustreError: 3697:0:(llog_obd.c:318:cat_cancel_cb()) Cannot find handle for log 0x1a768b0a
      2012-08-14 13:48:55 LustreError: 3472:0:(llog_obd.c:391:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -116
      2012-08-14 13:48:55 LustreError: 3472:0:(llog_obd.c:218:llog_setup_named()) obd lsa-OST00bd-osc ctxt 2 lop_setup=ffffffffa05b0a60 failed -116
      2012-08-14 13:48:55 LustreError: 3472:0:(osc_request.c:4186:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
      2012-08-14 13:48:55 LustreError: 3472:0:(osc_request.c:4203:__osc_llog_init()) osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' catid ffff8808312bd860 rc=-116
      2012-08-14 13:48:55 LustreError: 3472:0:(osc_request.c:4205:__osc_llog_init()) logid 0x1a7680f6:0x615e782e
      2012-08-14 13:48:55 LustreError: 3472:0:(osc_request.c:4233:osc_llog_init()) rc: -116
      2012-08-14 13:48:55 LustreError: 3472:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 189 osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' (rc=-116)
      2012-08-14 13:48:55 LustreError: 3698:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x1a768b0a:0xa2f17dca: rc -116
      2012-08-14 13:48:55 LustreError: 3698:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id 0x1a768b0a:a2f17dca: rc -116
      2012-08-14 13:48:55 LustreError: 3698:0:(llog_obd.c:318:cat_cancel_cb()) Cannot find handle for log 0x1a768b0a
      2012-08-14 13:48:55 LustreError: 3407:0:(llog_obd.c:391:llog_obd_origin_setup()) llog_process() with cat_cancel_cb failed: -116
      2012-08-14 13:48:55 LustreError: 3407:0:(llog_obd.c:218:llog_setup_named()) obd lsa-OST00bd-osc ctxt 2 lop_setup=ffffffffa05b0a60 failed -116
      2012-08-14 13:48:55 LustreError: 3407:0:(osc_request.c:4186:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
      2012-08-14 13:48:55 LustreError: 3407:0:(osc_request.c:4203:__osc_llog_init()) osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' catid ffff8804333af960 rc=-116
      2012-08-14 13:48:55 LustreError: 3407:0:(osc_request.c:4205:__osc_llog_init()) logid 0x1a7680f6:0x615e782e
      2012-08-14 13:48:56 LustreError: 3407:0:(osc_request.c:4233:osc_llog_init()) rc: -116
      2012-08-14 13:48:56 LustreError: 3407:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 189 osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' (rc=-116)
      

      I don't see a file in the OBJECTS directory that seems to match 0x1a768b0a:0xa2f17dca (if that is where we are looking). Although -116 is -ESTALE, so I'm not sure that we're even getting to the lower-level lookup. It may be that mds_lvfs_fid2dentry() is returning -ESTALE because the id is 0.

      This has left the OST connection "inactive" on the MDS, so any users with data on that OST are currently dead in the water.

      Attachments

        Activity

          [LU-1749] llog_lvfs_create()) error looking up logfile

          Reopening.

          nedbass Ned Bass (Inactive) added a comment - Reopening.

          We're still affected by this bug. OST lsa-OST00bd remains inactive on the MDS. MDS is at tag 2.1.4-3chaos which includes patch http://review.whamcloud.com/4742. The file /OBJECTS/1a7680f6:615e782e does not exist on the MDT.

          2013-03-20 10:05:33 LustreError: 3488:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x1a7680f6:0x615e782e: rc -116
          2013-03-20 10:05:33 LustreError: 3488:0:(llog_obd.c:220:llog_setup_named()) obd lsa-OST00bd-osc ctxt 2 lop_setup=ffffffffa06a0f20 failed -116
          2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4229:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
          2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4246:__osc_llog_init()) osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' catid ffff88022f6ff8c0 rc=-116
          2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4248:__osc_llog_init()) logid 0x1a7680f6:0x615e782e
          2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4276:osc_llog_init()) rc: -116
          2013-03-20 10:05:33 LustreError: 3488:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 189 osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' (rc=-116)
          
          nedbass Ned Bass (Inactive) added a comment - We're still affected by this bug. OST lsa-OST00bd remains inactive on the MDS. MDS is at tag 2.1.4-3chaos which includes patch http://review.whamcloud.com/4742 . The file /OBJECTS/1a7680f6:615e782e does not exist on the MDT. 2013-03-20 10:05:33 LustreError: 3488:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x1a7680f6:0x615e782e: rc -116 2013-03-20 10:05:33 LustreError: 3488:0:(llog_obd.c:220:llog_setup_named()) obd lsa-OST00bd-osc ctxt 2 lop_setup=ffffffffa06a0f20 failed -116 2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4229:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT 2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4246:__osc_llog_init()) osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' catid ffff88022f6ff8c0 rc=-116 2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4248:__osc_llog_init()) logid 0x1a7680f6:0x615e782e 2013-03-20 10:05:33 LustreError: 3488:0:(osc_request.c:4276:osc_llog_init()) rc: -116 2013-03-20 10:05:33 LustreError: 3488:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 189 osc 'lsa-OST00bd-osc' tgt 'mdd_obd-lsa-MDT0000' (rc=-116)
          emoly.liu Emoly Liu added a comment -

          Patch for b2_1 is at http://review.whamcloud.com/4742

          emoly.liu Emoly Liu added a comment - Patch for b2_1 is at http://review.whamcloud.com/4742

          Patch looks pretty simple, applies cleanly to 2.1.

          morrone Christopher Morrone (Inactive) added a comment - Patch looks pretty simple, applies cleanly to 2.1.

          Is there a patch for 2.1?

          marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Is there a patch for 2.1?
          pjones Peter Jones added a comment -

          Landed for 2.3 and 2.4

          pjones Peter Jones added a comment - Landed for 2.3 and 2.4
          laisiyao Lai Siyao added a comment - Patch for b2_3 is at: http://review.whamcloud.com/#change,4163

          Has any progress been made on this?

          marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Has any progress been made on this?

          Also at CEA they also hit this bug, so we're interested in this patch.

          Thanks,

          dmoreno Diego Moreno (Inactive) added a comment - Also at CEA they also hit this bug, so we're interested in this patch. Thanks,

          As such your problem will resolve all by itself once this new llog is deleted,

          What normally causes the llog to be deleted by itself? The inactive OST breaks quota reporting, so we need to decide whether to take down the MDS to manually remove the file.

          nedbass Ned Bass (Inactive) added a comment - As such your problem will resolve all by itself once this new llog is deleted, What normally causes the llog to be deleted by itself? The inactive OST breaks quota reporting, so we need to decide whether to take down the MDS to manually remove the file.
          laisiyao Lai Siyao added a comment - review is on http://review.whamcloud.com/#change,3708

          People

            laisiyao Lai Siyao
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: