Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6050

Master testing: Unable to set striping after master downgrade to 2.5

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0
    • None
    • 3
    • 16860

    Description

      After downgrading from master (Latest commit was I99ea077ae79fcdfedd7bb16c2a664714e0ea5ea3), we are unable to do lfs setstripe to specific OSTs, and setstripe with a count of greater than 1 appears to succeed, but the files created are singly striped.

      Viz:

      lfs setstripe -i 2 some_file
      error on ioctl 0x4008669a for 'some_file' (3): File too large
      error: setstripe: create stripe file 'some_file' failed
      
      lfs setstripe -c 2 some_file
      lfs getstripe some_file
      some_file
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  0
      	obdidx		 objid		 objid		 group
      	     0	      86890763	    0x52dd90b	             0
      

      Digging on the client, I've traced this to a failed RPC to the MDT:

      00000002:00100000:5.0:1418933152.848140:0:4339:0:(mdc_locks.c:642:mdc_finish_enqueue()) @@@ op: 1 disposition: 3, status: -27  req@ffff8803c4f9d000 x1487836602112680/t0(0) o101->perses1-MDT0000-mdc-ffff8803f3c1e400@4@gni:12/10 lens 600/544 e 0 to 0 dl 1418933314 ref 1 fl Complete:R/0/0 rc 301/301
      

      And looking on the MDT, I see the failure starting here:

      00000004:00000001:2.0:1418933152.838169:0:625:0:(osp_precreate.c:1057:osp_precreate_reserve()) Process entered
      00000004:00000001:2.0:1418933152.838170:0:625:0:(osp_precreate.c:1139:osp_precreate_reserve()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      

      (Note the -22 is turned in to -27 before being passed back to the client.)

      Looking at the code, we see the only way to return -EINVAL here is if it's already set in d->opd_pre_status.

      After logging the system startup, I see that being set here (MDT log):

      00000100:00000040:4.0:1418933124.438340:0:668:0:(client.c:1176:ptlrpc_check_status()) @@@ status is -22  req@ffff88022d32cc00 x1487859182731960/t0(0) o5->perses1-OST0005-osc-MDT0000@26@gni:28/4 lens 432/400 e 0 to 0 dl 1418933286 ref 2 fl Rpc:RN/0/0 rc 0/-22
      00000004:00020000:4.0:1418933124.438394:0:668:0:(osp_precreate.c:736:osp_precreate_cleanup_orphans()) perses1-OST0005-osc-MDT0000: cannot cleanup orphans: rc = -22
      

      ^-- Which sets the 'EINVAL' in d->opd_pre_status.

      And that RPC is coming from this action on the MDT:

      00000004:00080000:4.0:1418933124.429108:0:668:0:(osp_precreate.c:643:osp_precreate_cleanup_orphans()) perses1-OST0005-osc-MDT0000: going to cleanup orphans since [0x100050000:0x490dcec:0x0]
      

      Looking at logs from OST0005, I see the following:

      00000004:00000002:2.0:1418933124.428804:0:24404:0:(osd_handler.c:487:osd_check_lma()) perses1-OST0005: FID [0x100000000:0x0:0x0] != self_fid [0x100050000:0x0:0x0]
      00000004:00000001:2.0:1418933124.428806:0:24404:0:(osd_handler.c:491:osd_check_lma()) Process leaving (rc=18446744073709551538 : -78 : ffffffffffffffb2)
      00000001:00000001:2.0:1418933124.428807:0:24404:0:(osd_compat.c:330:osd_lookup_in_remote_parent()) Process entered
      00000001:00000001:2.0:1418933124.428810:0:24404:0:(osd_compat.c:349:osd_lookup_in_remote_parent()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
      00000004:00000001:2.0:1418933124.428811:0:24404:0:(osd_handler.c:624:osd_fid_lookup()) Process leaving via out (rc=18446744073709551501 : -115 : 0xffffffffffffff8d)
      00000020:00000001:2.0:1418933124.428813:0:24404:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4278189824 : 4278189824 : feffff00)
      00000004:00000010:2.0:1418933124.428815:0:24404:0:(osd_handler.c:721:osd_object_free()) kfreed 'obj': 176 at ffff880328190e00.
      00002000:00000001:2.0:1418933124.428816:0:24404:0:(ofd_dev.c:327:ofd_object_free()) Process entered
      00002000:00000040:2.0:1418933124.428817:0:24404:0:(ofd_dev.c:331:ofd_object_free()) object free, fid = [0x100000000:0x0:0x0]
      00002000:00000010:2.0:1418933124.428818:0:24404:0:(ofd_dev.c:335:ofd_object_free()) slab-freed '(of)': 160 at ffff880250060310.
      00002000:00000001:2.0:1418933124.428819:0:24404:0:(ofd_dev.c:336:ofd_object_free()) Process leaving
      00000020:00000001:2.0:1418933124.428819:0:24404:0:(lu_object.c:242:lu_object_alloc()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d)
      00000020:00000001:2.0:1418933124.428821:0:24404:0:(dt_object.c:386:dt_find_or_create()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d)
      00002000:00000010:2.0:1418933124.428822:0:24404:0:(ofd_fs.c:291:ofd_seq_load()) kfreed 'oseq': 96 at ffff8802b52ab140.
      00002000:00000001:2.0:1418933124.428824:0:24404:0:(ofd_fs.c:292:ofd_seq_load()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d)
      00002000:00020000:2.0:1418933124.428825:0:24404:0:(ofd_obd.c:1209:ofd_create()) perses1-OST0005: Can't find FID Sequence 0x0: rc = -115
      00002000:00000001:2.0:1418933124.428826:0:24404:0:(ofd_obd.c:1210:ofd_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      00000010:00000001:2.0:1418933124.428827:0:24404:0:(obd_class.h:840:obd_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      

      And this error eventually gets bubbled back to the MDT.

      I can't figure out what items should be in that FID sequence, or how those orphans came to be.

      I'm posting this mainly so the underlying issue can hopefully be identified, but I'd also welcome it if anyone had any ideas how to get the live system working again. We would prefer not to have to reformat.

      I'm going to attach logs from the client, MDS, and an OSS (all OSSes report this issue).

      Attachments

        Issue Links

          Activity

            [LU-6050] Master testing: Unable to set striping after master downgrade to 2.5
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13516/
            Subject: LU-6050 target: control OST-index in IDIF via ROCOMPAT flag
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ecd28d9b6cb691bda8184a7e07f1acc1ccded391

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13516/ Subject: LU-6050 target: control OST-index in IDIF via ROCOMPAT flag Project: fs/lustre-release Branch: master Current Patch Set: Commit: ecd28d9b6cb691bda8184a7e07f1acc1ccded391

            Making this one a blocker so it is landed for 2.7. Otherwise it isn't possible to downgrade to 2.5 after an upgrade.

            Note that it would also be possible to backport support for the OST index in the IDIF FIDs to 2.5 in order to allow downgrades after an upgrade, but that isn't a requirement for this to land to 2.7 and is only mentioned here for future reference.

            adilger Andreas Dilger added a comment - Making this one a blocker so it is landed for 2.7. Otherwise it isn't possible to downgrade to 2.5 after an upgrade. Note that it would also be possible to backport support for the OST index in the IDIF FIDs to 2.5 in order to allow downgrades after an upgrade, but that isn't a requirement for this to land to 2.7 and is only mentioned here for future reference.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13516
            Subject: LU-6050 target: control OST-index in IDIF via ROCOMPAT flag
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 03ba2e91a4cd4363dcb5cf4561155c111581ad21

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13516 Subject: LU-6050 target: control OST-index in IDIF via ROCOMPAT flag Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 03ba2e91a4cd4363dcb5cf4561155c111581ad21

            Note: the new ROCOMPAT flag in the OSTs should be in the SUPPORTED mask for 2.7, and can be added to the SUPPORTED mask for 2.5 when it is patched, so that downgrades are safe. The b2_5 patch should NOT set the ROCOMPAT flag in the last_rcvd file, since this would cause problems for 2.5.x interop.

            adilger Andreas Dilger added a comment - Note: the new ROCOMPAT flag in the OSTs should be in the SUPPORTED mask for 2.7, and can be added to the SUPPORTED mask for 2.5 when it is patched, so that downgrades are safe. The b2_5 patch should NOT set the ROCOMPAT flag in the last_rcvd file, since this would cause problems for 2.5.x interop.

            The tune2fs tool is not the right mechanism for this. The tune2fs utility is for ext2/3/4 filesystems only, and has nothing to do with the problem here. This needs to be done with the feature flags in the last_rcvd file, that can be set for newly formatted filesystems, or manually set via tunefs.lustre or maybe when a manual LFSCK is run on the OST?

            I also recommend that we make a patch for b2_5 if this problem was present in 2.6.0, since we are not making any updates for 2.6.x. It is bad for the code to be fragile in the face of a minor inconsistency on disk and should be able to handle this case, and you can bet that someone will want to downgrade even after they "knew" they didn't have to anymore. I doubt that 2.5.3 is the end of the road for any 2.5 installation, so we may as well fix this problem in both places.

            adilger Andreas Dilger added a comment - The tune2fs tool is not the right mechanism for this. The tune2fs utility is for ext2/3/4 filesystems only, and has nothing to do with the problem here. This needs to be done with the feature flags in the last_rcvd file, that can be set for newly formatted filesystems, or manually set via tunefs.lustre or maybe when a manual LFSCK is run on the OST? I also recommend that we make a patch for b2_5 if this problem was present in 2.6.0, since we are not making any updates for 2.6.x. It is bad for the code to be fragile in the face of a minor inconsistency on disk and should be able to handle this case, and you can bet that someone will want to downgrade even after they "knew" they didn't have to anymore. I doubt that 2.5.3 is the end of the road for any 2.5 installation, so we may as well fix this problem in both places.

            nasf, since we've effectively created an incompatibility preventing downgrades, I think your second option is the way to go. (Unless there is some other way to resolve this.)

            I think in general it's not OK to make downgrade impossible without the user taking specific action. Your second suggestion would provide that.

            paf Patrick Farrell (Inactive) added a comment - nasf, since we've effectively created an incompatibility preventing downgrades, I think your second option is the way to go. (Unless there is some other way to resolve this.) I think in general it's not OK to make downgrade impossible without the user taking specific action. Your second suggestion would provide that.

            People

              yong.fan nasf (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: