Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11158

PFL component instantiation is not replayed properly

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      While investigating LU-10961 I have found that component instantiation is not replayed. Test showing the problem:

      test_132a() {
      	[ $(lustre_version_code $SINGLEMDS) -lt $(version_code 2.9.90) ] &&
      		skip "Do not support PFL files before 2.10"
      
      	$LFS setstripe -E 1M -c 1 -E EOF -c 2 $DIR/$tfile
      	replay_barrier $SINGLEMDS
      	# write over the first component size cause next component instantiation
      	dd if=/dev/urandom of=$DIR/$tfile bs=1M count=1 seek=1 ||
      		error "dd to $DIR/$tfile failed"
      
      	cksum=$(md5sum $DIR/$tfile | awk '{print $1}')
      	$LFS getstripe -I2 $DIR/$tfile | grep -q lmm_objects ||
      		error "Component #1 was not instantiated"
      
      	fail $SINGLEMDS
      
      	cksum2=$(md5sum $DIR/$tfile | awk '{print $1}')
      	if [ $cksum != $cksum2 ] ; then
      		error_noexit "New checksum $cksum2 does not match original $cksum"
      	fi
      	$LFS getstripe -I2 $DIR/$tfile | grep -q lmm_objects ||
      		error "Component #1 instantiation was not replayed"
      }
      run_test 132a "PFL new component instantiate replay"
      

      it is double checked here - with checksums and by checking that next component has lmm_objects assigned. Both are failing in master.

      Attachments

        Issue Links

          Activity

            [LU-11158] PFL component instantiation is not replayed properly

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34049/
            Subject: LU-11158 mdt: grow lvb buffer to hold layout
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: a1d1006a5e2bd7ba3dd9096107c456b353a3eeb0

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34049/ Subject: LU-11158 mdt: grow lvb buffer to hold layout Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: a1d1006a5e2bd7ba3dd9096107c456b353a3eeb0

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/34049
            Subject: LU-11158 mdt: grow lvb buffer to hold layout
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: ec5ce00e9c1d95a178a9ea5bf6cd2b26e0e28837

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/34049 Subject: LU-11158 mdt: grow lvb buffer to hold layout Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: ec5ce00e9c1d95a178a9ea5bf6cd2b26e0e28837
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32847/
            Subject: LU-11158 mdt: grow lvb buffer to hold layout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e5abcf83c0575b8a79594c1eb9ea727739d91522

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32847/ Subject: LU-11158 mdt: grow lvb buffer to hold layout Project: fs/lustre-release Branch: master Current Patch Set: Commit: e5abcf83c0575b8a79594c1eb9ea727739d91522

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32847
            Subject: LU-11158 mdt: grow lvb buffer to hold layout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ef68d2e72f7fad7049594d14a78dda143fc0f736

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32847 Subject: LU-11158 mdt: grow lvb buffer to hold layout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ef68d2e72f7fad7049594d14a78dda143fc0f736

            Also, speaking about decision on client side - while MDS creates layout, the client still may predict its size quite correctly if number of stripes is known because layout size depends mostly on that, if it is not specified then client may allocate large reply buffer for some amount of stripes and MDS may consider that while creating new component.

            I wasn't participating that discussion you have mentioned, maybe there are good solution already.

            tappro Mikhail Pershin added a comment - Also, speaking about decision on client side - while MDS creates layout, the client still may predict its size quite correctly if number of stripes is known because layout size depends mostly on that, if it is not specified then client may allocate large reply buffer for some amount of stripes and MDS may consider that while creating new component. I wasn't participating that discussion you have mentioned, maybe there are good solution already.

            Do you mean, for example, that new component may have defined size but no stripe count, etc. And MDS will complete that and provide final layout. Yes, that makes sense. I think we have here couple options, first, we can try to grow reply buffer for modification cases so layout will fill into it, second option here is quite non-trivial but still - what if MDS will return not whole layout in reply but just new component data? Considering that we have EX lock and new component instantiation doesn't change earlier components that should work and would require less reply size. Just thoughts, probably I am missing something here.

            tappro Mikhail Pershin added a comment - Do you mean, for example, that new component may have defined size but no stripe count, etc. And MDS will complete that and provide final layout. Yes, that makes sense. I think we have here couple options, first, we can try to grow reply buffer for modification cases so layout will fill into it, second option here is quite non-trivial but still - what if MDS will return not whole layout in reply but just new component data? Considering that we have EX lock and new component instantiation doesn't change earlier components that should work and would require less reply size. Just thoughts, probably I am missing something here.

            Yes, there usually exists layout intent in file's layout, but there also exist cases that the file only has partially layout defined.

             

            The philosophy behind the design is that the MDS should decide what layout it will allocate and how many components it should instantiate, so client technically doesn't know the actual EA size. Does this make sense to you?

            Jinshan Jinshan Xiong added a comment - Yes, there usually  exists layout intent in file's layout, but there also exist cases that the file only has partially layout defined.   The philosophy behind the design is that the MDS should decide what layout it will allocate and how many components it should instantiate, so client technically doesn't know the actual EA size. Does this make sense to you?
            tappro Mikhail Pershin added a comment - - edited

            on other hand I wonder why client can't supply correct EA size when updating layout? It knows the size, doesn't it? I mean reply buffer on client side can be allocated with proper size.

            tappro Mikhail Pershin added a comment - - edited on other hand I wonder why client can't supply correct EA size when updating layout? It knows the size, doesn't it? I mean reply buffer on client side can be allocated with proper size.

            IIRC, the mdt_lvbo_fill() may skip the EA getting just because something like that - "we can do nothing here, let's report new EA size back and there will be separate getxattr RPC". That is not working with RPCs to be replayed though.

            tappro Mikhail Pershin added a comment - IIRC, the mdt_lvbo_fill() may skip the EA getting just because something like that - "we can do nothing here, let's report new EA size back and there will be separate getxattr RPC". That is not working with RPCs to be replayed though.

            People

              bobijam Zhenyu Xu
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: