[LU-17327] Write conf-santity test case for online OST and MDT addition Created: 30/Nov/23 Updated: 03/Feb/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | Jian Yu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
We need an automated test to exercise adding OSTs and MDTs online to a live filesystem that is under load. Andreas provided this guidance:
|
| Comments |
| Comment by Gerrit Updater [ 30/Nov/23 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53300 |
| Comment by Andreas Dilger [ 04/Dec/23 ] |
|
It looks like the test case has exposed an expected failure case where the MDS created a file with an object on the newly-added OST but the client wasn't aware of the new OST yet: [ 183.117948] Lustre: DEBUG MARKER: == conf-sanity test 46b: online OST and MDT addition ===== 16:48:36 (1701380916) [ 206.830361] Lustre: Mounted lustre-client [ 214.067224] LustreError: 14230:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [ 214.070240] Lustre: 14230:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x2ab:1025, magic 0x0bd10bd0, pattern 0x1 [ 214.072822] Lustre: 14230:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 214.075459] Lustre: 14230:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x2c0000401:2 [ 214.078104] LustreError: 14230:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x2ab:0x0]: rc = -22 [ 214.081653] LustreError: 14230:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22 [ 460.900709] Lustre: DEBUG MARKER: conf-sanity test_46b: @@@@@@ FAIL: rsync failed Issue LU-17334 is tracking the fix for client gracefully handling of this case, while LU-17300 is tracking the fix to avoid creating such files in the first place. Both fixes are useful to implement for interop and reliability reasons. |
| Comment by Andreas Dilger [ 04/Dec/23 ] |
|
I looked through the test results on Gerrit Janitor and 100% of the test runs for the new test_46b failed, but 40/44 test runs only failed because they ran out of space while copying the source trees into the test filesystem: Started lustre-OST0001 waiting for rsync to finish rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/kbd/keymaps/xkb/.hr-alternatequotes.map.gz.QHFVy3" failed: No space left on device (28) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/kbd/keymaps/xkb/.hr-unicode.map.gz.lliNpP" failed: No space left on device (28) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/kbd/keymaps/xkb/.hr-unicodeus.map.gz.AAhphB" failed: No space left on device (28) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/kbd/keymaps/xkb/.hr-us.map.gz.ID9K9m" failed: No space left on device (28) : There were 4 cases that failed due to the MDS creating a file on a new OST that the client didn't know existed yet (with errors on the client console as in the previous comment): waiting for rsync to finish rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/etc/.cron.deny.PDs0S0" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/etc/.crypttab.FFAEg4" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/etc/.csh.login.f7BhT7" failed: Invalid argument (22) rsync: write failed on "/mnt/lustre/d46b.conf-sanity/lib/locale/locale-archive": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2] |
| Comment by Andreas Dilger [ 04/Dec/23 ] |
|
The test runs on Autotest showed much more chance of hitting the object-on-new-OST creation race, with a long list of files being created on new OSTs: rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/dracut/modules.d/99shutdown/.shutdown.sh.iiLyky" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/dracut/modules.d/99squash/.shchkdir.V60fMT" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/dracut/modules.d/99squash/.module-setup.sh.TDJhBN" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/firewalld/helpers/.RAS.xml.zYZx1S" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/firewalld/helpers/.amanda.xml.okGbNj" failed: Invalid argument (22) rsync: mkstemp "/mnt/lustre/d46b.conf-sanity/lib/firewalld/helpers/.ftp.xml.Rsg5G5" failed: Invalid argument (22) : and the client console logs showing this error was hit for each new OST addition. This is likely because there are more Autotest OSTs to be added (6) instead of Janitor (only 1): : [ 605.944711] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x5bc:0x0]: rc = -22 [ 605.946775] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) Skipped 8 previous similar messages [ 605.948389] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22 [ 605.949762] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) Skipped 8 previous similar messages [ 606.954945] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 2 more than OST count 2 [ 606.956738] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) Skipped 25 previous similar messages [ 606.958203] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x663:1025, magic 0x0bd10bd0, pattern 0x1 [ 606.959944] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) Skipped 25 previous similar messages [ 606.961495] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 606.963239] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) Skipped 25 previous similar messages [ 606.964782] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 2 subobj 0x2c0000400:37 [ 606.966319] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) Skipped 25 previous similar messages [ 606.967869] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x663:0x0]: rc = -22 [ 606.969972] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) Skipped 25 previous similar messages [ 606.971572] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22 [ 606.972935] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) Skipped 25 previous similar messages [ 626.075632] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 5 more than OST count 5 [ 626.082235] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) Skipped 5 previous similar messages [ 626.083709] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x147a:1025, magic 0x0bd10bd0, pattern 0x1 [ 626.085377] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) Skipped 5 previous similar messages [ 626.086871] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 626.088539] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) Skipped 5 previous similar messages [ 626.090037] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 5 subobj 0x380000400:2 [ 626.091542] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) Skipped 5 previous similar messages [ 626.093115] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x147a:0x0]: rc = -22 [ 626.095205] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) Skipped 5 previous similar messages [ 626.096842] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22 [ 626.098230] LustreError: 41994:0:(llite_lib.c:3613:ll_prep_inode()) Skipped 5 previous similar messages [ 632.765740] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 6 more than OST count 6 [ 632.767467] LustreError: 41994:0:(lov_ea.c:279:lsme_unpack()) Skipped 36 previous similar messages [ 632.768944] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x16a9:1025, magic 0x0bd10bd0, pattern 0x1 [ 632.770626] Lustre: 41994:0:(lov_pack.c:57:lov_dump_lmm_common()) Skipped 36 previous similar messages [ 632.772137] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 632.773800] Lustre: 41994:0:(lov_pack.c:61:lov_dump_lmm_common()) Skipped 36 previous similar messages [ 632.775283] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 6 subobj 0x3c0000400:2 [ 632.776766] Lustre: 41994:0:(lov_pack.c:81:lov_dump_lmm_objects()) Skipped 36 previous similar messages [ 632.778285] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x16a9:0x0]: rc = -22 [ 632.780422] LustreError: 41994:0:(lcommon_cl.c:196:cl_file_inode_init()) Skipped 36 previous similar messages : |
| Comment by Gerrit Updater [ 06/Dec/23 ] |
|
|
| Comment by Gerrit Updater [ 08/Dec/23 ] |
|
|