Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5984

server_mgc_set_fs()) can't set_fs -17

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.5.3
    • 2
    • 16699

    Description

      After upgrading Lustre to 2.5.3 (specifically lustre-2.5.3-2chaos) we're no longer able to start the MDS due to the following failure.

      Lustre: Lustre: Build Version: 2.5.3-2chaos-2chaos--PRISTINE-2.6.32-431.29.2.1chaos.ch5.2.x86_64
      LustreError: 13871:0:(obd_mount_server.c:313:server_mgc_set_fs()) can't set_fs -17
      Lustre: fsv-MDT0000: Unable to start target: -17
      LustreError: 13871:0:(obd_mount_server.c:845:lustre_disconnect_lwp()) fsv-MDT0000-lwp-MDT0000: Can't end config log fsv-client.
      LustreError: 13871:0:(obd_mount_server.c:1419:server_put_super()) fsv-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 13871:0:(obd_mount_server.c:1449:server_put_super()) no obd fsv-MDT0000
      LustreError: 13871:0:(obd_mount_server.c:135:server_deregister_mount()) fsv-MDT0000 not registered
      Lustre: server umount fsv-MDT0000 complete
      LustreError: 13871:0:(obd_mount.c:1326:lustre_fill_super()) Unable to mount  (-17)
      

      I took a look at the Lustre debug log and the failure is due to a problem creating the local copy of the config logs. This is a ZFS based MDS which is upgrading from 2.4.x so there was never a local CONFIGS directory.

      I'll attach the full log but basically it seems to be correctly detecting there is no CONFIGS directory. Then it attempts to create the directory which fails with -17 EEXISTS. Given the debug log we have it's not clear why this fails since the directory clearly doesn't exist. We've mounted the MDT via the ZPL and verified this.

      Hoping we could work around the issue we tried manually created the CONFIGS directory and added a copy of the llogs from the MGS. We also just tried creating an empty CONFIGS directory through the ZPL. In both cases this caused the MDS to LBUG on start as follows:

      2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed:
      2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) LBUG
      2014-12-04 11:10:50 Pid: 16688, comm: mount.lustre
      2014-12-04 11:10:50
      2014-12-04 11:10:50 Call Trace:
      2014-12-04 11:10:50  [<ffffffffa05d18f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa05d1ef7>] lbug_with_loc+0x47/0xb0 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0d623e4>] osd_index_try+0x224/0x470 [osd_zfs]
      2014-12-04 11:10:50  [<ffffffffa0740d41>] dt_try_as_dir+0x41/0x60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa0741351>] dt_lookup_dir+0x31/0x130 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa071f845>] llog_osd_open+0x475/0xbb0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa06f15ba>] llog_open+0xba/0x2c0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa06f5131>] llog_backup+0x61/0x500 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8128f540>] ? sprintf+0x40/0x50
      2014-12-04 11:10:50  [<ffffffffa0d99757>] mgc_process_log+0x1177/0x18f0 [mgc]
      2014-12-04 11:10:50  [<ffffffffa0d93360>] ? mgc_blocking_ast+0x0/0x810 [mgc]
      2014-12-04 11:10:50  [<ffffffffa08991e0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      2014-12-04 11:10:50  [<ffffffffa0d9b4b5>] mgc_process_config+0x645/0x11d0 [mgc]
      2014-12-04 11:10:50  [<ffffffffa07351c6>] lustre_process_log+0x256/0xa60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0766cb7>] server_start_targets+0x9e7/0x1db0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0738876>] ? lustre_start_mgc+0x4b6/0x1e60 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa0730760>] ? class_config_llog_handler+0x0/0x1880 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa076ceb8>] server_fill_super+0xb98/0x19e0 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs]
      2014-12-04 11:10:50  [<ffffffffa073a3f8>] lustre_fill_super+0x1d8/0x550 [obdclass]
      2014-12-04 11:10:50  [<ffffffffa073a220>] ? lustre_fill_super+0x0/0x550 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8118d1ef>] get_sb_nodev+0x5f/0xa0
      2014-12-04 11:10:50  [<ffffffffa07320e5>] lustre_get_sb+0x25/0x30 [obdclass]
      2014-12-04 11:10:50  [<ffffffff8118c82b>] vfs_kern_mount+0x7b/0x1b0
      2014-12-04 11:10:50  [<ffffffff8118c9d2>] do_kern_mount+0x52/0x130
      2014-12-04 11:10:50  [<ffffffff811ae21b>] do_mount+0x2fb/0x930
      2014-12-04 11:10:50  [<ffffffff811ae8e0>] sys_mount+0x90/0xe0
      2014-12-04 11:10:50  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      

      At this point we're rolling back to the previous Lustre release in order to make the system available again.

      Attachments

        Issue Links

          Activity

            [LU-5984] server_mgc_set_fs()) can't set_fs -17

            The problem is not constrained to just the MDT. All of the OSTs have the same problem. Obviously a manual fix is not going to be reasonable. We will need Intel to provide either a code fix (preferred), or a script to update the file to the correct value.

            I looked at one of the OSTs, and the largest filename in the 0x200000003 sequence is 0x200000003:0x4:0x0. I didn't check any more, so I cannot say if that is the same on all OSTs.

            morrone Christopher Morrone (Inactive) added a comment - The problem is not constrained to just the MDT. All of the OSTs have the same problem. Obviously a manual fix is not going to be reasonable. We will need Intel to provide either a code fix (preferred), or a script to update the file to the correct value. I looked at one of the OSTs, and the largest filename in the 0x200000003 sequence is 0x200000003:0x4:0x0. I didn't check any more, so I cannot say if that is the same on all OSTs.

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13081
            Subject: LU-5984 obd: fix lastid file name in compat code
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 57a791a8bc7aac6c46eea23bf0e2c24a4d4e6c6a

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13081 Subject: LU-5984 obd: fix lastid file name in compat code Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 57a791a8bc7aac6c46eea23bf0e2c24a4d4e6c6a

            yes, the 0x200000003:0x0:0x0 must contain lastID and it should be copied from seq-200000003-lastid in case of upgrade but it wasn't due to error in code. So now we need to restore proper counter in oi.3/0x200000003:0x0:0x0 by changing 0001 to 0008. Note, it is not copy of seq-200000003-lastid which has magic word 0xdecafbee in the beginning then 0008. After such change MDT should start any new local files with 0x8 OID and mount successfully. Now it fails to mount because it tries to use 0x1 OID which is already used. Note also, you have to remove all previously manually created CONFIGS dir and its content at first.

            tappro Mikhail Pershin added a comment - yes, the 0x200000003:0x0:0x0 must contain lastID and it should be copied from seq-200000003-lastid in case of upgrade but it wasn't due to error in code. So now we need to restore proper counter in oi.3/0x200000003:0x0:0x0 by changing 0001 to 0008. Note, it is not copy of seq-200000003-lastid which has magic word 0xdecafbee in the beginning then 0008. After such change MDT should start any new local files with 0x8 OID and mount successfully. Now it fails to mount because it tries to use 0x1 OID which is already used. Note also, you have to remove all previously manually created CONFIGS dir and its content at first.
            # hexdump seq-200000003-lastid 
            0000000 fbee deca 0008 0000                    
            0000008
            
            # hexdump oi.3/0x200000003:0x0:0x0
            0000000 0001 0000 0000 0000                    
            0000008
            # hexdump oi.3/0x200000003:0x1:0x0
            0000000 fbee deca 0008 0000                    
            0000008
            
            (0x2 through 0x5 map to directories in the ZPL interface)
            
            # hexdump oi.3/0x200000003:0x6:0x0
            0000000 0001                                   
            0000002
            # hexdump oi.3/0x200000003:0x7:0x0
            0000000 0001                                   
            0000002
            

            If you want anything from the ones that look like directories, you will need provide details about what to retrieve.

            So you think believe that we need to take the value from seq-200000003-lastid and put it in 0x200000003:0x0:0x0? No other changes needed?

            morrone Christopher Morrone (Inactive) added a comment - # hexdump seq-200000003-lastid 0000000 fbee deca 0008 0000 0000008 # hexdump oi.3/0x200000003:0x0:0x0 0000000 0001 0000 0000 0000 0000008 # hexdump oi.3/0x200000003:0x1:0x0 0000000 fbee deca 0008 0000 0000008 (0x2 through 0x5 map to directories in the ZPL interface) # hexdump oi.3/0x200000003:0x6:0x0 0000000 0001 0000002 # hexdump oi.3/0x200000003:0x7:0x0 0000000 0001 0000002 If you want anything from the ones that look like directories, you will need provide details about what to retrieve. So you think believe that we need to take the value from seq-200000003-lastid and put it in 0x200000003:0x0:0x0? No other changes needed?
            tappro Mikhail Pershin added a comment - - edited

            I think I've found the reason, it is mistake in last_compat_check() function which converts old format of lasted file to the new one. So 0x200000003:0x0:0x0 doesn't contain last value from seq-200000003-lastid and try to create new file with existing FID.

            Patch will be ready soon to prevent such issues, but it will not help with existing situation. The current solution is to change content of 0x200000003:0x0:0x0 file manually by writing 0007 0000 0000 0000 to it. (I consider that 7 is lastid from old seq-200000003-lastid, please check this)

            tappro Mikhail Pershin added a comment - - edited I think I've found the reason, it is mistake in last_compat_check() function which converts old format of lasted file to the new one. So 0x200000003:0x0:0x0 doesn't contain last value from seq-200000003-lastid and try to create new file with existing FID. Patch will be ready soon to prevent such issues, but it will not help with existing situation. The current solution is to change content of 0x200000003:0x0:0x0 file manually by writing 0007 0000 0000 0000 to it. (I consider that 7 is lastid from old seq-200000003-lastid, please check this)

            Christopher, please include also content of /seq-200000003-lastid file.

            tappro Mikhail Pershin added a comment - Christopher, please include also content of /seq-200000003-lastid file.

            Christopher, thanks.

            can you attach few files from above (including 0x200000003:0x0:0.0) if they aren't huge?
            the first one should contain the highest used id (0x7 in this case), some of the remaining files should be llogs - you can check that with llog_reader utility.

            bzzz Alex Zhuravlev added a comment - Christopher, thanks. can you attach few files from above (including 0x200000003:0x0:0.0) if they aren't huge? the first one should contain the highest used id (0x7 in this case), some of the remaining files should be llogs - you can check that with llog_reader utility.

            This is a ZFS filesystem, so there isn't much in the way of "older versions". The current filesystem in use was probably formatted under lustre 2.4.0.

            Since there are 128 OI directories, I took a wild guess that the OI values are stored in those directories modulo 128, so 0x200000003 would be in oi.3. oi.3 does contain:

            0x200000003:0x0:0x0
            0x200000003:0x1:0x0
            0x200000003:0x2:0x0
            0x200000003:0x3:0x0
            0x200000003:0x4:0x0
            0x200000003:0x5:0x0
            0x200000003:0x6:0x0
            0x200000003:0x7:0x0
            

            For 0x200000003:0x2 I see:

            # \ls oi.3/0x200000003:0x2:0x0
            dt-0x0  md-0x0
            
            morrone Christopher Morrone (Inactive) added a comment - This is a ZFS filesystem, so there isn't much in the way of "older versions". The current filesystem in use was probably formatted under lustre 2.4.0. Since there are 128 OI directories, I took a wild guess that the OI values are stored in those directories modulo 128, so 0x200000003 would be in oi.3. oi.3 does contain: 0x200000003:0x0:0x0 0x200000003:0x1:0x0 0x200000003:0x2:0x0 0x200000003:0x3:0x0 0x200000003:0x4:0x0 0x200000003:0x5:0x0 0x200000003:0x6:0x0 0x200000003:0x7:0x0 For 0x200000003:0x2 I see: # \ls oi.3/0x200000003:0x2:0x0 dt-0x0 md-0x0

            In 2.5 the MOUNT_CONFIGS_DIR is created by name in special FID sequence FID_SEQ_LOCAL_NAME = 0x200000003ULL. During mount the new object with FID 0x200000003:0x2 was generated for CONFIGS directory but for some reason it exists already in filesystem OI. I am not sure how that happens, maybe this sequence was used in older Lustre versions for something else, this is area to investigate further.

            Local creation will not help because there are no IGIFs in ZFS, and created directory has no associated FID neither in direntry nor in OI.

            That would be useful to inspect OI to find out are there FIDs with sequence 0x200000003 and how many as next step.

            tappro Mikhail Pershin added a comment - In 2.5 the MOUNT_CONFIGS_DIR is created by name in special FID sequence FID_SEQ_LOCAL_NAME = 0x200000003ULL. During mount the new object with FID 0x200000003:0x2 was generated for CONFIGS directory but for some reason it exists already in filesystem OI. I am not sure how that happens, maybe this sequence was used in older Lustre versions for something else, this is area to investigate further. Local creation will not help because there are no IGIFs in ZFS, and created directory has no associated FID neither in direntry nor in OI. That would be useful to inspect OI to find out are there FIDs with sequence 0x200000003 and how many as next step.
            yujian Jian Yu added a comment -

            Here are the conf-sanity test_32[abd] clean upgrading test reports for Lustre 2.5.3 with ZFS:
            https://testing.hpdd.intel.com/sub_tests/a9e5dbde-311c-11e4-b503-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/a9f8a32c-311c-11e4-b503-5254006e85c2
            https://testing.hpdd.intel.com/sub_tests/aa5d42be-311c-11e4-b503-5254006e85c2
            All of the tests passed.

            I also tried to perform clean upgrading from Lustre 2.4.3 to 2.5.3 with ZFS but could not reproduce the failure. After upgrading, the Lustre filesystem was mounted successfully.

            The configuration was:

            1 MGS/MDS node (different datasets for MGS and MDT)
            2 OSS nodes (1 OST per node)
            2 client nodes
            

            After upgrading, on MGS/MDS node:

            # zfs list
            NAME               USED  AVAIL  REFER  MOUNTPOINT
            lustre-mdt1        231M   457G   136K  /lustre-mdt1
            lustre-mdt1/mdt1   229M   457G   229M  /lustre-mdt1/mdt1
            lustre-mgs        5.89M   457G   136K  /lustre-mgs
            lustre-mgs/mgs    5.34M   457G  5.34M  /lustre-mgs/mgs
            
            # df
            Filesystem       1K-blocks    Used Available Use% Mounted on
            /dev/sda1         20642428 2041472  17552380  11% /
            tmpfs             16393952       0  16393952   0% /dev/shm
            lustre-mgs/mgs   475194880    5504 475187328   1% /mnt/mgs
            lustre-mdt1/mdt1 475192704  234240 474956416   1% /mnt/mds1
            
            yujian Jian Yu added a comment - Here are the conf-sanity test_32[abd] clean upgrading test reports for Lustre 2.5.3 with ZFS: https://testing.hpdd.intel.com/sub_tests/a9e5dbde-311c-11e4-b503-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/a9f8a32c-311c-11e4-b503-5254006e85c2 https://testing.hpdd.intel.com/sub_tests/aa5d42be-311c-11e4-b503-5254006e85c2 All of the tests passed. I also tried to perform clean upgrading from Lustre 2.4.3 to 2.5.3 with ZFS but could not reproduce the failure. After upgrading, the Lustre filesystem was mounted successfully. The configuration was: 1 MGS/MDS node (different datasets for MGS and MDT) 2 OSS nodes (1 OST per node) 2 client nodes After upgrading, on MGS/MDS node: # zfs list NAME USED AVAIL REFER MOUNTPOINT lustre-mdt1 231M 457G 136K /lustre-mdt1 lustre-mdt1/mdt1 229M 457G 229M /lustre-mdt1/mdt1 lustre-mgs 5.89M 457G 136K /lustre-mgs lustre-mgs/mgs 5.34M 457G 5.34M /lustre-mgs/mgs # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 20642428 2041472 17552380 11% / tmpfs 16393952 0 16393952 0% /dev/shm lustre-mgs/mgs 475194880 5504 475187328 1% /mnt/mgs lustre-mdt1/mdt1 475192704 234240 474956416 1% /mnt/mds1

            I've also added Alex and Mike to the CC list. Looking at the osd-zfs/osd_oi.c code, it appears that it is looking up the "CONFIGS" directory by FID, but in ZFS that FID may only be referenced by the OI file and not have a filename:

            static const struct named_oid oids[] = {
                    { LAST_RECV_OID,                LAST_RCVD },
                    { OFD_LAST_GROUP_OID,           "LAST_GROUP" },
                    { LLOG_CATALOGS_OID,            "CATALOGS" },
                    { MGS_CONFIGS_OID,              NULL /*MOUNT_CONFIGS_DIR*/ },
            

            It isn't clear to me why only the MGS_CONFIGS_OID=4119 "named_oid" in osd-zfs doesn't have a filename in the namespace, unlike other OIDs, and in osd-ldiskfs. It doesn't appear that MGS_CONFIGS_OID is used directly in the code anywhere, only via MOUNT_CONFIGS_DIR, so at leas. While MOUNT_CONFIGS_DIR use is partly ldiskfs-specific for accessing the "mountdata" file (which is stored in dataset properties in ZFS), it is also used in the server_mgc_set_fs-> mgc_fs_setup-> local_file_find_or_create() callpath that is failing.

            It also isn't clear why conf-sanity test_32[ab] upgrade tests did not show any problems, since there is definitely a test image lustre/tests/disk2_4-zfs.tar.bz2 (created with 2.4.0) that should run 2.4.0->current version upgrades for every review-zfs test. There is LU-5649 that is specific to ZFS failing test_32 but it looks like it was only a short-term regression on 2.5 due to a patch that was later reverted before release ("LU-3489 osd-zfs: varsize index support").

            adilger Andreas Dilger added a comment - I've also added Alex and Mike to the CC list. Looking at the osd-zfs/osd_oi.c code, it appears that it is looking up the "CONFIGS" directory by FID, but in ZFS that FID may only be referenced by the OI file and not have a filename: static const struct named_oid oids[] = { { LAST_RECV_OID, LAST_RCVD }, { OFD_LAST_GROUP_OID, "LAST_GROUP" }, { LLOG_CATALOGS_OID, "CATALOGS" }, { MGS_CONFIGS_OID, NULL /*MOUNT_CONFIGS_DIR*/ }, It isn't clear to me why only the MGS_CONFIGS_OID=4119 "named_oid" in osd-zfs doesn't have a filename in the namespace, unlike other OIDs, and in osd-ldiskfs. It doesn't appear that MGS_CONFIGS_OID is used directly in the code anywhere, only via MOUNT_CONFIGS_DIR, so at leas. While MOUNT_CONFIGS_DIR use is partly ldiskfs-specific for accessing the "mountdata" file (which is stored in dataset properties in ZFS), it is also used in the server_mgc_set_fs-> mgc_fs_setup-> local_file_find_or_create() callpath that is failing. It also isn't clear why conf-sanity test_32 [ab] upgrade tests did not show any problems, since there is definitely a test image lustre/tests/disk2_4-zfs.tar.bz2 (created with 2.4.0) that should run 2.4.0->current version upgrades for every review-zfs test. There is LU-5649 that is specific to ZFS failing test_32 but it looks like it was only a short-term regression on 2.5 due to a patch that was later reverted before release (" LU-3489 osd-zfs: varsize index support").

            People

              tappro Mikhail Pershin
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: