[LU-5984] server_mgc_set_fs()) can't set_fs -17 Created: 04/Dec/14 Updated: 30/May/15 Resolved: 28/Apr/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Brian Behlendorf | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 16699 | ||||||||
| Description |
|
After upgrading Lustre to 2.5.3 (specifically lustre-2.5.3-2chaos) we're no longer able to start the MDS due to the following failure. Lustre: Lustre: Build Version: 2.5.3-2chaos-2chaos--PRISTINE-2.6.32-431.29.2.1chaos.ch5.2.x86_64 LustreError: 13871:0:(obd_mount_server.c:313:server_mgc_set_fs()) can't set_fs -17 Lustre: fsv-MDT0000: Unable to start target: -17 LustreError: 13871:0:(obd_mount_server.c:845:lustre_disconnect_lwp()) fsv-MDT0000-lwp-MDT0000: Can't end config log fsv-client. LustreError: 13871:0:(obd_mount_server.c:1419:server_put_super()) fsv-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 13871:0:(obd_mount_server.c:1449:server_put_super()) no obd fsv-MDT0000 LustreError: 13871:0:(obd_mount_server.c:135:server_deregister_mount()) fsv-MDT0000 not registered Lustre: server umount fsv-MDT0000 complete LustreError: 13871:0:(obd_mount.c:1326:lustre_fill_super()) Unable to mount (-17) I took a look at the Lustre debug log and the failure is due to a problem creating the local copy of the config logs. This is a ZFS based MDS which is upgrading from 2.4.x so there was never a local CONFIGS directory. I'll attach the full log but basically it seems to be correctly detecting there is no CONFIGS directory. Then it attempts to create the directory which fails with -17 EEXISTS. Given the debug log we have it's not clear why this fails since the directory clearly doesn't exist. We've mounted the MDT via the ZPL and verified this. Hoping we could work around the issue we tried manually created the CONFIGS directory and added a copy of the llogs from the MGS. We also just tried creating an empty CONFIGS directory through the ZPL. In both cases this caused the MDS to LBUG on start as follows: 2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 2014-12-04 11:10:50 LustreError: 16688:0:(osd_index.c:1313:osd_index_try()) LBUG 2014-12-04 11:10:50 Pid: 16688, comm: mount.lustre 2014-12-04 11:10:50 2014-12-04 11:10:50 Call Trace: 2014-12-04 11:10:50 [<ffffffffa05d18f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2014-12-04 11:10:50 [<ffffffffa05d1ef7>] lbug_with_loc+0x47/0xb0 [libcfs] 2014-12-04 11:10:50 [<ffffffffa0d623e4>] osd_index_try+0x224/0x470 [osd_zfs] 2014-12-04 11:10:50 [<ffffffffa0740d41>] dt_try_as_dir+0x41/0x60 [obdclass] 2014-12-04 11:10:50 [<ffffffffa0741351>] dt_lookup_dir+0x31/0x130 [obdclass] 2014-12-04 11:10:50 [<ffffffffa071f845>] llog_osd_open+0x475/0xbb0 [obdclass] 2014-12-04 11:10:50 [<ffffffffa06f15ba>] llog_open+0xba/0x2c0 [obdclass] 2014-12-04 11:10:50 [<ffffffffa06f5131>] llog_backup+0x61/0x500 [obdclass] 2014-12-04 11:10:50 [<ffffffff8128f540>] ? sprintf+0x40/0x50 2014-12-04 11:10:50 [<ffffffffa0d99757>] mgc_process_log+0x1177/0x18f0 [mgc] 2014-12-04 11:10:50 [<ffffffffa0d93360>] ? mgc_blocking_ast+0x0/0x810 [mgc] 2014-12-04 11:10:50 [<ffffffffa08991e0>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] 2014-12-04 11:10:50 [<ffffffffa0d9b4b5>] mgc_process_config+0x645/0x11d0 [mgc] 2014-12-04 11:10:50 [<ffffffffa07351c6>] lustre_process_log+0x256/0xa60 [obdclass] 2014-12-04 11:10:50 [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 2014-12-04 11:10:50 [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs] 2014-12-04 11:10:50 [<ffffffffa0766cb7>] server_start_targets+0x9e7/0x1db0 [obdclass] 2014-12-04 11:10:50 [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs] 2014-12-04 11:10:50 [<ffffffffa0738876>] ? lustre_start_mgc+0x4b6/0x1e60 [obdclass] 2014-12-04 11:10:50 [<ffffffffa05e1971>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 2014-12-04 11:10:50 [<ffffffffa0730760>] ? class_config_llog_handler+0x0/0x1880 [obdclass] 2014-12-04 11:10:50 [<ffffffffa076ceb8>] server_fill_super+0xb98/0x19e0 [obdclass] 2014-12-04 11:10:50 [<ffffffffa05dc378>] ? libcfs_log_return+0x28/0x40 [libcfs] 2014-12-04 11:10:50 [<ffffffffa073a3f8>] lustre_fill_super+0x1d8/0x550 [obdclass] 2014-12-04 11:10:50 [<ffffffffa073a220>] ? lustre_fill_super+0x0/0x550 [obdclass] 2014-12-04 11:10:50 [<ffffffff8118d1ef>] get_sb_nodev+0x5f/0xa0 2014-12-04 11:10:50 [<ffffffffa07320e5>] lustre_get_sb+0x25/0x30 [obdclass] 2014-12-04 11:10:50 [<ffffffff8118c82b>] vfs_kern_mount+0x7b/0x1b0 2014-12-04 11:10:50 [<ffffffff8118c9d2>] do_kern_mount+0x52/0x130 2014-12-04 11:10:50 [<ffffffff811ae21b>] do_mount+0x2fb/0x930 2014-12-04 11:10:50 [<ffffffff811ae8e0>] sys_mount+0x90/0xe0 2014-12-04 11:10:50 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b At this point we're rolling back to the previous Lustre release in order to make the system available again. |
| Comments |
| Comment by Brian Behlendorf [ 04/Dec/14 ] |
|
A little additional information about the configuration on the MDS node. One thing to notice is particular is the MGS and MDT are running on the same node but use different datasets. They are also named slightly differently for historical reasons. This MGS originally only had configuration for a filesystem named lsv, then a second was added called fsv and the original lsv was retired. $ zfs list NAME USED AVAIL REFER MOUNTPOINT vesta-mds1 1.04T 1.02T 30K /vesta-mds1 vesta-mds1/fsv-mdt0 1.03T 1.02T 864G legacy vesta-mds1/mgs 27.2M 1.02T 26.9M /vesta-mds1/mgs # Top level MDT directory # ls CATALOGS oi.10 oi.112 oi.125 oi.23 oi.36 oi.49 oi.61 oi.74 oi.87 quota_master O oi.100 oi.113 oi.126 oi.24 oi.37 oi.5 oi.62 oi.75 oi.88 quota_slave PENDING oi.101 oi.114 oi.127 oi.25 oi.38 oi.50 oi.63 oi.76 oi.89 seq-200000003-lastid ROOT oi.102 oi.115 oi.13 oi.26 oi.39 oi.51 oi.64 oi.77 oi.9 seq_ctl capa_keys oi.103 oi.116 oi.14 oi.27 oi.4 oi.52 oi.65 oi.78 oi.90 seq_srv changelog_catalog oi.104 oi.117 oi.15 oi.28 oi.40 oi.53 oi.66 oi.79 oi.91 changelog_users oi.105 oi.118 oi.16 oi.29 oi.41 oi.54 oi.67 oi.8 oi.92 fld oi.106 oi.119 oi.17 oi.3 oi.42 oi.55 oi.68 oi.80 oi.93 last_rcvd oi.107 oi.12 oi.18 oi.30 oi.43 oi.56 oi.69 oi.81 oi.94 lfsck_bookmark oi.108 oi.120 oi.19 oi.31 oi.44 oi.57 oi.7 oi.82 oi.95 lfsck_namespace oi.109 oi.121 oi.2 oi.32 oi.45 oi.58 oi.70 oi.83 oi.96 lov_objid oi.11 oi.122 oi.20 oi.33 oi.46 oi.59 oi.71 oi.84 oi.97 oi.0 oi.110 oi.123 oi.21 oi.34 oi.47 oi.6 oi.72 oi.85 oi.98 oi.1 oi.111 oi.124 oi.22 oi.35 oi.48 oi.60 oi.73 oi.86 oi.99 # Top level and CONFIGS MGS direcotry $ ls CONFIGS oi.106 oi.117 oi.13 oi.24 oi.35 oi.46 oi.57 oi.68 oi.79 oi.9 NIDTBL_VERSIONS oi.107 oi.118 oi.14 oi.25 oi.36 oi.47 oi.58 oi.69 oi.8 oi.90 O oi.108 oi.119 oi.15 oi.26 oi.37 oi.48 oi.59 oi.7 oi.80 oi.91 oi.0 oi.109 oi.12 oi.16 oi.27 oi.38 oi.49 oi.6 oi.70 oi.81 oi.92 oi.1 oi.11 oi.120 oi.17 oi.28 oi.39 oi.5 oi.60 oi.71 oi.82 oi.93 oi.10 oi.110 oi.121 oi.18 oi.29 oi.4 oi.50 oi.61 oi.72 oi.83 oi.94 oi.100 oi.111 oi.122 oi.19 oi.3 oi.40 oi.51 oi.62 oi.73 oi.84 oi.95 oi.101 oi.112 oi.123 oi.2 oi.30 oi.41 oi.52 oi.63 oi.74 oi.85 oi.96 oi.102 oi.113 oi.124 oi.20 oi.31 oi.42 oi.53 oi.64 oi.75 oi.86 oi.97 oi.103 oi.114 oi.125 oi.21 oi.32 oi.43 oi.54 oi.65 oi.76 oi.87 oi.98 oi.104 oi.115 oi.126 oi.22 oi.33 oi.44 oi.55 oi.66 oi.77 oi.88 oi.99 oi.105 oi.116 oi.127 oi.23 oi.34 oi.45 oi.56 oi.67 oi.78 oi.89 seq-200000003-lastid $ ls CONFIGS/ fsv-MDT0000 fsv-OST0018 fsv-OST0031 fsv-OST004a lsv-OST0001 lsv-OST001a lsv-OST0033 lsv-OST004c fsv-OST0000 fsv-OST0019 fsv-OST0032 fsv-OST004b lsv-OST0002 lsv-OST001b lsv-OST0034 lsv-OST004d fsv-OST0001 fsv-OST001a fsv-OST0033 fsv-OST004c lsv-OST0003 lsv-OST001c lsv-OST0035 lsv-OST004e fsv-OST0002 fsv-OST001b fsv-OST0034 fsv-OST004d lsv-OST0004 lsv-OST001d lsv-OST0036 lsv-OST004f fsv-OST0003 fsv-OST001c fsv-OST0035 fsv-OST004e lsv-OST0005 lsv-OST001e lsv-OST0037 lsv-OST0050 fsv-OST0004 fsv-OST001d fsv-OST0036 fsv-OST004f lsv-OST0006 lsv-OST001f lsv-OST0038 lsv-OST0051 fsv-OST0005 fsv-OST001e fsv-OST0037 fsv-OST0050 lsv-OST0007 lsv-OST0020 lsv-OST0039 lsv-OST0052 fsv-OST0006 fsv-OST001f fsv-OST0038 fsv-OST0051 lsv-OST0008 lsv-OST0021 lsv-OST003a lsv-OST0053 fsv-OST0007 fsv-OST0020 fsv-OST0039 fsv-OST0052 lsv-OST0009 lsv-OST0022 lsv-OST003b lsv-OST0054 fsv-OST0008 fsv-OST0021 fsv-OST003a fsv-OST0053 lsv-OST000a lsv-OST0023 lsv-OST003c lsv-OST0055 fsv-OST0009 fsv-OST0022 fsv-OST003b fsv-OST0054 lsv-OST000b lsv-OST0024 lsv-OST003d lsv-OST0056 fsv-OST000a fsv-OST0023 fsv-OST003c fsv-OST0055 lsv-OST000c lsv-OST0025 lsv-OST003e lsv-OST0057 fsv-OST000b fsv-OST0024 fsv-OST003d fsv-OST0056 lsv-OST000d lsv-OST0026 lsv-OST003f lsv-OST0058 fsv-OST000c fsv-OST0025 fsv-OST003e fsv-OST0057 lsv-OST000e lsv-OST0027 lsv-OST0040 lsv-OST0059 fsv-OST000d fsv-OST0026 fsv-OST003f fsv-OST0058 lsv-OST000f lsv-OST0028 lsv-OST0041 lsv-OST005a fsv-OST000e fsv-OST0027 fsv-OST0040 fsv-OST0059 lsv-OST0010 lsv-OST0029 lsv-OST0042 lsv-OST005b fsv-OST000f fsv-OST0028 fsv-OST0041 fsv-OST005a lsv-OST0011 lsv-OST002a lsv-OST0043 lsv-OST005c fsv-OST0010 fsv-OST0029 fsv-OST0042 fsv-OST005b lsv-OST0012 lsv-OST002b lsv-OST0044 lsv-OST005d fsv-OST0011 fsv-OST002a fsv-OST0043 fsv-OST005c lsv-OST0013 lsv-OST002c lsv-OST0045 lsv-OST005e fsv-OST0012 fsv-OST002b fsv-OST0044 fsv-OST005d lsv-OST0014 lsv-OST002d lsv-OST0046 lsv-OST005f fsv-OST0013 fsv-OST002c fsv-OST0045 fsv-OST005e lsv-OST0015 lsv-OST002e lsv-OST0047 lsv-OST0060 fsv-OST0014 fsv-OST002d fsv-OST0046 fsv-OST005f lsv-OST0016 lsv-OST002f lsv-OST0048 lsv-client fsv-OST0015 fsv-OST002e fsv-OST0047 fsv-client lsv-OST0017 lsv-OST0030 lsv-OST0049 params fsv-OST0016 fsv-OST002f fsv-OST0048 fsv-params lsv-OST0018 lsv-OST0031 lsv-OST004a params-client fsv-OST0017 fsv-OST0030 fsv-OST0049 lsv-MDT0000 lsv-OST0019 lsv-OST0032 lsv-OST004b |
| Comment by Peter Jones [ 04/Dec/14 ] |
|
Oleg is looking into this |
| Comment by Andreas Dilger [ 04/Dec/14 ] |
|
I've also added Alex and Mike to the CC list. Looking at the osd-zfs/osd_oi.c code, it appears that it is looking up the "CONFIGS" directory by FID, but in ZFS that FID may only be referenced by the OI file and not have a filename: static const struct named_oid oids[] = {
{ LAST_RECV_OID, LAST_RCVD },
{ OFD_LAST_GROUP_OID, "LAST_GROUP" },
{ LLOG_CATALOGS_OID, "CATALOGS" },
{ MGS_CONFIGS_OID, NULL /*MOUNT_CONFIGS_DIR*/ },
It isn't clear to me why only the MGS_CONFIGS_OID=4119 "named_oid" in osd-zfs doesn't have a filename in the namespace, unlike other OIDs, and in osd-ldiskfs. It doesn't appear that MGS_CONFIGS_OID is used directly in the code anywhere, only via MOUNT_CONFIGS_DIR, so at leas. While MOUNT_CONFIGS_DIR use is partly ldiskfs-specific for accessing the "mountdata" file (which is stored in dataset properties in ZFS), it is also used in the server_mgc_set_fs-> mgc_fs_setup-> local_file_find_or_create() callpath that is failing. It also isn't clear why conf-sanity test_32[ab] upgrade tests did not show any problems, since there is definitely a test image lustre/tests/disk2_4-zfs.tar.bz2 (created with 2.4.0) that should run 2.4.0->current version upgrades for every review-zfs test. There is |
| Comment by Jian Yu [ 05/Dec/14 ] |
|
Here are the conf-sanity test_32[abd] clean upgrading test reports for Lustre 2.5.3 with ZFS: I also tried to perform clean upgrading from Lustre 2.4.3 to 2.5.3 with ZFS but could not reproduce the failure. After upgrading, the Lustre filesystem was mounted successfully. The configuration was: 1 MGS/MDS node (different datasets for MGS and MDT) 2 OSS nodes (1 OST per node) 2 client nodes After upgrading, on MGS/MDS node: # zfs list NAME USED AVAIL REFER MOUNTPOINT lustre-mdt1 231M 457G 136K /lustre-mdt1 lustre-mdt1/mdt1 229M 457G 229M /lustre-mdt1/mdt1 lustre-mgs 5.89M 457G 136K /lustre-mgs lustre-mgs/mgs 5.34M 457G 5.34M /lustre-mgs/mgs # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 20642428 2041472 17552380 11% / tmpfs 16393952 0 16393952 0% /dev/shm lustre-mgs/mgs 475194880 5504 475187328 1% /mnt/mgs lustre-mdt1/mdt1 475192704 234240 474956416 1% /mnt/mds1 |
| Comment by Mikhail Pershin [ 08/Dec/14 ] |
|
In 2.5 the MOUNT_CONFIGS_DIR is created by name in special FID sequence FID_SEQ_LOCAL_NAME = 0x200000003ULL. During mount the new object with FID 0x200000003:0x2 was generated for CONFIGS directory but for some reason it exists already in filesystem OI. I am not sure how that happens, maybe this sequence was used in older Lustre versions for something else, this is area to investigate further. Local creation will not help because there are no IGIFs in ZFS, and created directory has no associated FID neither in direntry nor in OI. That would be useful to inspect OI to find out are there FIDs with sequence 0x200000003 and how many as next step. |
| Comment by Christopher Morrone [ 08/Dec/14 ] |
|
This is a ZFS filesystem, so there isn't much in the way of "older versions". The current filesystem in use was probably formatted under lustre 2.4.0. Since there are 128 OI directories, I took a wild guess that the OI values are stored in those directories modulo 128, so 0x200000003 would be in oi.3. oi.3 does contain: 0x200000003:0x0:0x0 0x200000003:0x1:0x0 0x200000003:0x2:0x0 0x200000003:0x3:0x0 0x200000003:0x4:0x0 0x200000003:0x5:0x0 0x200000003:0x6:0x0 0x200000003:0x7:0x0 For 0x200000003:0x2 I see: # \ls oi.3/0x200000003:0x2:0x0 dt-0x0 md-0x0 |
| Comment by Alex Zhuravlev [ 09/Dec/14 ] |
|
Christopher, thanks. can you attach few files from above (including 0x200000003:0x0:0.0) if they aren't huge? |
| Comment by Mikhail Pershin [ 09/Dec/14 ] |
|
Christopher, please include also content of /seq-200000003-lastid file. |
| Comment by Mikhail Pershin [ 09/Dec/14 ] |
|
I think I've found the reason, it is mistake in last_compat_check() function which converts old format of lasted file to the new one. So 0x200000003:0x0:0x0 doesn't contain last value from seq-200000003-lastid and try to create new file with existing FID. Patch will be ready soon to prevent such issues, but it will not help with existing situation. The current solution is to change content of 0x200000003:0x0:0x0 file manually by writing 0007 0000 0000 0000 to it. (I consider that 7 is lastid from old seq-200000003-lastid, please check this) |
| Comment by Christopher Morrone [ 12/Dec/14 ] |
# hexdump seq-200000003-lastid 0000000 fbee deca 0008 0000 0000008 # hexdump oi.3/0x200000003:0x0:0x0 0000000 0001 0000 0000 0000 0000008 # hexdump oi.3/0x200000003:0x1:0x0 0000000 fbee deca 0008 0000 0000008 (0x2 through 0x5 map to directories in the ZPL interface) # hexdump oi.3/0x200000003:0x6:0x0 0000000 0001 0000002 # hexdump oi.3/0x200000003:0x7:0x0 0000000 0001 0000002 If you want anything from the ones that look like directories, you will need provide details about what to retrieve. So you think believe that we need to take the value from seq-200000003-lastid and put it in 0x200000003:0x0:0x0? No other changes needed? |
| Comment by Mikhail Pershin [ 14/Dec/14 ] |
|
yes, the 0x200000003:0x0:0x0 must contain lastID and it should be copied from seq-200000003-lastid in case of upgrade but it wasn't due to error in code. So now we need to restore proper counter in oi.3/0x200000003:0x0:0x0 by changing 0001 to 0008. Note, it is not copy of seq-200000003-lastid which has magic word 0xdecafbee in the beginning then 0008. After such change MDT should start any new local files with 0x8 OID and mount successfully. Now it fails to mount because it tries to use 0x1 OID which is already used. Note also, you have to remove all previously manually created CONFIGS dir and its content at first. |
| Comment by Gerrit Updater [ 16/Dec/14 ] |
|
Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13081 |
| Comment by Christopher Morrone [ 09/Jan/15 ] |
|
The problem is not constrained to just the MDT. All of the OSTs have the same problem. Obviously a manual fix is not going to be reasonable. We will need Intel to provide either a code fix (preferred), or a script to update the file to the correct value. I looked at one of the OSTs, and the largest filename in the 0x200000003 sequence is 0x200000003:0x4:0x0. I didn't check any more, so I cannot say if that is the same on all OSTs. |
| Comment by Christopher Morrone [ 09/Jan/15 ] |
|
Also, please give a clearer explanation of when this filesystem damage occurred. You seem to claim that the problem originated in upgrades from 2.3 and earlier. But our filesystem were, to the best of our knowledge, all format at 2.4.0 or later. In other words, is your patch going to be any use to us when we upgrade all of the servers on our other major network (from 2.4.2+ to 2.5.3+)? Or are the servers that are damaged already damaged? |
| Comment by Mikhail Pershin [ 12/Jan/15 ] |
|
Christopher, the new lastid storage scheme was introduced in 2.4.0-RC1, so I suspect your MDS was formatted earlier that 2.4.0, can that be Orion filesystem? If server was formatted after 2.4.0-rc1 then it should upgrade cleanly. To be sure about OST let's check content of 0x200000003:0x0:0x0 file, it should contain the last used ID which is 0x4 or greater in your case, also there should not be seq-200000003-lastid file at all. Therefore, if there is seq-20000003-lastid file and Lustre version is greater than 2.4.0 then filesystem is damaged and my patch will not help, because it fixes only case of upgrade from non-damaged filesystem. If your OSTs are also damaged already then I will enhance patch to cover this case too. |
| Comment by Mikhail Pershin [ 13/Jan/15 ] |
|
I've just updated patch with more functionality to handle case of already damaged filesystem |
| Comment by Christopher Morrone [ 13/Jan/15 ] |
Christopher, the new lastid storage scheme was introduced in 2.4.0-RC1, so I suspect your MDS was formatted earlier that 2.4.0, can that be Orion filesystem? I am pretty certain that is was not orion based. I checked my email and compared some update announcements to local tag dates, and I think it is likely that we formatted them with a 2.3.6[345] based version. So that would explain why we saw this. |
| Comment by Christopher Morrone [ 15/Jan/15 ] |
|
I have pulled Patch Set 3 of change 13081 into our tree. Thanks! |
| Comment by Christopher Morrone [ 28/Apr/15 ] |
|
I believe that this is resolved. |
| Comment by Peter Jones [ 28/Apr/15 ] |
|
Great. Thanks Chris |