[LU-2237] Customer Entered incorrect parameter when enabling quotas. System is down. Created: 25/Oct/12  Updated: 16/Feb/13  Resolved: 16/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: Lustre 2.4.0, Lustre 2.1.4

Type: Bug Priority: Critical
Reporter: Roger Spellman (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre servers are running 2.6.32-220.el6, with Lustre 2.1.1.rc4.
Lustre clients are running 2.6.38.2, with special code created for this release, with http://review.whamcloud.com/#change,2170. (patch 8)


Attachments: File first-try-tunefs.lustre.log.2     File first-try-tunefs.lustre.log.3     File first-try-tunefs.lustre.log.4     File messages.01.after.updating.last_rcvd     File messages.01.gz     File messages.02     File messages.03     File messages.04     File mount.debug.after.updating.last_rcvd     File second-try-tunefs.lustre.log.1     File second-try-tunefs.lustre.log.3     File second-try-tunefs.lustre.log.4    
Severity: 1
Rank (Obsolete): 5295

 Description   

Customer's system is down. This is a very high priority issue.

Summary: The customer's file system was running fine, and he wanted to add quotas. He added them with an invalid option, and was unable to mount clients. We tried to fix the problem, and have not been able to.

Here are the details.

==================== STEP 1 ====================

Customer unmounted the file system, and ran the following command on each Lustre target. Note the addition of a dash before the letter u (i.e. the command should have said 'ug2', but it said '-ug2'.

  1. tunefs.lustre --param mdt.quota_type=-ug2 /dev/mapper/mapXX # Has a dash

Customer was able to mount the targets on the Lustre servers. However, he could not connect with a client.

==================== STEP 2 ====================

Once we noticed that the quota_type had the extra dash, we tried fixing the situation by running the command without the dash. The command we ran was:

  1. tunefs.lustre --param mdt.quota_type=ug2 /dev/mapper/mapXX # No dash

tunefs.lustre showed that BOTH parameters were present. I.e. on an OST, we saw:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

Read previous values:
Target: xxxxxx-OST0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x42
(OST update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=xx.yy.zz.244@tcp mgsnode=xx.yy.zz.241@tcp mgsnode=xx.yy.zz.242@tcp mdt.quota_type=-ug2 mdt.quota_type=ug2

And with tunefs.lustre on the MDT/MGS, we saw:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

Read previous values:
Target: xxxxxx-MDT0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x445
(MDT MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=xx.yy.zz.241@tcp failover.node=xx.yy.zz.242@tcp mdt.quota_type=-ug2 mdt.quota_type=ug2

Again, we could mount all the Lustre servers, but clients would not mount.

==================== STEP 3 ====================

Next, we thought that we could simply remove those parameters. So, for example, from the 2nd MDS (in an active-standby pair), customer ran:

  1. tunefs.lustre --erase-param --mgsnode=xx.yy.zz.242@tcp0 --failnode=xx.yy.zz.241@tcp0 --writeconf --fsname=xxxxxx /dev/mapper/map00

The command and the output can be seen in the file first-try-tunefs.lustre.log.2. But, I put it here, because it shows that the mdt.quota_type parameter seems to be gone now.

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

Read previous values:
Target: xxxxxx-MDT0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x405
(MDT MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=xx.yy.zz.241@tcp failover.node=xx.yy.zz.242@tcp mdt.quota_type=-ug2 mdt.quota_type=ug2

Permanent disk data:
Target: xxxxxx-MDT0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x545
(MDT MGS update writeconf )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=xx.yy.zz.242@tcp failover.node=xx.yy.zz.241@tcp

Writing CONFIGS/mountdata
RC=0
ALLRC=0

We ran similar commands on each OST, e.g.

  1. tunefs.lustre --erase-param --failnode=xx.yy.zz.244@tcp0 --mgsnode=xx.yy.zz.242@tcp0 --mgsnode=xx.yy.zz.241@tcp0 --writeconf --fsname=xxxxxx /dev/mapper/map00

The output from these commands can be found in first-try-tunefs.lustre.log.3 and first-try-tunefs.lustre.log.4 (3 & 4 are the two OSSes; each OSS has 6 targets).

At this point, customer was NOT ABLE to mount the MDT/MGS.

==================== STEP 4 ====================

So, we decided to run the command from the first MDS, as we normally format from there. So, we ran:

  1. tunefs.lustre --erase-param --mgsnode=xx.yy.zz.241@tcp0 --failnode=xx.yy.zz.242@tcp0 --writeconf --fsname=xxxxxx /dev/mapper/map00

The output of this command was:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

Read previous values:
Target: xxxxxx-MDT0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x405
(MDT MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=xx.yy.zz.242@tcp failover.node=xx.yy.zz.241@tcp

Permanent disk data:
Target: xxxxxx-MDT0000
Index: 0
Lustre FS: xxxxxx
Mount type: ldiskfs
Flags: 0x545
(MDT MGS update writeconf )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=xx.yy.zz.241@tcp failover.node=xx.yy.zz.242@tcp

Writing CONFIGS/mountdata
RC=0

We ran similar commands on each OST, e.g.

  1. tunefs.lustre --erase-param --failnode=xx.yy.zz.244@tcp0 --mgsnode=xx.yy.zz.241@tcp0 --mgsnode=xx.yy.zz.242@tcp0 --writeconf --fsname=xxxxxx /dev/mapper/map00

The output from these commands can be found in second-try-tunefs.lustre.log.3 and second-try-tunefs.lustre.log.4.

At this point, customer was STILL NOT ABLE to mount the MDT/MGS.

==================== STEP 5 ====================

Customer thought that the problem might be with MMP. So, customer removed and enabled MMP, i.e.

  1. tune2fs -O ^mmp /dev/mapper/map00
  2. tune2fs -O mmp /dev/mapper/map00

At this point, customer is STILL NOT ABLE to mount the MDT/MGS.

==================== LIST OF ATTACHED FILES ====================

messages.01.gz: gzipped /var/log/messages file from first MDS
messages.02: /var/log/messages file from second MDS
messages.03: /var/log/messages file from first OSS
messages.04: /var/log/messages file from second OSS

first-try-tunefs.lustre.log.2: tunefs command run from second MDS
first-try-tunefs.lustre.log.3: tunefs command run from first OSS
first-try-tunefs.lustre.log.4: tunefs command run from standby OSS

second-try-tunefs.lustre.log.1: tunefs command run from first MDS
second-try-tunefs.lustre.log.3: tunefs command run from first OSS
second-try-tunefs.lustre.log.4: tunefs command run from standby OSS

==================== LOGs from MDS ====================

The log messages in /var/log/messages on the first MDS are:

Oct 24 20:25:09 ts-xxxxxxxx-01 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts:
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts:
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: MGS MGS started
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: 6030:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGCxx.yy.zz.241@tcp->MGCxx.yy.zz.241@tcp_1 netid 90000: select flavor null
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: 6030:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 1 previous similar message
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: 6061:0:(ldlm_lib.c:877:target_handle_connect()) MGS: connection from 258ee8c7-213d-5dc4-fee0-c9990e7b9461@0@lo t0 exp (null) cur 1351110316 last 0
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: MGCxx.yy.zz.241@tcp: Reactivating import
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: Enabling ACL
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6065:0:(osd_handler.c:336:osd_iget()) no inode
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6065:0:(md_local_object.c:433:llo_local_objects_setup()) creating obj [last_rcvd] fid = [0x200000001:0xb:0x0] rc = -13
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6065:0:(mdt_handler.c:4577:mdt_init0()) Can't init device stack, rc -13
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6065:0:(obd_config.c:522:class_setup()) setup xxxxxx-MDT0000 failed (-13)
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6065:0:(obd_config.c:1361:class_config_llog_handler()) Err -13 on cfg command:
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: cmd=cf003 0:xxxxxx-MDT0000 1:xxxxxx-MDT0000_UUID 2:0 3:xxxxxx-MDT0000-mdtlov 4:f
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 15c-8: MGCxx.yy.zz.241@tcp: The configuration from log 'xxxxxx-MDT0000' failed (-13). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(obd_mount.c:1192:server_start_targets()) failed to start server xxxxxx-MDT0000: -13
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(obd_mount.c:1723:server_fill_super()) Unable to start targets: -13
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(obd_config.c:567:class_cleanup()) Device 3 not setup
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(ldlm_request.c:1172:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: Lustre: MGS has stopped.
Oct 24 20:25:16 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(ldlm_request.c:1799:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Oct 24 20:25:22 ts-xxxxxxxx-01 kernel: Lustre: 6030:0:(client.c:1779:ptlrpc_expire_one_request()) @@@ Request x1416737969930383 sent from MGCxx.yy.zz.241@tcp to NID 0@lo has timed out for slow reply: [sent 1351110316] [real_sent 1351110316] [current 1351110322] [deadline 6s] [delay 0s] req@ffff88060e66ec00 x1416737969930383/t0(0) o-1->MGS@MGCxx.yy.zz.241@tcp_1:26/25 lens 192/192 e 0 to 1 dl 1351110322 ref 2 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
Oct 24 20:25:22 ts-xxxxxxxx-01 kernel: Lustre: server umount xxxxxx-MDT0000 complete
Oct 24 20:25:22 ts-xxxxxxxx-01 kernel: LustreError: 6030:0:(obd_mount.c:2164:lustre_fill_super()) Unable to mount (-13)

The file /var/log/messages.01 is from the first MDS.
The file /var/log/messages.02 is from the second MDS.
The file /var/log/messages.03 is from the first OSS.
The file /var/log/messages.04 is from the second OSS.



 Comments   
Comment by Peter Jones [ 25/Oct/12 ]

Niu is looking into this

Comment by Niu Yawei (Inactive) [ 25/Oct/12 ]

Customer was able to mount the targets on the Lustre servers. However, he could not connect with a client.

"cound not connect with a client" means client can't be mounted? Is there any error message on client?

The proper "quota_type" should be mdd.quota_type but not mdt.quota_type, and we usually should use "lctl conf_param" on MGS to configure quota.

I'll look further into the server logs to see why the mds can't be mounted at the end.

Comment by Andreas Dilger [ 25/Oct/12 ]

Roger, it appears the current problem with the MDT is that the "last_rcvd" file is missing or inaccessible:

llo_local_objects_setup()) creating obj [last_rcvd] fid = [0x200000001:0xb:0x0] rc = -13

-13 == EACCESS. This should only ever be "0" for success, or -17 = -EEXIST (neither of which print an error message).

You need to verify that the "last_rcvd" file in the MDT filesystem exists and does not have a strange permission - I believe it should be 0666, but cannot log into a test system right now to check. In the worst case, it should be possible to mount the MDT filesystem with ldiskfs, rename the existing file out of the way, unmount the MDT, and a new one will be created when it is again mounted with lustre.

Comment by Roger Spellman (Inactive) [ 26/Oct/12 ]

Andreas,
I had the customer try this. last_rcvd is not present. Further, mounting the file system with -t lustre option still fails, and it DOES NOT create last_rcvd.

Here is the terminal session:

  1. mount -t ldiskfs /dev/mapper/map00 /mnt/mdt
  2. cd /mnt/mdt
  3. ls -l last_rcvd
    ls: cannot access last_rcvd: No such file or directory
  4. ls
    CATALOGS OBJECTS REM_OBJ_DIR capa_keys lost+found lquota_v2.group oi.16 seq_srv
    CONFIGS PENDING ROOT fld lov_objid lquota_v2.user seq_ctl
  5. ls lost+found/
  6. cd
  7. umount /mnt/mdt
  8. exit

Please advise on how to proceed.

Also, customer would like to know: Is it possible to back up the metadata information via ldiskfs since we were able to mount it that way? If so, is it possible to restore the metadata on a reformatted MDT? Is this a possible course of action?

Comment by Andreas Dilger [ 26/Oct/12 ]

Roger,
You could also try "touch last_rcvd" to recreate the file, I'm not sure why it is not being created, since this is done at first mount. Alternately (a bit more effort) format a new tiny test filesystem (using the same fsname on a different test node using loopback for safety) and copy last_rcvd over. There isn't very much useful data in this file except the client recovery data after a crash.

Unfortunately, in Lustre 2.1 it is NOT possible to do a file-level backup/restore of the MDT. This functionality is not restored until the 2.3 feature release. Only block device backups are supported for 2.1.

Comment by Andreas Dilger [ 26/Oct/12 ]

I'm just getting on a plane, but if the previous suggestions do not work, please set full debugging (lctl set_param debug=-1" and then try mounting the MDS, and dump the debug log and attach it here. Hopefully someone will get a chance to loom at this while I'm flying.

Comment by Roger Spellman (Inactive) [ 26/Oct/12 ]

messages file after updating last_rcvd

Comment by Roger Spellman (Inactive) [ 26/Oct/12 ]

debug file after updating last_rcvd

Comment by Roger Spellman (Inactive) [ 26/Oct/12 ]

I just attached two files from the customer. Here is his response:

Andreas' suggestions did not work. It is now returning ESTALE. I tried creating an empty last_rcvd file and that caused ESTALE when mounting. I then created a new filesystem using loopback on another machine with the same file system name, pulled last_rcvd from it and continued to see ESTALE.

Comment by Andreas Dilger [ 26/Oct/12 ]

Alex or Fan Yong, could you please take a look into this? It seems there is some strangeness happening with loading the last_rcvd file. The priority is to get the system up and working, then later to figure out how to avoid such a problem in the future.

Comment by nasf (Inactive) [ 27/Oct/12 ]

We do not know the reason why the local file "last_recv" was lost, but we know why it prevented the MDT to be mounted. Because the FID mapping for the "last_recv" was added into the OI file. Although the "last_recv" file was lost, but its FID mapping still existed in the OI file. Then the OI lookup with the FID (for the "last_recv") can find the staled inode, which not only prevented the new "last_recv" to be created, but also failed to re-use some existing "last_recv".

Such issue has been fixed by OI scrub in Lustre-2.3, and related patch will be back ported to Lustre-2.1 by Lsy. Currently, the back porting is not finished yet. This patch is a temporary solution to allow the customer to mount their system.

http://review.whamcloud.com/#change,4395

Comment by nasf (Inactive) [ 27/Oct/12 ]

Roger, would you please to ask the customer to try above patch, and remount the MDT. It works locally for myself. Thanks!

Comment by Andreas Dilger [ 27/Oct/12 ]

Fan Yong, thank you for the patch. Something similar is already included in Lustre 2.3, correct?

Roger, there are RPMs for download at http://build.whamcloud.com/job/lustre-reviews/10115/ but they are against the tip of b2_1 (2.1.3+), while the customer is running 2.1.1 according to the bug. It will still be about 5h until the regression testing on Fan Yong's patch completes, but the first 3h of testing has passed, and I'd expect any change like this to show up problems fairly quickly.

Comment by nasf (Inactive) [ 27/Oct/12 ]

Andreas, I do not know why the original "last_recv" file was lost. But it is known why the "last_recv" cannot be re-created, and why failed to re-use some existing "last_recv" file. The known issues have been fixed in OI scrub patches, and included into the Lustre-2.3 already. So as long as we back port OI scrub to Lustre-2.1, then it can fix them completely. Lsy is working on the back porting, but it is not finished yet.

Comment by nasf (Inactive) [ 28/Oct/12 ]

This is the patch for new test cases for that:

http://review.whamcloud.com/#change,4397

Comment by Roger Spellman (Inactive) [ 28/Oct/12 ]

I am getting errors running git.

$ git fetch http://review.whamcloud.com/p/fs/lustre-release refs/changes/95/4395/2 && git checkout FETCH_HEAD
error: The requested URL returned error: 403
fatal: Couldn't find remote ref refs/changes/95/4395/2

Do I need to be on a certain branch? I have tried both the head and 2.1.1-RC4. Can you please post an exact set of commands to get this patch?

Comment by Andreas Dilger [ 28/Oct/12 ]

The following commands check out a clean b2_1 branch, change to the 2.1.1 release branch (which is identical to 2.1.1-RC4), then applies just the patch from change 4395. The first two lines can be skipped if you already have a local clone of the lustre "master" or "b2_1" branch.

git clone -b b2_1 http://git.whamcloud.com/fs/lustre-release.git lustre-2_1
cd lustre-2_1
git checkout 2.1.1
git fetch http://review.whamcloud.com/p/fs/lustre-release refs/changes/95/4395/2 && git cherry-pick FETCH_HEAD
Comment by Roger Spellman (Inactive) [ 29/Oct/12 ]

Thanks. I tested this on my system by removing last_rcvd. I was unable to mount with the existing 2.1.1.rc4 code. I was able to mount with the new code. I will be giving this to the customer

Comment by nasf (Inactive) [ 01/Nov/12 ]

Roger, any feedback from the customer?

Comment by Roger Spellman (Inactive) [ 02/Nov/12 ]

The customer reported that he is back up and running, but without quotas.

We would like to know if quotas are supported in this release (2.1.1.rc4). Can we use ug2? I.e.

lctl conf_param mdd.xxxxx-MDT0000.quota_type=ug2

Comment by Peter Jones [ 02/Nov/12 ]

Roger

Quotas are certainly supported in the 2.1.1 release itself and used in production at a number of sites. There may well be some added complications due to the newer kernel you are running for clients, but I will defer to an engineer to ascertain whether that is the case and to comment on the specific syntax you mention

Peter

Comment by Niu Yawei (Inactive) [ 02/Nov/12 ]

Hi, Roger

You should use ug3. lctl conf_param xxxx-MDT0000.mdd.quota_type=ug3, and lctl conf_param xxxx-OSTxxxx.ost.quota_type=ug3.

Comment by Andreas Dilger [ 02/Nov/12 ]

Niu, I think that part is clear. What is worthwhile to investigate is why this caused the MDS to explode, and how to prevent that in the future. Perhaps having the quota code ignore unknown quota types ("-" in this case)?

Comment by Peter Jones [ 22/Dec/12 ]

Fanyong

Fixes have been landed for 2.1.4 for this issue. Are equivalent fixes needed on master?

Peter

Comment by nasf (Inactive) [ 29/Dec/12 ]

There are two test related patches to be landed:

For master:
http://review.whamcloud.com/#change,4455

For b2_1:
http://review.whamcloud.com/#change,4456

Comment by nasf (Inactive) [ 17/Jan/13 ]

When the patches for b2_1 can be landed?

Generated at Sat Feb 10 01:23:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.