[LU-2548] After upgrade from 1.8.8 to 2.4 hit qmt_entry.c:281:qmt_glb_write()) $$$ failed to update global index, rc:-5 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
- HB
Environment:
before upgrade: client and server are running 1.8.8
after upgrade: client and server are running lustre-master build#1141

Severity:
3
Rank (Obsolete):
5972

Description

After clean upgrade server and client from 1.8.8 to 2.4, I enabled quota with following steps:
1. before setup Lustre: tunefs.lustre --quota mdsdev/ostdev
2. after setup Lustre: lctl conf_param lustre.quota.mdt=ug
lctl conf_param lustre.quota.ost=ug

then do iozone got this error:

upgrade-downgrade : @@@@@@ FAIL: iozone did not fail with EDQUOT
{noforamt}

found errors in mds dmesg:

Lustre: DEBUG MARKER: ===== Pass ==================================================================
Lustre: DEBUG MARKER: ===== Check Lustre quotas usage/limits ======================================
Lustre: DEBUG MARKER: ===== Verify the data =======================================================
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
LDISKFS-fs warning (device sdb1): ldiskfs_block_to_path: block 1852143205 > max in inode 24537
LustreError: 7867:0:(qmt_entry.c:281:qmt_glb_write()) $$$ failed to update global index, rc:-5 qmt:lustre-QMT0000 pool:0-md id:60001 enforced:1 hard:5120 soft:0 granted:1024 time:0 qunit:1024 edquot:0 may_rel:0 revoke:4297684387
LustreError: 10848:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -5, flags:0x1 qsd:lustre-MDT0000 qtype:usr id:60001 enforced:1 granted:3 pending:0 waiting:2 req:1 usage:3 qunit:0 qtune:0 edquot:0
Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: iozone did not fail with EDQUOT
LDISKFS-fs warning (device sdb1): ldiskfs_block_to_path:
LDISKFS-fs warning (device sdb1): ldiskfs_block_to_path: block 1852143205 > max in inode 24537
LustreError: 10877:0:(qmt_entry.c:281:qmt_glb_write()) $$$ failed to update global index, rc:-5 qmt:lustre-QMT0000 pool:0-md id:60001 enforced:1 hard:5120 soft:0 granted:1026 time:0 qunit:1024 edquot:0 may_rel:0 revoke:4297684387
LustreError: 7577:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -5, flags:0x2 qsd:lustre-MDT0000 qtype:usr id:60001 enforced:1 granted:3 pending:0 waiting:0 req:1 usage:2 qunit:1024 qtune:512 edquot:0
LDISKFS-fs warning (device sdb1): ldiskfs_block_to_path: block 1852143205 > max in inode 24537
LDISKFS-fs warning (device sdb1): ldiskfs_block_to_path: block 1852143205 > max in inode 24537
block 1768711539 > max in inode 24538

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

debug_amd-1.tar.gz
1.88 MB
03/Jan/13 1:11 AM
dmesg
65 kB
03/Jan/13 1:11 AM
upgrade-downgrade..debug_log.fat-amd-1.1357602330.log
121 kB
07/Jan/13 7:05 PM
upgrade-downgrade..dmesg.fat-amd-1.1357602330.log
65 kB
07/Jan/13 7:05 PM

Issue Links

is related to

LU-2688 add quota upgrade checks to conf_sanity.sh test_32

Resolved

Activity

[LU-2548] After upgrade from 1.8.8 to 2.4 hit qmt_entry.c:281:qmt_glb_write()) $$$ failed to update global index, rc:-5

Niu Yawei (Inactive) added a comment - 08/Feb/13 3:56 AM

Well, I realize that the orignal iam index truncation is not quite right, the iam container wasn't reinitialized after truncation. I've update the patch 5292, the new patch works for me, Sarah could you verify if it fix your problem? Thanks.

Niu Yawei (Inactive) added a comment - 08/Feb/13 3:56 AM Well, I realize that the orignal iam index truncation is not quite right, the iam container wasn't reinitialized after truncation. I've update the patch 5292, the new patch works for me, Sarah could you verify if it fix your problem? Thanks.

Sarah Liu added a comment - 07/Feb/13 2:24 AM

Sure, will get back to you when I have the result

Sarah Liu added a comment - 07/Feb/13 2:24 AM Sure, will get back to you when I have the result

Niu Yawei (Inactive) added a comment - 07/Feb/13 2:05 AM

My test shows the global index truncation before the migration will lead to the IAM error, to not block other 1.8 upgrading tests, I've posted a temporary fix (skip the index truncation during migration) for it. http://review.whamcloud.com/5292

Sarah, could you try if above patch works for you too? Thanks.

Niu Yawei (Inactive) added a comment - 07/Feb/13 2:05 AM My test shows the global index truncation before the migration will lead to the IAM error, to not block other 1.8 upgrading tests, I've posted a temporary fix (skip the index truncation during migration) for it. http://review.whamcloud.com/5292 Sarah, could you try if above patch works for you too? Thanks.

Niu Yawei (Inactive) added a comment - 06/Feb/13 9:19 AM

I can reproduce the original problem in my local environment now, seems like something wrong in IAM when upgrading from 1.8 to 2.4 (2.1 -> 2.4 is fine), will look into it closer.

Niu Yawei (Inactive) added a comment - 06/Feb/13 9:19 AM I can reproduce the original problem in my local environment now, seems like something wrong in IAM when upgrading from 1.8 to 2.4 (2.1 -> 2.4 is fine), will look into it closer.

Niu Yawei (Inactive) added a comment - 04/Feb/13 2:15 AM

don't apply migration on global index copy: http://review.whamcloud.com/5259

Actually, I still don't quite sure the reason of qmt_glb_write() failed, but at least, we shound't do migration on the global index copy.

Niu Yawei (Inactive) added a comment - 04/Feb/13 2:15 AM don't apply migration on global index copy: http://review.whamcloud.com/5259 Actually, I still don't quite sure the reason of qmt_glb_write() failed, but at least, we shound't do migration on the global index copy.

Niu Yawei (Inactive) added a comment - 04/Feb/13 1:30 AM

I see, those message should come from the global index copy of the quota slave on MDT, migration should not apply to those global index copy. The failure of "qmt_glb_write()) $$$ failed to update global index, rc:-5" could probably caused by the race of migration with usual global index copy update. I'll post a pach to fix this.

Niu Yawei (Inactive) added a comment - 04/Feb/13 1:30 AM I see, those message should come from the global index copy of the quota slave on MDT, migration should not apply to those global index copy. The failure of "qmt_glb_write()) $$$ failed to update global index, rc:-5" could probably caused by the race of migration with usual global index copy update. I'll post a pach to fix this.

Niu Yawei (Inactive) added a comment - 01/Feb/13 4:41 AM

I found something really weird in the dmesg (1.8 upgrade to 2.4):

Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.usr) to new IAM quota index([0x200000006:0x10000:0x0]).
Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.grp) to new IAM quota index([0x200000006:0x1010000:0x0]).
Lustre: 31664:0:(mdt_handler.c:5261:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete.
Lustre: 31664:0:(mdt_handler.c:5261:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.usr) to new IAM quota index([0x200000003:0x8:0x0]).
Lustre: Skipped 2 previous similar messages

It says MDT is trying to migrate inode user quota into fid [0x200000003:0x8:0x0], which isn't a quota global index fid. I can't see why this could happen from the code, and I can't reproduce it locally neither.

Sarah, could you show me how did you reproduce it? If it's reproduceable, could you capature the log with DQUOTA & D_TRACE enabled for the MDT startup procesure only? (start mdt on the old 1.8 device) The startup log was truncated in your attached logs. Thanks in advance.

Niu Yawei (Inactive) added a comment - 01/Feb/13 4:41 AM I found something really weird in the dmesg (1.8 upgrade to 2.4): Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.usr) to new IAM quota index([0x200000006:0x10000:0x0]). Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.grp) to new IAM quota index([0x200000006:0x1010000:0x0]). Lustre: 31664:0:(mdt_handler.c:5261:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete. Lustre: 31664:0:(mdt_handler.c:5261:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete. Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11 Lustre: lustre-MDT0000: Migrate inode quota from old admin quota file(admin_quotafile_v2.usr) to new IAM quota index([0x200000003:0x8:0x0]). Lustre: Skipped 2 previous similar messages It says MDT is trying to migrate inode user quota into fid [0x200000003:0x8:0x0] , which isn't a quota global index fid. I can't see why this could happen from the code, and I can't reproduce it locally neither. Sarah, could you show me how did you reproduce it? If it's reproduceable, could you capature the log with DQUOTA & D_TRACE enabled for the MDT startup procesure only? (start mdt on the old 1.8 device) The startup log was truncated in your attached logs. Thanks in advance.

Sarah Liu added a comment - 08/Jan/13 4:22 AM

upgrade from 2.1.4 to 2.4 hit ~~LU-2587~~

Sarah Liu added a comment - 08/Jan/13 4:22 AM upgrade from 2.1.4 to 2.4 hit LU-2587

Sarah Liu added a comment - 07/Jan/13 7:05 PM

MDS dmesg and debug logs of 1.8->2.4

Sarah Liu added a comment - 07/Jan/13 7:05 PM MDS dmesg and debug logs of 1.8->2.4

Sarah Liu added a comment - 07/Jan/13 7:03 PM

Niu, I tried upgrading 1.8->2.4 again and it can be reproduced.

Sarah Liu added a comment - 07/Jan/13 7:03 PM Niu, I tried upgrading 1.8->2.4 again and it can be reproduced.

Sarah Liu added a comment - 04/Jan/13 4:43 PM

Niu, this time I upgraded to the latest tag-2.3.58, that's a different build from the first time.

I will keep you updated when I finish upgrading from 2.1 to 2.4 and try again 1.8 to 2.4 to see if it happens every time.

Sarah Liu added a comment - 04/Jan/13 4:43 PM Niu, this time I upgraded to the latest tag-2.3.58, that's a different build from the first time. I will keep you updated when I finish upgrading from 2.1 to 2.4 and try again 1.8 to 2.4 to see if it happens every time.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Sarah Liu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 28/Dec/12 7:38 PM

Updated:: 27/Feb/13 3:05 AM

Resolved:: 27/Feb/13 3:05 AM