[LU-5245] sanity-quota test_1: user write success, but expect EDQUOT Created: 23/Jun/14  Updated: 20/Nov/18  Resolved: 20/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Incomplete Votes: 0
Labels: None

Attachments: Zip Archive bjm900_lustre_2.zip    
Issue Links:
Related
is related to LU-9326 sanity-quota test_2: group create suc... Open
is related to LU-11678 sanity-quota test 1 fails with 'user ... Resolved
Severity: 3
Rank (Obsolete): 14627

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/540fb674-fad7-11e3-b499-52540035b04c.

The sub-test test_1 failed with the following error:

Total allocated inode limit: 0, total allocated block limit: 0
Files for user (quota_usr):
File: `/mnt/lustre/d1.sanity-quota/f1.sanity-quota-0'
Size: 11534336 Blocks: 22528 IO Block: 4194304 regular file
Device: 2c54f966h/743766374d Inode: 288230930219270255 Links: 1
Access: (0644/rw-rr-) Uid: (60000/quota_usr) Gid: (60000/quota_usr)
Access: 2014-06-23 00:18:35.000000000 -0700
Modify: 2014-06-23 00:18:36.000000000 -0700
Change: 2014-06-23 00:18:36.000000000 -0700
sanity-quota test_1: @@@@@@ FAIL: user write success, but expect EDQUOT
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4528:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:4559:error()
= /usr/lib64/lustre/tests/sanity-quota.sh:154:quota_error()
= /usr/lib64/lustre/tests/sanity-quota.sh:440:test_1()
= /usr/lib64/lustre/tests/test-framework.sh:4820:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:4855:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4708:run_test()
= /usr/lib64/lustre/tests/sanity-quota.sh:483:main()
user write success, but expect EDQUOT

Info required for matching: sanity-quota 1



 Comments   
Comment by Jodi Levi (Inactive) [ 23/Jun/14 ]

Niu,
Can you please comment on this one?
Thank you!

Comment by Niu Yawei (Inactive) [ 24/Jun/14 ]

01:18:26:== sanity-quota test 1: Block hard limit (normal use and out of quota) == 00:18:26 (1403507906)
01:18:28:Waiting for local destroys to complete
01:18:29:Creating test directory
01:18:29:CMD: shadow-47vm3,shadow-47vm4,shadow-47vm7 lctl set_param fail_val=0 fail_loc=0
01:18:29:fail_val=0
01:18:29:fail_loc=0
01:18:29:fail_val=0
01:18:29:fail_loc=0
01:18:29:fail_val=0
01:18:29:fail_loc=0
01:18:29:CMD: shadow-47vm3 /usr/sbin/lctl conf_param lustre.quota.ost=ug
01:18:29:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:29:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
01:18:29:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
01:18:29:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:30:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0001.quota_slave.enabled
01:18:30:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0001.quota_slave.enabled
01:18:30:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:30:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0002.quota_slave.enabled
01:18:31:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0002.quota_slave.enabled
01:18:32:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:32:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0003.quota_slave.enabled
01:18:32:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0003.quota_slave.enabled
01:18:32:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:32:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0004.quota_slave.enabled
01:18:33:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0004.quota_slave.enabled
01:18:33:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:33:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0005.quota_slave.enabled
01:18:33:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0005.quota_slave.enabled
01:18:34:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:34:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0006.quota_slave.enabled
01:18:34:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0006.quota_slave.enabled
01:18:34:CMD: shadow-47vm4 /usr/sbin/lctl dl
01:18:34:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0007.quota_slave.enabled
01:18:35:CMD: shadow-47vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0007.quota_slave.enabled
01:18:35:User quota (block hardlimit:10 MB)
01:18:35:Write...
01:18:36:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
01:18:36: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d1.sanity-quota/f1.sanity-quota-0] [count=5]
01:18:36:5+0 records in
01:18:36:5+0 records out
01:18:36:5242880 bytes (5.2 MB) copied, 0.117616 s, 44.6 MB/s
01:18:36:Write out of block quota ...
01:18:36:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
01:18:36: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d1.sanity-quota/f1.sanity-quota-0] [count=5] [seek=5]
01:18:36:5+0 records in
01:18:36:5+0 records out
01:18:36:5242880 bytes (5.2 MB) copied, 0.0293611 s, 179 MB/s
01:18:36:running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
01:18:36: [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d1.sanity-quota/f1.sanity-quota-0] [count=1] [seek=10]
01:18:36:1+0 records in
01:18:36:1+0 records out
01:18:36:1048576 bytes (1.0 MB) copied, 0.0343534 s, 30.5 MB/s
01:18:36:Disk quotas for user quota_usr (uid 60000):
01:18:36: Filesystem kbytes quota limit grace files quota limit grace
01:18:36: /mnt/lustre 11264* 0 10240 - 1 0 0 -
01:18:36:lustre-MDT0000_UUID
01:18:36: 0 - 0 - 1 - 0 -
01:18:36:lustre-MDT0001_UUID
01:18:36: 0 - 0 - 0 - 0 -
01:18:36:lustre-MDT0002_UUID
01:18:36: 0 - 0 - 0 - 0 -
01:18:36:lustre-MDT0003_UUID
01:18:36: 0 - 0 - 0 - 0 -
01:18:36:lustre-OST0000_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0001_UUID
01:18:36: 11264 - 0 - - - - -
01:18:36:lustre-OST0002_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0003_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0004_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0005_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0006_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:lustre-OST0007_UUID
01:18:36: 0 - 0 - - - - -
01:18:36:Total allocated inode limit: 0, total allocated block limit: 0

quota wasn't enforced on OSTs, looks wait_update() was broken somehow? I didn't see the message like "Updated after..." or "Update not seen after..." on console message.
The OSTs logs are missing in maloo too.

Comment by Di Wang [ 24/Jun/14 ]

I retriggered the test, unfortunately, this can not be reproduced. But it seems the failure rate of this test is pretty high

Error: 'user write success, but expect EDQUOT'
Failure Rate: 43.00% of last 100 executions [all branches]

So could you please search the maloo test result? Thanks.

Comment by Li Xi (Inactive) [ 06/Aug/14 ]

We saw similar problem on a production system. A couple of users found their files are truncated when they copied new files to lustre. And finially, they reproduced the problem using following script:

echo a > testfile1 && echo b >> testfile1 && cat testfile1

And the output of the script is always 'a\n' for these users. And the output of 'ls -l' shows that the file size is 2. However, we loged in another node, and run 'ls -l'. We found that the size of the 'testfile1' is actually 4 and the content of it is 'a\nb\n', which means the data has been written correctly. And please note root users do not have such kind of problem.

We traced the operation, and found following logs.

00000008:00000001:0.0:1407228580.717510:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.717514:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.717515:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.717516:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.717516:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.717517:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000080:00000001:0.0:1407228580.742691:0:29229:0:(xattr.c:321:ll_getxattr_common()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)
00000008:00000001:0.0:1407228580.742728:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.742731:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.742732:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.742733:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.742734:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.742734:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757374:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757377:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757378:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.757379:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.757379:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.757380:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)

It seems osc_quota_chkdq() returns NO_QUOTA. And +quota log is:

00000001:04000000:0.0F:1407230285.413534:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:9.0F:1407230285.435331:0:1869:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
00000001:04000000:0.0:1407230285.435884:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:14.0F:1407230285.452187:0:1871:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
00000001:04000000:0.0:1407230285.455988:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:3.0F:1407230285.519352:0:1875:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100

It seems that Lustre thinks this user has no quota. However, first, the users definitely had not reached their space limits, and second 'echo >>' should return failure if the user's quota is exceeded.

After we disable quota enforcement on OSTs, this problem is gone.

Comment by Li Xi (Inactive) [ 06/Aug/14 ]

Here is the trace log.

Comment by Niu Yawei (Inactive) [ 06/Aug/14 ]

It seems that Lustre thinks this user has no quota. However, first, the users definitely had not reached their space limits, and second 'echo >>' should return failure if the user's quota is exceeded.

osc_quota_chkdq() returns NO_QUOTA doesn't necessarily mean the user is already running out of quota, when the used space approaching quota limit, this function will return NO_QUOTA to inform client to turn the async write to sync write. Could you provide the output of 'lfs quota -v"? Thanks.

And the output of the script is always 'a\n' for these users. And the output of 'ls -l' shows that the file size is 2. However, we loged in another node, and run 'ls -l'. We found that the size of the 'testfile1' is actually 4 and the content of it is 'a\nb\n', which means the data has been written correctly. And please note root users do not have such kind of problem.

I guess that's not a quota problem, we'd collect logs (on both clients and OST, better with D_TRACE enabled and some debug marker recording where the 'ls' start/end) of 'ls' to see why the file size is inconsistent on different client.

Comment by Li Xi (Inactive) [ 06/Aug/14 ]

Here it goes. Please note that we are using Lustre-2.4.x for servers and Lustre-2.5.x for client.

[12:26:53 root@r7:~] # lfs quota -v -u bjm900 /home
Disk quotas for user bjm900 (uid 5800):
Filesystem kbytes quota limit grace files quota limit grace
/home 1599228 104857600 104857600 - 59224 1000000 1000000 -
homsys-MDT0000_UUID
13428 - 0 - 59224 - 64206 -
homsys-OST0000_UUID
21352 - 22376 - - - - -
homsys-OST0001_UUID
16488 - 17512 - - - - -
homsys-OST0002_UUID
12920 - 13576 - - - - -
homsys-OST0003_UUID
23704 - 24220 - - - - -
homsys-OST0004_UUID
17864 - 18888 - - - - -
homsys-OST0005_UUID
27436 - 28160 - - - - -
homsys-OST0006_UUID
12508 - 13532 - - - - -
homsys-OST0007_UUID
20476 - 21496 - - - - -
homsys-OST0008_UUID
11136 - 12156 - - - - -
homsys-OST0009_UUID
21872 - 22896 - - - - -
homsys-OST000a_UUID
13408 - 14432 - - - - -
homsys-OST000b_UUID
15312 - 16336 - - - - -
homsys-OST000c_UUID
39516 - 40536 - - - - -
homsys-OST000d_UUID
21108 - 22132 - - - - -
homsys-OST000e_UUID
17880 - 18904 - - - - -
homsys-OST000f_UUID
24440 - 25464 - - - - -
homsys-OST0010_UUID
18652 - 19676 - - - - -
homsys-OST0011_UUID
36456 - 37476 - - - - -
homsys-OST0012_UUID
17332 - 17864 - - - - -
homsys-OST0013_UUID
28272 - 29296 - - - - -
homsys-OST0014_UUID
32920 - 33944 - - - - -
homsys-OST0015_UUID
21708 - 22728 - - - - -
homsys-OST0016_UUID
21928 - 22952 - - - - -
homsys-OST0017_UUID
15104 - 15872 - - - - -
homsys-OST0018_UUID
18360 - 19384 - - - - -
homsys-OST0019_UUID
22288 - 23304 - - - - -
homsys-OST001a_UUID
11524 - 12548 - - - - -
homsys-OST001b_UUID
23016 - 24040 - - - - -
homsys-OST001c_UUID
14044 - 15068 - - - - -
homsys-OST001d_UUID
16692 - 17716 - - - - -
homsys-OST001e_UUID
39124 - 40148 - - - - -
homsys-OST001f_UUID
13484 - 14012 - - - - -
homsys-OST0020_UUID
11500 - 12524 - - - - -
homsys-OST0021_UUID
12004 - 13028 - - - - -
homsys-OST0022_UUID
26332 - 27356 - - - - -
homsys-OST0023_UUID
13896 - 14920 - - - - -
homsys-OST0024_UUID
17100 - 18120 - - - - -
homsys-OST0025_UUID
27388 - 28412 - - - - -
homsys-OST0026_UUID
10800 - 11824 - - - - -
homsys-OST0027_UUID
25572 - 26596 - - - - -
homsys-OST0028_UUID
23144 - 24064 - - - - -
homsys-OST0029_UUID
13700 - 14552 - - - - -
homsys-OST002a_UUID
21748 - 22772 - - - - -
homsys-OST002b_UUID
21800 - 22824 - - - - -
homsys-OST002c_UUID
16600 - 17624 - - - - -
homsys-OST002d_UUID
12224 - 13248 - - - - -
homsys-OST002e_UUID
12796 - 13820 - - - - -
homsys-OST002f_UUID
10436 - 11460 - - - - -
homsys-OST0030_UUID
24940 - 25960 - - - - -
homsys-OST0031_UUID
13820 - 14844 - - - - -
homsys-OST0032_UUID
10276 - 11296 - - - - -
homsys-OST0033_UUID
14324 - 14856 - - - - -
homsys-OST0034_UUID
11168 - 11776 - - - - -
homsys-OST0035_UUID
17876 - 18900 - - - - -
homsys-OST0036_UUID
14740 - 15764 - - - - -
homsys-OST0037_UUID
24764 - 25788 - - - - -
homsys-OST0038_UUID
17848 - 18868 - - - - -
homsys-OST0039_UUID
15164 - 15720 - - - - -
homsys-OST003a_UUID
18736 - 19760 - - - - -
homsys-OST003b_UUID
14476 - 15500 - - - - -
homsys-OST003c_UUID
4024 - 5048 - - - - -
homsys-OST003d_UUID
13588 - 14612 - - - - -
homsys-OST003e_UUID
13576 - 14600 - - - - -
homsys-OST003f_UUID
26372 - 27396 - - - - -
homsys-OST0040_UUID
50380 - 51404 - - - - -
homsys-OST0041_UUID
24796 - 25816 - - - - -
homsys-OST0042_UUID
24176 - 25196 - - - - -
homsys-OST0043_UUID
12776 - 13800 - - - - -
homsys-OST0044_UUID
13444 - 14468 - - - - -
homsys-OST0045_UUID
23492 - 24476 - - - - -
homsys-OST0046_UUID
11412 - 12436 - - - - -
homsys-OST0047_UUID
14552 - 15576 - - - - -
homsys-OST0048_UUID
19140 - 19664 - - - - -
homsys-OST0049_UUID
12384 - 13408 - - - - -
homsys-OST004a_UUID
29392 - 30416 - - - - -
homsys-OST004b_UUID
40412 - 41436 - - - - -
homsys-OST004c_UUID
52872 - 53896 - - - - -
homsys-OST004d_UUID
29372 - 30396 - - - - -
homsys-OST004e_UUID
13144 - 14164 - - - - -
homsys-OST004f_UUID
13000 - 14024 - - - - -
Total allocated inode limit: 64206, total allocated block limit: 1663052

Comment by Niu Yawei (Inactive) [ 06/Aug/14 ]

Looks the used space is far below the quota limit, so we'd figure out if it really write to OST first:

  • If write on OST failed with EDQUOT either, this could be LU-4505, collecting log on OST could be helpful. (for file write, with D_QUOTA enabled)
  • If write on OST succeeded, we'd collect log to see why this client can't read it (as I mentioned in previous comment), and try the same test on another clean client to see if it can be reproduced.

Anyway, I don't think this is related to current ticket, could you create a new ticket for this problem? Thanks.

Comment by Kit Westneat [ 07/Jul/15 ]

This test result looks somewhat similar:
https://testing.hpdd.intel.com/test_sets/08918008-2437-11e5-b6b4-5254006e85c2

Comment by Emoly Liu [ 10/Jul/15 ]

I hit the similar issue: https://testing.hpdd.intel.com/test_sets/f878df9a-2661-11e5-8b33-5254006e85c2

Comment by Niu Yawei (Inactive) [ 10/Jul/15 ]

All the failures were because quota slave didn't connect to master, so that quota wasn't enforced on slave.

00000100:00100000:0.0:1436439856.246154:0:23692:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-10.1.4.48@tcp, seq: 852
00000100:00100000:0.0:1436439856.246163:0:23692:0:(service.c:2076:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_004:0+-99:3137:x1505240840316480:12345-10.1.4.48@tcp:38
00010000:02000400:0.0:1436439856.246191:0:23692:0:(ldlm_lib.c:1026:target_handle_connect()) lustre-MDT0000: Received LWP connection from 10.1.4.48@tcp, removing former export from 10.1.4.52@tcp
00000020:00080000:0.0:1436439856.246195:0:23692:0:(genops.c:1382:class_fail_export()) disconnecting export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
00000020:00000080:0.0:1436439856.246204:0:23692:0:(genops.c:1215:class_disconnect()) disconnect: cookie 0x515596b0134eb7f7
00000100:00080000:0.0:1436439856.246211:0:23692:0:(import.c:1601:ptlrpc_cleanup_imp()) ffff880063495000 PA#}: changing import state from FULL to CLOSED
00000020:00080000:0.0:1436439856.246261:0:23692:0:(genops.c:1399:class_fail_export()) disconnected export ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
00000020:00000080:0.0:1436439856.246263:0:23692:0:(genops.c:814:class_export_put()) final put ffff8800798e9400/lustre-MDT0000-lwp-OST0003_UUID
00010000:00080000:0.0:1436439856.246290:0:23692:0:(ldlm_lib.c:1092:target_handle_connect()) lustre-MDT0000: connection from lustre-MDT0000-lwp-OST0003_UUID@10.1.4.48@tcp t0 exp (null) cur 1436439856 last 0

I see from the log that OST connected to master with different IP address during testing, do we failover OSTs while testing?

Comment by James Nunez (Inactive) [ 01/Sep/15 ]

I think we hit this issue again on PPC builds. Logs at:
2015-09-01 03:24:28 - https://testing.hpdd.intel.com/test_sets/b1d82b30-508d-11e5-95a9-5254006e85c2

Comment by James Nunez (Inactive) [ 19/Nov/15 ]

Another instance on master:
2015-11-15 14:01:26 - https://testing.hpdd.intel.com/test_sets/9e1632c2-8bbb-11e5-9933-5254006e85c2
2015-11-18 16:20:31 - https://testing.hpdd.intel.com/test_sets/cf90e830-8e25-11e5-8da8-5254006e85c2
2015-11-18 21:25:31 - https://testing.hpdd.intel.com/test_sets/1ce03240-8e52-11e5-8da8-5254006e85c2

Comment by nasf (Inactive) [ 26/May/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/fedc1a0c-4219-11e7-b558-5254006e85c2

Comment by Bob Glossman (Inactive) [ 05/Jul/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/a4bae786-61c9-11e7-9230-5254006e85c2

Comment by Bob Glossman (Inactive) [ 12/Jun/18 ]

another on master:
https://testing.hpdd.intel.com/test_sets/772bbf04-6dec-11e8-a522-52540065bddc

Comment by Andreas Dilger [ 20/Nov/18 ]

Closing this old issue, use LU-11678 for new issue with same symptom.

Generated at Sat Feb 10 01:49:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.