Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0, Lustre 2.7.0
-
servers are running Lustre-2.4.2, and clients are running b_ieel2_0.
-
3
-
15220
Description
A couple of users found their files are truncated when they copied new files to lustre. And they reproduced the problem using following script:
echo a > testfile1 && echo b >> testfile1 && cat testfile1
The output of the script is always 'a\n' for these users. And the output of 'ls -l' shows that the file size is 2. However, on another node, 'ls -l' shew that the size of the 'testfile1' is actually 4 and the content of it is 'a\nb\n', which means the data has been written onto disk correctly. And please note root users and other users do not have such kind of problem.
We traced the operation, and found following logs.
00000008:00000001:0.0:1407228580.717510:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.717514:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.717515:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.717516:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.717516:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.717517:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000080:00000001:0.0:1407228580.742691:0:29229:0:(xattr.c:321:ll_getxattr_common()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)
00000008:00000001:0.0:1407228580.742728:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.742731:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.742732:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.742733:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.742734:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.742734:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757374:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757377:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000008:00000001:0.0:1407228580.757378:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.757379:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00020000:00000001:0.0:1407228580.757379:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
00000020:00000001:0.0:1407228580.757380:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
It seems osc_quota_chkdq() returns NO_QUOTA. And +quota log is:
00000001:04000000:0.0F:1407230285.413534:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:9.0F:1407230285.435331:0:1869:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
00000001:04000000:0.0:1407230285.435884:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:14.0F:1407230285.452187:0:1871:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
00000001:04000000:0.0:1407230285.455988:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
00000008:04000000:3.0F:1407230285.519352:0:1875:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
-122 is -EDQUOT. Howevert, the users definitely had not reached their space limits, and second 'echo >>' should return failure if the user's quota is exceeded.
Following is the output of 'lfs quota -v':
[12:26:53 root@r7:~] # lfs quota -v -u bjm900 /home
Disk quotas for user bjm900 (uid 5800):
Filesystem kbytes quota limit grace files quota limit grace
/home 1599228 104857600 104857600 - 59224 1000000 1000000 -
homsys-MDT0000_UUID
13428 - 0 - 59224 - 64206 -
homsys-OST0000_UUID
21352 - 22376 - - - - -
homsys-OST0001_UUID
16488 - 17512 - - - - -
homsys-OST0002_UUID
12920 - 13576 - - - - -
homsys-OST0003_UUID
23704 - 24220 - - - - -
homsys-OST0004_UUID
17864 - 18888 - - - - -
homsys-OST0005_UUID
27436 - 28160 - - - - -
homsys-OST0006_UUID
12508 - 13532 - - - - -
homsys-OST0007_UUID
20476 - 21496 - - - - -
homsys-OST0008_UUID
11136 - 12156 - - - - -
homsys-OST0009_UUID
21872 - 22896 - - - - -
homsys-OST000a_UUID
13408 - 14432 - - - - -
homsys-OST000b_UUID
15312 - 16336 - - - - -
homsys-OST000c_UUID
39516 - 40536 - - - - -
homsys-OST000d_UUID
21108 - 22132 - - - - -
homsys-OST000e_UUID
17880 - 18904 - - - - -
homsys-OST000f_UUID
24440 - 25464 - - - - -
homsys-OST0010_UUID
18652 - 19676 - - - - -
homsys-OST0011_UUID
36456 - 37476 - - - - -
homsys-OST0012_UUID
17332 - 17864 - - - - -
homsys-OST0013_UUID
28272 - 29296 - - - - -
homsys-OST0014_UUID
32920 - 33944 - - - - -
homsys-OST0015_UUID
21708 - 22728 - - - - -
homsys-OST0016_UUID
21928 - 22952 - - - - -
homsys-OST0017_UUID
15104 - 15872 - - - - -
homsys-OST0018_UUID
18360 - 19384 - - - - -
homsys-OST0019_UUID
22288 - 23304 - - - - -
homsys-OST001a_UUID
11524 - 12548 - - - - -
homsys-OST001b_UUID
23016 - 24040 - - - - -
homsys-OST001c_UUID
14044 - 15068 - - - - -
homsys-OST001d_UUID
16692 - 17716 - - - - -
homsys-OST001e_UUID
39124 - 40148 - - - - -
homsys-OST001f_UUID
13484 - 14012 - - - - -
homsys-OST0020_UUID
11500 - 12524 - - - - -
homsys-OST0021_UUID
12004 - 13028 - - - - -
homsys-OST0022_UUID
26332 - 27356 - - - - -
homsys-OST0023_UUID
13896 - 14920 - - - - -
homsys-OST0024_UUID
17100 - 18120 - - - - -
homsys-OST0025_UUID
27388 - 28412 - - - - -
homsys-OST0026_UUID
10800 - 11824 - - - - -
homsys-OST0027_UUID
25572 - 26596 - - - - -
homsys-OST0028_UUID
23144 - 24064 - - - - -
homsys-OST0029_UUID
13700 - 14552 - - - - -
homsys-OST002a_UUID
21748 - 22772 - - - - -
homsys-OST002b_UUID
21800 - 22824 - - - - -
homsys-OST002c_UUID
16600 - 17624 - - - - -
homsys-OST002d_UUID
12224 - 13248 - - - - -
homsys-OST002e_UUID
12796 - 13820 - - - - -
homsys-OST002f_UUID
10436 - 11460 - - - - -
homsys-OST0030_UUID
24940 - 25960 - - - - -
homsys-OST0031_UUID
13820 - 14844 - - - - -
homsys-OST0032_UUID
10276 - 11296 - - - - -
homsys-OST0033_UUID
14324 - 14856 - - - - -
homsys-OST0034_UUID
11168 - 11776 - - - - -
homsys-OST0035_UUID
17876 - 18900 - - - - -
homsys-OST0036_UUID
14740 - 15764 - - - - -
homsys-OST0037_UUID
24764 - 25788 - - - - -
homsys-OST0038_UUID
17848 - 18868 - - - - -
homsys-OST0039_UUID
15164 - 15720 - - - - -
homsys-OST003a_UUID
18736 - 19760 - - - - -
homsys-OST003b_UUID
14476 - 15500 - - - - -
homsys-OST003c_UUID
4024 - 5048 - - - - -
homsys-OST003d_UUID
13588 - 14612 - - - - -
homsys-OST003e_UUID
13576 - 14600 - - - - -
homsys-OST003f_UUID
26372 - 27396 - - - - -
homsys-OST0040_UUID
50380 - 51404 - - - - -
homsys-OST0041_UUID
24796 - 25816 - - - - -
homsys-OST0042_UUID
24176 - 25196 - - - - -
homsys-OST0043_UUID
12776 - 13800 - - - - -
homsys-OST0044_UUID
13444 - 14468 - - - - -
homsys-OST0045_UUID
23492 - 24476 - - - - -
homsys-OST0046_UUID
11412 - 12436 - - - - -
homsys-OST0047_UUID
14552 - 15576 - - - - -
homsys-OST0048_UUID
19140 - 19664 - - - - -
homsys-OST0049_UUID
12384 - 13408 - - - - -
homsys-OST004a_UUID
29392 - 30416 - - - - -
homsys-OST004b_UUID
40412 - 41436 - - - - -
homsys-OST004c_UUID
52872 - 53896 - - - - -
homsys-OST004d_UUID
29372 - 30396 - - - - -
homsys-OST004e_UUID
13144 - 14164 - - - - -
homsys-OST004f_UUID
13000 - 14024 - - - - -
Total allocated inode limit: 64206, total allocated block limit: 1663052
After we disabled quota enforcement on OSTs, this problem was gone immediately. And the problem happened again after we enabled quota enforcement on OSTs.
Attachments
Issue Links
- is related to
-
LU-5552 Incorrect file size with -EDQUOT
- Resolved