Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5463

Short file size because of -EDQUOT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.6.0, Lustre 2.7.0
    • servers are running Lustre-2.4.2, and clients are running b_ieel2_0.
    • 3
    • 15220

    Description

      A couple of users found their files are truncated when they copied new files to lustre. And they reproduced the problem using following script:

      echo a > testfile1 && echo b >> testfile1 && cat testfile1

      The output of the script is always 'a\n' for these users. And the output of 'ls -l' shows that the file size is 2. However, on another node, 'ls -l' shew that the size of the 'testfile1' is actually 4 and the content of it is 'a\nb\n', which means the data has been written onto disk correctly. And please note root users and other users do not have such kind of problem.

      We traced the operation, and found following logs.

      00000008:00000001:0.0:1407228580.717510:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.717514:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.717515:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.717516:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00020000:00000001:0.0:1407228580.717516:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.717517:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000080:00000001:0.0:1407228580.742691:0:29229:0:(xattr.c:321:ll_getxattr_common()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)
      00000008:00000001:0.0:1407228580.742728:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.742731:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.742732:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.742733:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00020000:00000001:0.0:1407228580.742734:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.742734:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.757374:0:29229:0:(osc_cache.c:2274:osc_queue_async_io()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.757377:0:29229:0:(osc_page.c:224:osc_page_cache_add()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000008:00000001:0.0:1407228580.757378:0:29229:0:(osc_io.c:313:osc_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.757379:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00020000:00000001:0.0:1407228580.757379:0:29229:0:(lov_io.c:665:lov_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)
      00000020:00000001:0.0:1407228580.757380:0:29229:0:(cl_io.c:801:cl_io_commit_async()) Process leaving (rc=18446744073709551494 : -122 : ffffffffffffff86)

      It seems osc_quota_chkdq() returns NO_QUOTA. And +quota log is:

      00000001:04000000:0.0F:1407230285.413534:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
      00000008:04000000:9.0F:1407230285.435331:0:1869:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
      00000001:04000000:0.0:1407230285.435884:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
      00000008:04000000:14.0F:1407230285.452187:0:1871:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100
      00000001:04000000:0.0:1407230285.455988:0:29229:0:(osc_quota.c:64:osc_quota_chkdq()) chkdq found noquota for user 5800
      00000008:04000000:3.0F:1407230285.519352:0:1875:0:(osc_request.c:1528:osc_brw_fini_request()) setdq for [5800 1090] with valid 0x6f184fb9, flags 2100

      -122 is -EDQUOT. Howevert, the users definitely had not reached their space limits, and second 'echo >>' should return failure if the user's quota is exceeded.

      Following is the output of 'lfs quota -v':

      [12:26:53 root@r7:~] # lfs quota -v -u bjm900 /home
      Disk quotas for user bjm900 (uid 5800):
      Filesystem kbytes quota limit grace files quota limit grace
      /home 1599228 104857600 104857600 - 59224 1000000 1000000 -
      homsys-MDT0000_UUID
      13428 - 0 - 59224 - 64206 -
      homsys-OST0000_UUID
      21352 - 22376 - - - - -
      homsys-OST0001_UUID
      16488 - 17512 - - - - -
      homsys-OST0002_UUID
      12920 - 13576 - - - - -
      homsys-OST0003_UUID
      23704 - 24220 - - - - -
      homsys-OST0004_UUID
      17864 - 18888 - - - - -
      homsys-OST0005_UUID
      27436 - 28160 - - - - -
      homsys-OST0006_UUID
      12508 - 13532 - - - - -
      homsys-OST0007_UUID
      20476 - 21496 - - - - -
      homsys-OST0008_UUID
      11136 - 12156 - - - - -
      homsys-OST0009_UUID
      21872 - 22896 - - - - -
      homsys-OST000a_UUID
      13408 - 14432 - - - - -
      homsys-OST000b_UUID
      15312 - 16336 - - - - -
      homsys-OST000c_UUID
      39516 - 40536 - - - - -
      homsys-OST000d_UUID
      21108 - 22132 - - - - -
      homsys-OST000e_UUID
      17880 - 18904 - - - - -
      homsys-OST000f_UUID
      24440 - 25464 - - - - -
      homsys-OST0010_UUID
      18652 - 19676 - - - - -
      homsys-OST0011_UUID
      36456 - 37476 - - - - -
      homsys-OST0012_UUID
      17332 - 17864 - - - - -
      homsys-OST0013_UUID
      28272 - 29296 - - - - -
      homsys-OST0014_UUID
      32920 - 33944 - - - - -
      homsys-OST0015_UUID
      21708 - 22728 - - - - -
      homsys-OST0016_UUID
      21928 - 22952 - - - - -
      homsys-OST0017_UUID
      15104 - 15872 - - - - -
      homsys-OST0018_UUID
      18360 - 19384 - - - - -
      homsys-OST0019_UUID
      22288 - 23304 - - - - -
      homsys-OST001a_UUID
      11524 - 12548 - - - - -
      homsys-OST001b_UUID
      23016 - 24040 - - - - -
      homsys-OST001c_UUID
      14044 - 15068 - - - - -
      homsys-OST001d_UUID
      16692 - 17716 - - - - -
      homsys-OST001e_UUID
      39124 - 40148 - - - - -
      homsys-OST001f_UUID
      13484 - 14012 - - - - -
      homsys-OST0020_UUID
      11500 - 12524 - - - - -
      homsys-OST0021_UUID
      12004 - 13028 - - - - -
      homsys-OST0022_UUID
      26332 - 27356 - - - - -
      homsys-OST0023_UUID
      13896 - 14920 - - - - -
      homsys-OST0024_UUID
      17100 - 18120 - - - - -
      homsys-OST0025_UUID
      27388 - 28412 - - - - -
      homsys-OST0026_UUID
      10800 - 11824 - - - - -
      homsys-OST0027_UUID
      25572 - 26596 - - - - -
      homsys-OST0028_UUID
      23144 - 24064 - - - - -
      homsys-OST0029_UUID
      13700 - 14552 - - - - -
      homsys-OST002a_UUID
      21748 - 22772 - - - - -
      homsys-OST002b_UUID
      21800 - 22824 - - - - -
      homsys-OST002c_UUID
      16600 - 17624 - - - - -
      homsys-OST002d_UUID
      12224 - 13248 - - - - -
      homsys-OST002e_UUID
      12796 - 13820 - - - - -
      homsys-OST002f_UUID
      10436 - 11460 - - - - -
      homsys-OST0030_UUID
      24940 - 25960 - - - - -
      homsys-OST0031_UUID
      13820 - 14844 - - - - -
      homsys-OST0032_UUID
      10276 - 11296 - - - - -
      homsys-OST0033_UUID
      14324 - 14856 - - - - -
      homsys-OST0034_UUID
      11168 - 11776 - - - - -
      homsys-OST0035_UUID
      17876 - 18900 - - - - -
      homsys-OST0036_UUID
      14740 - 15764 - - - - -
      homsys-OST0037_UUID
      24764 - 25788 - - - - -
      homsys-OST0038_UUID
      17848 - 18868 - - - - -
      homsys-OST0039_UUID
      15164 - 15720 - - - - -
      homsys-OST003a_UUID
      18736 - 19760 - - - - -
      homsys-OST003b_UUID
      14476 - 15500 - - - - -
      homsys-OST003c_UUID
      4024 - 5048 - - - - -
      homsys-OST003d_UUID
      13588 - 14612 - - - - -
      homsys-OST003e_UUID
      13576 - 14600 - - - - -
      homsys-OST003f_UUID
      26372 - 27396 - - - - -
      homsys-OST0040_UUID
      50380 - 51404 - - - - -
      homsys-OST0041_UUID
      24796 - 25816 - - - - -
      homsys-OST0042_UUID
      24176 - 25196 - - - - -
      homsys-OST0043_UUID
      12776 - 13800 - - - - -
      homsys-OST0044_UUID
      13444 - 14468 - - - - -
      homsys-OST0045_UUID
      23492 - 24476 - - - - -
      homsys-OST0046_UUID
      11412 - 12436 - - - - -
      homsys-OST0047_UUID
      14552 - 15576 - - - - -
      homsys-OST0048_UUID
      19140 - 19664 - - - - -
      homsys-OST0049_UUID
      12384 - 13408 - - - - -
      homsys-OST004a_UUID
      29392 - 30416 - - - - -
      homsys-OST004b_UUID
      40412 - 41436 - - - - -
      homsys-OST004c_UUID
      52872 - 53896 - - - - -
      homsys-OST004d_UUID
      29372 - 30396 - - - - -
      homsys-OST004e_UUID
      13144 - 14164 - - - - -
      homsys-OST004f_UUID
      13000 - 14024 - - - - -
      Total allocated inode limit: 64206, total allocated block limit: 1663052

      After we disabled quota enforcement on OSTs, this problem was gone immediately. And the problem happened again after we enabled quota enforcement on OSTs.

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: