Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10028

    Description

      HSM release in not preserving block count as reported by stat()

      # cd /mnt/lustre
      # dd if=/dev/zero of=Antoshka bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.0740321 s, 142 MB/s
      # stat Antoshka 
        File: `Antoshka'
        Size: 10485760  	Blocks: 20480      IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d	Inode: 144115205255725060  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2013-08-30 10:13:48.000000000 -0500
      Modify: 2013-08-30 10:13:48.000000000 -0500
      Change: 2013-08-30 10:13:48.000000000 -0500
      # lfs hsm_archive Antoshka 
      # lfs hsm_release Antoshka
      # stat Antoshka
        File: `Antoshka'
        Size: 10485760  	Blocks: 0          IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d	Inode: 144115205255725060  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2013-08-30 10:13:48.000000000 -0500
      Modify: 2013-08-30 10:13:48.000000000 -0500
      Change: 2013-08-30 10:13:48.000000000 -0500
      

      I had intended to fix this with LU-3811 but it will require some work in the MD* attr_set patch.

      If you're thinking (philosophically) hmm, well maybe it should report a block count of 0 here then you're just wrong.

      Attachments

        Activity

          [LU-3864] stat() on HSM released file returns st_blocks = 0
          bfaccini Bruno Faccini (Inactive) added a comment - http://review.whamcloud.com/7776 has land and http://review.whamcloud.com/7584 only misses reviews now.
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          Submitted patch to have st_blocks = 1 returned for released files, at http://review.whamcloud.com/7776

          bfaccini Bruno Faccini (Inactive) added a comment - - edited Submitted patch to have st_blocks = 1 returned for released files, at http://review.whamcloud.com/7776

          Most of filesystem does not count the space used by inode (unless there is extra block used by metadata, like for EA) when filling st_blocks, but POSIX does not say if it is only data or not.

          Anyway, I think we will have to go for the "st_blocks == 1" solution.

          I checked what XFS/DMF does for that. SGI says "don't use tar".... :/

          adegremont Aurelien Degremont (Inactive) added a comment - Most of filesystem does not count the space used by inode (unless there is extra block used by metadata, like for EA) when filling st_blocks, but POSIX does not say if it is only data or not. Anyway, I think we will have to go for the "st_blocks == 1" solution. I checked what XFS/DMF does for that. SGI says "don't use tar".... :/

          Note that returning st_blocks = 1 can still be justified as "correct" for HSM released files, since this is still consuming one sector on the MDT filesystem for the inode.

          adilger Andreas Dilger added a comment - Note that returning st_blocks = 1 can still be justified as "correct" for HSM released files, since this is still consuming one sector on the MDT filesystem for the inode.

          We already do the same on the client if it only has writeback cache pages - if st_blocks == 0 but there is dirty data in the client cache we return st_blocks = 1. See cl_glimpse_lock():

                                          if (cl_isize_read(inode) > 0 &&
                                              inode->i_blocks == 0) {
                                                  /*
                                                   * LU-417: Add dirty pages block count
                                                   * lest i_blocks reports 0, some "cp" or
                                                   * "tar" may think it's a completely
                                                   * sparse file and skip it.
                                                   */
                                                  inode->i_blocks = dirty_cnt(inode);
          
          adilger Andreas Dilger added a comment - We already do the same on the client if it only has writeback cache pages - if st_blocks == 0 but there is dirty data in the client cache we return st_blocks = 1. See cl_glimpse_lock(): if (cl_isize_read(inode) > 0 && inode->i_blocks == 0) { /* * LU-417: Add dirty pages block count * lest i_blocks reports 0, some "cp" or * "tar" may think it's a completely * sparse file and skip it. */ inode->i_blocks = dirty_cnt(inode);

          Hummm seems that "tar --sparse", instead of using FIEMAP as expected, uses an odd optimization (if st_blocks = 0) that cause a released file to be archived as a fully sparse file. I did not check all/latest versions of tar.

          Seems that for btrfs they encountered the problem with small files (ie, enough small to have the datas stored with the meta-datas and then report st_blocks = 0), as detailed in RedHat Bugzilla #757557 (at https://bugzilla.redhat.com/show_bug.cgi?id=757557). And if I correctly understand, they fixed it by returning st_blocks = 1. Should we do the same ??

          bfaccini Bruno Faccini (Inactive) added a comment - Hummm seems that "tar --sparse", instead of using FIEMAP as expected, uses an odd optimization (if st_blocks = 0) that cause a released file to be archived as a fully sparse file. I did not check all/latest versions of tar. Seems that for btrfs they encountered the problem with small files (ie, enough small to have the datas stored with the meta-datas and then report st_blocks = 0), as detailed in RedHat Bugzilla #757557 (at https://bugzilla.redhat.com/show_bug.cgi?id=757557 ). And if I correctly understand, they fixed it by returning st_blocks = 1. Should we do the same ??
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          Humm during 1st patch-set testing I also found (seems not already reported) that doing a "filefrag" (ie FIEMAP syscall!) on a released file just crash with the follwing LBUG :

          LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) ASSERTION( fm_local ) failed: 
          LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) LBUG
          Pid: 13811, comm: filefrag
          
          Call Trace:
           [<ffffffffa0206895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0206e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa084e56c>] lov_get_info+0x10ac/0x1cb0 [lov]
           [<ffffffff8112fca0>] ? __lru_cache_add+0x40/0x90
           [<ffffffffa08697ab>] ? lov_lsm_addref+0x6b/0x130 [lov]
           [<ffffffffa0dbaab1>] ll_do_fiemap+0x411/0x6b0 [lustre]
           [<ffffffffa0dc5d97>] ll_fiemap+0x117/0x590 [lustre]
           [<ffffffff811956e5>] do_vfs_ioctl+0x505/0x580
           [<ffffffff811957e1>] sys_ioctl+0x81/0xa0
           [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
          

          and this with or without my patch, this is due to unconditionally freeing "fm_local" at the end of lov_fiemap() routine, even if it was not allocated because of no-object/ENOMEM and also now when released!!

          Will fix that in patch-set #7 in addition to answer/address Andreas comments for patch-set #6.

          bfaccini Bruno Faccini (Inactive) added a comment - - edited Humm during 1st patch-set testing I also found (seems not already reported) that doing a "filefrag" (ie FIEMAP syscall!) on a released file just crash with the follwing LBUG : LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) ASSERTION( fm_local ) failed: LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) LBUG Pid: 13811, comm: filefrag Call Trace: [<ffffffffa0206895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0206e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa084e56c>] lov_get_info+0x10ac/0x1cb0 [lov] [<ffffffff8112fca0>] ? __lru_cache_add+0x40/0x90 [<ffffffffa08697ab>] ? lov_lsm_addref+0x6b/0x130 [lov] [<ffffffffa0dbaab1>] ll_do_fiemap+0x411/0x6b0 [lustre] [<ffffffffa0dc5d97>] ll_fiemap+0x117/0x590 [lustre] [<ffffffff811956e5>] do_vfs_ioctl+0x505/0x580 [<ffffffff811957e1>] sys_ioctl+0x81/0xa0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b and this with or without my patch, this is due to unconditionally freeing "fm_local" at the end of lov_fiemap() routine, even if it was not allocated because of no-object/ENOMEM and also now when released!! Will fix that in patch-set #7 in addition to answer/address Andreas comments for patch-set #6.

          There is still a patch to land for this bug.

          adilger Andreas Dilger added a comment - There is still a patch to land for this bug.

          > BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ??

          Ok I got the information from SGI.
          XFS(for DMF) is returning 1 full extent, with normal flag (not UNKNOWN or DELALLOC, etc)

          I do not know for GHI.

          adegremont Aurelien Degremont (Inactive) added a comment - > BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ?? Ok I got the information from SGI. XFS(for DMF) is returning 1 full extent, with normal flag (not UNKNOWN or DELALLOC, etc) I do not know for GHI.

          Hello Aurelien,
          Seems everybody "agree" with st_blocks to be Null for released files. This is the reason ticket resolution has been set as "won't fix".

          Concerning change in FIEMAP, this can be tracked as a new ticket (or still this one), but with lower priority.

          BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ??

          Seems to me that it is not a frozen interface, because when googling I found that a FIEMAP_EXTENT_SECONDARY flag exists (to indicate that extent data are on HSM) in some implementations that may fit our needs here, but then is it used/tested by the coreutils and other tools ?

          But anyway, I wrote a 1st patch attempt that returns a single extent with (FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_LAST). It is http://review.whamcloud.com/7584.

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Aurelien, Seems everybody "agree" with st_blocks to be Null for released files. This is the reason ticket resolution has been set as "won't fix". Concerning change in FIEMAP, this can be tracked as a new ticket (or still this one), but with lower priority. BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ?? Seems to me that it is not a frozen interface, because when googling I found that a FIEMAP_EXTENT_SECONDARY flag exists (to indicate that extent data are on HSM) in some implementations that may fit our needs here, but then is it used/tested by the coreutils and other tools ? But anyway, I wrote a 1st patch attempt that returns a single extent with (FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN | FIEMAP_EXTENT_LAST). It is http://review.whamcloud.com/7584 .

          OK, I've checked with GHI (GPFS/HPSS Interface), which is the only concurrent product of Lustre/HSM.
          GHI also returns st_blocks == 0 when files are RELEASED.

          I propose that FIEMAP is modified for RELEASED files, and st_block still return 0. We will see if lot of people complains about that.

          adegremont Aurelien Degremont (Inactive) added a comment - OK, I've checked with GHI (GPFS/HPSS Interface), which is the only concurrent product of Lustre/HSM. GHI also returns st_blocks == 0 when files are RELEASED. I propose that FIEMAP is modified for RELEASED files, and st_block still return 0. We will see if lot of people complains about that.

          People

            bfaccini Bruno Faccini (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: