Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10028

    Description

      HSM release in not preserving block count as reported by stat()

      # cd /mnt/lustre
      # dd if=/dev/zero of=Antoshka bs=1M count=10
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.0740321 s, 142 MB/s
      # stat Antoshka 
        File: `Antoshka'
        Size: 10485760  	Blocks: 20480      IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d	Inode: 144115205255725060  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2013-08-30 10:13:48.000000000 -0500
      Modify: 2013-08-30 10:13:48.000000000 -0500
      Change: 2013-08-30 10:13:48.000000000 -0500
      # lfs hsm_archive Antoshka 
      # lfs hsm_release Antoshka
      # stat Antoshka
        File: `Antoshka'
        Size: 10485760  	Blocks: 0          IO Block: 4194304 regular file
      Device: 2c54f966h/743766374d	Inode: 144115205255725060  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2013-08-30 10:13:48.000000000 -0500
      Modify: 2013-08-30 10:13:48.000000000 -0500
      Change: 2013-08-30 10:13:48.000000000 -0500
      

      I had intended to fix this with LU-3811 but it will require some work in the MD* attr_set patch.

      If you're thinking (philosophically) hmm, well maybe it should report a block count of 0 here then you're just wrong.

      Attachments

        Activity

          [LU-3864] stat() on HSM released file returns st_blocks = 0
          pjones Peter Jones added a comment -

          Landed for LU-3864

          pjones Peter Jones added a comment - Landed for LU-3864
          bfaccini Bruno Faccini (Inactive) added a comment - http://review.whamcloud.com/7776 has land and http://review.whamcloud.com/7584 only misses reviews now.
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          Submitted patch to have st_blocks = 1 returned for released files, at http://review.whamcloud.com/7776

          bfaccini Bruno Faccini (Inactive) added a comment - - edited Submitted patch to have st_blocks = 1 returned for released files, at http://review.whamcloud.com/7776

          Most of filesystem does not count the space used by inode (unless there is extra block used by metadata, like for EA) when filling st_blocks, but POSIX does not say if it is only data or not.

          Anyway, I think we will have to go for the "st_blocks == 1" solution.

          I checked what XFS/DMF does for that. SGI says "don't use tar".... :/

          adegremont Aurelien Degremont (Inactive) added a comment - Most of filesystem does not count the space used by inode (unless there is extra block used by metadata, like for EA) when filling st_blocks, but POSIX does not say if it is only data or not. Anyway, I think we will have to go for the "st_blocks == 1" solution. I checked what XFS/DMF does for that. SGI says "don't use tar".... :/

          Note that returning st_blocks = 1 can still be justified as "correct" for HSM released files, since this is still consuming one sector on the MDT filesystem for the inode.

          adilger Andreas Dilger added a comment - Note that returning st_blocks = 1 can still be justified as "correct" for HSM released files, since this is still consuming one sector on the MDT filesystem for the inode.

          We already do the same on the client if it only has writeback cache pages - if st_blocks == 0 but there is dirty data in the client cache we return st_blocks = 1. See cl_glimpse_lock():

                                          if (cl_isize_read(inode) > 0 &&
                                              inode->i_blocks == 0) {
                                                  /*
                                                   * LU-417: Add dirty pages block count
                                                   * lest i_blocks reports 0, some "cp" or
                                                   * "tar" may think it's a completely
                                                   * sparse file and skip it.
                                                   */
                                                  inode->i_blocks = dirty_cnt(inode);
          
          adilger Andreas Dilger added a comment - We already do the same on the client if it only has writeback cache pages - if st_blocks == 0 but there is dirty data in the client cache we return st_blocks = 1. See cl_glimpse_lock(): if (cl_isize_read(inode) > 0 && inode->i_blocks == 0) { /* * LU-417: Add dirty pages block count * lest i_blocks reports 0, some "cp" or * "tar" may think it's a completely * sparse file and skip it. */ inode->i_blocks = dirty_cnt(inode);

          Hummm seems that "tar --sparse", instead of using FIEMAP as expected, uses an odd optimization (if st_blocks = 0) that cause a released file to be archived as a fully sparse file. I did not check all/latest versions of tar.

          Seems that for btrfs they encountered the problem with small files (ie, enough small to have the datas stored with the meta-datas and then report st_blocks = 0), as detailed in RedHat Bugzilla #757557 (at https://bugzilla.redhat.com/show_bug.cgi?id=757557). And if I correctly understand, they fixed it by returning st_blocks = 1. Should we do the same ??

          bfaccini Bruno Faccini (Inactive) added a comment - Hummm seems that "tar --sparse", instead of using FIEMAP as expected, uses an odd optimization (if st_blocks = 0) that cause a released file to be archived as a fully sparse file. I did not check all/latest versions of tar. Seems that for btrfs they encountered the problem with small files (ie, enough small to have the datas stored with the meta-datas and then report st_blocks = 0), as detailed in RedHat Bugzilla #757557 (at https://bugzilla.redhat.com/show_bug.cgi?id=757557 ). And if I correctly understand, they fixed it by returning st_blocks = 1. Should we do the same ??
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          Humm during 1st patch-set testing I also found (seems not already reported) that doing a "filefrag" (ie FIEMAP syscall!) on a released file just crash with the follwing LBUG :

          LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) ASSERTION( fm_local ) failed: 
          LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) LBUG
          Pid: 13811, comm: filefrag
          
          Call Trace:
           [<ffffffffa0206895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
           [<ffffffffa0206e97>] lbug_with_loc+0x47/0xb0 [libcfs]
           [<ffffffffa084e56c>] lov_get_info+0x10ac/0x1cb0 [lov]
           [<ffffffff8112fca0>] ? __lru_cache_add+0x40/0x90
           [<ffffffffa08697ab>] ? lov_lsm_addref+0x6b/0x130 [lov]
           [<ffffffffa0dbaab1>] ll_do_fiemap+0x411/0x6b0 [lustre]
           [<ffffffffa0dc5d97>] ll_fiemap+0x117/0x590 [lustre]
           [<ffffffff811956e5>] do_vfs_ioctl+0x505/0x580
           [<ffffffff811957e1>] sys_ioctl+0x81/0xa0
           [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
          

          and this with or without my patch, this is due to unconditionally freeing "fm_local" at the end of lov_fiemap() routine, even if it was not allocated because of no-object/ENOMEM and also now when released!!

          Will fix that in patch-set #7 in addition to answer/address Andreas comments for patch-set #6.

          bfaccini Bruno Faccini (Inactive) added a comment - - edited Humm during 1st patch-set testing I also found (seems not already reported) that doing a "filefrag" (ie FIEMAP syscall!) on a released file just crash with the follwing LBUG : LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) ASSERTION( fm_local ) failed: LustreError: 13811:0:(lov_obd.c:2488:lov_fiemap()) LBUG Pid: 13811, comm: filefrag Call Trace: [<ffffffffa0206895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0206e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa084e56c>] lov_get_info+0x10ac/0x1cb0 [lov] [<ffffffff8112fca0>] ? __lru_cache_add+0x40/0x90 [<ffffffffa08697ab>] ? lov_lsm_addref+0x6b/0x130 [lov] [<ffffffffa0dbaab1>] ll_do_fiemap+0x411/0x6b0 [lustre] [<ffffffffa0dc5d97>] ll_fiemap+0x117/0x590 [lustre] [<ffffffff811956e5>] do_vfs_ioctl+0x505/0x580 [<ffffffff811957e1>] sys_ioctl+0x81/0xa0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b and this with or without my patch, this is due to unconditionally freeing "fm_local" at the end of lov_fiemap() routine, even if it was not allocated because of no-object/ENOMEM and also now when released!! Will fix that in patch-set #7 in addition to answer/address Andreas comments for patch-set #6.

          There is still a patch to land for this bug.

          adilger Andreas Dilger added a comment - There is still a patch to land for this bug.

          > BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ??

          Ok I got the information from SGI.
          XFS(for DMF) is returning 1 full extent, with normal flag (not UNKNOWN or DELALLOC, etc)

          I do not know for GHI.

          adegremont Aurelien Degremont (Inactive) added a comment - > BTW, do we know what answer to FIEMAP do provide DMF and/or GHI ?? Ok I got the information from SGI. XFS(for DMF) is returning 1 full extent, with normal flag (not UNKNOWN or DELALLOC, etc) I do not know for GHI.

          People

            bfaccini Bruno Faccini (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: