Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3448

osc_page_delete()) ASSERTION(0) failed running racer

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 3
    • 8619

    Description

      Running a modified racer (LU-3072, + smaller dds + sleep in file_create.sh) I can reproduce this.

      LustreError: 23518:0:(osc_cache.c:2379:osc_teardown_async_page()) extent ffff8800abb54c60@{[0 -> 1/255], [3|1|-|active|wi|ffff8800aa8b4688], [8192|2|+|-|ffff8800ac9a2e78|256|(null)]} trunc at 0.
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) page@ffff8800acab8600[2 ffff8800ae3bb630:0 ^(null)_ffff880112410800 4 0 1 (null) (null) 0x0]
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) page@ffff880112410800[2 ffff8800a99e8508:0 ^ffff8800acab8600_(null) 4 0 1 (null) (null) 0x0]
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) vvp-page@ffff8800acab86c0(0:0:0) vm@ffffea0002c25250 20000000000035 3:0 0 0 lru
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) lov-page@ffff8800acab8710
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) osc-page@ffff8801124108e8: 1< 0x845fed 258 0 + - > 2< 0 0 4096 0x0 0x520 | (null) ffff8800b135e600 ffff8800aa8b4688 > 3< + ffff8800ade74080 0 0 0 > 4< 0 0 8 33980416 - | - - + - > 5< - - + - | 0 - | 7 - ->
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) end page@ffff8800acab8600
      LustreError: 23518:0:(osc_page.c:430:osc_page_delete()) Trying to teardown failed: -16
      LustreError: 23518:0:(osc_page.c:431:osc_page_delete()) ASSERTION( 0 ) failed: 
      LustreError: 23518:0:(osc_page.c:431:osc_page_delete()) LBUG
      Pid: 23518, comm: cp
      
      Call Trace:
       [<ffffffffa02ae895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa02aee97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0854701>] osc_page_delete+0x311/0x320 [osc]
       [<ffffffffa0468bb5>] cl_page_delete0+0xc5/0x4e0 [obdclass]
       [<ffffffffa0469012>] cl_page_delete+0x42/0x120 [obdclass]
       [<ffffffffa0ce766d>] ll_invalidatepage+0x8d/0x160 [lustre]
       [<ffffffff81131ae5>] do_invalidatepage+0x25/0x30
       [<ffffffff81131e02>] truncate_inode_page+0xa2/0xc0
       [<ffffffff811322d2>] truncate_inode_pages_range+0x292/0x500
       [<ffffffffa02afa4e>] ? cfs_mem_cache_free+0xe/0x10 [libcfs]
       [<ffffffff81143b62>] ? unmap_mapping_range+0x72/0x140
       [<ffffffff811325d5>] truncate_inode_pages+0x15/0x20
       [<ffffffff8113262f>] truncate_pagecache+0x4f/0x70
       [<ffffffff811aa84a>] simple_setsize+0x3a/0x50
       [<ffffffff811aa8a0>] simple_setattr+0x40/0x70
       [<ffffffffa0cc1416>] ll_setattr_raw+0x2a6/0x1090 [lustre]
       [<ffffffffa0cc225b>] ll_setattr+0x5b/0xf0 [lustre]
       [<ffffffff8119fdc8>] notify_change+0x168/0x340
       [<ffffffff811807e4>] do_truncate+0x64/0xa0
       [<ffffffff8121e52f>] ? security_inode_permission+0x1f/0x30
       [<ffffffff811946e4>] do_filp_open+0x844/0xdd0
       [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
       [<ffffffff811a0ca2>] ? alloc_fd+0x92/0x160
       [<ffffffff8117f559>] do_sys_open+0x69/0x140
       [<ffffffff8117f670>] sys_open+0x20/0x30
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-3448] osc_page_delete()) ASSERTION(0) failed running racer

            today we have 2 cases:
            1) a released file (just created with a release layout as in test 229)
            2) a released file associated with an archive (after hsm_archive + hsm_release or an import)

            truncate 0 or > 0 on 1) make an error
            truncate > 0 on 2), restores the archived and truncates the file
            truncate 0 on 2) should work without a restore (still need to be done, see LU-3454)

            If we change to have truncate 0 works on 1), we can also change to have truncate > 0 also works. This is only philosophy on "what is a released file without an HSM archive"

            jcl jacques-charles lafoucriere added a comment - today we have 2 cases: 1) a released file (just created with a release layout as in test 229) 2) a released file associated with an archive (after hsm_archive + hsm_release or an import) truncate 0 or > 0 on 1) make an error truncate > 0 on 2), restores the archived and truncates the file truncate 0 on 2) should work without a restore (still need to be done, see LU-3454 ) If we change to have truncate 0 works on 1), we can also change to have truncate > 0 also works. This is only philosophy on "what is a released file without an HSM archive"

            Is there any plan to rehabilitate truncate to 0 (or O_TRUNC) for released files?

            jhammond John Hammond added a comment - Is there any plan to rehabilitate truncate to 0 (or O_TRUNC) for released files?

            Patch landed to master.

            jhammond John Hammond added a comment - Patch landed to master.

            To make it safe, let's deny the truncate to HSM released file.

            John, can you please add this fix in your patch? The extra fix would be:

            [jinxiong@intel mdc]$ git diff ../lov/lov_io.c 
            diff --git a/lustre/lov/lov_io.c b/lustre/lov/lov_io.c
            index 6f6ea84..bec9fea 100644
            --- a/lustre/lov/lov_io.c
            +++ b/lustre/lov/lov_io.c
            @@ -984,12 +984,12 @@ int lov_io_init_released(const struct lu_env *env, struct cl_object *obj,
                            LASSERTF(0, "invalid type %d\n", io->ci_type);
                    case CIT_MISC:
                    case CIT_FSYNC:
            -       case CIT_SETATTR:
                            result = +1;
                            break;
                    case CIT_READ:
                    case CIT_WRITE:
                    case CIT_FAULT:
            +       case CIT_SETATTR:
                            /* TODO: need to restore the file. */
                            result = -EBADF;
                            break;
            

            Without this fix, it will have problem to handle the size of released file with truncate.

            jay Jinshan Xiong (Inactive) added a comment - To make it safe, let's deny the truncate to HSM released file. John, can you please add this fix in your patch? The extra fix would be: [jinxiong@intel mdc]$ git diff ../lov/lov_io.c diff --git a/lustre/lov/lov_io.c b/lustre/lov/lov_io.c index 6f6ea84..bec9fea 100644 --- a/lustre/lov/lov_io.c +++ b/lustre/lov/lov_io.c @@ -984,12 +984,12 @@ int lov_io_init_released( const struct lu_env *env, struct cl_object *obj, LASSERTF(0, "invalid type %d\n" , io->ci_type); case CIT_MISC: case CIT_FSYNC: - case CIT_SETATTR: result = +1; break ; case CIT_READ: case CIT_WRITE: case CIT_FAULT: + case CIT_SETATTR: /* TODO: need to restore the file. */ result = -EBADF; break ; Without this fix, it will have problem to handle the size of released file with truncate.

            It appears I misunderstood you guys. Because when you asked me to add the test case for truncate to released file, I thought this is what you will do.

            jay Jinshan Xiong (Inactive) added a comment - It appears I misunderstood you guys. Because when you asked me to add the test case for truncate to released file, I thought this is what you will do.

            I confirm Johann comment, truncate trigs a restore and blocks up to end of full restore. Later with partial restore we can optimize this.
            Today the only optimization is truncate to 0 (no restore)

            jcl jacques-charles lafoucriere added a comment - - edited I confirm Johann comment, truncate trigs a restore and blocks up to end of full restore. Later with partial restore we can optimize this. Today the only optimization is truncate to 0 (no restore)

            I don't think we ever intended to support such an optimization. In general, truncate should trigger a restore. The only case we want to "optimize" is truncate to 0 where we can just discard the HSM copy.

            johann Johann Lombardi (Inactive) added a comment - I don't think we ever intended to support such an optimization. In general, truncate should trigger a restore. The only case we want to "optimize" is truncate to 0 where we can just discard the HSM copy.
            jhammond John Hammond added a comment -

            Well that would make an even stronger case for disallowing truncates on released files.

            CEA colleagues, do you have opinions to share on this?

            jhammond John Hammond added a comment - Well that would make an even stronger case for disallowing truncates on released files. CEA colleagues, do you have opinions to share on this?

            Originally this is worked out for an optimization. For example, when a released file is truncated, we just change the size on the MDT but restore it later. But here is a problem that if a release file is truncated down to size A, then up to size B. The file content in [A, B] should contain zero.

            I think it's okay to remove the truncate part in the test case.

            jay Jinshan Xiong (Inactive) added a comment - Originally this is worked out for an optimization. For example, when a released file is truncated, we just change the size on the MDT but restore it later. But here is a problem that if a release file is truncated down to size A, then up to size B. The file content in [A, B] should contain zero. I think it's okay to remove the truncate part in the test case.
            jhammond John Hammond added a comment -

            Restoring the call to simple_setattr() fixed the LBUG but causes the added test (sanity 229) to fail. Shall I delete the test as well?

            On master (2.4.50-79-gaed8203) which has the patch from LU-2482 I see that the effect of truncate on a released file is not seen consistently across clients:

            # MOUNT_2=y llmount.sh
            # multiop /mnt/lustre/f0 H2c
            # truncate --size=42 /mnt/lustre/f0
            # stat /mnt/lustre/f0
              File: `/mnt/lustre/f0'
              Size: 42        	Blocks: 0          IO Block: 4194304 regular file
            ...
            # stat /mnt/lustre2/f0
              File: `/mnt/lustre2/f0'
              Size: 0         	Blocks: 0          IO Block: 4194304 regular empty file
            ...
            # stat /mnt/lustre/f0
              File: `/mnt/lustre/f0'
              Size: 42        	Blocks: 0          IO Block: 4194304 regular file
            ...
            

            Can someone explain why we support truncate (to non-zero size) on released files? Truncate to zero seems somewhat defensible, but to non-zero just asks for trouble. But in either case why bother? In practice won't truncate almost always be followed by write (requiring a restore)?

            jhammond John Hammond added a comment - Restoring the call to simple_setattr() fixed the LBUG but causes the added test (sanity 229) to fail. Shall I delete the test as well? On master (2.4.50-79-gaed8203) which has the patch from LU-2482 I see that the effect of truncate on a released file is not seen consistently across clients: # MOUNT_2=y llmount.sh # multiop /mnt/lustre/f0 H2c # truncate --size=42 /mnt/lustre/f0 # stat /mnt/lustre/f0 File: `/mnt/lustre/f0' Size: 42 Blocks: 0 IO Block: 4194304 regular file ... # stat /mnt/lustre2/f0 File: `/mnt/lustre2/f0' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file ... # stat /mnt/lustre/f0 File: `/mnt/lustre/f0' Size: 42 Blocks: 0 IO Block: 4194304 regular file ... Can someone explain why we support truncate (to non-zero size) on released files? Truncate to zero seems somewhat defensible, but to non-zero just asks for trouble. But in either case why bother? In practice won't truncate almost always be followed by write (requiring a restore)?

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: