Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13227

sanityn 16a FAIL: fsx with O_DIRECT failed.

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Size error: expected 0x659000 stat 0x145d12 seek 0x659000
      LOG DUMP (849 total operations):

      Full logs could be found here:

      https://testing-archive.whamcloud.com/gerrit-janitor/6308/testresults/sanityn-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/sanityn.test_16a.test_log.oleg357-client.log

      The problem might be related to following cases:

      1) Client A try to ftruncate file to file_size.
      2) Client B Direct Write (punch hole )to update new_file_size.
      3) Client A try to stat file to check whether it could get new_file_size, but got old file_size instead.

      Attachments

        Issue Links

          Activity

            [LU-13227] sanityn 16a FAIL: fsx with O_DIRECT failed.

            I'm still hitting this on master:

            065500[0] 1670492005.778405 trunc from 0x870187 to 0x19dc98 (0x6d24ee bytes)
            Size error: expected 0x94fbcf stat 0x92b000 seek 0x92b000

            bzzz Alex Zhuravlev added a comment - I'm still hitting this on master: 065500 [0] 1670492005.778405 trunc from 0x870187 to 0x19dc98 (0x6d24ee bytes) Size error: expected 0x94fbcf stat 0x92b000 seek 0x92b000

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37790/
            Subject: LU-13227 ofd: update lvb before dropping server lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 92d799217aeac4e1e859264d157c62e0f02b83a8

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37790/ Subject: LU-13227 ofd: update lvb before dropping server lock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 92d799217aeac4e1e859264d157c62e0f02b83a8

            I've added a simple reproduced test to the patch and verify with/without patch it works.

            wshilong Wang Shilong (Inactive) added a comment - I've added a simple reproduced test to the patch and verify with/without patch it works.
            wshilong Wang Shilong (Inactive) added a comment - - edited

            sorry, need make sure only update LVB with IO context, update patch again, https://review.whamcloud.com/#/c/37790/5
            shall fix the problem.

            wshilong Wang Shilong (Inactive) added a comment - - edited sorry, need make sure only update LVB with IO context, update patch again, https://review.whamcloud.com/#/c/37790/5 shall fix the problem.

            failed with an assertion:

            Lustre: DEBUG MARKER: == replay-single test 20b: write, unlink, eviction, replay (test mds_cleanup_orphans) ================ 17:33:05 (1583325185)
            Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
            Lustre: 28814:0:(genops.c:1709:obd_export_evict_by_uuid()) lustre-MDT0000: evicting 8ef1b02c-3fd5-4 at adminstrative request
            LustreError: 28814:0:(mdt_lvb.c:176:mdt_dom_lvbo_update()) ASSERTION( env ) failed: 
            LustreError: 28814:0:(mdt_lvb.c:176:mdt_dom_lvbo_update()) LBUG
            Pid: 28814, comm: lctl 4.18.0 #32 SMP Wed Jan 15 22:22:45 MSK 2020
            Call Trace:
             libcfs_call_trace+0x71/0x90 [libcfs]
             lbug_with_loc+0x3e/0x80 [libcfs]
             ? mdt_dom_lvbo_update+0x8ab/0x9c0 [mdt]
             ? ldlm_lock_decref_internal+0x33f/0xb70 [ptlrpc]
             ? ldlm_lock_decref_and_cancel+0x6f/0x130 [ptlrpc]
             ? mdt_dom_discard_data+0x490/0x550 [mdt]
             ? barrier_exit+0x27/0x60 [ptlrpc]
             ? mdd_trans_stop+0x28/0x15d [mdd]
             ? mdt_mfd_close+0x6ab/0x3080 [mdt]
             ? lu_context_fini+0x72/0x180 [obdclass]
             ? mdt_ctxt_add_dirty_flag.isra.0+0x111/0x160 [mdt]
             ? mdt_obd_disconnect+0x3a3/0x510 [mdt]
             ? class_fail_export+0x1ce/0x4e0 [obdclass]
             ? obd_export_evict_by_uuid+0xdb/0x1e0 [obdclass]
             ? lprocfs_evict_client_seq_write+0x1e8/0x2a0 [obdclass]
             ? mdt_mds_evict_client_write+0x417/0x6e0 [mdt]
             ? proc_reg_write+0x35/0x60
             ? __vfs_write+0x1f/0x160
             ? rcu_sync_lockdep_assert+0x9/0x50
             ? __sb_start_write+0x13f/0x1a0
             ? vfs_write+0x183/0x1b0
             ? vfs_write+0xba/0x1b0
             ? ksys_write+0x3d/0xa0
             ? do_syscall_64+0x4b/0x1a0
             ? entry_SYSCALL_64_after_hwframe+0x6a/0xdf
            
            bzzz Alex Zhuravlev added a comment - failed with an assertion: Lustre: DEBUG MARKER: == replay-single test 20b: write, unlink, eviction, replay (test mds_cleanup_orphans) ================ 17:33:05 (1583325185) Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000 Lustre: 28814:0:(genops.c:1709:obd_export_evict_by_uuid()) lustre-MDT0000: evicting 8ef1b02c-3fd5-4 at adminstrative request LustreError: 28814:0:(mdt_lvb.c:176:mdt_dom_lvbo_update()) ASSERTION( env ) failed: LustreError: 28814:0:(mdt_lvb.c:176:mdt_dom_lvbo_update()) LBUG Pid: 28814, comm: lctl 4.18.0 #32 SMP Wed Jan 15 22:22:45 MSK 2020 Call Trace: libcfs_call_trace+0x71/0x90 [libcfs] lbug_with_loc+0x3e/0x80 [libcfs] ? mdt_dom_lvbo_update+0x8ab/0x9c0 [mdt] ? ldlm_lock_decref_internal+0x33f/0xb70 [ptlrpc] ? ldlm_lock_decref_and_cancel+0x6f/0x130 [ptlrpc] ? mdt_dom_discard_data+0x490/0x550 [mdt] ? barrier_exit+0x27/0x60 [ptlrpc] ? mdd_trans_stop+0x28/0x15d [mdd] ? mdt_mfd_close+0x6ab/0x3080 [mdt] ? lu_context_fini+0x72/0x180 [obdclass] ? mdt_ctxt_add_dirty_flag.isra.0+0x111/0x160 [mdt] ? mdt_obd_disconnect+0x3a3/0x510 [mdt] ? class_fail_export+0x1ce/0x4e0 [obdclass] ? obd_export_evict_by_uuid+0xdb/0x1e0 [obdclass] ? lprocfs_evict_client_seq_write+0x1e8/0x2a0 [obdclass] ? mdt_mds_evict_client_write+0x417/0x6e0 [mdt] ? proc_reg_write+0x35/0x60 ? __vfs_write+0x1f/0x160 ? rcu_sync_lockdep_assert+0x9/0x50 ? __sb_start_write+0x13f/0x1a0 ? vfs_write+0x183/0x1b0 ? vfs_write+0xba/0x1b0 ? ksys_write+0x3d/0xa0 ? do_syscall_64+0x4b/0x1a0 ? entry_SYSCALL_64_after_hwframe+0x6a/0xdf

            sure, it's been running now..

            bzzz Alex Zhuravlev added a comment - sure, it's been running now..

            bzzz Would you mind applying the above new patch to see if it help fix your problem?
            At least i try ram OST with repeated tests 200, did not reproduce.

            Thanks in advance!

            wshilong Wang Shilong (Inactive) added a comment - bzzz Would you mind applying the above new patch to see if it help fix your problem? At least i try ram OST with repeated tests 200, did not reproduce. Thanks in advance!

            Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37790
            Subject: LU-13227 ldlm: update lvb before dropping server lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 75eafe50f942e1c412b673579e57a311cc46cac0

            gerrit Gerrit Updater added a comment - Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37790 Subject: LU-13227 ldlm: update lvb before dropping server lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 75eafe50f942e1c412b673579e57a311cc46cac0

            sure, please find.

            bzzz Alex Zhuravlev added a comment - sure, please find.

            Hmm..i could not access url logs, would you mind uploading it here?

            wshilong Wang Shilong (Inactive) added a comment - Hmm..i could not access url logs, would you mind uploading it here?

            People

              wshilong Wang Shilong (Inactive)
              wshilong Wang Shilong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: