Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18006

sanity test_119f: crash in ll_dio_user_copy

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/d37f6322-70a2-4899-833b-a09de308b500

      test_119f failed with the following error:

      [ 4762.163878] Lustre: DEBUG MARKER: == sanity test 119f: dio vs dio race ===================== 15:47:47 (1720194467)
      [ 4777.465102] BUG: scheduling while atomic: dd/456442/0x00000002
      [ 4777.465166] CPU: 1 PID: 456442 Comm: dd 5.14.0-362.24.1.el9_3.x86_64 #1
      [ 4777.465173] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 4777.465178] Call Trace:
      [ 4777.465194]  dump_stack_lvl+0x34/0x48
      [ 4777.465250]  __schedule_bug.cold+0x47/0x53
      [ 4777.465266]  schedule_debug.constprop.0+0xc5/0x100
      [ 4777.465289]  __schedule+0x48/0x550
      [ 4777.465319]  schedule+0x2d/0x70
      [ 4777.465321]  schedule_timeout+0x11f/0x160
      [ 4777.465331]  __wait_for_common+0x93/0x1d0
      [ 4777.465334]  ? __pfx_schedule_timeout+0x10/0x10
      [ 4777.465336]  ? __pfx_ll_dio_user_copy_helper+0x10/0x10 [obdclass]
      [ 4777.465657]  wait_for_completion_killable+0x20/0x40
      [ 4777.465660]  __kthread_create_on_node+0xe2/0x170
      [ 4777.465677]  kthread_create_on_node+0x49/0x70
      [ 4777.465680]  ll_dio_user_copy+0x8c/0x100 [obdclass]
      [ 4777.465734]  osc_build_rpc+0x14a/0x1440 [osc]
      [ 4777.465841]  osc_send_write_rpc+0x396/0x470 [osc]
      [ 4777.465861]  osc_check_rpcs+0x11b/0x430 [osc]
      [ 4777.465880]  osc_cache_writeback_range+0xf84/0x1020 [osc]
      [ 4777.465904]  osc_io_fsync_start+0x85/0x360 [osc]
      [ 4777.465922]  cl_io_start+0x61/0x130 [obdclass]
      [ 4777.466088]  lov_io_call.constprop.0+0x73/0x160 [lov]
      [ 4777.466178]  lov_io_start+0xc1/0x180 [lov]
      [ 4777.466190]  cl_io_start+0x61/0x130 [obdclass]
      [ 4777.466244]  cl_io_loop+0x99/0x220 [obdclass]
      [ 4777.466350]  cl_sync_file_range+0x298/0x360 [lustre]
      [ 4777.466552]  ll_writepages+0x195/0x220 [lustre]
      [ 4777.466589]  do_writepages+0xcf/0x1d0
      [ 4777.466688]  filemap_fdatawrite_wbc+0x66/0x90
      [ 4777.466696]  __filemap_fdatawrite_range+0x54/0x80
      [ 4777.466699]  filemap_write_and_wait_range+0x41/0xb0
      [ 4777.466701]  ll_fsync+0x78/0x570 [lustre]
      [ 4777.466761]  do_syscall_64+0x5c/0x90
      

      Strangely, there is also an LASSERT hit for the same thread a fraction later:

      [ 4777.479708] LustreError: 456442:0:(osc_request.c:2804:osc_build_rpc()) ASSERTION( (!((((( gfp_t)(0x400u|0x800u)) | (( gfp_t)0x40u))) != ((( gfp_t)0x20u)|(( gfp_t)0x200u)|(( gfp_t)0x800u))) || (!(((preempt_count() & (((1UL << (4))-1) << (((0 + 8) + 8) + 4))) | (preempt_count() & (((1UL << (4))-1) << ((0 + 8) + 8))) | (preempt_count() & (((1UL << (8))-1) << (0 + 8))))))) ) failed: 
      [ 4777.479713] LustreError: 456442:0:(osc_request.c:2804:osc_build_rpc()) LBUG
      [ 4777.479724] Kernel panic - not syncing: LBUG in interrupt.
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/105920 - 5.14.0-362.24.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/105920 - 4.18.0-513.24.1.el8_lustre.x86_64

      I didn't see any other recent similar crashes, but the affected patch didn't change anything related to CLIO so no expectation that it was causing this issue. Maybe just a low-frequency race condition.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_119f - trevis-58vm4 crashed during sanity test_119f

      Attachments

        Issue Links

          Activity

            People

              paf Patrick Farrell
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: