Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13063

sanity test 411 times out for RHEL8.1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0, Lustre 2.12.4
    • RHEL 8.1
    • 3
    • 9223372036854775807

    Description

      The last thing seen in the suite_log for sanity test 411 is

      == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 04:54:33 (1575953673)
      100+0 records in
      100+0 records out
      104857600 bytes (105 MB, 100 MiB) copied, 3.88888 s, 27.0 MB/s
      

      Normally, on successful runs, we would see a dd error reading the file just created, but the test hangs at this point. Looking at the console logs, it’s not clear why the test is hanging, but we see lnet-selftest processes hung. Looking at the stack trace on the first client (vm10), we see that there a lnet-selftest process stuck D state

      [14127.185129] lst_t_00_00     S    0 14488      2 0x80000080
      [14127.186075] Call Trace:
      [14127.186561]  ? __schedule+0x253/0x830
      [14127.187236]  ? sfw_test_unit_done.isra.14+0x9d/0x150 [lnet_selftest]
      [14127.188348]  schedule+0x28/0x70
      [14127.188929]  cfs_wi_scheduler+0x40d/0x420 [libcfs]
      [14127.189783]  ? finish_wait+0x80/0x80
      [14127.190466]  ? cfs_wi_sched_create+0x5a0/0x5a0 [libcfs]
      [14127.191397]  kthread+0x112/0x130
      [14127.191984]  ? kthread_flush_work_fn+0x10/0x10
      [14127.192782]  ret_from_fork+0x35/0x40
      [14127.193448] st_timer        D    0 14636      2 0x80000080
      [14127.194413] Call Trace:
      [14127.194882]  ? __schedule+0x253/0x830
      [14127.195555]  schedule+0x28/0x70
      [14127.196142]  schedule_timeout+0x16b/0x390
      [14127.196859]  ? __next_timer_interrupt+0xc0/0xc0
      [14127.197678]  ? prepare_to_wait_event+0xbb/0x140
      [14127.198496]  stt_timer_main+0x215/0x230 [lnet_selftest]
      [14127.199436]  ? finish_wait+0x80/0x80
      [14127.200083]  ? sfw_startup+0x540/0x540 [lnet_selftest]
      [14127.200989]  kthread+0x112/0x130
      [14127.201595]  ? kthread_flush_work_fn+0x10/0x10
      [14127.202393]  ret_from_fork+0x35/0x40
      

      Similarly in the stack-trace log on the MDS (vm12), we see the lnet process

      [14034.774700] st_timer        D ffff9cb15b62a080     0 28114      2 0x00000080
      [14034.776068] Call Trace:
      [14034.776493]  [<ffffffffb0f6af19>] schedule+0x29/0x70
      [14034.777425]  [<ffffffffb0f68968>] schedule_timeout+0x168/0x2d0
      [14034.778391]  [<ffffffffb08cfeb4>] ? __wake_up+0x44/0x50
      [14034.779358]  [<ffffffffb08aab30>] ? __internal_add_timer+0x130/0x130
      [14034.780432]  [<ffffffffb08c3a46>] ? prepare_to_wait+0x56/0x90
      [14034.781474]  [<ffffffffc1542a98>] stt_timer_main+0x168/0x220 [lnet_selftest]
      [14034.782654]  [<ffffffffb08c3f50>] ? wake_up_atomic_t+0x30/0x30
      [14034.783688]  [<ffffffffc1542930>] ? sfw_startup+0x580/0x580 [lnet_selftest]
      [14034.784856]  [<ffffffffb08c2e81>] kthread+0xd1/0xe0
      [14034.785787]  [<ffffffffb08c2db0>] ? insert_kthread_work+0x40/0x40
      [14034.786818]  [<ffffffffb0f77c37>] ret_from_fork_nospec_begin+0x21/0x21
      [14034.788077]  [<ffffffffb08c2db0>] ? insert_kthread_work+0x40/0x40
      

      lnet-selftest did run and fail (LU-10073) previous to sanity running. It’s not clear if lnet-selftest is a cause of this test hang.

      We’ve see this test hang twice for RHEL 8.1 testing both in December
      https://testing.whamcloud.com/test_sets/293b5216-1b13-11ea-a9d7-52540065bddc
      https://testing.whamcloud.com/test_sets/133daa46-1b8a-11ea-b1e8-52540065bddc

      In addition, we've seen this once in the past 3 months in PPC testing for a patch for LU-11997 at https://testing.whamcloud.com/test_sets/b4851392-f175-11e9-b62b-52540065bddc .

      Attachments

        Issue Links

          Activity

            People

              dongyang Dongyang Li
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: