Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18829

sanity-sec test_27ab: timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for emoly <emoly@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/08f63f4f-5ac0-4fae-9b9a-f63dfe10d72a

      test_27ab failed with the following error:

      [21815.303862] LNetError: Unexpected error -2 connecting to 10.240.23.49@tcp at host 10.240.23.49:7988
      [21815.305786] LNetError: Skipped 1 previous similar message
      [21818.375690] nfs: server 10.240.16.204 not responding, timed out
      [21818.375742] nfs: server 10.240.16.204 not responding, still trying
      [21818.377043] nfs: server 10.240.16.204 not responding, still trying
      ...
      [25124.265111] nfs: server 10.240.16.204 not responding, timed out
      [25133.480875] nfs: server 10.240.16.204 not responding, timed out
      ...
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/111814 - 5.15.0-94-generic
      servers: https://build.whamcloud.com/job/lustre-reviews/111814 - 4.18.0-553.40.1.el8_lustre.x86_64

      And another timeout case showed the following Call Trace at https://testing.whamcloud.com/test_logs/f5c5ffa4-f1d4-49a6-a3ae-322f7a78884f/show_text

      [21749.376971] INFO: task tee:900701 blocked for more than 120 seconds.
      [21749.380450]       Tainted: G           OE     5.15.0-94-generic #104-Ubuntu
      [21749.381879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [21749.383394] task:tee             state:D stack:    0 pid:900701 ppid:  3172 flags:0x00000002
      [21749.385045] Call Trace:
      [21749.385622]  <TASK>
      [21749.386143]  __schedule+0x24e/0x590
      [21749.386928]  schedule+0x69/0x110
      [21749.387625]  io_schedule+0x46/0x80
      [21749.388341]  wait_on_page_bit_common+0x10c/0x3d0
      [21749.389319]  ? filemap_invalidate_unlock_two+0x50/0x50
      [21749.390346]  wait_on_page_bit+0x3f/0x50
      [21749.391139]  wait_on_page_writeback+0x26/0x80
      [21749.392030]  wait_for_stable_page+0x32/0x40
      [21749.392900]  grab_cache_page_write_begin+0x31/0x40
      [21749.393880]  nfs_write_begin+0x61/0x300 [nfs]
      [21749.394871]  generic_perform_write+0xc9/0x200
      [21749.395754]  ? sched_clock+0x9/0x10
      [21749.396505]  ? __cond_resched+0x1a/0x50
      [21749.397316]  nfs_file_write+0x1a7/0x2c0 [nfs]
      [21749.398227]  new_sync_write+0x114/0x1a0
      [21749.399047]  vfs_write+0x1d5/0x270
      [21749.399770]  ksys_write+0x67/0xf0
      [21749.400466]  __x64_sys_write+0x19/0x20
      [21749.401269]  do_syscall_64+0x5c/0xc0
      [21749.402041]  ? syscall_exit_to_user_mode+0x35/0x50
      [21749.402995]  ? __x64_sys_read+0x19/0x20
      [21749.403815]  ? do_syscall_64+0x69/0xc0
      [21749.404594]  ? do_syscall_64+0x69/0xc0
      [21749.405380]  ? do_syscall_64+0x69/0xc0
      [21749.406251]  ? do_syscall_64+0x69/0xc0
      [21749.407104]  entry_SYSCALL_64_after_hwframe+0x62/0xcc
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-sec test_27ab - Timeout occurred after 450 minutes, last suite running was sanity-sec

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: