Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10415

parallel-scale-nfsv3 test_racer_on_nfs: Timeout occurred after 335 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.11.0
    • onyx, full DNE
      servers: el7.4, ldiskfs, branch master, v2.10.56, b3678
      clients: el7.4, branch master, v2.10.56, b3678
    • 3
    • 9223372036854775807

    Description

      session: https://testing.hpdd.intel.com/test_sessions/45ec3e40-419a-47db-95d1-7dbe1c6a0b66
      test set: https://testing.hpdd.intel.com/test_sets/bd99b32a-e0a6-11e7-9c63-52540065bddc

      There are 10 traces after parallel-scale-nfsv3 times out, and the tops of the dd and ln traces look the same:

      From console log:

      [17022.391464] nfs: server onyx-30vm4 not responding, still trying
      [17040.277998] INFO: task dd:9525 blocked for more than 120 seconds.
      [17040.278758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [17040.279567] dd              D ffffffff816a76b0     0  9525   3854 0x00000080
      [17040.280435]  ffff88006b797bd0 0000000000000082 ffff88004ef3bf40 ffff88006b797fd8
      [17040.281271]  ffff88006b797fd8 ffff88006b797fd8 ffff88004ef3bf40 ffff88007fd16cc0
      [17040.282108]  0000000000000000 7fffffffffffffff ffff88007ff682e8 ffffffff816a76b0
      [17040.282991] Call Trace:
      [17040.283302]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.283823]  [<ffffffff816a9589>] schedule+0x29/0x70
      [17040.284335]  [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
      [17040.285022]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.285681]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.286276]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.286954]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.287534]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.288113]  [<ffffffff816a8c0d>] io_schedule_timeout+0xad/0x130
      [17040.288743]  [<ffffffff816a8ca8>] io_schedule+0x18/0x20
      [17040.289291]  [<ffffffff816a76c1>] bit_wait_io+0x11/0x50
      [17040.289862]  [<ffffffff816a71e5>] __wait_on_bit+0x65/0x90
      [17040.290411]  [<ffffffff81181cc1>] wait_on_page_bit+0x81/0xa0
      [17040.291041]  [<ffffffff810b19e0>] ? wake_bit_function+0x40/0x40
      [17040.291656]  [<ffffffff81181df1>] __filemap_fdatawait_range+0x111/0x190
      [17040.292335]  [<ffffffff81181e84>] filemap_fdatawait_range+0x14/0x30
      
      [17040.293040]  [<ffffffff81183dc6>] filemap_write_and_wait_range+0x56/0x90
      [17040.293751]  [<ffffffffc04e3516>] nfs_file_fsync+0x86/0x110 [nfs]
      [17040.294387]  [<ffffffff812333cb>] vfs_fsync+0x2b/0x40
      [17040.294982]  [<ffffffffc04e3956>] nfs_file_flush+0x46/0x60 [nfs]
      [17040.295583]  [<ffffffff811fe294>] filp_close+0x34/0x80
      [17040.296148]  [<ffffffff81220388>] __close_fd+0x78/0xa0
      [17040.296709]  [<ffffffff811ffd03>] SyS_close+0x23/0x50
      [17040.297241]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      

      and

      [17040.339627] ln              D ffffffff816a76b0     0  9837   3840 0x00000080
      [17040.340461]  ffff880046a7bb20 0000000000000086 ffff88007b6eeeb0 ffff880046a7bfd8
      [17040.341300]  ffff880046a7bfd8 ffff880046a7bfd8 ffff88007b6eeeb0 ffff88007fd16cc0
      [17040.342135]  0000000000000000 7fffffffffffffff ffff88007ff682e8 ffffffff816a76b0
      [17040.343018] Call Trace:
      [17040.343304]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.343836]  [<ffffffff816a9589>] schedule+0x29/0x70
      [17040.344344]  [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
      [17040.345048]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.345696]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.346285]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.346973]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.347563]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.348104]  [<ffffffff816a8c0d>] io_schedule_timeout+0xad/0x130
      [17040.348751]  [<ffffffff816a8ca8>] io_schedule+0x18/0x20
      [17040.349299]  [<ffffffff816a76c1>] bit_wait_io+0x11/0x50
      [17040.349880]  [<ffffffff816a71e5>] __wait_on_bit+0x65/0x90
      [17040.350431]  [<ffffffff81181cc1>] wait_on_page_bit+0x81/0xa0
      [17040.351069]  [<ffffffff810b19e0>] ? wake_bit_function+0x40/0x40
      [17040.351685]  [<ffffffff81181df1>] __filemap_fdatawait_range+0x111/0x190
      [17040.352376]  [<ffffffff81181e84>] filemap_fdatawait_range+0x14/0x30
      
      [17040.353073]  [<ffffffff81181ec7>] filemap_fdatawait+0x27/0x30
      [17040.353681]  [<ffffffff81183cfc>] filemap_write_and_wait+0x4c/0x80
      [17040.354327]  [<ffffffffc04f4910>] nfs_wb_all+0x20/0x100 [nfs]
      [17040.354982]  [<ffffffffc04e7b7b>] nfs_getattr+0x1bb/0x250 [nfs]
      [17040.355571]  [<ffffffff812062c6>] vfs_getattr+0x46/0x80
      [17040.356112]  [<ffffffff812063f5>] vfs_fstatat+0x75/0xc0
      [17040.356705]  [<ffffffff8120694e>] SYSC_newstat+0x2e/0x60
      [17040.357262]  [<ffffffff816b0456>] ? trace_do_page_fault+0x56/0x150
      [17040.357940]  [<ffffffff816afaea>] ? do_async_page_fault+0x1a/0xd0
      [17040.358556]  [<ffffffff816ac5f8>] ? async_page_fault+0x28/0x30
      [17040.359218]  [<ffffffff81206c2e>] SyS_newstat+0xe/0x10
      [17040.359763]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: