Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10415

parallel-scale-nfsv3 test_racer_on_nfs: Timeout occurred after 335 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.11.0
    • None
    • onyx, full DNE
      servers: el7.4, ldiskfs, branch master, v2.10.56, b3678
      clients: el7.4, branch master, v2.10.56, b3678
    • 3
    • 9223372036854775807

    Description

      session: https://testing.hpdd.intel.com/test_sessions/45ec3e40-419a-47db-95d1-7dbe1c6a0b66
      test set: https://testing.hpdd.intel.com/test_sets/bd99b32a-e0a6-11e7-9c63-52540065bddc

      There are 10 traces after parallel-scale-nfsv3 times out, and the tops of the dd and ln traces look the same:

      From console log:

      [17022.391464] nfs: server onyx-30vm4 not responding, still trying
      [17040.277998] INFO: task dd:9525 blocked for more than 120 seconds.
      [17040.278758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [17040.279567] dd              D ffffffff816a76b0     0  9525   3854 0x00000080
      [17040.280435]  ffff88006b797bd0 0000000000000082 ffff88004ef3bf40 ffff88006b797fd8
      [17040.281271]  ffff88006b797fd8 ffff88006b797fd8 ffff88004ef3bf40 ffff88007fd16cc0
      [17040.282108]  0000000000000000 7fffffffffffffff ffff88007ff682e8 ffffffff816a76b0
      [17040.282991] Call Trace:
      [17040.283302]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.283823]  [<ffffffff816a9589>] schedule+0x29/0x70
      [17040.284335]  [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
      [17040.285022]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.285681]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.286276]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.286954]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.287534]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.288113]  [<ffffffff816a8c0d>] io_schedule_timeout+0xad/0x130
      [17040.288743]  [<ffffffff816a8ca8>] io_schedule+0x18/0x20
      [17040.289291]  [<ffffffff816a76c1>] bit_wait_io+0x11/0x50
      [17040.289862]  [<ffffffff816a71e5>] __wait_on_bit+0x65/0x90
      [17040.290411]  [<ffffffff81181cc1>] wait_on_page_bit+0x81/0xa0
      [17040.291041]  [<ffffffff810b19e0>] ? wake_bit_function+0x40/0x40
      [17040.291656]  [<ffffffff81181df1>] __filemap_fdatawait_range+0x111/0x190
      [17040.292335]  [<ffffffff81181e84>] filemap_fdatawait_range+0x14/0x30
      
      [17040.293040]  [<ffffffff81183dc6>] filemap_write_and_wait_range+0x56/0x90
      [17040.293751]  [<ffffffffc04e3516>] nfs_file_fsync+0x86/0x110 [nfs]
      [17040.294387]  [<ffffffff812333cb>] vfs_fsync+0x2b/0x40
      [17040.294982]  [<ffffffffc04e3956>] nfs_file_flush+0x46/0x60 [nfs]
      [17040.295583]  [<ffffffff811fe294>] filp_close+0x34/0x80
      [17040.296148]  [<ffffffff81220388>] __close_fd+0x78/0xa0
      [17040.296709]  [<ffffffff811ffd03>] SyS_close+0x23/0x50
      [17040.297241]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      

      and

      [17040.339627] ln              D ffffffff816a76b0     0  9837   3840 0x00000080
      [17040.340461]  ffff880046a7bb20 0000000000000086 ffff88007b6eeeb0 ffff880046a7bfd8
      [17040.341300]  ffff880046a7bfd8 ffff880046a7bfd8 ffff88007b6eeeb0 ffff88007fd16cc0
      [17040.342135]  0000000000000000 7fffffffffffffff ffff88007ff682e8 ffffffff816a76b0
      [17040.343018] Call Trace:
      [17040.343304]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.343836]  [<ffffffff816a9589>] schedule+0x29/0x70
      [17040.344344]  [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
      [17040.345048]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.345696]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.346285]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
      [17040.346973]  [<ffffffff810e93ac>] ? ktime_get_ts64+0x4c/0xf0
      [17040.347563]  [<ffffffff816a76b0>] ? bit_wait+0x50/0x50
      [17040.348104]  [<ffffffff816a8c0d>] io_schedule_timeout+0xad/0x130
      [17040.348751]  [<ffffffff816a8ca8>] io_schedule+0x18/0x20
      [17040.349299]  [<ffffffff816a76c1>] bit_wait_io+0x11/0x50
      [17040.349880]  [<ffffffff816a71e5>] __wait_on_bit+0x65/0x90
      [17040.350431]  [<ffffffff81181cc1>] wait_on_page_bit+0x81/0xa0
      [17040.351069]  [<ffffffff810b19e0>] ? wake_bit_function+0x40/0x40
      [17040.351685]  [<ffffffff81181df1>] __filemap_fdatawait_range+0x111/0x190
      [17040.352376]  [<ffffffff81181e84>] filemap_fdatawait_range+0x14/0x30
      
      [17040.353073]  [<ffffffff81181ec7>] filemap_fdatawait+0x27/0x30
      [17040.353681]  [<ffffffff81183cfc>] filemap_write_and_wait+0x4c/0x80
      [17040.354327]  [<ffffffffc04f4910>] nfs_wb_all+0x20/0x100 [nfs]
      [17040.354982]  [<ffffffffc04e7b7b>] nfs_getattr+0x1bb/0x250 [nfs]
      [17040.355571]  [<ffffffff812062c6>] vfs_getattr+0x46/0x80
      [17040.356112]  [<ffffffff812063f5>] vfs_fstatat+0x75/0xc0
      [17040.356705]  [<ffffffff8120694e>] SYSC_newstat+0x2e/0x60
      [17040.357262]  [<ffffffff816b0456>] ? trace_do_page_fault+0x56/0x150
      [17040.357940]  [<ffffffff816afaea>] ? do_async_page_fault+0x1a/0xd0
      [17040.358556]  [<ffffffff816ac5f8>] ? async_page_fault+0x28/0x30
      [17040.359218]  [<ffffffff81206c2e>] SyS_newstat+0xe/0x10
      [17040.359763]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-10415] parallel-scale-nfsv3 test_racer_on_nfs: Timeout occurred after 335 mins, last suite running was parallel-scale-nfsv3, restarting cluster to continue tests

            This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.

            adilger Andreas Dilger added a comment - This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.

            People

              wc-triage WC Triage
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: