Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10481

parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client: Hung

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.11.0
    • None
    • Data on MDT enabled
      2.10.56 both clients and servers
      2 OSTs 1 mdt
      2 clients
      Onyx
    • 3
    • 9223372036854775807

    Description

      Result at - https://testing.hpdd.intel.com/test_sets/c7e2bd8c-fd56-11e7-a6ad-52540065bddc

      == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 21:38:30 (1515533910)
      [357743.218692] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 21:38:30 (1515533910)
      Running /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
      ./dir_remote.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      ./dir_remote.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      ./dir_migrate.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      ./dir_migrate.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      ./dir_migrate.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      ./dir_remote.sh: line 11: RANDOM % MDTCOUNT: division by 0 (error token is "MDTCOUNT")
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
        Trace dump:
        = ./file_create.sh:10:main()
      parallel-scale-nfsv3: FAIL: test-framework exiting on error
        Trace dump:
        = ./file_create.sh:10:main()
      parallel-scale-nfsv3: FAIL: test-framework exiting on error
        Trace dump:
        = ./file_create.sh:10:main()
      parallel-scale-nfsv3: FAIL: test-framework exiting on error
      [357960.620111] INFO: task chown:10956 blocked for more than 120 seconds.
      [357960.626483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.632950] chown           D ffff88000e112750     0 10956  21869 0x00000080
      [357960.635468]  ffff880018727df0 0000000000000086 ffff88000f5d0fd0 ffff880018727fd8
      [357960.637915]  ffff880018727fd8 ffff880018727fd8 ffff88000f5d0fd0 ffff88000e112748
      [357960.640417]  ffff88000e11274c ffff88000f5d0fd0 00000000ffffffff ffff88000e112750
      [357960.642902] Call Trace:
      [357960.644892]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.647213]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.649494]  [<ffffffff81211aad>] ? putname+0x3d/0x60
      [357960.651733]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.653938]  [<ffffffff811fdffb>] chown_common+0xeb/0x1d0
      [357960.656153]  [<ffffffff816b0091>] ? __do_page_fault+0x171/0x450
      [357960.658395]  [<ffffffff811ff5ef>] SyS_fchownat+0xcf/0x120
      [357960.660597]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357960.662816] INFO: task setfattr:10965 blocked for more than 120 seconds.
      [357960.665076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.667408] setfattr        D ffff88000e115f20     0 10965  21855 0x00000080
      [357960.669690]  ffff88003637fd50 0000000000000086 ffff88006ff15ee0 ffff88003637ffd8
      [357960.672058]  ffff88003637ffd8 ffff88003637ffd8 ffff88006ff15ee0 ffff88000e115f18
      [357960.674383]  ffff88000e115f1c ffff88006ff15ee0 00000000ffffffff ffff88000e115f20
      [357960.676681] Call Trace:
      [357960.678530]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.680677]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.682778]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.684763]  [<ffffffff81227a0f>] vfs_removexattr+0x5f/0x130
      [357960.686749]  [<ffffffff81227b35>] removexattr+0x55/0x80
      [357960.688670]  [<ffffffff81211aad>] ? putname+0x3d/0x60
      [357960.690559]  [<ffffffff81212cc2>] ? user_path_at_empty+0x72/0xc0
      [357960.692497]  [<ffffffff81203a58>] ? __sb_start_write+0x58/0x110
      [357960.694401]  [<ffffffff816b0091>] ? __do_page_fault+0x171/0x450
      [357960.696284]  [<ffffffff8122292c>] ? mnt_want_write+0x2c/0x50
      [357960.698151]  [<ffffffff81228924>] SyS_removexattr+0x94/0xd0
      [357960.699982]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357960.701822] INFO: task mv:10981 blocked for more than 120 seconds.
      [357960.703658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.705615] mv              D ffff88000e112750     0 10981  21845 0x00000080
      [357960.707528]  ffff880039c03c20 0000000000000082 ffff88001456af70 ffff880039c03fd8
      [357960.709465]  ffff880039c03fd8 ffff880039c03fd8 ffff88001456af70 ffff88000e112748
      [357960.711361]  ffff88000e11274c ffff88001456af70 00000000ffffffff ffff88000e112750
      [357960.713237] Call Trace:
      [357960.714640]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.716340]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.717978]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.719522]  [<ffffffff816a13e3>] lookup_slow+0x33/0xa7
      [357960.721017]  [<ffffffff8120f0db>] path_lookupat+0x77b/0x7b0
      [357960.722522]  [<ffffffff811df60d>] ? kmem_cache_alloc+0x16d/0x1e0
      [357960.724031]  [<ffffffff81211b1f>] ? getname_flags+0x4f/0x1a0
      [357960.725493]  [<ffffffff8120f13b>] filename_lookup+0x2b/0xc0
      [357960.726928]  [<ffffffff81212cb7>] user_path_at_empty+0x67/0xc0
      [357960.728375]  [<ffffffff81114b22>] ? from_kgid_munged+0x12/0x20
      [357960.729801]  [<ffffffff8120676f>] ? cp_new_stat+0x14f/0x180
      [357960.731242]  [<ffffffff81212d21>] user_path_at+0x11/0x20
      [357960.732635]  [<ffffffff81206263>] vfs_fstatat+0x63/0xc0
      [357960.734032]  [<ffffffff81206831>] SYSC_newlstat+0x31/0x60
      [357960.735403]  [<ffffffff8111f5d6>] ? __audit_syscall_exit+0x1e6/0x280
      [357960.736846]  [<ffffffff81206abe>] SyS_newlstat+0xe/0x10
      [357960.738208]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357960.739627] INFO: task ln:10982 blocked for more than 120 seconds.
      [357960.741074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.742661] ln              D ffff88005dd73398     0 10982  21846 0x00000080
      [357960.744356]  ffff880078c7bd70 0000000000000082 ffff880014568fd0 ffff880078c7bfd8
      [357960.746111]  ffff880078c7bfd8 ffff880078c7bfd8 ffff880014568fd0 ffff88005dd73390
      [357960.747734]  ffff88005dd73394 ffff880014568fd0 00000000ffffffff ffff88005dd73398
      [357960.749349] Call Trace:
      [357960.750511]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.751983]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.753668]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.755112]  [<ffffffff8120f255>] filename_create+0x85/0x180
      [357960.756515]  [<ffffffff811df60d>] ? kmem_cache_alloc+0x16d/0x1e0
      [357960.757937]  [<ffffffff81211b1f>] ? getname_flags+0x4f/0x1a0
      [357960.759339]  [<ffffffff81211b94>] ? getname_flags+0xc4/0x1a0
      [357960.760717]  [<ffffffff81211e41>] user_path_create+0x41/0x60
      [357960.762100]  [<ffffffff8121350e>] SyS_linkat+0xae/0x320
      [357960.763455]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357960.764866] INFO: task mkdir:10986 blocked for more than 120 seconds.
      [357960.766323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.767896] mkdir           D ffff88005dd73398     0 10986  21843 0x00000080
      [357960.769474]  ffff880013bfbd90 0000000000000082 ffff880006408000 ffff880013bfbfd8
      [357960.771124]  ffff880013bfbfd8 ffff880013bfbfd8 ffff880006408000 ffff88005dd73390
      [357960.772745]  ffff88005dd73394 ffff880006408000 00000000ffffffff ffff88005dd73398
      [357960.774391] Call Trace:
      [357960.775583]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.777084]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.778551]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.779933]  [<ffffffff8120f255>] filename_create+0x85/0x180
      [357960.781387]  [<ffffffff81211b1f>] ? getname_flags+0x4f/0x1a0
      [357960.782982]  [<ffffffff81211b94>] ? getname_flags+0xc4/0x1a0
      [357960.784401]  [<ffffffff81211e41>] user_path_create+0x41/0x60
      [357960.785803]  [<ffffffff812131d8>] SyS_mkdirat+0x48/0x100
      [357960.787165]  [<ffffffff812132a9>] SyS_mkdir+0x19/0x20
      [357960.788503]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357960.789920] INFO: task mkdir:10988 blocked for more than 120 seconds.
      [357960.791370] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [357960.792928] mkdir           D ffff88005dd73398     0 10988  21861 0x00000080
      [357960.794496]  ffff8800094bbd90 0000000000000082 ffff88006ff16eb0 ffff8800094bbfd8
      [357960.796144]  ffff8800094bbfd8 ffff8800094bbfd8 ffff88006ff16eb0 ffff88005dd73390
      [357960.797790]  ffff88005dd73394 ffff88006ff16eb0 00000000ffffffff ffff88005dd73398
      [357960.799440] Call Trace:
      [357960.800624]  [<ffffffff816aa4b9>] schedule_preempt_disabled+0x29/0x70
      [357960.802134]  [<ffffffff816a83e7>] __mutex_lock_slowpath+0xc7/0x1d0
      [357960.803629]  [<ffffffff816a77ff>] mutex_lock+0x1f/0x2f
      [357960.805043]  [<ffffffff8120f255>] filename_create+0x85/0x180
      [357960.806486]  [<ffffffff81211b1f>] ? getname_flags+0x4f/0x1a0
      [357960.807928]  [<ffffffff81211b94>] ? getname_flags+0xc4/0x1a0
      [357960.809375]  [<ffffffff81211e41>] user_path_create+0x41/0x60
      [357960.810860]  [<ffffffff812131d8>] SyS_mkdirat+0x48/0x100
      [357960.812242]  [<ffffffff812132a9>] SyS_mkdir+0x19/0x20
      [357960.813581]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
      [357985.696288] nfs: server onyx-73 not responding, still trying
      [357985.700798] nfs: server onyx-73 not responding, still trying
      [357997.728270] nfs: server onyx-73 not responding, still trying
      [357997.732791] nfs: server onyx-73 not responding, still trying
      [358021.792299] nfs: server onyx-73 not responding, still trying
      [358021.796371] nfs: server onyx-73 not responding, still trying
      racer cleanup
      ./file_exec.sh: line 1: kill: (21927) - No such process
      ./file_exec.sh: line 1: kill: (21934) - No such process
      ./file_exec.sh: line 1: kill: (21914) - No such process
      sleeping 5 sec ...
      file_create.sh: no process found
      dir_create.sh: no process found
      file_rm.sh: no process found
      file_rename.sh: no process found
      file_link.sh: no process found
      file_symlink.sh: no process found
      file_list.sh: no process found
      file_concat.sh: no process found
      file_exec.sh: no process found
      file_chown.sh: no process found
      file_chmod.sh: no process found
      file_mknod.sh: no process found
      file_truncate.sh: no process found
      file_delxattr.sh: no process found
      file_getxattr.sh: no process found
      file_setxattr.sh: no process found
      dir_remote.sh: no process found
      dir_migrate.sh: no process found
      there should be NO racer processes:
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      [358069.984316] nfs: server onyx-73 not responding, still trying
      [358250.464082] nfs: server onyx-73 not responding, still trying
      [358250.469403] nfs: server onyx-73 not responding, still trying

      Attachments

        Issue Links

          Activity

            People

              standan Saurabh Tandan (Inactive)
              standan Saurabh Tandan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: