Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16163

parallel-scale-nfsv3 test racer_on_nfs hangs with ‘general protection fault’ in nfs3_proc_setacls()

Details

    • Bug
    • Resolution: Low Priority
    • Minor
    • Lustre 2.16.0, Lustre 2.15.4
    • None
    • None
    • SLES15 SP2 clients
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv3 test_racer_on_nfs hangs with ‘general protection fault’ on the client. We’ve only seen this issue once at https://testing.whamcloud.com/test_sets/1ca79db5-dcb8-457d-8d82-540881b78cb7.

      Looking at the suite_log, the last information written before the hang for this test is

      == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 23:48:28 (1633304908)
      CMD: trevis-219vm16,trevis-219vm17 MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      

      There is no output from test racer_on_nfs in the client consoles and not much in the MDS console

      [28519.704689] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 23:48:28 (1633304908)
      [28599.757782] LustreError: 1448:0:(llite_nfs.c:343:ll_dir_get_parent_fid()) lustre: failure inode [0x200026562:0x398c:0x0] get parent: rc = -2
      [28653.246054] reconnect_path: npd != pd
      [28724.302719] LustreError: 1451:0:(llite_nfs.c:343:ll_dir_get_parent_fid()) lustre: failure inode [0x200026562:0x4a5a:0x0] get parent: rc = -2
      [28765.943370] reconnect_path: npd != pd
      

      In the client2 (vm17) dmesg, we see

      [Sun Oct  3 23:48:29 2021] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 23:48:28 (1633304908)
      [30115.270757] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 01:48:00 (1657417680)
      [30115.474474] Lustre: DEBUG MARKER: MDSCOUNT=4 OSTCOUNT=8 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs
      [30224.905303] BUG: kernel NULL pointer dereference, address: 0000000000000028
      [30224.909544] #PF: supervisor read access in kernel mode
      [30224.910526] #PF: error_code(0x0000) - not-present page
      [30224.912000] Oops: 0000 [#1] SMP PTI
      [30224.912670] CPU: 0 PID: 11734 Comm: dd  5.3.18-59.37-default #1 SLE15-SP3
      [30224.914634] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [30224.915728] RIP: 0010:__nfs3_proc_setacls+0x28/0x370 [nfsv3]
      [30224.931896] Call Trace:
      [30224.935347]  nfs3_proc_setacls+0xa/0x20 [nfsv3]
      [30224.936212]  nfs3_proc_create+0x1bd/0x2b0 [nfsv3]
      [30224.937111]  nfs_create+0x82/0x180 [nfs]
      [30224.938945]  path_openat+0x1212/0x1520
      [30224.940487]  do_filp_open+0x9b/0x110
      [30224.942700]  do_sys_open+0x1bd/0x260
      

      In the client2 journal, we see

      Oct 03 23:48:37 trevis-219vm17 systemd[1]: Removed slice User Slice of UID 532.
      Oct 03 23:52:57 trevis-219vm17 kernel: traps: 9[22660] general protection fault ip:7fa32dbdb3cd sp:7ffe2a089838 error:0 in ld-2.26.so[7fa32dbd0000+25000]
      Oct 03 23:52:57 trevis-219vm17 kernel: traps: 1[22693] general protection fault ip:7f842a0a93cd sp:7ffd3a65a368 error:0 in ld-2.26.so[7f842a09e000+25000]
      Oct 03 23:52:57 trevis-219vm17 systemd[1]: Started Process Core Dump (PID 22942/UID 0).
      Oct 03 23:52:57 trevis-219vm17 systemd[1]: Started Process Core Dump (PID 22939/UID 0).
      Oct 03 23:52:58 trevis-219vm17 systemd-coredump[22964]: Process 22693 (1) of user 0 dumped core.
                                                              
                                                              Stack trace of thread 22693:
                                                              #0  0x00007f842a0a93cd _dl_setup_hash (/lib64/ld-2.26.so)
                                                              #1  0x00007f842a0a0ddb dl_main (/lib64/ld-2.26.so)
                                                              #2  0x00007f842a0b7010 _dl_sysdep_start (/lib64/ld-2.26.so)
                                                              #3  0x00007f842a09fdb8 _dl_start (/lib64/ld-2.26.so)
                                                              #4  0x00007f842a09eea8 _start (/lib64/ld-2.26.so)
      Oct 03 23:52:58 trevis-219vm17 systemd-coredump[22967]: Process 22660 (9) of user 0 dumped core.
                                                              
                                                              Stack trace of thread 22660:
                                                              #0  0x00007fa32dbdb3cd _dl_setup_hash (/lib64/ld-2.26.so)
                                                              #1  0x00007fa32dbd2ddb dl_main (/lib64/ld-2.26.so)
                                                              #2  0x00007fa32dbe9010 _dl_sysdep_start (/lib64/ld-2.26.so)
                                                              #3  0x00007fa32dbd1db8 _dl_start (/lib64/ld-2.26.so)
                                                              #4  0x00007fa32dbd0ea8 _start (/lib64/ld-2.26.so)
      Oct 03 23:53:34 trevis-219vm17 mrshd[11196]: pam_unix(mrsh:session): session closed for user root
      Oct 03 23:53:34 trevis-219vm17 systemd-logind[1522]: Session c19961 logged out. Waiting for processes to exit.
      Oct 03 23:53:34 trevis-219vm17 systemd-logind[1522]: Removed session c19961.
      Oct 03 23:53:45 trevis-219vm17 systemd[1]: user-runtime-dir@0.service: Unit not needed anymore. Stopping.
      

      Attachments

        Issue Links

          Activity

            [LU-16163] parallel-scale-nfsv3 test racer_on_nfs hangs with ‘general protection fault’ in nfs3_proc_setacls()

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51282/
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 3626be5686cc395ce622d281a993603dba16e3e2

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51282/ Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 3626be5686cc395ce622d281a993603dba16e3e2

            "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51282
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 6ba201ca53c6aba58c397ba57ad147b4bbc3caec

            gerrit Gerrit Updater added a comment - "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51282 Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 6ba201ca53c6aba58c397ba57ad147b4bbc3caec

            Problem is not actually fixed, but we've stopped testing racer_on_nfs for NFSv3.

            adilger Andreas Dilger added a comment - Problem is not actually fixed, but we've stopped testing racer_on_nfs for NFSv3.

            Reopen temporarily to change resolution.

            adilger Andreas Dilger added a comment - Reopen temporarily to change resolution.
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50579/
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 892d726f274c7cd4e505689ad69194ac68dc323b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50579/ Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 892d726f274c7cd4e505689ad69194ac68dc323b

            "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50579
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2b92c4189827090a80daefe37f752fdbabd1b939

            gerrit Gerrit Updater added a comment - "Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50579 Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2b92c4189827090a80daefe37f752fdbabd1b939

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50385/
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 513eb670b01f15104cbeb2909a141d2174dcc874

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50385/ Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 513eb670b01f15104cbeb2909a141d2174dcc874

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50385
            Subject: LU-16163 tests: skip racer_on_nfs for NFSv3
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: df15c8baa63d747eeb23da451f7cc50f5db98da7

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50385 Subject: LU-16163 tests: skip racer_on_nfs for NFSv3 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: df15c8baa63d747eeb23da451f7cc50f5db98da7

            It looks like client1 crashed in NFS:

            [28679.492102] BUG: kernel NULL pointer dereference, address: 0000000000000028
            [28679.496074] CPU: 0 PID: 32107 Comm: file_concat.sh  5.3.18-24.78-default #1 SLE15-SP2
            [28679.498997] RIP: 0010:__nfs3_proc_setacls+0x28/0x370 [nfsv3]
            [28679.514160] Call Trace:
            [28679.516928]  nfs3_proc_setacls+0xa/0x20 [nfsv3]
            [28679.517615]  nfs3_proc_create+0x1be/0x2a0 [nfsv3]
            [28679.518338]  nfs_create+0x83/0x180 [nfs]
            [28679.519577]  path_openat+0x1212/0x1520
            [28679.520161]  do_filp_open+0x9b/0x110
            [28679.521899]  do_sys_open+0x1bd/0x260
            
            adilger Andreas Dilger added a comment - It looks like client1 crashed in NFS: [28679.492102] BUG: kernel NULL pointer dereference, address: 0000000000000028 [28679.496074] CPU: 0 PID: 32107 Comm: file_concat.sh 5.3.18-24.78-default #1 SLE15-SP2 [28679.498997] RIP: 0010:__nfs3_proc_setacls+0x28/0x370 [nfsv3] [28679.514160] Call Trace: [28679.516928] nfs3_proc_setacls+0xa/0x20 [nfsv3] [28679.517615] nfs3_proc_create+0x1be/0x2a0 [nfsv3] [28679.518338] nfs_create+0x83/0x180 [nfs] [28679.519577] path_openat+0x1212/0x1520 [28679.520161] do_filp_open+0x9b/0x110 [28679.521899] do_sys_open+0x1bd/0x260

            People

              Deiter Alex Deiter
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: