Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18085

RIP: 0010:ll_prune_negative_children+0xaf/0x260 [lustre]

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0, Lustre 2.15.7
    • Lustre 2.16.0, Lustre 2.15.6
    • 3
    • 9223372036854775807

    Description

      Many subtests in Ubuntu 24.04 client test sessions failed as follows:

      BUG: kernel NULL pointer dereference, address: 0000000000000004
      #PF: supervisor write access in kernel mode 
      #PF: error_code(0x0002) - not-present page
      Oops: 0002 [#1] PREEMPT SMP PTI
      CPU: 1 PID: 297293 Comm: ldlm_bl_04  6.8.0-31-generic #31-Ubuntu
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:ll_prune_negative_children+0xaf/0x260 [lustre]
      Call Trace:
       <TASK>
       ? page_fault_oops+0x99/0x1b0
       ? do_user_addr_fault+0x2ee/0x6b0
       ll_md_blocking_ast+0xb23/0xdf0 [lustre]
       ldlm_cancel_callback+0x7d/0x290 [ptlrpc]
       ldlm_cli_cancel_local+0xab/0x4a0 [ptlrpc]
       ldlm_cli_cancel_list_local+0x102/0x2c0 [ptlrpc]
       ldlm_bl_thread_main+0x826/0x9e0 [ptlrpc]
       kthread+0xf2/0x120
       ret_from_fork+0x47/0x70
       </TASK>
      

      sanity test 24v: https://testing.whamcloud.com/test_sets/2255de0b-e637-40c6-a7c2-af194af04f9a
      sanity test 72a: https://testing.whamcloud.com/test_sets/e4626eb0-6814-4125-b8c9-ec82d9d45a6d
      sanityn test 90: https://testing.whamcloud.com/test_sets/dbb37b62-af78-469d-b324-8b599665b4e7
      racer test 1: https://testing.whamcloud.com/test_sets/71154acf-735c-4355-adec-f554fa4e5788
       

      Attachments

        Issue Links

          Activity

            [LU-18085] RIP: 0010:ll_prune_negative_children+0xaf/0x260 [lustre]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57007/
            Subject: LU-18085 llite: use RCU to protect the dentry_data
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: a71369eb9cb0aa89ede41cb01b2cd9cdcd8e9680

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57007/ Subject: LU-18085 llite: use RCU to protect the dentry_data Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: a71369eb9cb0aa89ede41cb01b2cd9cdcd8e9680

            Looks like above patch set for b2_15 unfortunately did not make it into 2.15.6.

            Example crash with 2.15.6_RC1 client on EL 9.5 kernel-5.14.0-503.14.1.el9_5.x86_64:
            https://rpa.st/MMAA

            Seems ok with this patch set applied.

            nscfreny Fredrik Nyström added a comment - Looks like above patch set for b2_15 unfortunately did not make it into 2.15.6. Example crash with 2.15.6_RC1 client on EL 9.5 kernel-5.14.0-503.14.1.el9_5.x86_64: https://rpa.st/MMAA Seems ok with this patch set applied.

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57007
            Subject: LU-18085 llite: use RCU to protect the dentry_data
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 68890e48d859f05a5a7820b1140e875322039f1a

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57007 Subject: LU-18085 llite: use RCU to protect the dentry_data Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 68890e48d859f05a5a7820b1140e875322039f1a
            yujian Jian Yu added a comment - +1 on Lustre b2_15 branch: https://testing.whamcloud.com/test_sets/22a80329-a690-4293-81c0-4750101aa9ac
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55984/
            Subject: LU-18085 llite: use RCU to protect the dentry_data
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 983999bda71115595df48d614ca1aaf9b746c75f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55984/ Subject: LU-18085 llite: use RCU to protect the dentry_data Project: fs/lustre-release Branch: master Current Patch Set: Commit: 983999bda71115595df48d614ca1aaf9b746c75f

            "Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55984
            Subject: LU-18085 llite: use RCU to protect the dentry_data
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 92472cd50c96e1775d9377d68ce1eef3dc632b96

            gerrit Gerrit Updater added a comment - "Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55984 Subject: LU-18085 llite: use RCU to protect the dentry_data Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 92472cd50c96e1775d9377d68ce1eef3dc632b96
            yujian Jian Yu added a comment - - edited

            Hi ys,
            Please find the error in the client console log.
            E.g., https://testing.whamcloud.com/test_sets/2255de0b-e637-40c6-a7c2-af194af04f9a
            Clinet onyx-105vm7 console log: https://testing.whamcloud.com/test_logs/86e714a4-0b56-448c-bba5-ec6f0609da75/show_text

            [  124.880463] kdump-tools[713]:  * Dumping to NFS mountpoint 10.240.16.204:/export/scratch/dumps/onyx-105vm7.onyx.whamcloud.com/202407312050
            [  124.889393] kdump-tools[713]:  * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/10.240.28.170-202407312050/dmesg.202407312050
            [  124.964759] kdump-tools[764]: The dmesg log is saved to /var/crash/10.240.28.170-202407312050/dmesg.202407312050.
            [  124.968488] kdump-tools[764]: makedumpfile Completed.
            [  124.970438] kdump-tools[713]:  * kdump-tools: saved dmesg content in /var/crash/10.240.28.170-202407312050
            [  124.981048] kdump-tools[713]:  * running makedumpfile -F -c -d 31 /proc/vmcore | compress > /var/crash/10.240.28.170-202407312050/dump-incomplete
            [  131.127624] kdump-tools[769]: 
            Checking for memory holes                         : [  0.0 %] /                  
            Checking for memory holes                         : [100.0 %] |                  
            Excluding unnecessary pages                       : [100.0 %] \                  
            Checking for memory holes                         : [100.0 %] -                  
            Checking for memory holes                         : [100.0 %] /                  
            Excluding unnecessary pages                       : [100.0 %] |                  
            Copying data                                      : [  1.3 %] \           eta: 6s
            Copying data                                      : [ 17.3 %] -           eta: 5s
            Copying data                                      : [ 34.7 %] /           eta: 3s
            Copying data                                      : [ 53.0 %] |           eta: 2s
            Copying data                                      : [ 71.1 %] \           eta: 1s
            Copying data                                      : [ 82.3 %] -           eta: 1s
            Copying data                                      : [ 99.7 %] /           eta: 0s
            Copying data                                      : [100.0 %] |           eta: 0s
            Copying data                                      : [100.0 %] \           eta: 0s
            [  131.144577] kdump-tools[769]: The dumpfile is saved to STDOUT.
            [  131.146330] kdump-tools[769]: makedumpfile Completed.
            [  131.230323] kdump-tools[713]:  * kdump-tools: saved vmcore in /var/crash/10.240.28.170-202407312050
            

            On onyx:

            # ls -htl /scratch/dumps/onyx-105vm7.onyx.whamcloud.com/10.240.28.170-202407312050
            total 84M
            -rw-r--r-- 1 root root  83M Jul 31 20:50 dump.202407312050
            -rw----r-- 1 root root 306K Jul 31 20:50 dmesg.202407312050
            
            yujian Jian Yu added a comment - - edited Hi ys , Please find the error in the client console log. E.g., https://testing.whamcloud.com/test_sets/2255de0b-e637-40c6-a7c2-af194af04f9a Clinet onyx-105vm7 console log: https://testing.whamcloud.com/test_logs/86e714a4-0b56-448c-bba5-ec6f0609da75/show_text [ 124.880463] kdump-tools[713]: * Dumping to NFS mountpoint 10.240.16.204:/export/scratch/dumps/onyx-105vm7.onyx.whamcloud.com/202407312050 [ 124.889393] kdump-tools[713]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/10.240.28.170-202407312050/dmesg.202407312050 [ 124.964759] kdump-tools[764]: The dmesg log is saved to /var/crash/10.240.28.170-202407312050/dmesg.202407312050. [ 124.968488] kdump-tools[764]: makedumpfile Completed. [ 124.970438] kdump-tools[713]: * kdump-tools: saved dmesg content in /var/crash/10.240.28.170-202407312050 [ 124.981048] kdump-tools[713]: * running makedumpfile -F -c -d 31 /proc/vmcore | compress > /var/crash/10.240.28.170-202407312050/dump-incomplete [ 131.127624] kdump-tools[769]: Checking for memory holes : [ 0.0 %] / Checking for memory holes : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Checking for memory holes : [100.0 %] - Checking for memory holes : [100.0 %] / Excluding unnecessary pages : [100.0 %] | Copying data : [ 1.3 %] \ eta: 6s Copying data : [ 17.3 %] - eta: 5s Copying data : [ 34.7 %] / eta: 3s Copying data : [ 53.0 %] | eta: 2s Copying data : [ 71.1 %] \ eta: 1s Copying data : [ 82.3 %] - eta: 1s Copying data : [ 99.7 %] / eta: 0s Copying data : [100.0 %] | eta: 0s Copying data : [100.0 %] \ eta: 0s [ 131.144577] kdump-tools[769]: The dumpfile is saved to STDOUT. [ 131.146330] kdump-tools[769]: makedumpfile Completed. [ 131.230323] kdump-tools[713]: * kdump-tools: saved vmcore in /var/crash/10.240.28.170-202407312050 On onyx: # ls -htl /scratch/dumps/onyx-105vm7.onyx.whamcloud.com/10.240.28.170-202407312050 total 84M -rw-r--r-- 1 root root 83M Jul 31 20:50 dump.202407312050 -rw----r-- 1 root root 306K Jul 31 20:50 dmesg.202407312050
            ys Yang Sheng added a comment -

            Hi, YuJian,

            I am confused by the information has been attached in this ticket. As the summary, It should be a crash issue. Then a vmcore file is expected. But i don't found any relate things from the test results that you have provided. Could you please point which one should be consistent with summary? Maybe i miss something?

            ys Yang Sheng added a comment - Hi, YuJian, I am confused by the information has been attached in this ticket. As the summary, It should be a crash issue. Then a vmcore file is expected. But i don't found any relate things from the test results that you have provided. Could you please point which one should be consistent with summary? Maybe i miss something?
            pjones Peter Jones added a comment -

            Yang Sheng

            Can you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yang Sheng Can you please investigate? Thanks Peter

            People

              ys Yang Sheng
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: