Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18049

ost-pools test_25, sanity-sec test_31: crash in ext4_htree_store_dirent kmalloc

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/3576290d-064a-4573-a087-75b59fff6df7

      test_25 failed with the following error:

      trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4542 - 5.14.0-362.24.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-b_es6_0/666 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64

      Two clients both crashed in ext4_htree_store_dirent() (NOT ldiskfs) in kmalloc, so it looks like some kind of client-side memory corruption?

      [27299.419062] Lustre: MGC10.240.44.44@tcp: Connection restored to  (at 10.240.44.44@tcp)
      [27299.448580] LustreError: 886364:0:(client.c:3288:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff902e1402e3c0 x1802432488249280/t691489734702(691489734702) o101->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 520/608 e 0 to 0 dl 1718935667 ref 2 fl Interpret:RPQU/204/0 rc 301/301 job:'lfs.0' uid:0 gid:0
      [27300.013294] Lustre: lustre-MDT0000-mdc-ffff902e04449800: Connection restored to  (at 10.240.44.44@tcp)
      [27305.358931] Lustre: 886365:0:(client.c:2334:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718935641/real 1718935641]  req@ffff902e547d9380 x1802432536672704/t0(0) o400->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 224/224 e 0 to 1 dl 1718935657 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0
      [27306.104959] BUG: unable to handle page fault for address: ffff902ee5338778
      [27306.105697] #PF: supervisor read access in kernel mode
      [27306.106204] #PF: error_code(0x0000) - not-present page
      [27306.107607] CPU: 1 PID: 1109213 Comm: bash Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-362.24.1.el9_3.x86_64 #1
      [27306.108653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [27306.109209] RIP: 0010:__kmalloc+0x11b/0x370
      [27306.118475] Call Trace:
      [27306.118756]  <TASK>
      [27306.120625]  ? __die_body.cold+0x8/0xd
      [27306.121006]  ? page_fault_oops+0x134/0x170
      [27306.121437]  ? kernelmode_fixup_or_oops+0x84/0x110
      [27306.121944]  ? exc_page_fault+0xa8/0x150
      [27306.122371]  ? asm_exc_page_fault+0x22/0x30
      [27306.122806]  ? ext4_htree_store_dirent+0x36/0x100 [ext4]
      [27306.123359]  ? __kmalloc+0x11b/0x370
      [27306.123740]  ext4_htree_store_dirent+0x36/0x100 [ext4]
      [27306.124269]  htree_dirblock_to_tree+0x1ab/0x310 [ext4]
      [27306.124809]  ext4_htree_fill_tree+0x203/0x3b0 [ext4]
      [27306.125333]  ext4_dx_readdir+0x10d/0x360 [ext4]
      [27306.125817]  ext4_readdir+0x392/0x550 [ext4]
      [27306.126275]  iterate_dir+0x17c/0x1c0
      [27306.126711]  __x64_sys_getdents64+0x80/0x120
      [27306.128187]  do_syscall_64+0x5c/0x90
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      ost-pools test_25 - trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25

      Attachments

        Issue Links

          Activity

            People

              scherementsev Sergey Cheremencev
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: