Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18354

sanity test_136: ZFS crash due to OOM/NULL pointer deref

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/37653b0d-00b2-48d6-b8ce-0c09bb5f3f0a

      test_136 failed with the following error:

      trevis-99vm4 crashed during sanity test_136
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master-patchless/772 - 4.18.0-240.22.1.el8_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-master-patchless/772 - 4.18.0-240.22.1.el8_3.x86_64

      It looks like this has been crashing for a long time, but only in full testing because the test is only run with "SLOW=y", and is skipped otherwise. The test itself is allocating and immediately deleting about 150k files in a loop.

      The first failure is on 2023-04-22 with commit v2_15_55-90-g73a7b1c2a3, and it looks like the early test failures are all with ZFS hitting OOM:
      https://testing.whamcloud.com/test_sets/37653b0d-00b2-48d6-b8ce-0c09bb5f3f0a

      but by 2023-07-01 they are hitting a NULL pointer dereference (likely also an allocation failure before OOM) in the ZFS inode handling:
      https://testing.whamcloud.com/test_sets/855d00e9-7906-465b-96cf-f7cfbc08c16a

      [19856.663841] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [19856.673442] CPU: 1 PID: 851991 Comm: dp_sync_taskq 4.18.0-477.10.1.el8_lustre.x86_64 #1
      [19856.675740] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [19856.676816] RIP: 0010:arc_write+0xf5/0x460 [zfs]
      [19856.692605] Call Trace:
      [19856.696922]  dbuf_write+0x2ff/0x550 [zfs]
      [19856.700667]  dbuf_sync_leaf+0x137/0x660 [zfs]
      [19856.703284]  dbuf_sync_list+0xcf/0x120 [zfs]
      [19856.704161]  dbuf_sync_indirect+0xe2/0x170 [zfs]
      [19856.705108]  dbuf_sync_list+0xae/0x120 [zfs]
      [19856.705992]  dbuf_sync_indirect+0xe2/0x170 [zfs]
      [19856.706935]  dbuf_sync_list+0xae/0x120 [zfs]
      [19856.707814]  dnode_sync+0x365/0xa20 [zfs]
      [19856.709421]  sync_dnodes_task+0x71/0xa0 [zfs]
      [19856.710341]  taskq_thread+0x2e1/0x510 [spl]
      [19856.712759]  kthread+0x134/0x150
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_136 - trevis-99vm4 crashed during sanity test_136

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: