Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18924

Super big mdt.*.hsm.max_requests will cause system crash.

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If setting mdt.*.hsm.max_request to be a huge number, e.g. 2^64-1(max. int), the system will crash when receiving the first hsm request. This is related to the following part code in function mdt_coordinator():

                      CDEBUG(D_HSM, "coordinator starts reading llog\n");
      
                      if (hsd.hsd_request_len != cdt->cdt_max_requests) {
                              /* cdt_max_requests has changed,
                               * we need to allocate a new buffer
                               */
                              struct hsm_scan_request *tmp = NULL;
                              int max_requests = cdt->cdt_max_requests;
                              OBD_ALLOC_LARGE(tmp, max_requests *
                                              sizeof(struct hsm_scan_request));
                              if (!tmp) {
                                      CERROR("Failed to resize request buffer, "
                                             "keeping it at %d\n",
                                             hsd.hsd_request_len);
                              } else {
                                       ....
      

      The system logs showed: 

      kernel: LustreError: 6312:0:(mdt_coordinator.c:714:mdt_coordinator()) vmalloc of 'tmp' (0 bytes) failed
      ...
      kernel: LustreError: 6312:0:(mdt_coordinator.c:718:mdt_coordinator()) Failed to resize request buffer, keeping it at 1048576
      ...
      kernel: hsm_cdtr: vmalloc: allocation failure: 17179869184 bytes, mode:0x608042(GFP_NOFS|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
      ...
      kernel: CPU: 2 PID: 6312 Comm: hsm_cdtr Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.27.1.el8.aarch64 #1
      kernel: Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024
      kernel: Call trace:
      kernel: dump_backtrace+0x0/0x178
      kernel: show_stack+0x28/0x38
      kernel: dump_stack+0x68/0x8c
      kernel: warn_alloc+0x10c/0x190
      kernel: __vmalloc_node_range+0x218/0x2e0
      kernel: __vmalloc+0x84/0xa8
      kernel: mdt_coordinator+0x1010/0x1a68 [mdt]
      kernel: kthread+0x150/0x160
      kernel: ret_from_fork+0x10/0x18
      kernel: Mem-Info:
      ...
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            emoly.liu Emoly Liu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: