Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14178

conf-sanity test_5d: mount.lustre: mount at /mnt/lustre failed: Cannot allocate memory

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      There are essentially two problems here:

      • the first issue is that the LDLM namespace is not being cleaned up properly for some reason, which is causing sysfs to report an error trying to re-register a parameter file
      • the secondary issue is that ldlm_namespace_sysfs_register() is returning an -EEXIST = -17 error to ldlm_namespace_new(), but ldlm_namespace_new() returns NULL on any failure, and the caller interprets this NULL as -ENOMEM = -12 which generates a misleading "Cannot allocate memory" error higher up the stack and returns this to userspace
        sysfs: cannot create duplicate filename '/fs/lustre/ldlm/namespaces/lustre-OST0002-osc-ffff89f33be70000'
        Call Trace:
         dump_stack+0x19/0x1b
         __warn+0xd8/0x100
         sysfs_warn_dup+0x64/0x80
         sysfs_create_dir_ns+0x8e/0xa0
         kobject_add_internal+0xaa/0x330
         kobject_init_and_add+0x70/0xb0
         ldlm_namespace_sysfs_register+0x68/0xc0 [ptlrpc]
         ldlm_namespace_new+0x335/0xac0 [ptlrpc]
         client_obd_setup+0xd77/0x1430 [ptlrpc]
         osc_setup_common+0x63/0x320 [osc]
         osc_setup+0x33/0x240 [osc]
         osc_device_alloc+0xa5/0x240 [osc]
         obd_setup+0x129/0x2f0 [obdclass]
         class_setup+0x2a8/0x840 [obdclass]
         class_process_config+0x1569/0x27c0 [obdclass]
         class_config_llog_handler+0x7f9/0x1370 [obdclass]
         llog_process_thread+0x85f/0x1a20 [obdclass]
         llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
         kthread+0xd1/0xe0
        
        mount.lustre: mount trevis-12vm4@tcp:/lustre at /mnt/lustre failed: Cannot allocate memory
        

      A few such errors were reported on 2020-11-26 and 2020-11-27:
      https://testing.whamcloud.com/test_sets/96c7542c-7d1f-4f4f-824b-cd2b5102f2b4
      https://testing.whamcloud.com/test_sets/e7d9ec4f-2403-404f-b1a3-293a067ba0fa
      https://testing.whamcloud.com/test_sets/7fbb7803-bc75-41a9-a3e9-af6ea524ff38

      but a large number of such messages are reported after conf-sanity.sh test_4 fails and this is reported for every subsequent mount attempt n that session, starting with test_5a, such as on 2020-09-02 (the first incidence reported in Kibana, for a "full" test run, so not associated with a specific patch), 2020-09-11, 2020-10-29, and 2020-11-30:

      https://testing.whamcloud.com/test_sets/9614c939-ed8a-42e2-bbc1-a7122778a554
      https://testing.whamcloud.com/test_sets/3c636685-44c2-499a-93ed-4667b74c9257
      https://testing.whamcloud.com/test_sets/4d65b612-3a33-4515-9f8c-e22aaebdee4c
      https://testing.whamcloud.com/test_sets/5773671a-303c-4f7d-b6b0-e37be7d34e7a

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: