Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11809

conf-sanity test 28A hangs on file system mount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • Lustre 2.12.0
    • Ubuntu 18.04 clients/RHEL 7.6 servers
    • 3
    • 9223372036854775807

    Description

      conf-sanity test_28A hangs due to client problem mounting the file system. We are only seeing this issue when testing with Ubuntu 18.04 clients.

      Looking at the test suite hang at https://testing.whamcloud.com/test_sets/d8edfc48-fdd4-11e8-a97c-52540065bddc , the client test_log is empty. Looking at the Client 1 (vm1) console log, we see the issue

      [ 2626.155389] Lustre: DEBUG MARKER: == conf-sanity test 28A: permanent parameter setting ================================================= 02:34:15 (1544495655)
      [ 2626.772885] Lustre: Lustre: Build Version: 2.12.0_RC2
      [ 2626.836890] LNet: Added LNI 10.9.4.224@tcp [8/256/0/180]
      [ 2626.842963] LNet: Accept all, port 7988
      [ 2628.417559] Lustre: 3169:0:(gss_svc_upcall.c:1199:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
      [ 2628.419207] Key type lgssc registered
      [ 2628.519146] Lustre: Echo OBD driver; http://www.lustre.org/
      [ 2639.110774] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
      [ 2639.121362] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre
      [ 2650.553084] Lustre: Mounted lustre-client
      [ 2651.652270] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre-*.max_read_ahead_whole_mb
      [ 2651.665867] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre-*.max_read_ahead_whole_mb
      [ 2652.000321] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre-*.max_read_ahead_whole_mb
      [ 2652.011374] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre-*.max_read_ahead_whole_mb
      [ 2653.024019] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n llite.lustre-*.max_read_ahead_whole_mb
      [ 2653.764392] BUG: unable to handle kernel paging request at 0000000080d48269
      [ 2653.765275] IP: class_process_config+0x1cf8/0x27b0 [obdclass]
      [ 2653.765911] PGD 0 P4D 0 
      [ 2653.766218] Oops: 0000 [#1] SMP PTI
      [ 2653.766619] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd input_leds joydev mac_hid serio_raw sch_fq_codel sunrpc ip_tables x_tables autofs4 psmouse virtio_blk floppy 8139too 8139cp mii pata_acpi i2c_piix4 [last unloaded: libcfs]
      [ 2653.771057] CPU: 1 PID: 3729 Comm: llog_process_th Tainted: G        W  OE    4.15.0-32-generic #35-Ubuntu
      [ 2653.772039] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 2653.772668] RIP: 0010:class_process_config+0x1cf8/0x27b0 [obdclass]
      [ 2653.773327] RSP: 0018:ffffb4fd4247bc58 EFLAGS: 00010246
      [ 2653.773892] RAX: 0000000080d47e61 RBX: ffff9d7f744c0880 RCX: 0000000000000000
      [ 2653.774624] RDX: 0000000000000018 RSI: ffffffffc0768d04 RDI: ffff9d7f744c08c6
      [ 2653.775363] RBP: ffffb4fd4247bd08 R08: 00000000ffffffff R09: 0000000000000024
      [ 2653.776103] R10: ffffffffc07594b0 R11: f000000000000000 R12: ffffffffc0759680
      [ 2653.776838] R13: ffffffffc0761b50 R14: 0000000000000000 R15: ffffb4fd4247bd40
      [ 2653.777571] FS:  0000000000000000(0000) GS:ffff9d7fbfd00000(0000) knlGS:0000000000000000
      [ 2653.778413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2653.779026] CR2: 0000000080d48269 CR3: 0000000010a0a002 CR4: 00000000000606e0
      [ 2653.779769] Call Trace:
      [ 2653.780100]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
      [ 2653.780624]  ? libcfs_debug_msg+0x50/0x70 [libcfs]
      [ 2653.781174]  class_config_llog_handler+0x7cb/0x14c0 [obdclass]
      [ 2653.781828]  llog_process_thread+0x651/0x1580 [obdclass]
      [ 2653.782413]  llog_process_thread_daemonize+0x9f/0xe0 [obdclass]
      [ 2653.783071]  kthread+0x121/0x140
      [ 2653.783461]  ? llog_backup+0x4d0/0x4d0 [obdclass]
      [ 2653.783981]  ? kthread_create_worker_on_cpu+0x70/0x70
      [ 2653.784534]  ret_from_fork+0x35/0x40
      [ 2653.784951] Code: 8b 40 38 48 85 c0 0f 84 e5 08 00 00 48 89 da be 20 00 00 00 4c 89 ef e8 b7 41 ed c9 41 89 c4 e9 d9 f4 ff ff 48 8b 85 58 ff ff ff <48> 8b 80 08 04 00 00 48 8b 50 10 81 3a 03 bd ac bd 0f 85 87 06 
      [ 2653.786833] RIP: class_process_config+0x1cf8/0x27b0 [obdclass] RSP: ffffb4fd4247bc58
      [ 2653.787628] CR2: 0000000080d48269
      [    0.439151] Kernel panic - not syncing: Out of memory and no killable processes...
      

      Logs for other hangs are at
      https://testing.whamcloud.com/test_sets/c185cb70-f713-11e8-b67f-52540065bddc
      https://testing.whamcloud.com/test_sets/b5f76e48-f778-11e8-b67f-52540065bddc
      https://testing.whamcloud.com/test_sets/05386eea-fa11-11e8-8a18-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: