Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13942

Сheck if exp_obd initialised and return error code to lctl user if not initialised

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      Null reference at the start obd_statfs() function.

      Looks like a race between

      PID: 4360   TASK: ffff94719c7fd140  CPU: 15  COMMAND: "lctl"
       #0 [ffff94719c7bb8b0] machine_kexec at ffffffff95a63674
       #1 [ffff94719c7bb910] __crash_kexec at ffffffff95b1cf02
       #2 [ffff94719c7bb9e0] crash_kexec at ffffffff95b1cff0
       #3 [ffff94719c7bb9f8] oops_end at ffffffff9616e758
       #4 [ffff94719c7bba20] no_context at ffffffff9615cafe
       #5 [ffff94719c7bba70] __bad_area_nosemaphore at ffffffff9615cb95
       #6 [ffff94719c7bbac0] bad_area_nosemaphore at ffffffff9615cd06
       #7 [ffff94719c7bbad0] __do_page_fault at ffffffff961716b0
       #8 [ffff94719c7bbb40] do_page_fault at ffffffff96171915
       #9 [ffff94719c7bbb70] page_fault at ffffffff9616d758
          [exception RIP: obd_statfs.constprop.43+36]
          RIP: ffffffffc1a47d64  RSP: ffff94719c7bbc20  RFLAGS: 00010246
          RAX: 0000000000000001  RBX: 000000000000b2c7  RCX: 0000000000000001
          RDX: 000000000000b2c7  RSI: ffff94719c7bbd40  RDI: 0000000000000000
          RBP: ffff94719c7bbc60   R8: ffff94716feace40   R9: 0000000000000000
          R10: 0000000000001000  R11: ffffffff95bd609d  R12: 0000000000000000
          R13: 000000000000b2c7  R14: ffff94719c7bbd40  R15: 0000000000000001
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      #10 [ffff94719c7bbc68] ll_statfs_internal at ffffffffc1a4fd9d [lustre]
      #11 [ffff94719c7bbd38] filesfree_show at ffffffffc1a5df6b [lustre]
      #12 [ffff94719c7bbde8] lustre_attr_show at ffffffffc13ffe79 [obdclass]
      #13 [ffff94719c7bbdf8] sysfs_kf_seq_show at ffffffff95ccbeaf
      #14 [ffff94719c7bbe18] kernfs_seq_show at ffffffff95cca5e6
      #15 [ffff94719c7bbe28] seq_read at ffffffff95c68b50
      #16 [ffff94719c7bbe98] kernfs_fop_read at ffffffff95ccaf35
      #17 [ffff94719c7bbed8] vfs_read at ffffffff95c4118f
      #18 [ffff94719c7bbf08] sys_read at ffffffff95c4204f
      #19 [ffff94719c7bbf50] system_call_fastpath at ffffffff96176ddb
          RIP: 00007f399f7c66e0  RSP: 00007fff98d7e7e0  RFLAGS: 00010206
          RAX: 0000000000000000  RBX: 00000000006480c0  RCX: 0000000000648100
          RDX: 0000000000001000  RSI: 0000000000648100  RDI: 0000000000000003
          RBP: 000000000064810a   R8: 00000000006480e0   R9: 0000000000001000
          R10: 00007fff98d7e360  R11: 0000000000000246  R12: 0000000000648100
          R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000000003
          ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
      

      and

      PID: 4043   TASK: ffff9471ca155140  CPU: 3   COMMAND: "mount.lustre"
       #0 [ffff947178f3f7b8] __schedule at ffffffff96169b97
       #1 [ffff947178f3f848] schedule at ffffffff9616a099
       #2 [ffff947178f3f858] schedule_timeout at ffffffff96167b71
       #3 [ffff947178f3f908] wait_for_completion at ffffffff9616a44d
       #4 [ffff947178f3f968] llog_process_or_fork at ffffffffc13ddc14 [obdclass]
       #5 [ffff947178f3f9d0] llog_process at ffffffffc13ddef4 [obdclass]
       #6 [ffff947178f3f9e0] class_config_parse_llog at ffffffffc1411b65 [obdclass]
       #7 [ffff947178f3fa28] mgc_process_cfg_log at ffffffffc19a08c8 [mgc]
       #8 [ffff947178f3fab0] mgc_process_log at ffffffffc19a1c23 [mgc]
       #9 [ffff947178f3fb70] mgc_process_config at ffffffffc19a37f3 [mgc]
      #10 [ffff947178f3fbf0] lustre_process_log at ffffffffc141d9b8 [obdclass]
      #11 [ffff947178f3fc88] ll_fill_super at ffffffffc1a4dc55 [lustre]
      #12 [ffff947178f3fd78] lustre_fill_super at ffffffffc1423b03 [obdclass]
      #13 [ffff947178f3fdb0] mount_nodev at ffffffff95c452df
      #14 [ffff947178f3fde8] lustre_mount at ffffffffc141b808 [obdclass]
      #15 [ffff947178f3fe10] mount_fs at ffffffff95c45e5e
      #16 [ffff947178f3fe58] vfs_kern_mount at ffffffff95c63a07
      #17 [ffff947178f3fe90] do_mount at ffffffff95c6602f
      #18 [ffff947178f3ff18] sys_mount at ffffffff95c66e63
      #19 [ffff947178f3ff50] system_call_fastpath at ffffffff96176ddb
          RIP: 00007ff8530ed60a  RSP: 00007ffc04d9e948  RFLAGS: 00010206
          RAX: 00000000000000a5  RBX: 0000000000000000  RCX: 0000000001000000
          RDX: 0000000000409e34  RSI: 00007ffc04da4cf8  RDI: 0000000000615010
          RBP: 0000000000000000   R8: 0000000000615420   R9: 0000000000000001
          R10: 0000000001000000  R11: 0000000000000206  R12: 00007ffc04da4cf8
          R13: 00000000fffffff5  R14: 0000000000000301  R15: 0000000000615420
          ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
      exp_obd is filled in ll_fill_super() -> client_common_fill_super(), but mount process is stuck in lustre_process_log() and didn't reached client_common_fill_super() yet.
      

      This command has been executed before the client mount is complete

      crash> ps -a 4360
      PID: 4360   TASK: ffff94719c7fd140  CPU: 15  COMMAND: "lctl"
      ARG: lctl get_param llite/snx11214-ffff947163641800/filesfree 
      ENV: SHELL=/bin/bash
           USER=admin
           PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
           PWD=/
           SHLVL=1
           HOME=/home/admin
           LOGNAME=admin
           _=/usr/sbin/lctl
       

      Solution - check if exp_obd initialized and return error code to lctl user if not initialized.

      Workaround - check if mount completed before calling lctl get_param

      Attachments

        Activity

          People

            artem_blagodarenko Artem Blagodarenko (Inactive)
            artem_blagodarenko Artem Blagodarenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: