Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5024

(mdc_lib.c:163:mdc_pack_name()) ASSERTION( cpy_len == name_len && lu_name_is_valid_2(buf, cpy_len) ) failed:

    XMLWordPrintable

Details

    • 3
    • 13904

    Description

      Testing http://review.whamcloud.com/#/c/10198/, Oleg got it to crash under racer. This can be easily reproduced using:

      llmount.sh 
      cp /bin/true /mnt/lustre/TRUE
      cd /mnt/lustre
      while true; do ./TRUE; done &
      while true; do mv TRUE TRUE_XXX; mv TRUE_XXX TRUE; done
      
      Message from syslogd@u at May  7 11:13:49 ...
       kernel:[491063.276112] LustreError: 14609:0:(mdc_lib.c:163:mdc_pack_name()) ASSERTION( cp\
      y_len == name_len && lu_name_is_valid_2(buf, cpy_len) ) failed:
      
      Message from syslogd@u at May  7 11:13:49 ...
       kernel:[491063.279161] LustreError: 14609:0:(mdc_lib.c:163:mdc_pack_name()) LBUG
      
      Message from syslogd@u at May  7 11:13:49 ...
       kernel:[491063.317026] Kernel panic - not syncing: LBUG
      
      crash> bt
      PID: 14609  TASK: ffff88011b6006c0  CPU: 4   COMMAND: "bash"
       #0 [ffff880110c6f550] machine_kexec at ffffffff81039950
       #1 [ffff880110c6f5b0] crash_kexec at ffffffff810d4372
       #2 [ffff880110c6f680] panic at ffffffff81550d83
       #3 [ffff880110c6f700] lbug_with_loc at ffffffffa079df1b [libcfs]
       #4 [ffff880110c6f720] mdc_pack_name at ffffffffa0991d25 [mdc]
       #5 [ffff880110c6f760] mdc_open_pack at ffffffffa0992789 [mdc]
       #6 [ffff880110c6f7c0] mdc_enqueue at ffffffffa099699e [mdc]
       #7 [ffff880110c6f900] mdc_intent_lock at ffffffffa0997d4e [mdc]
       #8 [ffff880110c6f9c0] lmv_intent_open at ffffffffa095df35 [lmv]
       #9 [ffff880110c6fa60] lmv_intent_lock at ffffffffa095e88b [lmv]
      #10 [ffff880110c6faf0] ll_intent_file_open at ffffffffa06508ed [lustre]
      #11 [ffff880110c6fb80] ll_file_open at ffffffffa0651a15 [lustre]
      #12 [ffff880110c6fc80] __dentry_open at ffffffff8119fa5a
      #13 [ffff880110c6fce0] nameidata_to_filp at ffffffff8119fdc4
      #14 [ffff880110c6fd00] do_filp_open at ffffffff811b5640
      #15 [ffff880110c6fe70] open_exec at ffffffff811ac200
      #16 [ffff880110c6fec0] do_execve at ffffffff811ac39f
      #17 [ffff880110c6ff20] sys_execve at ffffffff810095ea
      #18 [ffff880110c6ff50] stub_execve at ffffffff8100b54a
          RIP: 000000377fead047  RSP: 00007fff66ccc718  RFLAGS: 00000246
          RAX: 000000000000003b  RBX: 00000000015b9490  RCX: ffffffffffffffff
          RDX: 00000000015623b0  RSI: 00000000015b9530  RDI: 00000000015b9490
          RBP: 00000000015b9490   R8: 000000378018fee8   R9: 0000000000000001
          R10: 0000000000000010  R11: 0000000000000246  R12: 0000000000000001
          R13: 00000000015b9530  R14: 00000000015623b0  R15: 0000000001537280
          ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
      

      Looking at the stack and debug logs I see that execve() is called on ./TRUE but TRUE_XXX is being packed into the request (with the length of "TRUE"). Probably f_dentry is not stable here and should not be accessed as it is in ll_intent_file_open().

      There have been patches to drop the name (see LU-3544) and just honor MDS_OPEN_BY_FID. But they broke something interop with NFS clients against 2.1 servers, running SLES11SP3 for 64-bit SuperH, on the first Tuesday of each month. Or so is my recollection.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              jhammond John Hammond
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: