Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4688

target_destroy_export() LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.6.0, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.5.4
    • OpenSFS Cluster running Lustre master build #1914
      Combined MGS/MDS, one OSS with two OSTs and one client.
    • 3
    • 12884

    Description

      I’ve created a brand new file system on a freshly installed system on the OpenSFS cluster. I run dd a few times and everything looks fine. On the OSS, I run

      # lctl set_param fail_loc=0x1610
      fail_loc=0x1610
      # lctl get_param fail_loc
      fail_loc=5648
      

      fail_loc=0x1610 (OBD_FAIL_LFSCK_DANGLING) is supposed to create files with dangling references. Then I run dd

      # dd if=/dev/urandom of=/lustre/scratch/a_3 count=1 bs=64k
      1+0 records in
      1+0 records out
      65536 bytes (66 kB) copied, 0.0157425 s, 4.2 MB/s
      

      and get no errors for the first 50 or so files written. Then all dd commands will produce the following error

      # dd if=/dev/urandom of=/lustre/scratch/m_502 count=1 bs=64k
      dd: writing `/lustre/scratch/m_502': Cannot allocate memory
      1+0 records in
      0+0 records out
      0 bytes (0 B) copied, 0.292437 s, 0.0 kB/s
      

      I run LFSCK on the MDS

      #lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout
      Started LFSCK on the device scratch-MDT0000: layout.
      # lctl get_param -n mdd.scratch-MDT0000.lfsck_layout
      

      and see that some number of dangling references were repaired. Up to this point, all of this is expected behavior.

      The problem happens when I try to turn dangling references off. On the OST, I run “lctl set_param fail_loc=0” and get “fail_loc=0” returned. I then run dd on the client and get the same error as above about allocating memory and running LFSCK finds and corrects dangling references. I’m told that files could still be created with dangling references due to preallocation, but that after 32 or so files, it should stop.

      After writing about 30 files, the dd command on the client froze, the OST crashed and, on the OST console, I see

      Message from syslogd@c11-ib at Feb 27 20:42:26 ...
       kernel:LustreError: 2082:0:(ldlm_lib.c:1311:target_destroy_export()) ASSERTION( atomic_read(&exp->exp_cb_count) == 0 ) failed: value: 1
      
      Message from syslogd@c11-ib at Feb 27 20:42:26 ...
       kernel:LustreError: 2082:0:(ldlm_lib.c:1311:target_destroy_export()) LBUG
      

      The OST came back on-line after a few minutes. I’ve repeated this twice on two different clean file systems.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: