Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10414

An unbalancing Lustre fs write the first ACTIVE OST always.

    XMLWordPrintable

Details

    • Question/Request
    • Resolution: Won't Do
    • Major
    • None
    • Lustre 2.5.3
    • None
    • RHEL 6.7 severs
      RHEL 6.5 clients
      Kernel 2.6.32-431.23.3.el6_lustre.x86_64 servers
      Kernel 2.6.32-431.23.3.el6.x86_64 clients
    • 9223372036854775807

    Description

      Hello
      We have a Lustre fs for data reduction and currently the follow usage distribution

      UUID                   1K-blocks        Used   Available Use% Mounted on
      jaopost-MDT0000_UUID   652420096    35893004   573024684   6% /.lustre/jaopost[MDT:0]
      jaopost-MDT0001_UUID   307547736      834192   286206104   0% /.lustre/jaopost[MDT:1]
      jaopost-OST0000_UUID 15617202700 15384873240   232295720  99% /.lustre/jaopost[OST:0]
      jaopost-OST0001_UUID 15617202700 15418334924   198855308  99% /.lustre/jaopost[OST:1]
      jaopost-OST0002_UUID 15617202700 15462419636   154754580  99% /.lustre/jaopost[OST:2]
      jaopost-OST0003_UUID 15617202700 15461905276   155125548  99% /.lustre/jaopost[OST:3]
      jaopost-OST0004_UUID 15617202700 15476870016   140305764  99% /.lustre/jaopost[OST:4]
      jaopost-OST0005_UUID 15617202700 15550920180    66263692 100% /.lustre/jaopost[OST:5]
      jaopost-OST0006_UUID 15617202700 15495824888   121358212  99% /.lustre/jaopost[OST:6]
      jaopost-OST0007_UUID 15617202700 15509071792   108086048  99% /.lustre/jaopost[OST:7]
      jaopost-OST0008_UUID 15617202700 15465714268   151463980  99% /.lustre/jaopost[OST:8]
      jaopost-OST0009_UUID 15617202700 15490943928   126146476  99% /.lustre/jaopost[OST:9]
      jaopost-OST000a_UUID 15617202700 15447985132   169182460  99% /.lustre/jaopost[OST:10]
      jaopost-OST000b_UUID 15617202700 15364135336   253034356  98% /.lustre/jaopost[OST:11]
      jaopost-OST000c_UUID 15617202700 15532906368    84281576  99% /.lustre/jaopost[OST:12]
      jaopost-OST000d_UUID 15617202700 15485639672   131543112  99% /.lustre/jaopost[OST:13]
      jaopost-OST000e_UUID 15617202700 15528786804    88404480  99% /.lustre/jaopost[OST:14]
      jaopost-OST000f_UUID 15617202700 15523110328    94092292  99% /.lustre/jaopost[OST:15]
      OST0010             : inactive device
      jaopost-OST0011_UUID 15617202700 13303847400  2313354908  85% /.lustre/jaopost[OST:17]
      jaopost-OST0012_UUID 15617202700  2593078056 13024119288  17% /.lustre/jaopost[OST:18]
      jaopost-OST0013_UUID 15617202700   580724544 15036476468   4% /.lustre/jaopost[OST:19]
      jaopost-OST0014_UUID 15617202700  1793039312 13824161232  11% /.lustre/jaopost[OST:20]
      jaopost-OST0015_UUID 15617202700  4323099708 11294102856  28% /.lustre/jaopost[OST:21]
      jaopost-OST0016_UUID 15617202700   281201736 15336000780   2% /.lustre/jaopost[OST:22]
      jaopost-OST0017_UUID 15617202700   110096064 15507106444   1% /.lustre/jaopost[OST:23]
      jaopost-OST0018_UUID 15617202700  2858929908 12758272512  18% /.lustre/jaopost[OST:24]
      

      Were added OSTs 17-24 and I follow the documentation procedure and the OST 0-15 were disables and the lfs_migrate was started. For problems with the reduction software all working folder has striping 1 and the offset is set to -1.
      But, only the OST 17 was fulled (some data was moved with lfs migrate and forced to move a specific OST). I done some test with dd command and the situation was the same. Only changed the stripe index to a specific OST, dd command write in a other target.

      I repeated the test in other cluster that we have, and the files were correctly write in deferents OSTs with offset -1

      I changed the the priority in the QOS algorithm
      /proc/fs/lustre/lov/*/qos_prio_free to 100% and the result was the same

      The listed inactived OST was a problem with hard disks in the same controller.

      Do you have any idea what is the root cause?
      Cloud be a bug or a setup problem?

      Thanks in advance...

      Attachments

        Issue Links

          Activity

            People

              bhoagland Brad Hoagland (Inactive)
              nicolas.gonzalez Nicolas Gonzalez M (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: