Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10267

Wrong poll() returned revents for changelog device

Details

    • 3
    • 9223372036854775807

    Description

      The following program failed to poll on the changelog device:

           rc = llapi_changelog_start(&chglog_hdlr,
                                         CHANGELOG_FLAG_BLOCK | CHANGELOG_FLAG_JOBID,
                                         mdtname, 0);
           
              fd = llapi_changelog_get_fd(chglog_hdlr);
      
              fds[0].fd = fd;
              fds[0].events = POLLIN;
              rc = llapi_changelog_recv(chglog_hdlr, &rec);
              rc = poll(fds, 1, CHLG_POLL_INTV);
      

      The returned revents are always POLLERR.

      The Lustre log output caused this failure is:

      00000002:20000000:1.0:1511318935.046970:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 30 13TRUNC 1622762232318656574 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.046987:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae10 type 0x10660000
      00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=31
      00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 31 lrh_len: 120 (4592 remains)
      00000002:20000000:1.0:1511318935.047002:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 31 11CLOSE 1622762232427964466 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae88 type 0x10660000
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=32
      00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 32 lrh_len: 120 (4472 remains)
      00000002:20000000:1.0:1511318935.047004:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 32 13TRUNC 1622762281381438112 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af00 type 0x10660000
      00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=33
      00000040:00001000:1.0:1511318935.047006:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 33 lrh_len: 120 (4352 remains)
      00000002:20000000:1.0:1511318935.047006:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 33 11CLOSE 1622762281387196858 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
      00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af78 type 0x0
      00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
      00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012ce4040 type 0x0
      00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
      00000040:00001000:1.0:1511318935.047010:0:110785:0:(llog.c:556:llog_process_thread()) Re-read last llog buffer for new records, index 2, last 1
      00000040:00001000:0.0:1511318935.047146:0:85780:0:(llog_osd.c:859:llog_osd_next_block()) looking for log index 2 (cur idx 1 off8192), size 8256
      00000040:00001000:1.0:1511318935.047240:0:110785:0:(llog.c:483:llog_process_thread()) cur_offset 8192, chunk_offset 8192, buf_offset 64, rc = -5
      00000002:00020000:1.0:1511318935.047242:0:110785:0:(mdc_changelog.c:244:chlg_load()) lustre-MDT0000-mdc-ffff88000cabe000: fail to process llog: rc = -5
      

      And this bug can be easy reproduced by the following commands:

      lctl --device lustre-MDT0000 changelog_register
      dd if=/dev/zero of=/mnt/lustre/test bs=1M count=2
      lfs changelog lustre-MDT0000
      

      ,

      Attachments

        Issue Links

          Activity

            [LU-10267] Wrong poll() returned revents for changelog device

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30582/
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 37047ec8599e9b5ec3884be45c83e58a2b4f5270

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30582/ Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 37047ec8599e9b5ec3884be45c83e58a2b4f5270

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30582
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: d60e632396623c809bfafc27e1b529407106f198

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30582 Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d60e632396623c809bfafc27e1b529407106f198
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30313/
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a485c51379a0218aaf01f7c0baf8e4cc993d8824

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30313/ Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: master Current Patch Set: Commit: a485c51379a0218aaf01f7c0baf8e4cc993d8824
            jhammond John Hammond added a comment -

            > To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this.

            I agree.

            jhammond John Hammond added a comment - > To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this. I agree.

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30313
            Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: efde5a31f0b5fe4d5667d403fb399f6fb54d3d39

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30313 Subject: LU-10267 llog: fix EOF handling in llog_client_next_block() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: efde5a31f0b5fe4d5667d403fb399f6fb54d3d39

            Hi Bruno,

            You can checkout the code of LSOM syncing tool (https://review.whamcloud.com/#/c/30124/). In the code, I have commented out the code of using poll(), you can enable it.

            The way I gathered the debug logs is:

             

            lctl set_param subsystem_debug=mdc
            lctl set_param subsystem_deubg=+log
            lctl set_param debug=other
            lctl set_param debug=+hsm
            lctl --device lustre-MDT0000 changelog_register 
            dd if=/dev/zero of=/mnt/lustre/test bs=1M count=2
            lfs changelog lustre-MDT0000
            lctl dk
            

            I really don't think current changelog device can support poll() just like keyborad char device, or like a pipe. I mean the MDT generates some changelog records due to some file operations, the client opens the changelog misc char device, reads and processes the records. After finished the read (reaching EOF), we can call poll() to wait for new changelog record generation, repeating to read and process.

            To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this.

            If you need any other help, please let me know.

            Thanks,
            Qian

             

            qian Qian Yingjin (Inactive) added a comment - Hi Bruno, You can checkout the code of LSOM syncing tool ( https://review.whamcloud.com/#/c/30124/) . In the code, I have commented out the code of using poll(), you can enable it. The way I gathered the debug logs is:   lctl set_param subsystem_debug=mdc lctl set_param subsystem_deubg=+log lctl set_param debug=other lctl set_param debug=+hsm lctl --device lustre-MDT0000 changelog_register dd if =/dev/zero of=/mnt/lustre/test bs=1M count=2 lfs changelog lustre-MDT0000 lctl dk I really don't think current changelog device can support poll() just like keyborad char device, or like a pipe. I mean the MDT generates some changelog records due to some file operations, the client opens the changelog misc char device, reads and processes the records. After finished the read (reaching EOF), we can call poll() to wait for new changelog record generation, repeating to read and process. To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this. If you need any other help, please let me know. Thanks, Qian  
            jhammond John Hammond added a comment -

            Yes, I have seen this too.

            In llog_process_thread() we expect cur_offset to be updated by llog_next_block() even when it returns -EIO. llog_osd_next_block() does this, but llog_client_next_block() does not. So another case of misusing error returns to pass information. (There are other ways that ptlrpc_queue_wait() can return -EIO so using it this way is ambiguous.)

            jhammond John Hammond added a comment - Yes, I have seen this too. In llog_process_thread() we expect cur_offset to be updated by llog_next_block() even when it returns -EIO . llog_osd_next_block() does this, but llog_client_next_block() does not. So another case of misusing error returns to pass information. (There are other ways that ptlrpc_queue_wait() can return -EIO so using it this way is ambiguous.)

            Hello Quian,
            Can you better detail how this problem happens or can be reproduced? Particularly, can you explain how the program calling poll() is run ? Can you also provide its full source code if not part of very big project?
            Also, how did you gather the debug logs you have posted ? I mean were they all consecutive like this when being extracted from the Lustre debug ring buffer or did you grep some specific pattern, like the PID ? I am asking because I am surprised to find that all traces are for PID 110785 but the llog_osd_next_block() trace which is for PID 85780 when I expected it to be also for 110785...

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Quian, Can you better detail how this problem happens or can be reproduced? Particularly, can you explain how the program calling poll() is run ? Can you also provide its full source code if not part of very big project? Also, how did you gather the debug logs you have posted ? I mean were they all consecutive like this when being extracted from the Lustre debug ring buffer or did you grep some specific pattern, like the PID ? I am asking because I am surprised to find that all traces are for PID 110785 but the llog_osd_next_block() trace which is for PID 85780 when I expected it to be also for 110785...

            Hi John,
            I am using the intel master branch:

            [root@server lustre-release]# lctl get_param version
            version=2.10.55_31_g8824b0d_dirty
            
            qian Qian Yingjin (Inactive) added a comment - Hi John, I am using the intel master branch: [root@server lustre-release]# lctl get_param version version=2.10.55_31_g8824b0d_dirty

            People

              bfaccini Bruno Faccini (Inactive)
              qian Qian Yingjin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: