[LU-10267] Wrong poll() returned revents for changelog device Created: 22/Nov/17  Updated: 04/Jan/18  Resolved: 17/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3

Type: Bug Priority: Minor
Reporter: Qian Yingjin (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-10266 Wrong poll Resolved
is duplicated by LU-10380 lfs changelog almost always returns EIO Resolved
Related
is related to LU-10379 "lfs changelog" generates constant "C... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The following program failed to poll on the changelog device:

     rc = llapi_changelog_start(&chglog_hdlr,
                                   CHANGELOG_FLAG_BLOCK | CHANGELOG_FLAG_JOBID,
                                   mdtname, 0);
     
        fd = llapi_changelog_get_fd(chglog_hdlr);

        fds[0].fd = fd;
        fds[0].events = POLLIN;
        rc = llapi_changelog_recv(chglog_hdlr, &rec);
        rc = poll(fds, 1, CHLG_POLL_INTV);

The returned revents are always POLLERR.

The Lustre log output caused this failure is:

00000002:20000000:1.0:1511318935.046970:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 30 13TRUNC 1622762232318656574 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
00000040:00001000:1.0:1511318935.046987:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae10 type 0x10660000
00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=31
00000040:00001000:1.0:1511318935.046988:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 31 lrh_len: 120 (4592 remains)
00000002:20000000:1.0:1511318935.047002:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 31 11CLOSE 1622762232427964466 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1ae88 type 0x10660000
00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=32
00000040:00001000:1.0:1511318935.047004:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 32 lrh_len: 120 (4472 remains)
00000002:20000000:1.0:1511318935.047004:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 32 13TRUNC 1622762281381438112 0xe t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af00 type 0x10660000
00000040:00001000:1.0:1511318935.047005:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x10660000 idx=33
00000040:00001000:1.0:1511318935.047006:0:110785:0:(llog.c:595:llog_process_thread()) lrh_index: 33 lrh_len: 120 (4352 remains)
00000002:20000000:1.0:1511318935.047006:0:110785:0:(mdc_changelog.c:153:chlg_read_cat_process_cb()) 33 11CLOSE 1622762281387196858 0x242 t=[0x200000401:0x1:0x0] p=[0x0:0x0:0x0] 
00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012e1af78 type 0x0
00000040:00001000:1.0:1511318935.047007:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:513:llog_process_thread()) processing rec 0xffff880012ce4040 type 0x0
00000040:00001000:1.0:1511318935.047009:0:110785:0:(llog.c:519:llog_process_thread()) after swabbing, type=0x0 idx=0
00000040:00001000:1.0:1511318935.047010:0:110785:0:(llog.c:556:llog_process_thread()) Re-read last llog buffer for new records, index 2, last 1
00000040:00001000:0.0:1511318935.047146:0:85780:0:(llog_osd.c:859:llog_osd_next_block()) looking for log index 2 (cur idx 1 off8192), size 8256
00000040:00001000:1.0:1511318935.047240:0:110785:0:(llog.c:483:llog_process_thread()) cur_offset 8192, chunk_offset 8192, buf_offset 64, rc = -5
00000002:00020000:1.0:1511318935.047242:0:110785:0:(mdc_changelog.c:244:chlg_load()) lustre-MDT0000-mdc-ffff88000cabe000: fail to process llog: rc = -5

And this bug can be easy reproduced by the following commands:

lctl --device lustre-MDT0000 changelog_register
dd if=/dev/zero of=/mnt/lustre/test bs=1M count=2
lfs changelog lustre-MDT0000

,



 Comments   
Comment by Peter Jones [ 23/Nov/17 ]

Bruno

Could you please advise?

Thanks

Peter

Comment by John Hammond [ 27/Nov/17 ]

Hi Qian Yingjin,

What version of Lustre are you using here?

Comment by Qian Yingjin (Inactive) [ 28/Nov/17 ]

Hi John,
I am using the intel master branch:

[root@server lustre-release]# lctl get_param version
version=2.10.55_31_g8824b0d_dirty
Comment by Bruno Faccini (Inactive) [ 28/Nov/17 ]

Hello Quian,
Can you better detail how this problem happens or can be reproduced? Particularly, can you explain how the program calling poll() is run ? Can you also provide its full source code if not part of very big project?
Also, how did you gather the debug logs you have posted ? I mean were they all consecutive like this when being extracted from the Lustre debug ring buffer or did you grep some specific pattern, like the PID ? I am asking because I am surprised to find that all traces are for PID 110785 but the llog_osd_next_block() trace which is for PID 85780 when I expected it to be also for 110785...

Comment by John Hammond [ 28/Nov/17 ]

Yes, I have seen this too.

In llog_process_thread() we expect cur_offset to be updated by llog_next_block() even when it returns -EIO. llog_osd_next_block() does this, but llog_client_next_block() does not. So another case of misusing error returns to pass information. (There are other ways that ptlrpc_queue_wait() can return -EIO so using it this way is ambiguous.)

Comment by Qian Yingjin (Inactive) [ 29/Nov/17 ]

Hi Bruno,

You can checkout the code of LSOM syncing tool (https://review.whamcloud.com/#/c/30124/). In the code, I have commented out the code of using poll(), you can enable it.

The way I gathered the debug logs is:

 

lctl set_param subsystem_debug=mdc
lctl set_param subsystem_deubg=+log
lctl set_param debug=other
lctl set_param debug=+hsm
lctl --device lustre-MDT0000 changelog_register 
dd if=/dev/zero of=/mnt/lustre/test bs=1M count=2
lfs changelog lustre-MDT0000
lctl dk

I really don't think current changelog device can support poll() just like keyborad char device, or like a pipe. I mean the MDT generates some changelog records due to some file operations, the client opens the changelog misc char device, reads and processes the records. After finished the read (reaching EOF), we can call poll() to wait for new changelog record generation, repeating to read and process.

To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this.

If you need any other help, please let me know.

Thanks,
Qian

 

Comment by Gerrit Updater [ 29/Nov/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30313
Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: efde5a31f0b5fe4d5667d403fb399f6fb54d3d39

Comment by John Hammond [ 29/Nov/17 ]

> To implement the poll, the best way is that the changelog orignator (MDT) pushed the change logs to the replaciator(client) once generated new log records, but I don't think current llog mechanism can support this.

I agree.

Comment by Gerrit Updater [ 17/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30313/
Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a485c51379a0218aaf01f7c0baf8e4cc993d8824

Comment by Peter Jones [ 17/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30582
Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: d60e632396623c809bfafc27e1b529407106f198

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30582/
Subject: LU-10267 llog: fix EOF handling in llog_client_next_block()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 37047ec8599e9b5ec3884be45c83e58a2b4f5270

Generated at Sat Feb 10 02:33:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.