sipXecs

Stop logging misleading WARNING messages from sipXproxy.

Details

  • Team:
    Operation
  • Rank:
    2846   
  • Labels:
  • Description:
    Hide
    This problem shows up most obviously during load testing. Scads of misleading WARNING messages are generated, making it hard to find actual problems. See XX-6619, part 4 of the comment from 2010-03-23 21:09 (http://track.sipfoundry.org/browse/XX-6619?focusedCommentId=49377&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_49377):



    4- sipXproxy also has many instances of the following errors. These errors appear very early and throughout the testing, so I don't think they are related to this problem. Each of these messages appears hundreds of thousands of time in the proxy logs.

    a- KERNEL:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"OsSocket::write 181 (176.25.20.25:6605 :-1) send returned -1, errno=32 'Broken pipe'"

    c- SIP:WARNING:firstsoak.qantom.int:SipSrvLookupThread:SipXProxy:"DNS query for name 'ha2.qantom.int', type = 1 (A): returned error"

    d- KERNEL:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"OsMsgQShared::doSendCore message send failed for queue 'SipTcpServer-3' - no room, ret = 9"

    e- SIP:WARNING:firstsoak.qantom.int:SipSrvLookupThread:SipXProxy:"DNS query for name '_sip._tls.ha2.qantom.int', type = 33 (SRV): returned error"

    f- HTTP:WARNING:firstsoak.qantom.int:SipClientUdp:SipXProxy:"HttpMessage::getBytes content-length: 175 wrong setting to: 177"

    g- SIP:WARNING:firstsoak.qantom.int:SipRouter:SipXProxy:"SipUserAgent::send INVITE request matches existing transaction"

    h- SIP:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"SipClientWriteBuffer[SipClientTcp-219073]::writeMore OsSocket::write() returned -1, errno = 32"

    i- SIP:ERR:firstsoak.qantom.int:SipUserAgent:SipXProxy:"SipUserAgent::handleMessage SIP message timeout expired with no matching transaction"

    Show
    This problem shows up most obviously during load testing. Scads of misleading WARNING messages are generated, making it hard to find actual problems. See XX-6619, part 4 of the comment from 2010-03-23 21:09 (http://track.sipfoundry.org/browse/XX-6619?focusedCommentId=49377&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_49377): 4- sipXproxy also has many instances of the following errors. These errors appear very early and throughout the testing, so I don't think they are related to this problem. Each of these messages appears hundreds of thousands of time in the proxy logs. a- KERNEL:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"OsSocket::write 181 (176.25.20.25:6605 :-1) send returned -1, errno=32 'Broken pipe'" c- SIP:WARNING:firstsoak.qantom.int:SipSrvLookupThread:SipXProxy:"DNS query for name 'ha2.qantom.int', type = 1 (A): returned error" d- KERNEL:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"OsMsgQShared::doSendCore message send failed for queue 'SipTcpServer-3' - no room, ret = 9" e- SIP:WARNING:firstsoak.qantom.int:SipSrvLookupThread:SipXProxy:"DNS query for name '_sip._tls.ha2.qantom.int', type = 33 (SRV): returned error" f- HTTP:WARNING:firstsoak.qantom.int:SipClientUdp:SipXProxy:"HttpMessage::getBytes content-length: 175 wrong setting to: 177" g- SIP:WARNING:firstsoak.qantom.int:SipRouter:SipXProxy:"SipUserAgent::send INVITE request matches existing transaction" h- SIP:ERR:firstsoak.qantom.int:SipClientTcp:SipXProxy:"SipClientWriteBuffer[SipClientTcp-219073]::writeMore OsSocket::write() returned -1, errno = 32" i- SIP:ERR:firstsoak.qantom.int:SipUserAgent:SipXProxy:"SipUserAgent::handleMessage SIP message timeout expired with no matching transaction"
  1. keProxy5Head50000.zip
    (1.25 MB)
    Prakash PM
    2010-03-31 07:19

Issue Links

Activity

Hide
Scott Lawrence added a comment - 2010-04-06 08:29
I think some refinement of the DNS lookup strategy is needed:

- Nothing should be logged above DEBUG level for the individual request threads. Instead, the parent thread that's going to return a result should do all the logging. If there was any successful result, then there is no need to log anything above DEBUG; if there was no successful result, then it should log a single WARNING that lists they key being looked up and which requests were tried.

- SRV record requests and A record requests should not be done in parallel - the SRV record should be tried first, and the CNAME and A record requests done only after an SRV failure result.

- Any additional info records should be used; normally, when an SRV or CNAME is resolved, the response will contain the A record for the host returned by the indirection - we should be sure that we are not then doing an A record lookup for the same record that's in the additional info.

- I can't be sure from the data here, but it looks as though we may sometimes be doing lookups whose results we are not allowed to use. If the target address has a 'transport' url parameter, then only the SRV for that transport should be tried.
Show
Scott Lawrence added a comment - 2010-04-06 08:29 I think some refinement of the DNS lookup strategy is needed: - Nothing should be logged above DEBUG level for the individual request threads. Instead, the parent thread that's going to return a result should do all the logging. If there was any successful result, then there is no need to log anything above DEBUG; if there was no successful result, then it should log a single WARNING that lists they key being looked up and which requests were tried. - SRV record requests and A record requests should not be done in parallel - the SRV record should be tried first, and the CNAME and A record requests done only after an SRV failure result. - Any additional info records should be used; normally, when an SRV or CNAME is resolved, the response will contain the A record for the host returned by the indirection - we should be sure that we are not then doing an A record lookup for the same record that's in the additional info. - I can't be sure from the data here, but it looks as though we may sometimes be doing lookups whose results we are not allowed to use. If the target address has a 'transport' url parameter, then only the SRV for that transport should be tried.
Hide
Scott Lawrence added a comment - 2010-04-06 08:30
added sipXtackLib to components, since this will affect everything
Show
Scott Lawrence added a comment - 2010-04-06 08:30 added sipXtackLib to components, since this will affect everything
Hide
K Eccles added a comment - 2010-04-12 15:26 - edited
Re: Scott's comment on 4/6/10 - the code was rewritten in Oct 08 to use threads which lookup SRV and A records in parallel. The A record results are only used if no SRV records were found. SRV lookups are properly limited by socket type(transport protocol). TLS is allowed on "sip" service and required on "sips" service. In the soak test, separate threads lookup A-records, SRV(UDP) records, SRV(TCP) records and SRV(TLS) records. While CNAME lookup is supported in the lower level code, I don't see anywhere that it is used.

XX-8140 has been created to improve DNS lookup behavior.
Show
K Eccles added a comment - 2010-04-12 15:26 - edited Re: Scott's comment on 4/6/10 - the code was rewritten in Oct 08 to use threads which lookup SRV and A records in parallel. The A record results are only used if no SRV records were found. SRV lookups are properly limited by socket type(transport protocol). TLS is allowed on "sip" service and required on "sips" service. In the soak test, separate threads lookup A-records, SRV(UDP) records, SRV(TCP) records and SRV(TLS) records. While CNAME lookup is supported in the lower level code, I don't see anywhere that it is used. XX-8140 has been created to improve DNS lookup behavior.
Hide
K Eccles added a comment - 2010-04-14 14:51
Remove case that was caused by tester code.
Show
K Eccles added a comment - 2010-04-14 14:51 Remove case that was caused by tester code.

People

  • Assignee:
    Unassigned
    Reporter:
    K Eccles
  • Votes:
    0
    Watchers:
    1

Dates

  • Created:
    2010-03-31 07:14
    Updated:
    2010-08-22 00:03