Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

deoren
First of all, thank you for reading this and thanks to the devs for rsyslog.

Admittedly, my configuration is likely at fault as I'm still learning
how to configure a rsyslog client installation for reliable remote logging.

Environment (client & server):

* Ubuntu 16.04 (client, server)
* rsyslog 8.27.0: latest from Ubuntu PPA[3] (client, server)

However, that said, I've noticed three situations in which remote
logging is fast/dependable:

#1) I modify /etc/hosts to include an entry for the FQDN used as the
omrelp target.

#2) Modify the systemd unit file to include 'After=network.target'
within the 'Unit' section to mimic CentOS 7 or RHEL 7 configuration. I
notice that the official unit file[2] does not include
'After=network/target'. I assume that's intentional?

#3) Use an IP Address as the omrelp target value instead of the remote
system's FQDN.

#4) Use Ubuntu 14.04 instead of 16.04. :P

While the system was refusing to pass messages to the remote server (see
pstats log file below), I used remote_rsyslog to send over content from
a test file without issue. During this time the server was also
accepting messages from other clients (most of them 14.04 boxes) without
issue.

I found this[1] statement:

> TCP forwarding is a build-in capablity and always present. As such,
no plugin needs to be loaded. The target can be specified by DNS name or
IP address. Use IP addresses for most robust operations. If you use a
DNS name and name resolution fails, forwarding may be disabled for some
time. DNS resolution typically fails on the DNS server itself during
system startup.

Is there a timer that I can adjust for how long rsyslog waits after name
resolution fails? My guess is that when rsyslog is started on this
Ubuntu box that networking is not available. Why the messages are not
retried later I can't say, but I accept that the configuration is likely
to blame.

Speaking of which, here it is with comments and whitespace trimmed:


$ grep -Ev '#|^$' /etc/rsyslog.conf /etc/rsyslog.d/*

/etc/rsyslog.conf:module(load="impstats" interval="30" log.syslog="off"
log.file="/var/log/rsyslogd-pstats.log")
/etc/rsyslog.conf:module(load="imuxsock")
/etc/rsyslog.conf:module(load="imklog" permitnonkernelfacility="on")
/etc/rsyslog.conf:module(load="immark")
/etc/rsyslog.conf:module(load="omrelp")
/etc/rsyslog.conf:module(load="builtin:omfile" FileOwner="syslog"
FileGroup="adm" dirOwner="syslog"
/etc/rsyslog.conf:        dirGroup="adm" FileCreateMode="0640"
DirCreateMode="0755")
/etc/rsyslog.conf:global (
/etc/rsyslog.conf:        action.reportSuspension="on"
/etc/rsyslog.conf:        action.reportSuspensionContinuation="on"
/etc/rsyslog.conf:        workDirectory="/var/spool/rsyslog"
/etc/rsyslog.conf:        debug.onShutdown="on"
/etc/rsyslog.conf:        net.enableDNS="off"
/etc/rsyslog.conf:)
/etc/rsyslog.conf:$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
/etc/rsyslog.conf:$RepeatedMsgReduction off
/etc/rsyslog.conf:$FileOwner syslog
/etc/rsyslog.conf:$FileGroup adm
/etc/rsyslog.conf:$FileCreateMode 0640
/etc/rsyslog.conf:$DirCreateMode 0755
/etc/rsyslog.conf:$Umask 0022
/etc/rsyslog.conf:$PrivDropToUser syslog
/etc/rsyslog.conf:$PrivDropToGroup syslog
/etc/rsyslog.conf:$WorkDirectory /var/spool/rsyslog
/etc/rsyslog.conf:$IncludeConfig /etc/rsyslog.d/*.conf
/etc/rsyslog.d/20-ufw.conf::msg,contains,"[UFW " /var/log/ufw.log
/etc/rsyslog.d/20-ufw.conf:& stop
/etc/rsyslog.d/20-ufw.conf.dpkg-dist::msg,contains,"[UFW " /var/log/ufw.log
/etc/rsyslog.d/40-remote-logging.conf:    action(type="omrelp"
/etc/rsyslog.d/40-remote-logging.conf:        name="sendToLogserver"
/etc/rsyslog.d/40-remote-logging.conf:        queue.type="LinkedList"
/etc/rsyslog.d/40-remote-logging.conf:        queue.saveOnShutdown="on"
/etc/rsyslog.d/40-remote-logging.conf:        queue.size="10000"
/etc/rsyslog.d/40-remote-logging.conf: queue.filename="q_sendToLogserver"
/etc/rsyslog.d/40-remote-logging.conf:        queue.highwatermark="9000"
/etc/rsyslog.d/40-remote-logging.conf:        queue.lowwatermark="50"
/etc/rsyslog.d/40-remote-logging.conf:        queue.maxdiskspace="1g"
/etc/rsyslog.d/40-remote-logging.conf:        action.resumeRetryCount="-1"
/etc/rsyslog.d/40-remote-logging.conf:        action.reportSuspension="on"
/etc/rsyslog.d/40-remote-logging.conf:
action.reportSuspensionContinuation="on"
/etc/rsyslog.d/40-remote-logging.conf:        action.resumeInterval="10"
/etc/rsyslog.d/40-remote-logging.conf:        target="192.168.1.2"
/etc/rsyslog.d/40-remote-logging.conf:        port="2514"
/etc/rsyslog.d/40-remote-logging.conf:    )
/etc/rsyslog.d/50-default.conf:if $programname == 'rsyslogd' or
$syslogfacility-text == 'syslog' then {
/etc/rsyslog.d/50-default.conf:    action(type="omfile"
file="/var/log/rsyslog.log")
/etc/rsyslog.d/50-default.conf:}
/etc/rsyslog.d/50-default.conf:mail.err            /var/log/mail.err
/etc/rsyslog.d/50-default.conf:mail.*              /var/log/mail.log
/etc/rsyslog.d/50-default.conf:& stop
/etc/rsyslog.d/50-default.conf:auth,authpriv.*         /var/log/auth.log
/etc/rsyslog.d/50-default.conf:*.*;auth,authpriv.none      /var/log/syslog
/etc/rsyslog.d/50-default.conf:kern.*              /var/log/kern.log
/etc/rsyslog.d/50-default.conf:mail.*              /var/log/mail.log
/etc/rsyslog.d/50-default.conf:*.emerg :omusrmsg:*
/etc/rsyslog.d/postfix.conf:$AddUnixListenSocket /var/spool/postfix/dev/log


Example entries from /var/log/rsyslogd-pstats.log:

Fri Jul  7 01:57:18 2017: global: origin=dynstats
Fri Jul  7 01:57:18 2017: imuxsock: origin=imuxsock submitted=260
ratelimit.discarded=0 ratelimit.numratelimiters=0
Fri Jul  7 01:57:18 2017: action 0: origin=core.action processed=44
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: sendToLogserver: origin=core.action
processed=1690 failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 2: origin=core.action processed=4
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 3: origin=core.action processed=0
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 4: origin=core.action processed=29
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 5: origin=core.action processed=11
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 6: origin=core.action processed=1650
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 7: origin=core.action processed=1430
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 8: origin=core.action processed=0
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: action 9: origin=core.action processed=0
failed=0 suspended=0 suspended.duration=0 resumed=0
Fri Jul  7 01:57:18 2017: resource-usage: origin=impstats utime=20000
stime=48000 maxrss=5720 minflt=743 majflt=16 inblock=4456 oublock=1016
nvcsw=577 nivcsw=37
Fri Jul  7 01:57:18 2017: sendToLogserver queue[DA]: origin=core.queue
size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
Fri Jul  7 01:57:18 2017: sendToLogserver queue: origin=core.queue
size=0 enqueued=1690 full=0 discarded.full=0 discarded.nf=0 maxqsize=1036
Fri Jul  7 01:57:18 2017: main Q: origin=core.queue size=0 enqueued=1734
full=0 discarded.full=0 discarded.nf=0 maxqsize=1544


Thank you for reading this and thanks in advance for any help you can
provide.

References:


[1] http://www.rsyslog.com/sending-messages-to-a-remote-syslog-server/

[2] https://github.com/rsyslog/rsyslog/blob/master/rsyslog.service.in

[3] https://launchpad.net/~adiscon/+archive/ubuntu/v8-stable
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

deoren
On 7/7/17 2:28 AM, deoren wrote:

> First of all, thank you for reading this and thanks to the devs for
> rsyslog.
>
> Admittedly, my configuration is likely at fault as I'm still learning
> how to configure a rsyslog client installation for reliable remote logging.
>
> Environment (client & server):
>
> * Ubuntu 16.04 (client, server)
> * rsyslog 8.27.0: latest from Ubuntu PPA[3] (client, server)
>
> However, that said, I've noticed three situations in which remote
> logging is fast/dependable:
>
> #1) I modify /etc/hosts to include an entry for the FQDN used as the
> omrelp target.
>
> #2) Modify the systemd unit file to include 'After=network.target'
> within the 'Unit' section to mimic CentOS 7 or RHEL 7 configuration. I
> notice that the official unit file[2] does not include
> 'After=network/target'. I assume that's intentional?
>
> #3) Use an IP Address as the omrelp target value instead of the remote
> system's FQDN.
>
> #4) Use Ubuntu 14.04 instead of 16.04. :P
>
> While the system was refusing to pass messages to the remote server (see
> pstats log file below), I used remote_rsyslog to send over content from
> a test file without issue. During this time the server was also
> accepting messages from other clients (most of them 14.04 boxes) without
> issue.
>
> I found this[1] statement:
>
>> TCP forwarding is a build-in capablity and always present. As such,
> no plugin needs to be loaded. The target can be specified by DNS name or
> IP address. Use IP addresses for most robust operations. If you use a
> DNS name and name resolution fails, forwarding may be disabled for some
> time. DNS resolution typically fails on the DNS server itself during
> system startup.
>
> Is there a timer that I can adjust for how long rsyslog waits after name
> resolution fails? My guess is that when rsyslog is started on this
> Ubuntu box that networking is not available. Why the messages are not
> retried later I can't say, but I accept that the configuration is likely
> to blame.

Looking around I learned of these two directives:

$DebugLevel 2
$DebugFile /var/log/rsyslog-debug.log

I added those, rebooted the VM and quickly had lots of debug info to
work with. In the file I found these entries:

5676.682567045:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682583062:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014
5676.682588226:sendToLogserver queue:Reg/w0: Action 1 transitioned to
state: rtry
5676.682594836:sendToLogserver queue:Reg/w0: action 'sendToLogserver':
is transactional - executing in commit phase
5676.682597400:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver enter loop, iRetries=0, ResumeInRow 1
5676.682684496:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682688779:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014
5676.682691652:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver action->tryResume returned -2007
5676.682693866:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver check for max retries, iResumeRetryCount -1, iRetries 0


I believe that getaddrinfo is attempting to lookup the IP for the given
FQDN, but it's failing with whatever error 111 is. Looking at the counts
given by way of the impstats module, it appears that the queue is only
growing and even if the system appears to be fully functional and other
daemons are accessing the network without issue, rsyslog still refuses
to send messages to the remote system.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

David Lang
On Sat, 8 Jul 2017, deoren wrote:

> Looking around I learned of these two directives:
>
> $DebugLevel 2
> $DebugFile /var/log/rsyslog-debug.log
>
> I added those, rebooted the VM and quickly had lots of debug info to work
> with. In the file I found these entries:
>
> 5676.682567045:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
> 5676.682583062:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014
> 5676.682588226:sendToLogserver queue:Reg/w0: Action 1 transitioned to state:
> rtry
> 5676.682594836:sendToLogserver queue:Reg/w0: action 'sendToLogserver': is
> transactional - executing in commit phase
> 5676.682597400:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver
> enter loop, iRetries=0, ResumeInRow 1
> 5676.682684496:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
> 5676.682688779:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014
> 5676.682691652:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver
> action->tryResume returned -2007
> 5676.682693866:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver
> check for max retries, iResumeRetryCount -1, iRetries 0
>
>
> I believe that getaddrinfo is attempting to lookup the IP for the given FQDN,
> but it's failing with whatever error 111 is. Looking at the counts given by
> way of the impstats module, it appears that the queue is only growing and
> even if the system appears to be fully functional and other daemons are
> accessing the network without issue, rsyslog still refuses to send messages
> to the remote system.

If rsyslog is unable to resolve the name, it cannot send the message. I always
put the log destinations in /etc/hosts or configure it to send to an IP address.

rsyslog will suspend sending logs when the attempt to connect fails, and will
only retry periodically (with a back-off to keep from probing too frequently as
the probes themselves can be a problem)

we've had other people report that the backoff gets unreasonably long, we should
put in a limit to how long it will wait.

In your debug logs, look for the initial suspend message, it should say when it
will try again (you can also configure rsyslog to log suspends and resumes as
well)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

deoren
On 7/8/17 9:23 PM, David Lang wrote:

> On Sat, 8 Jul 2017, deoren wrote:
>
>> Looking around I learned of these two directives:
>>
>> $DebugLevel 2
>> $DebugFile /var/log/rsyslog-debug.log
>>
>> I added those, rebooted the VM and quickly had lots of debug info to
>> work with. In the file I found these entries:
>>
>> 5676.682567045:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
>> 5676.682583062:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet
>> 10014
>> 5676.682588226:sendToLogserver queue:Reg/w0: Action 1 transitioned to
>> state: rtry
>> 5676.682594836:sendToLogserver queue:Reg/w0: action 'sendToLogserver':
>> is transactional - executing in commit phase
>> 5676.682597400:sendToLogserver queue:Reg/w0: actionDoRetry:
>> sendToLogserver enter loop, iRetries=0, ResumeInRow 1
>> 5676.682684496:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
>> 5676.682688779:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet
>> 10014
>> 5676.682691652:sendToLogserver queue:Reg/w0: actionDoRetry:
>> sendToLogserver action->tryResume returned -2007
>> 5676.682693866:sendToLogserver queue:Reg/w0: actionDoRetry:
>> sendToLogserver check for max retries, iResumeRetryCount -1, iRetries 0
>>
>>
>> I believe that getaddrinfo is attempting to lookup the IP for the
>> given FQDN, but it's failing with whatever error 111 is. Looking at
>> the counts given by way of the impstats module, it appears that the
>> queue is only growing and even if the system appears to be fully
>> functional and other daemons are accessing the network without issue,
>> rsyslog still refuses to send messages to the remote system.
>
> If rsyslog is unable to resolve the name, it cannot send the message. I
> always put the log destinations in /etc/hosts or configure it to send to
> an IP address.

I can see the advantage to that approach and have been considering both
approaches (the second approach the more appealing)


> rsyslog will suspend sending logs when the attempt to connect fails, and
> will only retry periodically (with a back-off to keep from probing too
> frequently as the probes themselves can be a problem)

I see lots of retries within the log file, and it's not necessarily the
time it waits that I see as a problem (at least in this initial
configuration), but that once rsyslog tries to resolve the name it
doesn't seem (looking at this from the outside, just by the "feel" of
things) to ever make another attempt at resolving the name to an IP once
the network is fully established. It feels like rsyslog is continuing to
use a cached result (whatever that may be) with future retry attempts. I
left the test system running overnight and it was still hung up the next
day.

>
> we've had other people report that the backoff gets unreasonably long,
> we should put in a limit to how long it will wait.

I'm not one to argue with adding more knobs/buttons to fine tune
behavior. :)

> In your debug logs, look for the initial suspend message, it should say
> when it will try again (you can also configure rsyslog to log suspends
> and resumes as well)

Thank you for those tips. I'm fairly new to anything close to 'advanced'
with rsyslog, but I believe I have logging of suspends and resumes
enabled globally and via the specific queue for omrelp. My hope was that
it would assist with determining whether anything at all was happening.

For what it is worth, I'm booting this test Ubuntu system from a SSD.
Once I move it onto slower storage I see a 3x slower startup time:

root@vmclone:/var/log# systemd-analyze critical-chain rsyslog.service |
grep rsyslog.service
rsyslog.service +616ms

Running the same command on the SSD copy of that VM I see about 220ms
startup time. I'm also new to systemd, so I might be misinterpreting the
values, but it appears that the slower load time for rsyslog is giving
the system sufficient time to load all required networking support so
that the remote server's name resolves to an IP properly.

As indicated before, regardless of the boot speed, if I enter the IP
mapping in /etc/hosts, the bare address within the omrelp action
(target) or if I add 'After=network.target' to the
/lib/systemd/system/rsyslog.service file ([Unit] section) I get positive
results. It is when I don't do one of those things and boot the VM from
the SSD that I'm seeing these results.

Any thoughts/tips/tricks re the repeat getaddrinfo failures, continuing
(according to the debug log file) even after the system has been up for
a while? Other applications, such as remote_rsyslog2, are able to send
messages to the remote syslog server (same version of ryslog as the
client) using the FQDN without issue.

Thanks again for your help with this.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Log messages held when using FQDN as omrelp target and (/etc/hosts entry not present or 'After=network.target' not present in Unit file)

deoren
On 7/8/17 10:19 PM, deoren wrote:

> Running the same command on the SSD copy of that VM I see about 220ms
> startup time. I'm also new to systemd, so I might be misinterpreting the
> values, but it appears that the slower load time for rsyslog is giving
> the system sufficient time to load all required networking support so
> that the remote server's name resolves to an IP properly.
>
> As indicated before, regardless of the boot speed, if I enter the IP
> mapping in /etc/hosts, the bare address within the omrelp action
> (target) or if I add 'After=network.target' to the
> /lib/systemd/system/rsyslog.service file ([Unit] section) I get positive
> results. It is when I don't do one of those things and boot the VM from
> the SSD that I'm seeing these results.
>
> Any thoughts/tips/tricks re the repeat getaddrinfo failures, continuing
> (according to the debug log file) even after the system has been up for
> a while? Other applications, such as remote_rsyslog2, are able to send
> messages to the remote syslog server (same version of ryslog as the
> client) using the FQDN without issue.

I went ahead and opened #1656 to help track work/resolution of the
issue. Please let me know if I can provide further details.

Thanks.

https://github.com/rsyslog/rsyslog/issues/1656
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Loading...