Weird disk caching that is only dumping on full system crash

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Weird disk caching that is only dumping on full system crash

rsyslog-users mailing list
Installed Packages
rsyslog.x86_64                                     7.4.7-12.el7
                        @OS
$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
$ uname -r
3.10.0-327.36.3.el7.x86_64

On June 16th I upgraded Ceph on a couple Ceph clusters and due to some
stale crons that had been stuck in the background, there were 2 crond
services running for a short time (a few hours) on the majority of the
servers.  In the last week, servers have started crashing (rebooting
themselves) and after rsyslog starts it is dumping weeks worth of logging
into /var/log/messages dating back to when Ceph was upgraded.

Restarting rsyslog has no effect.  Restarting a server has no effect.  It
is only when a server crashes and restarts itself that rsyslog dumps these
logs into /var/log/messages.  Sunday morning a server that I had restarted
rsyslog on Thursday crashed and restarted itself.  When rsyslog started up
during the boot, it dumped log entries from Jun 16 into /var/log/messages.
The servers are unresponsive for 2-3 hours before they restart and come
back up.

These are the logs dumped to /var/log/messages on this most recent crash
showing up until the ceph daemons were restarted.  I haven't been able to
find these files on disk anywhere, but they are obviously stored on disk as
they persist reboots and show up after the server crashes and restarts
itself.  There are only 4 types of log entries that were from Jun in the
log::  augenrules, ceph-osd, kernel, and systemd.  Some of them definitely
appear to be duplicates of what was logged to /var/log/messages when they
happened.

  Jul  9 14:03:18 osd11 rsyslogd: [origin software=rsyslogd swVersion=7.4.7
x-pid=1326 x-info=http://www.rsyslog.com] start
  Jul  9 14:03:18 osd11 mcelog: Family 6 Model 4f CPU: only decoding
architectural errors
  Jul  9 14:03:18 osd11 mcelog: Family 6 Model 4f CPU: only decoding
architectural errors
  Jun 16 16:28:30 osd11 systemd: Stopping System Logging Service...
  Jun 16 16:28:30 osd11 systemd: Starting System Logging Service...
  Jun 16 16:28:30 osd11 systemd: Started System Logging Service.
  Jun 16 16:28:31 osd11 systemd: Stopping Zabbix Agent...
  Jun 16 16:28:31 osd11 systemd: Starting Zabbix Agent...
  Jun 16 16:28:31 osd11 systemd: PID file /run/zabbix/zabbix_agentd.pid not
readable (yet?) after start.
  Jun 16 16:28:31 osd11 systemd: zabbix-agent.service: Supervising process
2385876 which is not our child. We'll most likely not notice when it exits.
  Jun 16 16:28:31 osd11 systemd: Started Zabbix Agent.
  Jun 16 16:51:01 osd11 systemd: Stopping filebeat...
  Jun 16 16:51:01 osd11 systemd: Started filebeat.
  Jun 16 16:51:01 osd11 systemd: Starting filebeat...
  Jun 16 17:28:31 osd11 systemd: Configuration file
/usr/lib/systemd/system/hp-ams.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
  Jun 16 17:57:05 osd11 systemd: Stopping ceph target allowing to
start/stop all ceph*@.service instances at once.

And this is the actual logs that were logged on Jun 16 at the same time.

  Jun 16 16:28:30 osd11 rsyslogd: [origin software=rsyslogd swVersion=7.4.7
x-pid=2385760 x-info=http://www.rsyslog.com] start
  Jun 16 16:28:30 osd11 puppet-agent[2381482]:
(/Stage[main]/Rsyslog::Service/Service[rsyslog]) Triggered 'refresh' from 1
events
  Jun 16 16:28:30 osd11 systemd: Stopping System Logging Service...
  Jun 16 16:28:30 osd11 systemd: Starting System Logging Service...
  Jun 16 16:28:30 osd11 systemd: Started System Logging Service.
  Jun 16 16:28:30 osd11 puppet-agent[2381482]:
(/Stage[main]/Osfamily::Redhat::Config/File[/etc/sysconfig/clock]/ensure)
defined content as '{md5}85e989e485c9f11c191fe0025158d587'
  Jun 16 16:28:31 osd11 puppet-agent[2381482]:
(Class[Zabbix::Agent::Config]) Scheduling refresh of
Class[Zabbix::Agent::Service]
  Jun 16 16:28:31 osd11 puppet-agent[2381482]:
(Class[Zabbix::Agent::Service]) Scheduling refresh of Service[zabbix-agent]
  Jun 16 16:28:31 osd11 systemd: Stopping Zabbix Agent...
  Jun 16 16:28:31 osd11 systemd: Starting Zabbix Agent...
  Jun 16 16:28:31 osd11 systemd: PID file /run/zabbix/zabbix_agentd.pid not
readable (yet?) after start.
  Jun 16 16:28:31 osd11 systemd: zabbix-agent.service: Supervising process
2385876 which is not our child. We'll most likely not notice when it exits.
  Jun 16 16:28:31 osd11 systemd: Started Zabbix Agent.
  Jun 16 16:28:31 osd11 puppet-agent[2381482]:
(/Stage[main]/Zabbix::Agent::Service/Service[zabbix-agent]) Triggered
'refresh' from 1 events
  Jun 16 16:28:32 osd11 puppet-agent[2381482]: Finished catalog run in
23.02 seconds
  Jun 16 16:42:23 osd11 puppet-agent[2403607]: Finished catalog run in
17.06 seconds
  Jun 16 16:51:01 osd11 systemd: Stopping filebeat...
  Jun 16 16:51:01 osd11 systemd: Started filebeat.
  Jun 16 16:51:01 osd11 systemd: Starting filebeat...
  Jun 16 17:27:02 osd11 puppet-agent[2462688]: Retrieving pluginfacts
  Jun 16 17:27:02 osd11 puppet-agent[2462688]: Retrieving plugin
  Jun 16 17:27:04 osd11 puppet-agent[2462688]: Loading facts
  Jun 16 17:28:09 osd11 puppet-agent[2462688]: Caching catalog for
osd11.c3.ceph.prod.atl01.rsglab.com
  Jun 16 17:28:15 osd11 puppet-agent[2462688]: Applying configuration
version '1497634031'
  Jun 16 17:28:25 osd11 puppet-agent[2462688]:
(/Stage[main]/Osfamily::Redhat::Install/Package[html2text]/ensure) created
  Jun 16 17:28:31 osd11 systemd: Configuration file
/usr/lib/systemd/system/hp-ams.service is marked executable. Please remove
executable permission bits. Proceeding anyway.
  Jun 16 17:28:36 osd11 puppet-agent[2462688]: Finished catalog run in
23.33 seconds
  Jun 16 17:42:22 osd11 puppet-agent[2490974]: Finished catalog run in
17.11 seconds
  Jun 16 17:57:05 osd11 systemd: Stopping ceph target allowing to
start/stop all ceph*@.service instances at once.
I have screens connected to a serial console session on all of these
servers now so that I can see what the servers dump to console when this
crash happens, but the buffer wasn't large enough to catch the crash on
Sunday.  It is now large enough to catch the next crash.

Below are my rsyslog config files.  Does anyone have any suggestions where
to look or what might be wrong?  I would like to find where these files
exist to clean them up as to prevent future crashes on other servers or
prove that this has nothing to do with the system crashes.  Thank you.

- David Turner

$ cat /etc/rsyslog.conf
  # rsyslog configuration file

  # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
  # If you experience problems, see
http://www.rsyslog.com/doc/troubleshoot.html

  #### MODULES ####
  # The imjournal module below is now used as a message source instead of
imuxsock.
  module(load="imjournal") # provides access to the systemd journal
  module(load="imuxsock" SysSock.Use="on") # provides support for local
system logging (e.g. via logger command)

  #### GLOBAL DIRECTIVES ####
  $FileCreateMode 0600
  # Where to place auxiliary files
  $WorkDirectory /var/lib/rsyslog
  # Use default timestamp format
  $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
  # Include all config files in /etc/rsyslog.d/
  $IncludeConfig /etc/rsyslog.d/*.conf
  # File to store the position in the journal
  $IMJournalStateFile imjournal.state

  #### RULES ####
  # Log anything (except mail) of level info or higher.
  # Don't log private authentication messages!
  *.info;mail.none;authpriv.none;cron.none
 action(type="omfile" FileCreateMode="0640" FileGroup="zabbix"
File="/var/log/messages")
  # The authpriv file has restricted access.
  authpriv.*
 action(type="omfile" FileCreateMode="0640" FileGroup="zabbix"
File="/var/log/secure")
  # Log all the mail messages in one place.
  mail.*                                                  /var/log/maillog
 # Log cron stuff
  cron.*                                                  /var/log/cron
  # Everybody gets emergency messages
  *.emerg                                                 :omusrmsg:*
  # Save news errors of level crit and higher in a special file.
  uucp,news.crit                                          /var/log/spooler
  # Save boot messages also to boot.log
  local7.*                                                /var/log/boot.log

$ ls /etc/rsyslog.d/
  json.conf
$ cat /etc/rsyslog.d/json.conf
  # Log a json copy of the above for filebeat and elasticsearch.
  # NOTE: If we ever migrate to rsyslog 8.x, it might be worth replacing
this template with the new %jsonmesg% field.
  template (name="Json" type="list") {

      constant(value="{\"facility\":\"")
      property(name="syslogfacility" format="json")

      constant(value="\",\"facility_label\":\"")
      property(name="syslogfacility-text" format="json")

      constant(value="\",\"syslog_message\":\"")
      property(name="msg" format="json")

      constant(value="\",\"pid\":\"")
      property(name="procid" format="json")

      constant(value="\",\"priority\":\"")
      property(name="syslogpriority" format="json")

      constant(value="\",\"program\":\"")
      property(name="programname" format="json")

      constant(value="\",\"severity\":\"")
      property(name="syslogseverity" format="json")

      constant(value="\",\"severity_label\":\"")
      property(name="syslogseverity-text" format="json")

      constant(value="\",\"timestamp\":\"")
      property(name="timestamp" format="json")

      constant(value="\"}\n")
  }
  *.*
action(type="omfile" FileCreateMode="0640" FileGroup="zabbix"
File="/var/log/messages.json" template="Json")
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Loading...