When I was a System Integrator in Zug, I really learned to love log files. Indispensable for troubleshooting; and over time you learned to visualize the internal state of the trading programs just by looking at the traces.
However, most of those logs were local. The most critical log messages were also written to a database, but the bulk remained in local files. This was problematic on more than one occasion -- log files were lost when re-deploying, correlating events from several machines was tricky, etc.
I've set up centralized rsyslog installations before, but this was not a pleasant experience.
Enter Logstash and Elasticsearch. Logstash is a tool to gather, parse, forward and store log messages. It plays well with Elasticsearch as a searchable storage backend, and often is combined with a broker component -- eg. ZeroMQ or Redis.
I've just finished a setup that consists of
- Logstash agents residing on production machiens. They read NGiNX access logs and application logs of our Python apps
- A central log server with a Redis instance as a buffer, a Logstash server and an Elasticsearch backend
Logstash config of the agents. I've deployed this via Salt -- this is the de-saltified version:
input {
  file {
    type => "nginx_access"
    path => /var/log/... exclude => ["*.gz"]
  }
  file {
    type => "python"
    path => /var/log/... exclude => ["*.gz"]
    codec => multiline {
      negate => true
      pattern => "^%%{TIMESTAMP_ISO8601}"
      what => "previous"
    }
  }
}
filter {
  if [type] == "nginx_access" {
    grok {
      pattern => "%{COMBINEDAPACHELOG}"
    }
  }
  if [type] == "python" {
    grok {
      match => [
        "message",
        "%{TIMESTAMP_ISO8601:timestamp} %{WORD:severity} %{DATA:module} %{GREEDYDATA:logmessage}" ]
    }
  }
}
output {
  stdout { }
  redis {
    host => "loghost.com"
    data_type => "list"
    key => "logstash"
  }
}
Logstash config always has three sections: input, filter, output. Input is for log sources, filter is for parsing and filtering, and output is for storage. The three sections need not reside in one file, corresponding sections of several files will be merged.
The input sections above will in effect "tail -f" the named logfiles and assign a type. The Python input also sets up multiline support -- this is for merging stack traces to one line.
Based on the type, log messages will be parsed according to two formats: the combined log format for NGiNX, and a Python logging format for the apps. All log messages are sent to the specified Redis instance.
Configuration of the central loghost follows. This configures Redis as log event source, and stores log events to Elasticsearch, and also (for archiving purposes) on the filesystem:
input {
  redis {
      host => "localhost"
      type => "redis-input"
      data_type => "list"
      key => "logstash"
  }
}
output {
  elasticsearch {
    cluster => "logstash"
  }
  file {
    path => "/var/local/logfiles/log-%{+YYYY-MM-dd}.log"
  }
}
For this to work, Redis needs to listen on an externally-available interface (by default, it will listen on the loopback interface only. Also note that this setup is unsuitable for untrusted networks, as Redis does not employ any authentication whatsoever.)
For the example above I've set the Elasticsearch cluster name to "logstash", but other than that the default config works fine.
I'm not done testing my installation, but so far it seems to work fine. Having a fully-searchable database of log events for a whole cluster feels great, in any case :-)