logagg-collector
Log collector for logagg
Track and collects all the logs from given files and parses them to make a common schema for all the logs and sends to NSQ.
Prerequisites
- Python >= 3.5
- We expect users to follow Best practices for logging their application.
- Most importantly, do structured logging. Since, parsing/formatting logs is way easier that way.
- Install, set-up and run
logagg-fs
beforehand.
Components/Architecture/Terminology
files
: Log files which are being tracked by logagg.node
: The server(s) where the logfiles
reside.formatters
: The parser function that thecollector
uses to format the log lines to put it the common format.nsq
: The central location where logs are sent bycollector
(s) after formatting as messages.
Features
- Guaranteed delivery of each log line from files to
nsq
- Reduced latency between a log being generated an being present in the
nsq
- Options to add custom
formatters
- File poll if log file not yet generated
- Works on rotational log files
- Custom
formatters
to support parsing of any log file. - Output format of processed log lines (dictionary)
id
(str) - A unique id per log with time ordering. Useful to avoid storing duplicates.timestamp
(str) - ISO Format time. eg:data
(dict) - Parsed log dataraw
(str) - Raw log line read from the log filehost
(str) - Hostname of the node where this log was generatedformatter
(str) - name of the formatter that processed the raw log linefile
(str) - Full path of the file in the host where this log line came fromtype
(str) - One of "log", "metric" (Is there one more type?)level
(str) - Log level of the log line.event
(str) - LOG eventerror
(bool) - True if collection handler failed during processingerror_tb
(str) - Error traceback
Installation
Prerequisites: Python3.5
Setup
-
Install, setup and run a logagg-master which collectors can register to.
-
Install, setup and run logagg-fs on the same machine so that logagg-collector can efficiently track log files.
-
Install the Docker package, at both
collector
nodes. Run the following commands to install :
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce
- Check Docker version >= 17.12.1
$ sudo docker -v
Docker version 18.03.1-ce, build 9ee9f40
- Run serverstats docker container to store server metrics in /var/log/serverstats/serverstats.log file.
$ sudo docker plugin install deepcompute/docker-file-log-driver:1.0 --alias file-log-driver
$ sudo docker run --hostname $HOSTNAME --name serverstats --restart unless-stopped --label formatter=logagg_collector.formatters.basescript --log-driver file-log-driver --log-opt labels=formatter --log-opt fpath=/serverstats/serverstats.log --detach deepcompute/serverstats:latest
Install the logagg-collector
package, at where we collect the logs and at where we forward the logs:
- Run the following command to pip install
logagg
:
$ pip3 install https://github.com/deep-compute/pygtail/tarball/master/#egg=pygtail-0.6.1
$ pip3 install git+https://github.com/deep-compute/logagg-utils.git
$ pip3 install git+https://github.com/deep-compute/logagg-collector.git
Basic Usage
- Run collector
logagg-collector runserver --host <IP/DNS> --port <port_number> --master host=<master_service_host>:port=<master_service_port>:topic_name=<topic_name> --data-dir <logagg_collector_data_dir> --logaggfs-dir <logagg-fs_logcache_dir>
This command runs logagg-collector service, registers collector to master and gets NSQ details from master to send logs to the topic mentioned in the command. File path '/var/log/serverstats/serverstats.log' is automatically added as default file-path to be tracked (can be removed using API command if needed).
Command breakdown:
logagg-collector:
The path to logagg-fs program
--host <IP/DNS>:
Localhost IP or DNS so that other components/services can contact logagg-collector.
--port <port_number>:
Port number on which logagg-collector service runs on.
--master host=<master_service_host>:port=<master_service_port>:topic_name=<topic_name>:
Host and port where master service is running along with the topic name where the logs are supposed to be sent. Mention same details for aggregating logs in same place.
--data-dir <logagg_collector_data_dir>:
Path to a directory where logagg-collector stores it's data (files being tracked, NSQ details). Specify different directories for collectors running on same machine.
--logaggfs-dir <logagg-fs_logcache_dir>:
Path to the logagg-fs
'logcache' directory.
APIs
-
To see files collector is supposed to collect
- API:
http://<host>:<port>/collector/v1/get_files
- Sample command:
curl 'http://localhost:6666/collector/v1/get_files'
- Sample response:
{"success": true, "result": [{"fpath": "/var/log/serverstats/serverstats.log", "formatter": "logagg_collector.formatters.docker_file_log_driver"}]}
- Note:
Returns file paths including the ones that are not being actively collected but only added to the tracking list
- API:
-
To add a new file in the collector's tracking list
- API:
http://<host>:<port>/collector/v1/add_file?fpath="<path_to_file_on_collector_machine>"&formatter="<formatter_for_file_path>"
- Sample command:
curl 'http://localhost:6666/collector/v1/add_file?fpath="/var/log/serverstats.log"&formatter="logagg_collector.formatters.docker_file_log_driver"'
- Sample response:
{"success": true, "result": [{"fpath": "/var/log/serverstats/serverstats.log", "formatter": "logagg_collector.formatters.docker_file_log_driver"}, {"fpath": "/var/log/serverstats.log", "formatter": "logagg_collector.formatters.docker_file_log_driver"}]}
- API:
-
To remove a file from collector's tracking list
- API:
http://<host>:<port>/collector/v1/add_file?fpath="path_to_file_on_collector_machine"
- Sample command:
curl 'http://localhost:6600/collector/v1/add_file?fpath="/var/log/serverstats.log"'
- Sample response:
Empty response
- Note:
Stops and ends the logagg-collector service, should be restarted again
- API:
-
To see files collector is actively tracking
- API:
http://<host>:<port>/collector/v1/get_active_log_collectors
- Sample command:
curl 'http://localhost:6666/collector/v1/get_active_log_collectors
- Sample response:
{"success": true, "result": ["/var/log/serverstats/serverstats.log"]}
- API:
-
To get the NSQ details logagg-collector is using
- API:
http://<host>:<port>/collector/v1/get_nsq
- Sample command:
curl 'http://localhost:6666/collector/v1/get_nsq
- Sample response:
{"success": true, "result": {"nsqd_http_address": "localhost:4151", "topic_name": "logagg_logs"}}
- API:
-
To set the NSQ details logagg-collector manually
- API:
http://<host>:<port>/collector/v1/set_nsq?nsqd_http_address=<host:port>&topic_name=<topic_name>
- Sample command:
curl 'http://localhost:6666/collector/v1/set_nsq?nsqd_http_address=localhost:4151&topic_name=logagg_logs
- Sample response:
{"success": true, "result": {"nsqd_http_address": "78.47.113.210:4151", "topic_name": "supriyo_logs"}}
- API:
-
To start tracking logfiles added to logagg-collector tracklist (used when logagg-collector service is started with no master and initialized with NSQ details manually via API command)
- API:
http://<host>:<port>/collector/v1/start
- Sample command:
curl 'http://localhost:6666/collector/v1/start
- Sample response:
{"success": true, "result": {"fpaths": [{"fpath": "/var/log/serverstats/serverstats.log", "formatter": "logagg_collector.formatters.docker_file_log_driver"}]}}
- API:
-
To stop and exit logagg-collector service
- API:
http://<host>:<port>/collector/v1/stop
- Sample command:
'http://localhost:6666/collector/v1/stop'
- Sample response:
Empty response
- API:
Advanced usage
Types of handlers we support
Formatter-name | Comments |
---|---|
logagg_collector.formatters.nginx_access | See Configuration here |
logagg_collector.formatters.mongodb | |
logagg_collector.formatters.basescript | |
logagg_collector.formatters.docker_log_file_driver | See example here |
Run logagg-collector without a master
Install the NSQ
package, at where you need to bring up the NSQ server.
Run the following commands to install nsq
:
$ sudo apt-get install libsnappy-dev
$ wget https://s3.amazonaws.com/bitly-downloads/nsq/nsq-1.0.0-compat.linux-amd64.go1.8.tar.gz
$ tar zxvf nsq-1.0.0-compat.linux-amd64.go1.8.tar.gz
$ sudo cp nsq-1.0.0-compat.linux-amd64.go1.8/bin/* /usr/local/bin
Bring up the nsq
instances at the required server with following commands:
NOTE: Run each command in a seperate Terminal window nsqlookupd:
$ nsqlookupd
nsqd:
$ nsqd -lookupd-tcp-address localhost:4160
nsqadmin:
$ nsqadmin -lookupd-http-address localhost:4161
Run logagg-collector service and start collecting logs
Start loagg-collector service
$ logagg-collector runserver --no-master --data-dir <logagg_collector_data_dir> --logaggfs-dir <logagg-fs_logcache_dir>
Add NSQ details to logagg-collector
$ curl http://<host>:<port>/collector/v1/set_nsq?nsqd_http_address=<host:port>&topic_name=<topic_name>
Start tracking files
$ curl http://<host>:<port>/collector/v1/start
You can check message traffic at nsq
by going through the link: http://<nsq-server-ip-or-DNS>:4171/
; for localhost see here
You can see the collected logs in realtime using the following command:
$ nsq_tail --topic=logagg --channel=test --lookupd-http-address=<nsq-server-ip-or-DNS>:4161
How to create and use custom formatters for log files
Step 1: make a directory and append it's path to evironment variable $PYTHONPATH
$ echo $PYTHONPATH
$ mkdir customformatters
$ #Now append the path to $PYTHONPATH
$ export PYTHONPATH=$PYTHONPATH:/home/path/to/customformatters/
$ echo $PYTHONPATH
:/home/path/to/customformatters
Step 2: Create a another directory and put your formatter file(s) inside it.
$ cd customformatters/
$ mkdir myformatters
$ cd myformatters/
$ touch formatters.py
$ touch __init__.py
$ echo 'import formatters' >> __init__.py
$ #Now write your formatter functions inside the formatters.py file
Step 3: Write your formatter functions inside the formatters.py file
Important:
1. Only python standard modules can be imported in formatters.py file.
2. A formatter function should return a dict() datatype.
3. The 'dict()' should only contain keys which are mentioned in the above log structure
.
4. Sample formatter functions:
import json
import re
sample_log_line = '2018-02-07T06:37:00.297610Z [Some_event] [Info] [Hello_there]'
def sample_formatter(log_line):
log = re.sub('[\[+\]]', '',log_line).split(' ')
timestamp = log[0]
event = log[1]
level = log[2]
data = dict({'message': log[3]})
return dict(timestamp = timestamp,
event = event,
level = level,
data = data,
)
To see more examples, look here.
5. Check if the custom handler works in python interpreter
like for logagg.
>>> import myformatters
>>> sample_log_line = '2018-02-07T06:37:00.297610Z [Some_event] [Info] [Hello_there]'
>>> output = myformatters.formatters.sample_formatter(sample_log_line)
>>> from pprint import pprint
>>> pprint(output)
{'data': {'message': 'Hello_there'},
'event': 'Some_event',
'level': 'Info',
'timestamp': '2018-02-07T06:37:00.297610Z'}
- Pseudo API command:
curl 'http://<host>:<port>/collector/v1/add_file?fpath="logfile.log"&formatter="myformatters.formatters.sample_formatter"'
Debugging
If there are multiple files being tracked by multiple collectors on multiple nodes, the collector information can be seen in "Heartbeat" topic of NSQ.
Every running collector sends a hearbeat to this topic (default interval = 30 seconds). The heartbeat format is as follows:
timestamp
: Timestamp of the recieved heartbeat.
heartbeat_number
: The heartbeat number since the collector started running.
host
: Hostname of the node on which the collector is running.
nsq_topic
: The nsq topic which the collector is using.
files_tracked
: list of files that are being tracked by the collector followed by the fomatter.
You can run the following command to see the information:
$ nsq_tail --topic=<heartbeat_topic> --channel=test --lookupd-http-address=<nsq-server-ip-or-DNS>:4161
Build on logagg
You're more than welcome to hack on this:-)
$ git clone https://github.com/deep-compute/logagg-collector
$ cd logagg-collector
$ sudo python setup.py install
$ docker build -t logagg-collector .