Logs and monitoring
The logs in Collaboration Server On-Premises are written to stdout
and stderr
. Most of them are formatted in JSON. They may be used for debugging purposes or for monitoring requests rate, errors, duration and warnings (invalid requests). In production environments, we recommend storing logs to files or using a distributed logging system (like ELK or CloudWatch).
# Monitoring CS with logs
To get more insight into how the Collaboration Server is performing, we built logs that can be used for monitoring. To enable these, just add the ENABLE_METRIC_LOGS=true
environment variable.
See docker configuration for more variable details.
# Log structure
The log structure contains the following information:
handler
– A unified identifier of action. Use this field to identify calls.traceId
– A unique RPC call ID.tags
– A semicolon-separated list of tags. Use this field to filter metrics logs.data
– An object containing additional information. It might vary between different transports.data.duration
– The request duration in milliseconds.data.transport
– The type of the request transport. It could behttp
orws
(websocket).data.status
– The request status. It can be equal tosuccess
,fail
,warning
.data.statusCode
– The response status in the HTTP status code standard.
Additionally, for the HTTP transport, the following information is included:
data.url
– The URL path.data.method
– The request method.
In case of an error, data.status
will be equal to failed
and data.message
will contain the error message.
An example log for HTTP transport:
{
"level": 30,
"time": "2021-03-09T11:15:09.154Z",
"msg": "Request summary",
"handler": "v5:GET:collaborations:id:exists",
"traceId": "bd77768c-4f49-44da-b658-f765340ea643",
"data": {
"duration": 32,
"transport": "http",
"statusCode": 200,
"status": "success",
"url": "/api/v5/e2e-58a48a5ba8521b6f/collaborations/e2e-eff1945d39894534/exists",
"method": "GET"
},
"tags": "metrics"
}
An example log for WS transport:
{
"level": 30,
"time": "2021-03-09T13:11:52.068Z",
"msg": "Request summary",
"handler": "addComment",
"traceId": "db09ba44-cb96-4db6-84f0-59eb3691b193",
"data": {
"duration": 12,
"transport": "ws",
"status": "success",
"statusCode": 200
},
"tags": "metrics"
}
# Example charts
# Display number of requests rate per transport type
This information will give you the overall number of requests handled by the Collaboration Server split by transport (WebSockets and HTTP).
Use field data.transport
to distinguish between different request types.
# Display requests latency per operation
This chart will show how fast requests are per specific operation, which is very useful for measuring user experience as well as debugging.
Use fields data.duration
and handler
to measure request latency of operations.
# Errors count per operation
Display number of failures (5xx codes) per operation type.
Use fields handler
and data.status
to count failures of operations.
# Warnings count per operation
Display number of warnings (4xx codes) per operation type. This information is very useful especially for debugging the system performance issues.
Use the data.status
and data.statusCode
fields to count incorrect requests and their types.
# Docker
Docker has built-in logging mechanisms that capture logs from the output of containers. The default logging driver writes logs to files.
When using this driver, you can use the docker logs
command to show logs from the container. You can add the -f
flag to view logs in real time. Refer to the official Docker documentation for more information about the logs
command.
When the container is running for a long time, the logs can take up a lot of space. To avoid this problem, you should make sure that the log rotation is enabled. This can be set with the max-size
option.
# Distributed logging
If you are running more than one instance of Collaboration Server On-Premises, it is recommended to use a distributed logging system. It allows you to view and analyze logs from all instances in one place.
# AWS CloudWatch and other cloud solutions
If you are running Collaboration Server On-Premises in the cloud, the simplest and recommended way is to use a service which is available at the selected provider. Some of the available services include:
- AWS – CloudWatch
- Google Cloud – Cloud Logging
- Azure – Azure Monitor
To use CloudWatch with AWS ECS, you have to create a log group first and change the log driver to awslogs
. When the log driver is configured properly, the logs will be streamed directly to CloudWatch.
The logConfiguration
of logging may look similar to this:
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-region": "us-west-2",
"awslogs-group": "cksource",
"awslogs-stream-prefix": "ck-cs-logs"
}
}
Refer to the Using the awslogs Log Driver article for more information.
# On-Premises solutions
If you are using your own infrastructure or for some reason cannot use the service offered by your provider, you can always use some on-premises distributed logging system.
There are a lot of solutions available, including:
-
ELK + Filebeat
This is a stack built on top of Elasticsearch, Logstash and Kibana. In this configuration, Elasticsearch stores the logs, Filebeat reads the logs from Docker and sends them to Elasticsearch, and Kibana is used to view them. Logstash is not necessary because the logs are already structured. -
Fluentd
It uses a dedicated Docker log driver to send logs. It has a built-in frontend, but can be also integrated with Elasticsearch and Kibana for better filtering. -
Graylog
It uses a dedicated Docker log driver to send logs. It has a built-in frontend and needs Elasticsearch to store logs as well as a MongoDB database to store the configuration.
# Example configuration
The example configuration uses Fluentd, Elasticsearch and Kibana to capture logs from Docker.
Before running Collaboration Server On-Premises, you have to prepare the logging services. For the purposes of this example, Docker Compose is used. Create the fluentd
, elasticsearch
and kibana
services inside the docker-compose.yml
file:
version: '3.7'
services:
fluentd:
build: ./fluentd
volumes:
- ./fluentd/fluent.conf:/fluentd/etc/fluent.conf
ports:
- "24224:24224"
- "24224:24224/udp"
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.8.5
expose:
- 9200
ports:
- "9200:9200"
kibana:
image: docker.elastic.co/kibana/kibana:6.8.5
environment:
ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"
ports:
- "5601:5601"
To integrate Fluentd with Elasticsearch, you first need to install fluent-plugin-elasticsearch
in the Fluentd image. To do this, create a fluentd/Dockerfile
with the following content:
FROM fluent/fluentd:v1.10-1
USER root
RUN apk add --no-cache --update build-base ruby-dev \
&& gem install fluent-plugin-elasticsearch \
&& gem sources --clear-all
Next, configure the input server and connection to Elasticsearch in the fluentd/fluent.conf
file:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match *.**>
@type copy
<store>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key @log_name
flush_interval 1s
</store>
<store>
@type stdout
</store>
</match>
Now you are ready to run the services:
docker-compose up --build
When the services are ready, you can finally start Collaboration Server On-Premises.
docker run --init -p 8000:8000 \
--log-driver=fluentd \
--log-opt fluentd-address=[Fluentd address]:24224 \
[Your config here] \
docker.cke-cs.com/cs:[version]
Now open Kibana in your browser. It is available at http://localhost:5601/. In the first run, you may be asked about creating an index. Use the fluentd-*
pattern and press the “Create” button. After this step, your logs should appear in the “Discover” tab.
# Monitoring resources
To ensure that the infrastructure for Collaboration Server On-Premises is correctly scaled and can handle the load you need to monitor the following servers:
- SQL server – monitor CPU and memory usage
- If the consumption of these resources reaches 70-90%, consider increasing the size or the number of nodes of the SQL server.
- Redis server – monitor CPU and memory usage
- If the consumption of these resources reaches 70-90%, consider increasing the size or the number of nodes of the Redis server.
- On-Premises server – monitor CPU and memory usage, and the processing time of requests from metric logs
- If the consumption of these resources reaches 70-90% or you notice a significant increase of the
duration
values in metric logs:- consider scaling Collaboration Server On-Premises instances if high usage is caused by users and collaboration sessions,
- consider scaling (or start using) Collaboration Worker instances if high usage is caused by editor bundle features.
- If the consumption of these resources reaches 70-90% or you notice a significant increase of the