guideLogs and monitoring

The logs in Collaboration Server On-Premises are written to stdout and stderr. Most of them are formatted in JSON. They may be used for debugging purposes or for monitoring requests rate, errors, duration and warnings (invalid requests). In production environments, we recommend storing logs to files or using a distributed logging system (like ELK or CloudWatch).

# Monitoring CS with logs

To get more insight into how the Collaboration Server is performing, we built logs that can be used for monitoring. To enable these, just add the ENABLE_METRIC_LOGS=true environment variable.

See configuration for more variable details.

# Log structure

The log structure contains the following information:

  • handler – A unified identifier of action. Use this field to identify calls.
  • traceId – A unique RPC call ID.
  • tags – A semicolon-separated list of tags. Use this field to filter metrics logs.
  • data – An object containing additional information. It might vary between different transports.
  • data.duration – The request duration in milliseconds.
  • data.transport – The type of the request transport. It could be http or ws (websocket).
  • data.status – The request status. It can be equal to success, fail, warning.
  • data.statusCode – The response status in the HTTP status code standard.

Additionally, for the HTTP transport, the following information is included:

  • data.url – The URL path.
  • data.method – The request method.

In case of an error, data.status will be equal to failed and data.message will contain the error message.

An example log for HTTP transport:

{
  "level": 30,
  "time": "2021-03-09T11:15:09.154Z",
  "msg": "Request summary",
  "handler": "v5:GET:collaborations:id:exists",
  "traceId": "bd77768c-4f49-44da-b658-f765340ea643",
  "data": {
    "duration": 32,
    "transport": "http",
    "statusCode": 200,
    "status": "success",
    "url": "/api/v5/e2e-58a48a5ba8521b6f/collaborations/e2e-eff1945d39894534/exists",
    "method": "GET"
  },
  "tags": "metrics"
}

An example log for WS transport:

{
  "level": 30,
  "time": "2021-03-09T13:11:52.068Z",
  "msg": "Request summary",
  "handler": "addComment",
  "traceId": "db09ba44-cb96-4db6-84f0-59eb3691b193",
  "data": {
    "duration": 12,
    "transport": "ws",
    "status": "success",
    "statusCode": 200
  },
  "tags": "metrics"
}

# Example charts

# Display number of requests rate per transport type

This information will give you the overall number of requests handled by the Collaboration Server split by transport (WebSockets and HTTP).

Metrics chart - requests rate.

Use field data.transport to distinguish between different request types.

# Display requests latency per operation

This chart will show how fast requests are per specific operation, which is very useful for measuring user experience as well as debugging.

Metrics chart - requests latency.

Use fields data.duration and handler to measure request latency of operations.

# Errors count per operation

Display number of failures (5xx codes) per operation type.

Errors count per operation.

Use fields handler and data.status to count failures of operations.

# Warnings count per operation

Display number of warnings (4xx codes) per operation type. This information is very useful especially for debugging the system performance issues.

Warnings count per operation.

Use the data.status and data.statusCode fields to count incorrect requests and their types.

# Docker

Docker has built-in logging mechanisms that capture logs from the output of containers. The default logging driver writes logs to files.

When using this driver, you can use the docker logs command to show logs from the container. You can add the -f flag to view logs in real time. Refer to the official Docker documentation for more information about the logs command.

When the container is running for a long time, the logs can take up a lot of space. To avoid this problem, you should make sure that the log rotation is enabled. This can be set with the max-size option.

# Distributed logging

If you are running more than one instance of Collaboration Server On-Premises, it is recommended to use a distributed logging system. It allows you to view and analyze logs from all instances in one place.

# AWS CloudWatch and other cloud solutions

If you are running Collaboration Server On-Premises in the cloud, the simplest and recommended way is to use a service which is available at the selected provider. Some of the available services include:

To use CloudWatch with AWS ECS, you have to create a log group first and change the log driver to awslogs. When the log driver is configured properly, the logs will be streamed directly to CloudWatch.

The logConfiguration of logging may look similar to this:

"logConfiguration": {
  "logDriver": "awslogs",
  "options": {
    "awslogs-region": "us-west-2",
    "awslogs-group": "cksource",
    "awslogs-stream-prefix": "ck-cs-logs"
  }
}

Refer to the Using the awslogs Log Driver article for more information.

# On-Premises solutions

If you are using your own infrastructure or for some reason cannot use the service offered by your provider, you can always use some on-premises distributed logging system.

There are a lot of solutions available, including:

  • ELK + Filebeat
    This is a stack built on top of Elasticsearch, Logstash and Kibana. In this configuration, Elasticsearch stores the logs, Filebeat reads the logs from Docker and sends them to Elasticsearch, and Kibana is used to view them. Logstash is not necessary because the logs are already structured.

  • Fluentd
    It uses a dedicated Docker log driver to send logs. It has a built-in frontend, but can be also integrated with Elasticsearch and Kibana for better filtering.

  • Graylog
    It uses a dedicated Docker log driver to send logs. It has a built-in frontend and needs Elasticsearch to store logs as well as a MongoDB database to store the configuration.

# Example configuration

The example configuration uses Fluentd, Elasticsearch and Kibana to capture logs from Docker.

Before running Collaboration Server On-Premises, you have to prepare the logging services. For the purposes of this example, Docker Compose is used. Create the fluentd, elasticsearch and kibana services inside the docker-compose.yml file:

version: '3.7'
services:
  fluentd:
    build: ./fluentd
    volumes:
      - ./fluentd/fluent.conf:/fluentd/etc/fluent.conf
    ports:
      - "24224:24224"
      - "24224:24224/udp"

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.5
    expose:
      - 9200
    ports:
      - "9200:9200"

  kibana:
    image: docker.elastic.co/kibana/kibana:6.8.5
    environment:
      ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"
    ports:
      - "5601:5601"

To integrate Fluentd with Elasticsearch, you first need to install fluent-plugin-elasticsearch in the Fluentd image. To do this, create a fluentd/Dockerfile with the following content:

FROM fluent/fluentd:v1.10-1

USER root

RUN apk add --no-cache --update build-base ruby-dev \
    && gem install fluent-plugin-elasticsearch \
    && gem sources --clear-all

Next, configure the input server and connection to Elasticsearch in the fluentd/fluent.conf file:

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<match *.**>
  @type copy
  <store>
    @type elasticsearch
    host elasticsearch
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>
  <store>
    @type stdout
  </store>
</match>

Now you are ready to run the services:

docker-compose up --build

When the services are ready, you can finally start Collaboration Server On-Premises.

docker run --init -p 8000:8000 \
--log-driver=fluentd \
--log-opt fluentd-address=[Fluentd address]:24224 \
[Your config here] \
docker.cke-cs.com/cs:[version]

Now open Kibana in your browser. It is available at http://localhost:5601/. In the first run, you may be asked about creating an index. Use the fluentd-* pattern and press the “Create” button. After this step, your logs should appear in the “Discover” tab.

# Monitoring resources

To ensure that the infrastructure for Collaboration Server On-Premises is correctly scaled and can handle the load you need to monitor the following servers:

  • SQL server – monitor CPU and memory usage
    • If the consumption of these resources reaches 70-90%, consider increasing the size or the number of nodes of the SQL server.
  • Redis server – monitor CPU and memory usage
    • If the consumption of these resources reaches 70-90%, consider increasing the size or the number of nodes of the Redis server.
  • On-Premises server – monitor CPU and memory usage, and the processing time of requests from metric logs
    • If the consumption of these resources reaches 70-90% or you notice a significant increase of the duration values in metric logs:
      • consider scaling Collaboration Server On-Premises instances if high usage is caused by users and collaboration sessions,
      • consider scaling (or start using) Collaboration Worker instances if high usage is caused by editor bundle features.