How to upgrade Fluentd (td-agent) from td-agent2 to td-agent3

How to upgrade Fluentd (td-agent) from td-agent2 to td-agent3

In this article, we are going to see how to upgrade td-agent from td-agent2 to td-agent3. I had a td-agent2 on my production system and the system is busy. Incoming data is 100-200 records in a second.




Performance issue on td-agent2

I built a production system with td-agent2, but getting data from a text file and propagating data to PostgreSQL database was so slow, so I looked at Fluentd official website and checked all my configuration. I also checked system activities such as CPU and memory and disk I/O speed, but I had no issues on Linux system. Therefore, I was thinking something might be wrong in my td-agent2 configuration file.

Decided to upgrade td-ganet2 to td-agent3

Even though I checked all my configuration in my td-agent.conf file, so I decided to upgrade it to td-agent3. I was not 100% sure the upgrade solves my performance issue, but since I had no way to solve the issue, I looked over the following web page and decided to upgrade it.

td-agent v2 vs. td-agent v3

According to the website, the following items are implemented and added and updated.

- Ruby 2.4 - Fluentd v0.14/v1.0 - Updated for the core libraries, msgpack, Cool.io, etc. - Windows support - Drop older distributions and non-popular plugins

Start upgrading td-agent from td-agent2 to td-agent3

Here is what I did when I upgraded td-agent on my production system.

Stop td-agent2 process

Before start upgrading, let's take a backup of td-agent2 configuration file. We just need to copy /etc/init.d/td-agent.conf file to another safe place.

Also, we need to make sure that we have "flush_at_shutdown" setting. If we don't have, let's added it into td-agent2's configuration file.

Since I couldn't find any information regarding buffer file compatibility between td-agent2 and td-agent3, I was not sure whether td-agent2's buffer file works with td-agent3. So it's better to flush all outstanding items (items in buffer need to be flushed) just in case.

flush_at_shutdown true
"flush_at_shutdown true" setting flushes all outstanding items, so that we don't have any buffer files. Once we added, we need to reload td-agent.conf file as follows.

# /etc/init.d/td-agent reload
Then, we need to stop td-agent2 process as follows.

# /etc/init.d/td-agent stop

Install td-agent3 by using install script

According to td-agent3 installation steps on the Fluentd official website, we need to install td-agent3 as follows. We don't need to uninstall td-agent2.

$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh

Change service start-up script (if necessary)

Once we installed td-agent3, service start up script which was maden by td-agent2 is overwritten, so if you customized already, need to take a backup of it.

If we need to access files or something which can be accessed only by root user, the following setting need to be changed in /etc/init.d/td-agent file.

(Before)
TD_AGENT_USER=td-agent
TD_AGENT_GROUP=td-agent

(After)
TD_AGENT_USER=root
TD_AGENT_GROUP=root

Install plugins on td-agent3 if we already installed plguins in td-agent2

If we already installed plugins on td-agent2, we need to install same plugins on td-agent3. Since I installed PostgreSQL plguin, I needed to run the following command.

# td-agent-gem install fluent-plugin-postgres

Adjust some configuration for td-agent3

Buffer related parameters have been changed in td-agent3, so we need to change buffer setting. If we use td-agnet2's buffer configuration, we get error message.

Below is my td-agent3's configuration file. Since I cannot lose any data on my system, I set huge buffer in case of emergency.

<match foo.bar>
    @type stdout

    <buffer>
      @type file
      path /var/log/td-agent/foo.*.buffer

      # Buffering parameters
      chunk_limit_size 256MB
      total_limit_size 64GB

      # Flushing parameters
      flush_at_shutdown true
      flush_mode immediate
      #flush_interval 1s
      flush_thread_count 50
      flush_thread_interval 1.0
      flush_thread_burst_interval 1.0
      delayed_commit_timeout 5
      overflow_action block

      # Retries parameters
      #retry_timeout 240h
      retry_forever true
      retry_max_times none
      retry_secondary_threshold 0.5
    </buffer>

    num_threads 100
</match>

Check td-agent.conf file

Once we modified td-agent.conf file for td-agent3, we need to run the following command to make sure configuration is correct.

# /etc/init.d/td-agent configtest
In order to make sure the configuration again, we need to run td-agent command with "--dry-run" option. This simulates job execution, but doesn't make any changes. We can check whether configuration is correct with this command as well.

# td-agent --dry-run -c /etc/td-agent/td-agent.conf

Run td-agent3 process

We have completed upgrading td-agent from td-agent2 to td-agent3. We can run td-agent3 now as follows.


# /etc/init.d/td-agent start

Performance after upgraded to td-agent3

I have been running td-agent3 for a week. As long as I realized that CPU usage is a bit high. CPU usage on td-agent2 was so low. This might be a big difference from td-agent2. CPU usage on td-agent2 was 10% or less, but it's now 20% on average, but I have no performance issues any more with td-agent3.

td-agent2 wasn't able to handle so many incoming data (100-200/sec), so I wasn't able to see near real-time data, but I see near real-time data with td-agent3 without any issues.

As a result of that, we should upgrade td-agent to td-agent3.