Главная

Thursday, 22 October 2020

Zabbix performance tuning #1.

Hi everybody.

Today I would like to present some ideas about Zabbix performance tuning. The main feature of problem with Zabbix can too many processes by graphs:
- zabbix poller processes more than 75% busy
- zabbix unreachable poller processes more than 75% busy

1. The device that collects data through Zabbix agent is in the state of monitoring, but the machine crashes or other reasons cause the zabbix agent to die. The server cannot obtain data, and the unreachable poller will rise.

2. The device that collects data through Zabbix agent is in the monitoring state, but the server takes too long to obtain data from the agent, often exceeding the server or even the timeout time, at this time the unreachable poller will increase.

So, this article is recopied from the "Gong Xiaoyi" blog, please be sure to keep this source http://gongxiaoyi.blog.51cto.com/7325139/1825492


Ok, let’s go.

Optimization ideas:

1. Ensure that the performance of zabbix internal components is under monitoring (The basis for tuning!)
 
2. Use a server with sufficient hardware performance
 
3. Separate different roles and use separate servers

4. Use active mode

5. Zabbixtmp uses tmpfs file system

6. Use distributed deployment

7. Adjust MySQL performance

8. Adjust Zabbix's own configuration


Optimize deployment:

1. Measure zabbix performance

Measure its performance by Zabbix's NVPS (number of processing values per second), and there is a rough estimate on Zabbix's dashboard:
-    percentage of time a component is in BUSY state:
•zabbix[process,<type>,<mode>,<state>]
•<type> - trapper, discoverer, escalator, alerter, etc
•<mode>  - avg, count, min, max
•<state> - busy, idle
-    real number of VPS:
•zabbix[wcache, values, all]
•zabbix[queue,1m] number of items delayed for more than 1 minute
-    Zabbix Server components:
alerter, configuration syncer, DB watchdog, discoverer, escalator, history syncer, http poller, housekeeper, icmp pinger, ipmi poller, poller, trapper.

2. Obtain the working status of zabbix internal components


3. Use tmpfs file system

cd /

mkdir zabbixtmp

chown mysql:mysql zabbixtmp

vi /etc/fstab #Configure the /etc/fstab file

tmpfs /zabbixtmp tmpfs rw,size=400m,nr_inodes=10k,mod=0700,uid=mysql,gid=mysql 0 0

When configuring the /etc/fstab parameter, you need to pay attention to the file size setting. Generally, it is set to 8%-10% of the physical memory.


4. Use active mode and proxy distributed monitoring.

When there are too many hosts on the zabbix_server side, the server side collects data, zabbix will have serious performance problems, mainly as follows:

1) When the monitored end reaches an order of magnitude, the web operation is very stuck and 502 is prone to appear

2) Layer fracture

3) There are too many open processes (pollar), even if the number of items is reduced, adding a certain amount of machines in the future will cause problems

Optimization consideration direction:

1) Add proxy node or Node mode for distributed monitoring

2) Adjust agentd to active mode


The monitored end zabbix_Agentd.conf configuration

vim zabbix_Agentd.conf

LogFile = /tmp/zabbix_agentd.log

StartAgents=0

ServerActive=ip

Hostname=

RefreshActiveChecks=1800

BufferSize=200

Timeout=10



Serverd side zabbix_server.conf configuration adjustment

StartPollers=100

StartTrappers=200


Batch modification in zabbix template becomes zabbix agent (active) mode.


5.zabbix mysql tuning

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

user=mysql

 
# Disabling symbolic-links is recommended to prevent assorted security risks

tmpdir=/zabbixtmp

#network

connect_timeout =60

wait_timeout =5000

max_connections=400

max_allowed_packet =16M

max_connect_errors=400

#limits

tmp_table_size =256M

max_heap_table_size =64M

table_cache =256

#logs

slow_query_log_file =/var/log/slowquery.log

 

log_error =/var/log/mysql-error.log

long_query_time =10

slow_query_log =1

#innodb

 
#innodb_data_file_path =ibdata1:128M;ibdata2:128M:autoextend:max:4096M

innodb_file_per_table =1 #One file per table

innodb_status_file =1

 

innodb_additional_mem_pool_size = 128M

innodb_buffer_pool_size = 2800M # Generally set to 70%-80% of the server's physical memory

innodb_flush_method =O_DIRECT

#innodb_io_capacity =1000

innodb_support_xa =0

innodb_log_file_size =64M # The zabbix database is a database with more writes, so setting a larger one can prevent MySQL from continuously flushing log files to the table.

But there is a side effect, that is, starting and closing the database will be slower.

innodb_log_buffer_size = 32M

symbolic-links=0

#log-queries-not-using-indexes

thread_cache_size=4 #This value seems to affect the hit rate of Threads_created per Connection in the show global status output

When set to 4, there are 3228483 Connections and 5840 Threads_created, and the hit rate reaches 99.2% Threads_created. The value should be as small as possible.

query_cache_size=128M

#join_buffer_size=512K

join_buffer_size=128M

read_buffer_size=128M

read_rnd_buffer_size=128M

key_buffer=128M

innodb_flush_log_at_trx_commit=2

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

#DsiableHousekeeper=1
#When using the partition table, turn off Housekeeper


6. Adjust the number of zabbix worker processes

vim zabbix_server.conf

StartPollers=90

StartPingers=10

StartPollersUnreacheable=80

StartIPMIPollers=10

StartTrappers=20

StartDBSyncers=8

LogSlowQueries=1000


7.zabbix db partition

Step 1. Prepare related tables

ALTER TABLE `acknowledges` DROP PRIMARY KEY, ADD KEY `acknowledgedid` (`acknowledgeid`);

ALTER TABLE `alerts` DROP PRIMARY KEY, ADD KEY `alertid` (`alertid`);

ALTER TABLE `auditlog` DROP PRIMARY KEY, ADD KEY `auditid` (`auditid`);

ALTER TABLE `events` DROP PRIMARY KEY, ADD KEY `eventid` (`eventid`);

ALTER TABLE `service_alarms` DROP PRIMARY KEY, ADD KEY `servicealarmid` (`servicealarmid`);

ALTER TABLE `history_log` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`);

ALTER TABLE `history_log` DROP KEY `history_log_2`;

ALTER TABLE `history_text` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`);

ALTER TABLE `history_text` DROP KEY `history_text_2`;



Step2. Set monthly partition

Please repeat the following steps in all the tables in the first step. The following example is to create a monthly partition between 2011-5 and 2011-12 for the events table.

ALTER TABLE `events` PARTITION BY RANGE( clock) (

PARTITION p201105 VALUES LESS THAN (UNIX_TIMESTAMP("2011-06-01 00:00:00")),

PARTITION p201106 VALUES LESS THAN (UNIX_TIMESTAMP("2011-07-01 00:00:00")),

PARTITION p201107 VALUES LESS THAN (UNIX_TIMESTAMP("2011-08-01 00:00:00")),

PARTITION p201108 VALUES LESS THAN (UNIX_TIMESTAMP("2011-09-01 00:00:00")),

PARTITION p201109 VALUES LESS THAN (UNIX_TIMESTAMP("2011-10-01 00:00:00")),

PARTITION p201110 VALUES LESS THAN (UNIX_TIMESTAMP("2011-11-01 00:00:00")),

PARTITION p201111 VALUES LESS THAN (UNIX_TIMESTAMP("2011-12-01 00:00:00")),

PARTITION p201112 VALUES LESS THAN (UNIX_TIMESTAMP("2012-01-01 00:00:00"))

);


Step3. Set the daily partition

Please repeat the following steps in all the tables in the first step. The following example is to create a daily partition between 5.15 and 5.22 for the history_uint table.

ALTER TABLE `history_uint` PARTITION BY RANGE( clock) (

PARTITION p20110515 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-16 00:00:00")),

PARTITION p20110516 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-17 00:00:00")),

PARTITION p20110517 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-18 00:00:00")),

PARTITION p20110518 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-19 00:00:00")),

PARTITION p20110519 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-20 00:00:00")),

PARTITION p20110520 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-21 00:00:00")),

PARTITION p20110521 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-22 00:00:00")),

PARTITION p20110522 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-23 00:00:00"))

);


Manually maintain the partition:

Add new partition

ALTER TABLE `history_uint` ADD PARTITION (

PARTITION p20110523 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-24 00:00:00"))

);


Delete partition (using Housekeepeing)

ALTER TABLE `history_uint` DROP PARTITION p20110515;

 
Step4. Automatic daily partition

Confirm that the partition has been correctly created for the history table in step 3.

The following script automatically drops and creates daily partitions. By default, only the last 3 days are reserved. If you need more days, please modify

@mindays this variable.


Don't forget to add this command to your cron!

mysql -B -h localhost -u zabbix -pPASSWORD zabbix -e "CALL create_zabbix_partitions();"


Script to automatically create partition:

https://github.com/xsbr/zabbixzone/blob/master/zabbix-mysql-autopartitioning.sql


DELIMITER //

DROP PROCEDURE IF EXISTS `zabbix`.`create_zabbix_partitions` //

CREATE PROCEDURE `zabbix`.`create_zabbix_partitions` ()

BEGIN

CALL zabbix.create_next_partitions("zabbix","history");

CALL zabbix.create_next_partitions("zabbix","history_log");

CALL zabbix.create_next_partitions("zabbix","history_str");

CALL zabbix.create_next_partitions("zabbix","history_text");

CALL zabbix.create_next_partitions("zabbix","history_uint");

CALL zabbix.drop_old_partitions("zabbix","history");

CALL zabbix.drop_old_partitions("zabbix","history_log");

CALL zabbix.drop_old_partitions("zabbix","history_str");

CALL zabbix.drop_old_partitions("zabbix","history_text");

CALL zabbix.drop_old_partitions("zabbix","history_uint");

END //

DROP PROCEDURE IF EXISTS `zabbix`.`create_next_partitions` //

CREATE PROCEDURE `zabbix`.`create_next_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64))

BEGIN

DECLARE NEXTCLOCK timestamp;

DECLARE PARTITIONNAME varchar(16);

DECLARE CLOCK int;

SET @totaldays = 7;

SET @i = 1;

createloop: LOOP

SET NEXTCLOCK = DATE_ADD(NOW(),INTERVAL @i DAY);

SET PARTITIONNAME = DATE_FORMAT( NEXTCLOCK, ‘p%Y%m%d’ );

SET CLOCK = UNIX_TIMESTAMP(DATE_FORMAT(DATE_ADD( NEXTCLOCK ,INTERVAL 1 DAY),‘%Y-%m-%d 00:00:00’));

CALL zabbix.create_partition( SCHEMANAME, TABLENAME, PARTITIONNAME, CLOCK );

SET @[email protected]+1;

IF @i> @totaldays THEN

LEAVE createloop;

END IF;

END LOOP;

END //

DROP PROCEDURE IF EXISTS `zabbix`.`drop_old_partitions` //

CREATE PROCEDURE `zabbix`.`drop_old_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64))

BEGIN

DECLARE OLDCLOCK timestamp;

DECLARE PARTITIONNAME varchar(16);

DECLARE CLOCK int;

SET @mindays = 3;

SET @maxdays = @mindays+4;

SET @i = @maxdays;

droploop: LOOP

SET OLDCLOCK = DATE_SUB(NOW(),INTERVAL @i DAY);

SET PARTITIONNAME = DATE_FORMAT( OLDCLOCK, ‘p%Y%m%d’ );

CALL zabbix.drop_partition( SCHEMANAME, TABLENAME, PARTITIONNAME );

SET @[email protected];

IF @i <= @mindays THEN

LEAVE droploop;

END IF;

END LOOP;

END //

DROP PROCEDURE IF EXISTS `zabbix`.`create_partition` //

CREATE PROCEDURE `zabbix`.`create_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int)

BEGIN

DECLARE RETROWS int;

SELECT COUNT(1) INTO RETROWS

FROM `information_schema`.`partitions`

WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME;


IF RETROWS = 0 THEN

SELECT CONCAT( "create_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ",", CLOCK, ")") AS msg;

SET @sql = CONCAT( ‘ALTER TABLE `‘, SCHEMANAME, ‘`.`‘, TABLENAME, ‘`‘,

‘ADD PARTITION (PARTITION ‘, PARTITIONNAME,‘ VALUES LESS THAN (‘, CLOCK, ‘));‘ );

PREPARE STMT FROM @sql;

EXECUTE STMT;

DEALLOCATE PREPARE STMT;

END IF;

END //

DROP PROCEDURE IF EXISTS `zabbix`.`drop_partition` //

CREATE PROCEDURE `zabbix`.`drop_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64))

BEGIN

DECLARE RETROWS int;

SELECT COUNT(1) INTO RETROWS

FROM `information_schema`.`partitions`

WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME;



IF RETROWS = 1 THEN

SELECT CONCAT( "drop_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ")") AS msg;

SET @sql = CONCAT( ‘ALTER TABLE `‘, SCHEMANAME, ‘`.`‘, TABLENAME, ‘`‘,

‘DROP PARTITION ‘, PARTITIONNAME, ‘;’ );

PREPARE STMT FROM @sql;

EXECUTE STMT;

DEALLOCATE PREPARE STMT;

END IF;

END //

DELIMITER;



Summary: 

the idea of optimization is when there are more and more machines

1. Increase the number of zabbix worker processes

2. In active mode, the agent sends data actively

3. Use proxy for distributed monitoring

4. mysql tuning


Reference documents:

http://www.centoscn.com/zabbix/2014/0508/2936.html

http://caiguangguang.blog.51cto.com/1652935/1354093

http://waringid.blog.51cto.com/65148/1156013/

http://blog.sina.com.cn/s/blog_4cbf97060101fcfw.html

http://www.linuxidc.com/Linux/2015-08/121799.htm




No comments:

Post a Comment

А что вы думаете по этому поводу?