Today I would like to present some ideas about Zabbix performance tuning. The main feature of problem with Zabbix can too many processes by graphs:
- zabbix poller processes more than 75% busy
- zabbix unreachable poller processes more than 75% busy
1. The device that collects data through Zabbix agent is in the state of monitoring, but the machine crashes or other reasons cause the zabbix agent to die. The server cannot obtain data, and the unreachable poller will rise.
2. The device that collects data through Zabbix agent is in the monitoring state, but the server takes too long to obtain data from the agent, often exceeding the server or even the timeout time, at this time the unreachable poller will increase.
So, this article is recopied from the "Gong Xiaoyi" blog, please be sure to keep this source http://gongxiaoyi.blog.51cto.com/7325139/1825492
Ok, let’s go.
Optimization ideas:
1. Ensure that the performance of zabbix internal components is under monitoring (The basis for tuning!)
2. Use a server with sufficient hardware performance
3. Separate different roles and use separate servers
4. Use active mode
5. Zabbixtmp uses tmpfs file system
6. Use distributed deployment
7. Adjust MySQL performance
8. Adjust Zabbix's own configuration
Optimize deployment:
1. Measure zabbix performance
Measure its performance by Zabbix's NVPS (number of processing values per second), and there is a rough estimate on Zabbix's dashboard:
- percentage of time a component is in BUSY state:
•zabbix[process,<type>,<mode>,<state>]
•<type> - trapper, discoverer, escalator, alerter, etc
•<mode> - avg, count, min, max
•<state> - busy, idle
- real number of VPS:
•zabbix[wcache, values, all]
•zabbix[queue,1m] number of items delayed for more than 1 minute
- Zabbix Server components:
alerter, configuration syncer, DB watchdog, discoverer, escalator, history syncer, http poller, housekeeper, icmp pinger, ipmi poller, poller, trapper.
2. Obtain the working status of zabbix internal components
3. Use tmpfs file system
cd /
mkdir zabbixtmp
chown mysql:mysql zabbixtmp
vi /etc/fstab #Configure the /etc/fstab file
tmpfs /zabbixtmp tmpfs rw,size=400m,nr_inodes=10k,mod=0700,uid=mysql,gid=mysql 0 0
When configuring the /etc/fstab parameter, you need to pay attention to the file size setting. Generally, it is set to 8%-10% of the physical memory.
4. Use active mode and proxy distributed monitoring.
When there are too many hosts on the zabbix_server side, the server side collects data, zabbix will have serious performance problems, mainly as follows:
1) When the monitored end reaches an order of magnitude, the web operation is very stuck and 502 is prone to appear
2) Layer fracture
3) There are too many open processes (pollar), even if the number of items is reduced, adding a certain amount of machines in the future will cause problems
Optimization consideration direction:
1) Add proxy node or Node mode for distributed monitoring
2) Adjust agentd to active mode
The monitored end zabbix_Agentd.conf configuration
vim zabbix_Agentd.conf
LogFile = /tmp/zabbix_agentd.log
StartAgents=0
ServerActive=ip
Hostname=
RefreshActiveChecks=1800
BufferSize=200
Timeout=10
Serverd side zabbix_server.conf configuration adjustment
StartPollers=100
StartTrappers=200
Batch modification in zabbix template becomes zabbix agent (active) mode.
5.zabbix mysql tuning
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
tmpdir=/zabbixtmp
#network
connect_timeout =60
wait_timeout =5000
max_connections=400
max_allowed_packet =16M
max_connect_errors=400
#limits
tmp_table_size =256M
max_heap_table_size =64M
table_cache =256
#logs
slow_query_log_file =/var/log/slowquery.log
log_error =/var/log/mysql-error.log
long_query_time =10
slow_query_log =1
#innodb
#innodb_data_file_path =ibdata1:128M;ibdata2:128M:autoextend:max:4096M
innodb_file_per_table =1 #One file per table
innodb_status_file =1
innodb_additional_mem_pool_size = 128M
innodb_buffer_pool_size = 2800M # Generally set to 70%-80% of the server's physical memory
innodb_flush_method =O_DIRECT
#innodb_io_capacity =1000
innodb_support_xa =0
innodb_log_file_size =64M # The zabbix database is a database with more writes, so setting a larger one can prevent MySQL from continuously flushing log files to the table.
But there is a side effect, that is, starting and closing the database will be slower.
innodb_log_buffer_size = 32M
symbolic-links=0
#log-queries-not-using-indexes
thread_cache_size=4 #This value seems to affect the hit rate of Threads_created per Connection in the show global status output
When set to 4, there are 3228483 Connections and 5840 Threads_created, and the hit rate reaches 99.2% Threads_created. The value should be as small as possible.
query_cache_size=128M
#join_buffer_size=512K
join_buffer_size=128M
read_buffer_size=128M
read_rnd_buffer_size=128M
key_buffer=128M
innodb_flush_log_at_trx_commit=2
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#DsiableHousekeeper=1
#When using the partition table, turn off Housekeeper
6. Adjust the number of zabbix worker processes
vim zabbix_server.conf
StartPollers=90
StartPingers=10
StartPollersUnreacheable=80
StartIPMIPollers=10
StartTrappers=20
StartDBSyncers=8
LogSlowQueries=1000
7.zabbix db partition
Step 1. Prepare related tables
ALTER TABLE `acknowledges` DROP PRIMARY KEY, ADD KEY `acknowledgedid` (`acknowledgeid`);
ALTER TABLE `alerts` DROP PRIMARY KEY, ADD KEY `alertid` (`alertid`);
ALTER TABLE `auditlog` DROP PRIMARY KEY, ADD KEY `auditid` (`auditid`);
ALTER TABLE `events` DROP PRIMARY KEY, ADD KEY `eventid` (`eventid`);
ALTER TABLE `service_alarms` DROP PRIMARY KEY, ADD KEY `servicealarmid` (`servicealarmid`);
ALTER TABLE `history_log` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`);
ALTER TABLE `history_log` DROP KEY `history_log_2`;
ALTER TABLE `history_text` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`);
ALTER TABLE `history_text` DROP KEY `history_text_2`;
Step2. Set monthly partition
Please repeat the following steps in all the tables in the first step. The following example is to create a monthly partition between 2011-5 and 2011-12 for the events table.
ALTER TABLE `events` PARTITION BY RANGE( clock) (
PARTITION p201105 VALUES LESS THAN (UNIX_TIMESTAMP("2011-06-01 00:00:00")),
PARTITION p201106 VALUES LESS THAN (UNIX_TIMESTAMP("2011-07-01 00:00:00")),
PARTITION p201107 VALUES LESS THAN (UNIX_TIMESTAMP("2011-08-01 00:00:00")),
PARTITION p201108 VALUES LESS THAN (UNIX_TIMESTAMP("2011-09-01 00:00:00")),
PARTITION p201109 VALUES LESS THAN (UNIX_TIMESTAMP("2011-10-01 00:00:00")),
PARTITION p201110 VALUES LESS THAN (UNIX_TIMESTAMP("2011-11-01 00:00:00")),
PARTITION p201111 VALUES LESS THAN (UNIX_TIMESTAMP("2011-12-01 00:00:00")),
PARTITION p201112 VALUES LESS THAN (UNIX_TIMESTAMP("2012-01-01 00:00:00"))
);
Step3. Set the daily partition
Please repeat the following steps in all the tables in the first step. The following example is to create a daily partition between 5.15 and 5.22 for the history_uint table.
ALTER TABLE `history_uint` PARTITION BY RANGE( clock) (
PARTITION p20110515 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-16 00:00:00")),
PARTITION p20110516 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-17 00:00:00")),
PARTITION p20110517 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-18 00:00:00")),
PARTITION p20110518 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-19 00:00:00")),
PARTITION p20110519 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-20 00:00:00")),
PARTITION p20110520 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-21 00:00:00")),
PARTITION p20110521 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-22 00:00:00")),
PARTITION p20110522 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-23 00:00:00"))
);
Manually maintain the partition:
Add new partition
ALTER TABLE `history_uint` ADD PARTITION (
PARTITION p20110523 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-24 00:00:00"))
);
Delete partition (using Housekeepeing)
ALTER TABLE `history_uint` DROP PARTITION p20110515;
Step4. Automatic daily partition
Confirm that the partition has been correctly created for the history table in step 3.
The following script automatically drops and creates daily partitions. By default, only the last 3 days are reserved. If you need more days, please modify
@mindays this variable.
Don't forget to add this command to your cron!
mysql -B -h localhost -u zabbix -pPASSWORD zabbix -e "CALL create_zabbix_partitions();"
Script to automatically create partition:
https://github.com/xsbr/zabbixzone/blob/master/zabbix-mysql-autopartitioning.sql
DELIMITER //
DROP PROCEDURE IF EXISTS `zabbix`.`create_zabbix_partitions` //
CREATE PROCEDURE `zabbix`.`create_zabbix_partitions` ()
BEGIN
CALL zabbix.create_next_partitions("zabbix","history");
CALL zabbix.create_next_partitions("zabbix","history_log");
CALL zabbix.create_next_partitions("zabbix","history_str");
CALL zabbix.create_next_partitions("zabbix","history_text");
CALL zabbix.create_next_partitions("zabbix","history_uint");
CALL zabbix.drop_old_partitions("zabbix","history");
CALL zabbix.drop_old_partitions("zabbix","history_log");
CALL zabbix.drop_old_partitions("zabbix","history_str");
CALL zabbix.drop_old_partitions("zabbix","history_text");
CALL zabbix.drop_old_partitions("zabbix","history_uint");
END //
DROP PROCEDURE IF EXISTS `zabbix`.`create_next_partitions` //
CREATE PROCEDURE `zabbix`.`create_next_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64))
BEGIN
DECLARE NEXTCLOCK timestamp;
DECLARE PARTITIONNAME varchar(16);
DECLARE CLOCK int;
SET @totaldays = 7;
SET @i = 1;
createloop: LOOP
SET NEXTCLOCK = DATE_ADD(NOW(),INTERVAL @i DAY);
SET PARTITIONNAME = DATE_FORMAT( NEXTCLOCK, ‘p%Y%m%d’ );
SET CLOCK = UNIX_TIMESTAMP(DATE_FORMAT(DATE_ADD( NEXTCLOCK ,INTERVAL 1 DAY),‘%Y-%m-%d 00:00:00’));
CALL zabbix.create_partition( SCHEMANAME, TABLENAME, PARTITIONNAME, CLOCK );
SET @[email protected]+1;
IF @i> @totaldays THEN
LEAVE createloop;
END IF;
END LOOP;
END //
DROP PROCEDURE IF EXISTS `zabbix`.`drop_old_partitions` //
CREATE PROCEDURE `zabbix`.`drop_old_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64))
BEGIN
DECLARE OLDCLOCK timestamp;
DECLARE PARTITIONNAME varchar(16);
DECLARE CLOCK int;
SET @mindays = 3;
SET @maxdays = @mindays+4;
SET @i = @maxdays;
droploop: LOOP
SET OLDCLOCK = DATE_SUB(NOW(),INTERVAL @i DAY);
SET PARTITIONNAME = DATE_FORMAT( OLDCLOCK, ‘p%Y%m%d’ );
CALL zabbix.drop_partition( SCHEMANAME, TABLENAME, PARTITIONNAME );
SET @[email protected];
IF @i <= @mindays THEN
LEAVE droploop;
END IF;
END LOOP;
END //
DROP PROCEDURE IF EXISTS `zabbix`.`create_partition` //
CREATE PROCEDURE `zabbix`.`create_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int)
BEGIN
DECLARE RETROWS int;
SELECT COUNT(1) INTO RETROWS
FROM `information_schema`.`partitions`
WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME;
IF RETROWS = 0 THEN
SELECT CONCAT( "create_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ",", CLOCK, ")") AS msg;
SET @sql = CONCAT( ‘ALTER TABLE `‘, SCHEMANAME, ‘`.`‘, TABLENAME, ‘`‘,
‘ADD PARTITION (PARTITION ‘, PARTITIONNAME,‘ VALUES LESS THAN (‘, CLOCK, ‘));‘ );
PREPARE STMT FROM @sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END //
DROP PROCEDURE IF EXISTS `zabbix`.`drop_partition` //
CREATE PROCEDURE `zabbix`.`drop_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64))
BEGIN
DECLARE RETROWS int;
SELECT COUNT(1) INTO RETROWS
FROM `information_schema`.`partitions`
WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME;
IF RETROWS = 1 THEN
SELECT CONCAT( "drop_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ")") AS msg;
SET @sql = CONCAT( ‘ALTER TABLE `‘, SCHEMANAME, ‘`.`‘, TABLENAME, ‘`‘,
‘DROP PARTITION ‘, PARTITIONNAME, ‘;’ );
PREPARE STMT FROM @sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END //
DELIMITER;
Summary:
the idea of optimization is when there are more and more machines
1. Increase the number of zabbix worker processes
2. In active mode, the agent sends data actively
3. Use proxy for distributed monitoring
4. mysql tuning
Reference documents:
http://www.centoscn.com/zabbix/2014/0508/2936.html
http://caiguangguang.blog.51cto.com/1652935/1354093
http://waringid.blog.51cto.com/65148/1156013/
http://blog.sina.com.cn/s/blog_4cbf97060101fcfw.html
http://www.linuxidc.com/Linux/2015-08/121799.htm
No comments:
Post a Comment
А что вы думаете по этому поводу?