Pandora FMS PosgreSQL Monitoring

comments and questions about working with SQL Server and PostgreSQL
Post Reply
Posts: 17
Joined: Thu Jun 20, 2019 6:41 pm

Mon Jun 24, 2019 2:12 pm

In order to get the software working at peak performance and make a better use of your resources, it is essential to optimize the parameters of Progress. Optimizing is important, but optimizing without monitoring is completely useless. With a tool that monitors your systems, you will be notified on problems before they become a bigger issue. If an error occurs in your database, you will receive an email with all the information so you can solve it before it's too late.

A monitoring system will let you know about when, where and why a problem occurred. With a good optimization and monitoring of ProgreSQL, this database will be suitable even for large companies.

All these things will provide you with a better user experience and reliability. It will even reduce the time spent on maintenance of PostgreSQL and, therefore, you will cut down on costs and time.


There are several tools from Linux / Unix that collect useful information about the impact of PostgreSQL in the system. We will talk about some of them below.

* ps

With command, which most of Linux / Unix systems includes, you can get information about the CPU usage, RAM PostgreSQL processes, the client connections to the server and its activity, among many other things.

For example, with a query as the one below:

ps aux | grep "postgres" | grep -v "grep"

We would get this information:

postgres 1289 0.0 0.3 51676 7900? S 11:31 0:01 /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main -c config_file=/etc/postgresql/9.1/main/postgresql.conf
postgres 1303 0.0 0.0 21828 1148? Ss 11:31 0:00 postgres: logger process

postgres 1305 0.0 0.0 51676 1568? Ss 11:31 0:02 postgres: writer process

postgres 1306 0.0 0.0 51676 1324? Ss 11:31 0:02 postgres: wal writer process

postgres 1307 0.0 0.1 52228 2452? Ss 11:31 0:00 postgres: autovacuum launcher process

postgres 1308 0.0 0.0 22016 1432? Ss 11:31 0:00 postgres: stats collector process

The third and fourth columns show the percentage of CPU usage and RAM, respectively. Since a very high consumption of CPU and RAM can cause slowness and server crashes, it is important to pay attention to these two data.

Note that if data appear when performing this query, it means that the PostgreSQL server is active.

* Vmstat:

With this command (included in most of the facilities of Linux / Unix systems) you can obtain data related to the common memory and SWAP, (at the entrance and exit), the system and the CPU. The first line always shows an average since the last reboot.

To get a view of the current data during an interval of time with a defined number of repetitions, you can perform a query like the one below, where the first variable shows the time in seconds that will measure each line, and the second variable shows the number of lines that will appear:

vmstat 1 5
It will return information like this:

procs ----memory---- -swap- --io-- -system- --cpu--
r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 188972 179496 669392 0 0 24 27 525 461 1 1 96 1

0 0 0 188972 179496 669392 0 0 0 0 1449 1689 2 2 97 0

0 0 0 188972 179496 669392 0 0 0 0 2403 1605 2 2 97 0

2 0 0 188972 179500 669392 0 0 0 16 2041 2502 2 3 94 2

0 0 0 188972 179500 669392 0 0 0 0 1599 1868 1 1 98 0

The most useful information to the topic that concerns us is shown in the last column: "wa". It shows the timeout for Input / Output operations. Since the system could "collapse " due to an excessive disk access, it is very important to monitor it so it will not exceed a value of 15-20.

Check if the values of the columns "si" and "so" of the SWAP memory usage are 0. It would mean that the Ram memory of the server is saturated.

* netstat

This command displays information about the status of the network connections, so you can check the number of standby connections in the computer hosting the PostgreSQ, with a query like the one below.

netstat -ntu | grep "TIME_WAIT" | wc -l
A high number of "standby connections" may indicate problems with the server response as a result of other problems.

** Views and internal tables of PostgreSQL

PostgreSQL has a subsystem called Statistics Collector responsible for collecting and reporting information on the activity of the server. To deactivate Statistic Collectors and avoid performance losses, modify the parameters ' track_counts ', ' track_functions ' and ' track_activities ' on the file ' postgresql.conf '. However, before deactivating it, you must assess whether it is worthwhile as this will remove the possibility of obtaining very useful data.

To access these data, you need SQL to perform several queries to some predefined views. Some of the most important ones are:

* pg_database

It saves information about the databases available. A database per row.

This table is very interesting because it will allow you to find out the size ( in bytes ) of all the databases. The " pg_database_size ( oid ) " returns the size of the database whose ID is passed as an argument. By performing a query on this table, it will show the size of each database, row by row, and you will only have to add the results to see if there is enough space left in the disk:

SELECT SUM(pg_database_size(oid)) FROM pg_database;
* pg_locks

* pg_locks displays information on the active locks in the server. It has one row for each lock, which display information about the type of lock and where it has occurred, etc.

As the SGBD allows multiple users to access at the same time, some of them may want to make changes on the same item that is being used by another user. To avoid this situation, the elements in use in a transaction must be locked. To find out the number of locked element, perform the following query:

SELECT COUNT(*) FROM pg_locks;
A high number of locks could decrease the server performance, because two or more processes could try to access the same resource and would have to wait for the resource to be unlocked.

* pg_stat_activity: it shows information about the server processes. It shows a row per process.

From this view you can get the number of users connected to the query:

SELECT COUNT(usename) FROM pg_stat_activity;
The number of concurrent users that can manage the system depend on the hardware and the connection of the machine where the server is located. It is convenient to perform stress tests to find out what that limit is and be alerted if the system is reaching it.

* pg_stat_database

It shows information about the use of databases. It has one row for each database, which shows the number of connected processes, commits and rollbacks. Furthermore, it shows information on the blocks read from the disk, the cache and the number and type of operations performed with rows of each one.

With this table you can find out, for example, if the cache memory is working properly. With the query below, you can get a usage rate of the cache with respect to the common:

SELECT SUM(blks_hit) / SUM(blks_read) FROM pg_stat_database;
The higher ratio, the greater speed when collecting data for the tables.

* pg_stat_bgwriter

It only has one row and shows data related to checkpoints and buffers from the background writer.

An advanced user able to interpret data can take advantage of this information by changing certain properties to improve the performance. For example, you can perform this query:

SELECT maxwritten_clean FROM pg_stat_bgwriter;
This query will return the number of times that the bgwriter has overwritten the maximum number of buffers allowed in that round, when a checkpoint has been performed. A low number is OK, but if the values are too high, you should increase the value ' bgwriter_lru_maxpages ' (default 100 ) in ' postgresql.conf ' or the command line to improve the performance.

You can also improve the performance by checking what external processes are forced to ask for more space many times in the buffers.

Use this query to check it:

SELECT buffers_backend FROM pg_stat_bgwriter;
To expand the space before a process needs to extend it, you can increase the values ' bgwriter_lru_maxpages ' and ' bgwriter_lru_multiplier ', and reduce the value ' bgwriter_delay '.
Post Reply

Social Media