Clustercheck: Difference between revisions
(3 intermediate revisions by one other user not shown) | |||
Line 79: | Line 79: | ||
== Installation as web service via xinetd == | == Installation as web service via xinetd == | ||
Some of the different upstream packages install a <code>xinetd</code> service definition file <code>/etc/xinetd.d/mysqlchk</code>. If you don't have one, install the one pasted below. We use / configure it for our custom service like | |||
<pre># default: on | <pre># default: on | ||
Line 93: | Line 93: | ||
wait = no | wait = no | ||
user = nobody | user = nobody | ||
server = /usr/bin/clustercheck | server = /usr/local/bin/clustercheck | ||
server_args = -e /var/log/clustercheck.log -f /usr/local/etc/clustercheck.my.cnf | server_args = -e /var/log/clustercheck.log -f /usr/local/etc/clustercheck.my.cnf | ||
log_on_failure += USERID | log_on_failure += USERID | ||
Line 103: | Line 103: | ||
} | } | ||
</pre> | </pre> | ||
You need to | |||
touch /var/log/clustercheck.log | |||
chown nobody /var/log/clustercheck.log | |||
So with the usual steps (<code>apt-get install xinetd; service xinetd restart</code>, etc) we have a webservice for our clustercheck script. | So with the usual steps (<code>apt-get install xinetd; service xinetd restart</code>, etc) we have a webservice for our clustercheck script. | ||
Line 126: | Line 131: | ||
=== HAproxy === | === HAproxy === | ||
The service can be configured for use with [[HAproxy]]. See the [[HAproxy| | The service can be configured for use with [[HAproxy]]. See the [[HAproxy#Configuration|configuration]] page for details. | ||
=== Keepalived === | === Keepalived === | ||
Line 132: | Line 137: | ||
The service can also be configured for use with [[Keepalived]]. See the the [[Keepalived#Keepalived_configuration_.28with_health_checks.29|Keepalived]] page for more information. | The service can also be configured for use with [[Keepalived]]. See the the [[Keepalived#Keepalived_configuration_.28with_health_checks.29|Keepalived]] page for more information. | ||
== The clustercheck | == The custom clustercheck script == | ||
<pre>#!/bin/bash | <pre>#!/bin/bash |
Latest revision as of 10:04, 8 November 2017
Custom clustercheck for Galera cluster
Introduction
When using Galera as clustered MySQL database for OX App Suite, a loadbalancer is required to transform Galera's notion of equivalent cluster nodes into OX's understanding of master and slave nodes (or writeUrl
and readUrl
). Typical choice is a round-robin fashion for the readUrl
and a persistent active-passive behavior for the writeUrl
. See the Galera setup page for more detailed information.
The loadbalancers need to be able to check the health status of the Galera cluster nodes in order to decide which nodes are available for read requests, and which node should be picked persistently for write requestes. The latter point is most important if we are considering a lot of distributed loadbalancers not synchronizing their target list with each other, as one of our proposed high level design options works (distributed HAproxy instances on the OX App Suite groupware nodes). It seems natural to leverage the mechanism also used by MariaDB Maxscale to define a master node: use the one with wsrep_local_index=0
.
Recent versions of the packages (both MariaDB and Percona) ship with a /usr/bin/clustercheck
script which has basically been designed for that task. However, the original version has some shortcomings, most noticeably it has been designed for and works only with HAproxy, but not with Keepalived; and it offers no support for the wsrep_local_index=0
feature discussed above. So, we decided to improve on that script.
Basic Installation and Testing
Copy-paste the script pasted below in a location on your Galera nodes where it will not be overwritten. We assume /usr/local/bin/clustercheck
for that purpose.
The script needs a user like
GRANT PROCESS ON *.* TO 'clustercheck'@'localhost' IDENTIFIED BY '3shyShynhut';
Make the script it executable and test:
# echo "GET / HTTP/1.0" | /usr/local/bin/clustercheck clustercheck 3shyShynhut HTTP/1.0 200 OK Content-Length: 40 Percona XtraDB Cluster Node is synced.
Note 1: The arguments in that sample call are the MySQL user and password. We will change the way this is wired later.
Note 2: The echo "GET / HTTP/1.0"
is actually kind of optional, but be aware that the script expects a (HTTP 1.0) request on standard input. That behavior is a change to the original clustercheck
script, but required for Keepalived, while still being compatible to HAproxy. So you could actually test also just by using echo "" | ...
. But you can change the behavior of the script by the URL passed, in particular by passing the URL /master
you can test for wsrep_local_index==0
:
# echo "GET /master HTTP/1.0" | /usr/local/bin/clustercheck clustercheck 3shyShynhut HTTP/1.0 200 OK Content-Length: 65 Percona XtraDB Cluster Node is synced and wsrep_local_index==0.
Or:
# echo "GET /master HTTP/1.0" | /usr/local/bin/clustercheck clustercheck 3shyShynhut HTTP/1.0 503 Service Unavailable Content-Length: 88 Percona XtraDB Cluster Node is not wsrep_local_index==0 and you requested master mode.
We actually recommend for security to configure some extra MySQL config file for the credentials like
# /usr/local/etc/clustercheck.my.cnf [client] user=clustercheck password=3shyShynhut
Invocation then goes like
# echo "GET /master HTTP/1.0" | /usr/local/bin/clustercheck -f /usr/local/etc/clustercheck.my.cnf
For further options see the script's usage info or the source code below. We want to emphasise and recommend something like
-e /var/log/clustercheck.log
to have logging e.g. to /var/log/clustercheck.log
. Logging is disabled by default, like in the original script.
Furthermore you want to think about and decide whether to use
-d Consider this node as available while being donor for a SST. Default: Donor node is considered unvailable. -r Consider this node as unavailable while being read-only. Default: read-only node is considered available.
Our scripts behaves by default like the original one.
Finally master mode can not only by toggled by the request path (see above), but also by a command line parameter.
-m Consider this as available only if it has got wsrep_local_index=0.
Installation as web service via xinetd
Some of the different upstream packages install a xinetd
service definition file /etc/xinetd.d/mysqlchk
. If you don't have one, install the one pasted below. We use / configure it for our custom service like
# default: on # description: mysqlchk service mysqlchk { # this is a config for xinetd, place it in /etc/xinetd.d/ disable = no flags = REUSE socket_type = stream type = UNLISTED port = 9200 wait = no user = nobody server = /usr/local/bin/clustercheck server_args = -e /var/log/clustercheck.log -f /usr/local/etc/clustercheck.my.cnf log_on_failure += USERID only_from = 0.0.0.0/0 # recommended to put the IPs that need # to connect exclusively (security purposes) per_source = UNLIMITED }
You need to
touch /var/log/clustercheck.log chown nobody /var/log/clustercheck.log
So with the usual steps (apt-get install xinetd; service xinetd restart
, etc) we have a webservice for our clustercheck script.
Note: Please ensure that you haven't set the max_load
parameter in the xinetd configuration. This parameter will lead to xinetd
not answering any request if the load increases above this value. So the system will be detected dead even though it actually isn't.
Final thing to do is to test this from the loadbalancer node (and adjust firewall configuration or whatever, if required).
# telnet db1 9200 Trying 10.0.0.1... Connected to db1. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.0 200 OK Content-Length: 40 Percona XtraDB Cluster Node is synced. Connection closed by foreign host.
Loadbalancer configuration
HAproxy
The service can be configured for use with HAproxy. See the configuration page for details.
Keepalived
The service can also be configured for use with Keepalived. See the the Keepalived page for more information.
The custom clustercheck script
#!/bin/bash # # Script to make a proxy (ie HAProxy) capable of monitoring Percona XtraDB Cluster nodes properly # # Authors: # Raghavendra Prabhu <raghavendra.prabhu@percona.com> # Olaf van Zandwijk <olaf.vanzandwijk@nedap.com> # # Based on the original script from Unai Rodriguez and Olaf (https://github.com/olafz/percona-clustercheck) # # Heavily rewritten and extended by Dominik Epple <dominik.epple@open-xchange.com> 2017-09 # # Grant privileges required: # GRANT PROCESS ON *.* TO 'clustercheck'@'localhost' IDENTIFIED BY '3shyShynhut'; # # Sample usage: # # echo "GET / HTTP/1.0" | /usr/local/bin/clustercheck clustercheck 3shyShynhut # HTTP/1.0 200 OK # Content-Length: 40 # # Percona XtraDB Cluster Node is synced. # AVAILABLE_WHEN_DONOR=0 ERR_FILE=/dev/null AVAILABLE_WHEN_READONLY=1 DEFAULTS_EXTRA_FILE="" DEFAULTS_FILE="" #Timeout exists for instances where mysqld may be hung TIMEOUT=10 MASTER_MODE=0 usage() { cat <<EOF usage: $0 [-h] show this usage text $0 [-e error_file] [-f defaults_file] [-F defaults_extra_file] [-t timeout_secs] [-d] [-r] [-m] [user [pass]] Perform clustercheck. Arguments are -e error_file File to log errors to. Default: /dev/null -f defaults_file Defaults file for MySQL client. Default: none Preferred way to pass credentials to the MySQL client. -F defaults_extra_file Extra defaults file for MySQL client. Default: none Kept for compatibilty to original clustercheck. -t timeout Timeout for the MySQL client in seconds. Default: 10 -d Consider this node as available while being donor for a SST. Default: Donor node is considered unvailable. -r Consider this node as unavailable while being read-only. Default: read-only node is considered available. -m Consider this as available only if it has got wsrep_local_index=0. Useful to define a "master" node. You can also toggle MASTER_MODE by using the request path "/master". Default: It is sufficient to be "Synced" for a node to be considered available. user, pass Credentials to connect to MySQL server to EOF } log_debug() { if [[ "$ERR_FILE" != "/dev/null" ]]; then # the following woulde give nanoseconds timestamps, but create extra processes, which I want to avoid in normal ops #echo "$(date --iso-8601=ns) $message" >> ${ERR_FILE} printf "%(%FT%T%z)T" -1 >> ${ERR_FILE} echo " $1" >> ${ERR_FILE} fi } output() { http_status=$1 message="$2" exit_status=$3 log_debug "sending \"$http_status\" \"$message\" to the client." length=${#message} length=$(( length + 2 )) echo -en "HTTP/1.0 $http_status\r\n" echo -en "Content-Length: $length\r\n" echo -en "\r\n" echo -en "$message\r\n" 1<&- exit $exit_status } while getopts "e:drf:t:mh" o; do case "${o}" in e) ERR_FILE=${OPTARG} ;; d) AVAILABLE_WHEN_DONOR=1 ;; r) AVAILABLE_WHEN_READONLY=0 ;; f) DEFAULTS_FILE=${OPTARG} ;; F) DEFAULTS_EXTRA_FILE=${OPTARG} ;; t) TIMEOUT=${OPTARG} ;; m) MASTER_MODE=1 ;; h) usage exit 0 ;; *) usage exit 1 ;; esac done shift $((OPTIND-1)) MYSQL_USERNAME="${1}" MYSQL_PASSWORD="${2}" EXTRA_ARGS="--connect-timeout=$TIMEOUT -B -N" if [[ -n "$MYSQL_USERNAME" ]]; then EXTRA_ARGS="$EXTRA_ARGS --user=${MYSQL_USERNAME}" fi if [[ -n "$MYSQL_PASSWORD" ]]; then EXTRA_ARGS="$EXTRA_ARGS --password=${MYSQL_PASSWORD}" fi if [[ -n "$DEFAULTS_FILE" ]]; then if [[ -r "$DEFAULTS_FILE" ]]; then # seems like it must be the first agrument EXTRA_ARGS="--defaults-file=$DEFAULTS_FILE $EXTRA_ARGS " else echo "$0: error: defaults file $DEFAULTS_FILE not readable." >&2 exit 1 fi fi if [[ -n "$DEFAULTS_EXTRA_FILE" ]]; then if [[ -r "$DEFAULTS_EXTRA_FILE" ]]; then # seems like it must be the first agrument EXTRA_ARGS="--defaults-extra-file=$DEFAULTS_EXTRA_FILE $EXTRA_ARGS " else echo "$0: error: defaults extra file $DEFAULTS_EXTRA_FILE not readable." >&2 exit 1 fi fi MYSQL_CMDLINE="mysql ${EXTRA_ARGS}" # irrelevant for haproxy, required for keepalived: try to read input log_debug "Reading HTTP request ..." while read line do # https://stackoverflow.com/questions/369758/how-to-trim-whitespace-from-a-bash-variable # remove trailing control characters # inner expression: truncate left everything until to the right only spaces are left -> is only right spaces # outer expression: truncate to the right the "right spaces" line="${line%"${line##*[![:cntrl:]]}"}" log_debug "Client sent: \"===$line===\"" if [[ -z "$line" ]]; then log_debug "Client sent empty line, breaking" break fi set -- $line # haproxy sends by default OPTIONS, keepalived sends GET if [[ ${1,,} = "get" || ${1,,} = "options" ]]; then if [[ ${2:0:7} = "/master" ]]; then log_debug "Upgrading to master mode as requrested by /master URL." MASTER_MODE=1 fi fi done log_debug "Done reading HTTP request." set -- log_debug "Calling MySQL..." mysql_output=$($MYSQL_CMDLINE -e 'SHOW GLOBAL STATUS WHERE Variable_name REGEXP "^(wsrep_local_state|wsrep_cluster_status|wsrep_local_index)$"; show global variables like "read_only";' 2>>${ERR_FILE} ) log_debug "MySQL output: ===$mysql_output===" set -- $mysql_output while [[ $# -gt 1 ]] do case "$1" in wsrep_local_state|wsrep_cluster_status|wsrep_local_index|read_only) declare $1="$2" shift shift ;; *) log_debug "unexpected output from MySQL: $1 $2" shift shift ;; esac done log_debug "After parsing: wsrep_local_state=$wsrep_local_state wsrep_cluster_status=$wsrep_cluster_status wsrep_local_index=$wsrep_local_index read_only=$read_only" if [[ "$wsrep_cluster_status" == 'Primary' && ( $wsrep_local_state -eq 4 || ( $wsrep_local_state -eq 2 && $AVAILABLE_WHEN_DONOR -eq 1 ) ) ]] then if [[ "${MASTER_MODE}" == 1 ]];then if [[ ${wsrep_local_index} -eq 0 ]];then output "200 OK" "Percona XtraDB Cluster Node is synced and wsrep_local_index==0." 0 else output "503 Service Unavailable" "Percona XtraDB Cluster Node is not wsrep_local_index==0 and you requested master mode." 1 fi fi if [[ "${read_only}" == "ON" && $AVAILABLE_WHEN_READONLY -eq 0 ]];then output "503 Service Unavailable" "Percona XtraDB Cluster Node is read_only and you requested AVAILABLE_WHEN_READONLY=0." 1 fi output "200 OK" "Percona XtraDB Cluster Node is synced." 0 else output "503 Service Unavailable" "Percona XtraDB Cluster Node is not synced or non-PRIM." 1 fi