Part 68: Global availability monitoring via a single home router

Cover image

My blog has global audience, so would be nice to know how well the site itself is reachable from different points of this planet. In this blog post, I'll setup a very different solution with Zabbix to do that. No remote proxies or agents, just a single home router and a VPN service. 

Also, in this story, I probably found out a small bug from Zabbix web scenarios, or how Zabbix or its agents behaves with them. I will let the Zabbix team to decide.

Could I just use Grafana Cloud?

I have my Grafana Cloud Forever Free instance, and that comes with synthetic monitoring. 

Grafana Cloud checks

However, Grafana Cloud is way too aggressive with its synthetic HTTP checks for a small, humble blog like mine. For out-of-curiosity global availability checks, I don't want it to pummel my site every two minutes from multiple locations around the world. 

Grafana Cloud HTTP checks settings

See, even if I would monitor my site just from six different locations, due to strange "You cannot make check frequency longer than two minutes" limitation this would lead to roughly 130,000 HTTP requests to my site per month. I don't want that. For my needs, once per hour would be enough just to get an overall idea over a longer period of time. That rate would only add roughly 4-5k HTTP requests per month. Much better.

Well, you know what I'm about to say -- Zabbix to the rescue!

But... how... what?

I have been a paying customer of Proton for many years now; my e-mail, VPN connections, cloud storage, calendar. As a paying customer, I can have maximum of ten VPN clients, and I only use few. Why not to use the free slots for something useful? Let's build my own version of Grafana Cloud probes, but with Zabbix, all via my own home router and Proton VPN. 

This will not be the same than running my own dedicated Zabbix agents on cloud around the world, but gives a good estimate anyway, and that's all I care about for now. Usually in your corporate environment, of course, you would have your Zabbix proxies and agents deployed around the globe doing this task, but I don't have that luxury. In this first blog entry, I'll concentrate on HTTP testing; expect to see DNS monitoring, traceroute and ping tests coming up next.

Now, let's begin.

Fetch the Proton VPN OpenVPN configuration files

I went to my Proton VPN account settings and downloaded OpenVPN router profiles for different countries. For this demo purpose, I chose Australia, Brazil, Latvia (I'm sure you won't guess why :D), South Africa and USA.

Proton VPN profiles download

Setup the VPN tunnels

To import the OpenVPN profiles, I logged into my Asus router, went to VPN --> VPN Client and uploaded the profiles to each free slot.

OpenVPN settings

Additionally, I chose the profile Internet access to be controlled by VPN Director.

VPN Director policy rules

My home router running on Asuswrt-Merlin firmware supports multiple simultaneous VPN client connections. In other words, I can configure different devices connected to my router to connect to different VPN tunnels, and after repeating the above for each country, I got this view.

Asuswrt-Merlin VPN connection

Through the router's command line vpnmon utility we can observe that the tunnels really are up and connected to countries I expected them to connect:

vpnmon

We still need some rules for how and when to redirect the traffic to some tunnel. For that, few VPN Director rules are enough: if connecting client has IP this and that, then go there.

VPN Director rules

Setup virtual Ethernet

My home router is configured to handle the client connections over DHCP (and for some devices I have them configured the address to stay the same via Asuswrt-Merlin admin interface). 

One physical NIC can only get one IP address from a DHCP server, but a creative hack to quickly get more IP addresses over DHCP is to setup some virtual Ethernet magic on my Raspberry Pi for each of the country instance I need. Be ready to wash and rinse your eyes, the following is not pretty but works for my PoC.

#!/bin/bash
# Create virtual NICs with completely wacky MAC addresses
sudo ip link add link eth0 address 00:11:22:33:44:55 virtbraz0 type macvlan
sudo ip link add link eth0 address 00:11:22:33:44:56 virtusa0 type macvlan
sudo ip link add link eth0 address 00:11:22:33:44:57 virtaus0 type macvlan
sudo ip link add link eth0 address 00:11:22:33:44:58 virtsoutha0 type macvlan
sudo ip link add link eth0 address 00:11:22:33:44:59 virtlatv0 type macvlan
# Bring these virtual interfaces up
sudo ip link set virtbraz0 up
sudo ip link set virtusa0 up
sudo ip link set virtaus0 up
sudo ip link set virtsoutha0 up
sudo ip link set virtlatv0 up
# Poke the DHCP server for these interfaces to get an IP
sudo dhclient virtbraz0
sudo dhclient virtusa0
sudo dhclient virtaus0
sudo dhclient virtsoutha0
sudo dhclient virtlatv0

And here is the output from ip a.

4: virtbraz0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
   inet 192.168.50.242/24 brd 192.168.50.255 scope global dynamic virtual0
      valid_lft 83919sec preferred_lft 83919sec
   inet6 fe80::211:22ff:fe33:4455/64 scope link 
      valid_lft forever preferred_lft forever
5: virtusa0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   link/ether 00:11:22:33:44:56 brd ff:ff:ff:ff:ff:ff
   inet 192.168.50.243/24 brd 192.168.50.255 scope global dynamic virtusa0
      valid_lft 86247sec preferred_lft 86247sec
   inet6 fe80::211:22ff:fe33:4456/64 scope link 
      valid_lft forever preferred_lft forever
6: virtaus0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   link/ether 00:11:22:33:44:57 brd ff:ff:ff:ff:ff:ff
   inet 192.168.50.244/24 brd 192.168.50.255 scope global dynamic virtaus0
      valid_lft 83498sec preferred_lft 83498sec
   inet6 fe80::211:22ff:fe33:4457/64 scope link 
      valid_lft forever preferred_lft forever
7: virtsoutha0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   link/ether 00:11:22:33:44:58 brd ff:ff:ff:ff:ff:ff
   inet 192.168.50.245/24 brd 192.168.50.255 scope global dynamic virtsoutha0
      valid_lft 81184sec preferred_lft 81184sec
   inet6 fe80::211:22ff:fe33:4458/64 scope link 
      valid_lft forever preferred_lft forever
8: virtlatv0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   link/ether 00:11:22:33:44:59 brd ff:ff:ff:ff:ff:ff
   inet 192.168.50.246/24 brd 192.168.50.255 scope global dynamic virtlatv0
      valid_lft 81162sec preferred_lft 81162sec
   inet6 fe80::211:22ff:fe33:4459/64 scope link 
      valid_lft forever preferred_lft forever

Setup Zabbix agents

Zabbix agent is extremely light for its system resource usage -- so light, in fact, that spawning five extra instances of it on my already crowded Raspberry Pi 4 won't hurt at all. To not add any unnecessary resource usage, I'm going to run the agents on top of host OS instead of Docker or similar. 

First, I copied my existing Zabbix Agent 2 config file to some new names:

cd /etc/zabbix
sudo cp zabbix_agent2.conf zabbix_agent2-protonvpn-australia.conf
sudo cp zabbix_agent2.conf zabbix_agent2-protonvpn-brazil.conf
sudo cp zabbix_agent2.conf zabbix_agent2-protonvpn-latvia.conf
sudo cp zabbix_agent2.conf zabbix_agent2-protonvpn-southafrica.conf
sudo cp zabbix_agent2.conf zabbix_agent2-protonvpn-usa.conf

And then just modified few lines from each of the files to match with my new IP addresses and to give each agent instance its own unique PID, control socket and log file names. Relevant lines in config are, using my new Proton VPN Brazil config in comparison against my usual Zabbix Agent 2 config file:

sudo diff -U0 zabbix_agent2.conf zabbix_agent2-protonvpn-brazil.conf 
--- zabbix_agent2.conf  2024-03-11 22:10:26.840812121 +0200
+++ zabbix_agent2-protonvpn-brazil.conf 2024-03-12 08:19:47.557501129 +0200
@@ -13 +13 @@
-PidFile=/var/run/zabbix/zabbix_agent2.pid
+PidFile=/var/run/zabbix/zabbix_agent2-brazil.pid
@@ -32 +32 @@
-LogFile=/var/log/zabbix/zabbix_agent2.log
+LogFile=/var/log/zabbix/zabbix_agent2-brazil.log
@@ -64 +64 @@
-# SourceIP=
+SourceIP=192.168.50.242
@@ -80 +80 @@
-Server=127.0.0.1
+Server=192.168.50.132,127.0.0.1,192.168.50.242
@@ -96 +96 @@
-ListenIP=127.0.0.1,192.168.50.132
+ListenIP=192.168.50.242
@@ -133 +133 @@
-ServerActive=127.0.0.1
+ServerActive=192.168.50.132
@@ -144 +144 @@
-Hostname=Zabbix server
+Hostname=blogprobe-brazil
@@ -312 +312 @@
-PluginSocket=/run/zabbix/agent.plugin.sock
+PluginSocket=/run/zabbix/agent-br.plugin.sock
@@ -355 +355 @@
-ControlSocket=/run/zabbix/agent.sock
+ControlSocket=/run/zabbix/agent-br.sock

Next, let's just quickly start the agents: 

#!/bin/bash
sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2-protonvpn-australia.conf &
sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2-protonvpn-brazil.conf &
sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2-protonvpn-latvia.conf &
sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2-protonvpn-southafrica.conf &
sudo /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2-protonvpn-usa.conf &

and see they are really there:

sudo netstat -ltnp | grep 10050
tcp        0      0 192.168.50.245:10050    0.0.0.0:*               LISTEN      2097050/zabbix_agen 
tcp        0      0 192.168.50.244:10050    0.0.0.0:*               LISTEN      2097133/zabbix_agen 
tcp        0      0 192.168.50.242:10050    0.0.0.0:*               LISTEN      2097304/zabbix_agen 
tcp        0      0 192.168.50.246:10050    0.0.0.0:*               LISTEN      2096956/zabbix_agen 
tcp        0      0 127.0.0.1:10050         0.0.0.0:*               LISTEN      1968114/zabbix_agen 
tcp        0      0 192.168.50.243:10050    0.0.0.0:*               LISTEN      2097216/zabbix_agen 
tcp        0      0 192.168.50.132:10050    0.0.0.0:*               LISTEN      1968114/zabbix_agen 

Yep, looking good! But we still need to add these hosts to Zabbix.

Add a new Zabbix template

For now, my template is as simple as it possibly can be with just a single web scenario. 

Template

With the web scenario itself being just a single step. A single HTTP request, that's some real synthetic monitoring for you! Maybe I should add my Selenium tests to this new addition, too.

Web scenario

Add some hosts

I added a new host for each of the countries.

New hosts

Nothing magical in these hosts, other than that the agent on each is bound to these new IP addresses we created earlier and each of the hosts has our new template (plus one we will create soon) assigned for them.

Host config

 

Add HTTPS port connection time

(Please note that I left this section here, as I did write this before I noticed a possible bug in Zabbix web scenarios. Read on to understand more. I'm willing to hear your opinions. The following what I write about web scenarios might not actually be true.)

Zabbix web scenario response time measures how long it took from sending the HTTP request to receiving the response: when it sends the request, the connection to web service has been already established. Or, based on the response times I get, that's my assumption. 

To separately measure how long it takes to connect to https port, it's time to create another simple template. It has this single item on it.

HTTPS port connection time

Or, in more detail:

Detailed item view

That is really all. At this point, I did not add any triggers, as I'm trying to avoid alert noise and there's not much I could or should do if my blog ain't working via some random Proton VPN connection. (As a plan though, I could setup another sub-site here called status.whatsuphome.fi and make it look like the usual status pages provided by any Big Corp. That would be fun.)

Let's create a dashboard and verify this is working

Adding some standard graph items, we can see that this setup is working. I created the web scenarios last night and the port connection times early this morning, and we seem to have some good results already, indicating a slow moment from South Africa and Latvia. The ~3am dip & HTTP 500 errors were real, as my regular blog monitoring also indicated that the site was not feeling well around that time -- the web hotel where this is running on had some issues that lasted about 40 minutes in total.

Response timeDownload speedResponse code

For port connection time tests, I don't yet have much data but here it's also working, and you can see how the initial connection part is actually much slower than the request itself.

Port connection test time

To see how my tunnels are doing, I added the Operational status details of my tunnels, now that Zabbix automatically discovered the newly added virtual ethernet interfaces on my Raspberry Pi. Nice to see these and see alerts on dashboards, and only on dashboards - I will be not receiving e-mail alerts about possible Proton VPN issues.

VPN tunnels

UPDATE even before I published this blog post

You know, many times I type these blog posts whilst I build something. This is one of those cases, and this is where I probably found a bug from Zabbix Agent(2?) web scenario behaviour, or at least something related to web scenarios. 

I started to wonder the similarities in web page performance, and when I added the per-VPN tunnel traffic graphs to my dashboard, I realized that Zabbix web scenarios are not built with this kind of shenanigans in mind with six Zabbix agents and multiple IP addresses on a single machine. See, the tunnel traffic is going through a single tunnel. Or was.

VPN traffic graphs

That's not good. So, I went back to my custom template and added another item, Zabbix agent web.page.perf.

Web page performance

And it seems to make sure the traffic goes through the IP address that is defined for the particular host agent instance. Compare these two graphs. The latter seems more logical when it comes to request times. 

Probably my earlier talk about how I think the web scenario response time processing would be measured from the point when the request was sent after an successfully established HTTP connection to the point when the response comes back could be just guessing and plain wrong. I don't know.

Response time (web scenario)

 

Response time (web.page.perf)

I created a ZBX ticket about this, but for now, if you plan doing similar weird testing, please note that at least on Zabbix 7.0beta1 the web scenarios might misbehave if you have too many Zabbix agents around. My gut feeling is that the web scenarios perform their web requests over the primary network interface on the machine and not the one that is defined for the agent SourceIP config parameter...

Anyway, at least with web.page.perf you will get the desired outcome with weird scenarios like this. I now kind of know how fast or slow it is to check my blog from all around the world. 

Next time we'll do some tracerouting. Until then!

Comments

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
Buy me a coffee

Like these posts? Support the project and Buy me a coffee