Just a quick note to say that this was done at approx 23:15, and all seems OK today, but we will continue monitoring closely for a while.
Further to our previous message, we are going to reboot the server mail001 at approx 11pm tonight as a precaution.
We expect total downtime of this server to be in the region of 10 minutes and will have an engineer on site to make sure everything goes smoothly.
Since 20:44 yesterday, we have experienced some intermittent POP3 issues on one of our back-end mail servers.
We’re not quite sure what is causing this, and suspect a software issue within the ZFS system.
We have (hopefully) put in some changes that will stop this recurring, but we may need to do an emergency reboot of the system at some point to fix it properly. This will unfortunately affect IMAP and SMTP forwarding users on that server, but shouldn’t take more than 10 minutes or so.
More information will come as and when we know.
Meanwhile, apologies to all those affected by this.
The SSL certificate for apm-internet.net was recently revoked with zero notice (it was due for renewal shortly), so we had to quickly replace it. This seems to have caused issues with iPhones and iPads complaining about the certificate, and is affecting users who are using domains that are not supporting their own SSL.
It may be that the devices in question need to add an exception to allow this (as the domain doesn’t match the domain the user is connecting to), as they’ve been running in this configuration for a while without complaining.
If that doesn’t work or isn’t an option, we would suggest using the generic name of mail.overssl.net – this works for POP3, IMAP and SMTP connections.
Please accept our apologies for any inconvenience caused by this.
We are aware that, when using the standard access URL, users are getting a blank screen, instead of the login page.
We believe this is a PHP issue and are currently investigating.
In the mean-time please use https://vwm.overssl.net/admin/ with your standard credentials to gain access to the Admin Portal.
UPDATE (13:40 22/03/19) This issue has now been resolved. The problem was traced to PHP version state and should not re-occur.
Currently experiencing some outbound sending issues via one of our Telco providers – customers are getting engaged tones. Raised with network provider – update soon.
Update: This was a Telco issue that has now been resolved.
An explanation is in order about the zimbra002 downtime this morning, so please bear with me while I try and explain.
The Zimbra servers have a backup mechanism that can only back up to a mounted drive, so either a local drive or an NFS mount.
Every night they do an incremental backup and once a week they do a full backup. The NFS server to which zimbra002 and zimbra003 (which never seems to be affected in the same way) are backing up to crashed in the night. It crashed in such a way that it was still responding to pings, but not NFS calls, so this wasn’t picked up by our monitoring systems.
When an NFS mount goes away, a Linux server treats this as a very bad thing, and basically waits around for it to come back again.
This tends to affect the process that is attempting to access the mounted drive, and other processes normally continue unaffected. Because Zimbra is a monolithic piece of Java code, the backup process causes it to misbehave.
However, it still responds on all the ports that we check with our monitoring systems – the problems occur after logging in, so once again not picked up by our monitoring systems.
While we are still uncertain of the causes of the backup server outage, we’re going to look at two approaches to avoid this problem recurring.
1) Start backing up zimbra002 to a different server.
2) Mount the disk, do the backup, then unmount it straight afterwards. That way, if the server has a problem, hopefully the mount will fail, and we’ll skip the backup rather than breaking the server.
Once again, apologies for all affected by this today, and hopefully this goes someway towards explaining things!
The Zimbra server, zimbra004 has just crashed and is being rebooted. Please bear with us while we get it back into action and accept our apologies for any inconvenience caused by this.
UPDATE: Up and running again – may be slow for a few minutes as it catches up.
We will be performing minor maintenance of the following servers on Thursday the 5th July, starting at 22:00:
zimbra002, zimbra003 and zimbra004
Downtime should be minimal.
We will be upgrading our zimbra002 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.
We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.