Service Alert: zimbra002 issues earlier today

From Adrian:

An explanation is in order about the zimbra002 downtime this morning, so please bear with me while I try and explain.

The Zimbra servers have a backup mechanism that can only back up to a mounted drive, so either a local drive or an NFS mount.

Every night they do an incremental backup and once a week they do a full backup. The NFS server to which zimbra002 and zimbra003 (which never seems to be affected in the same way) are backing up to crashed in the night. It crashed in such a way that it was still responding to pings, but not NFS calls, so this wasn’t picked up by our monitoring systems.

When an NFS mount goes away, a Linux server treats this as a very bad thing, and basically waits around for it to come back again.

This tends to affect the process that is attempting to access the mounted drive, and other processes normally continue unaffected. Because Zimbra is a monolithic piece of Java code, the backup process causes it to misbehave.

However, it still responds on all the ports that we check with our monitoring systems – the problems occur after logging in, so once again not picked up by our monitoring systems.

While we are still uncertain of the causes of the backup server outage, we’re going to look at two approaches to avoid this problem recurring.

1) Start backing up zimbra002 to a different server.

2) Mount the disk, do the backup, then unmount it straight afterwards. That way, if the server has a problem, hopefully the mount will fail, and we’ll skip the backup rather than breaking the server.

Once again, apologies for all affected by this today, and hopefully this goes someway towards explaining things!

Service Alert: zimbra004 Reboot

The Zimbra server, zimbra004 has just crashed and is being rebooted. Please bear with us while we get it back into action and accept our apologies for any inconvenience caused by this.

UPDATE: Up and running again – may be slow for a few minutes as it catches up.

Service Alert: Zimbra Server Maintenance

We will be performing minor maintenance of the following servers on Thursday the 5th July, starting at 22:00:

zimbra002, zimbra003 and zimbra004

Downtime should be minimal.

Service Alert: Server Upgrade – zimbra002 – tomorrow (Wed 7th Feb)

Reminder

We will be upgrading our zimbra002 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Server Upgrade: zimbra003 – tomorrow (Wed 31st Jan)

Reminder

After being delayed for a week we will be upgrading our zimbra003 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Server upgrade – zimbra004 – tomorrow (Wed 17th)

Reminder

We will be upgrading our zimbra004 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Zimbra Upgrades

Important – Zimbra Upgrades

As part of our ongoing programme of service enhancements, we will be performing upgrade work on our Zimbra servers during the month of January.

This work will be undertaken outside of normal working hours (in the evening) and will involve a period of downtime for each server.

We will endeavour to minimise any disruption to service and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

The dates will be as follows – we will be sending reminders beforehand.

Wednesday 17th Jan: Zimbra004 – upgrade to Zimbra 8.7
Wednesday 31st Jan: Zimbra003 – upgrade to Zimbra 8.7
Wednesday 7th Feb: Zimbra002 – upgrade to Zimbra 8.7

Whilst Zimbra 8.8 has just been released, we want to monitor it over the coming months to ensure stability. Once satisfied that it will be reliable we will upgrade to that version.We haven’t forgotten Zimbra001 and will contact customers on this server about moving users in the New Year.

If you have any questions please let us know.

Service Alert: Major Proxy Server Issues

It looks like we’ve suffered from some sort of attack against our proxy servers, which has unfortunately resulted in the need to reboot them.

This process is underway at the moment, and we hope to have services restored shortly. We are currently blocking some POP3S services, as this is the source of the attack, we believe, and we will look to restore these as soon as we can.

Please accept our apologies and be assured we’re working hard to restore service and figure out how we can prevent a recurrence.

Update – 14:15

Everything should now be back up and running properly.

We’ve managed to patch some software to control the connections coming in on POP3 over SSL so that another attack on that port shouldn’t overload the servers.

We will, of course, continue to watch closely!

Service Alert: Zimbra004 Server – PSU Issue (Continued)

From approximately 7pm tomorrow evening (Wednesday 18th October) we will be performing a cold reboot of the server, as we perform further work to fix ongoing issues with the Power Supply Units (PSUs).

This will impact all users of Zimbra mailboxes on this server as there will be up to 10 minutes of downtime.

We will endeavour to keep downtime to an absolute minimum, but the work is required to ensure that zimbra004 remains a robust performer going forward.

Any inbound mail will be queued and everything should return to normal, without intervention, following the reboot.

Service Alert: Zimbra004 Server – PSU Change

From approximately 7pm this evening (or shortly thereafter) we will be swapping out a faulty Power Supply Unit (PSU) on our Zimbra004 server.

We do not expect any impact, or outage, to the service.