Service Alert: zimbra002 issues earlier today

From Adrian:

An explanation is in order about the zimbra002 downtime this morning, so please bear with me while I try and explain.

The Zimbra servers have a backup mechanism that can only back up to a mounted drive, so either a local drive or an NFS mount.

Every night they do an incremental backup and once a week they do a full backup. The NFS server to which zimbra002 and zimbra003 (which never seems to be affected in the same way) are backing up to crashed in the night. It crashed in such a way that it was still responding to pings, but not NFS calls, so this wasn’t picked up by our monitoring systems.

When an NFS mount goes away, a Linux server treats this as a very bad thing, and basically waits around for it to come back again.

This tends to affect the process that is attempting to access the mounted drive, and other processes normally continue unaffected. Because Zimbra is a monolithic piece of Java code, the backup process causes it to misbehave.

However, it still responds on all the ports that we check with our monitoring systems – the problems occur after logging in, so once again not picked up by our monitoring systems.

While we are still uncertain of the causes of the backup server outage, we’re going to look at two approaches to avoid this problem recurring.

1) Start backing up zimbra002 to a different server.

2) Mount the disk, do the backup, then unmount it straight afterwards. That way, if the server has a problem, hopefully the mount will fail, and we’ll skip the backup rather than breaking the server.

Once again, apologies for all affected by this today, and hopefully this goes someway towards explaining things!

Service Alert: zimbra004 Reboot

The Zimbra server, zimbra004 has just crashed and is being rebooted. Please bear with us while we get it back into action and accept our apologies for any inconvenience caused by this.

UPDATE: Up and running again – may be slow for a few minutes as it catches up.

Service Alert: Zimbra Server Maintenance

We will be performing minor maintenance of the following servers on Thursday the 5th July, starting at 22:00:

zimbra002, zimbra003 and zimbra004

Downtime should be minimal.

Service Alert: Server Upgrade – zimbra002 – tomorrow (Wed 7th Feb)

Reminder

We will be upgrading our zimbra002 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Server Upgrade: zimbra003 – tomorrow (Wed 31st Jan)

Reminder

After being delayed for a week we will be upgrading our zimbra003 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Server upgrade – zimbra004 – tomorrow (Wed 17th)

Reminder

We will be upgrading our zimbra004 server to Zimbra 8.7 tomorrow night, with work commencing at 10pm.

We will endeavour to minimise any disruption to service, and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

Service Alert: Zimbra Upgrades

Important – Zimbra Upgrades

As part of our ongoing programme of service enhancements, we will be performing upgrade work on our Zimbra servers during the month of January.

This work will be undertaken outside of normal working hours (in the evening) and will involve a period of downtime for each server.

We will endeavour to minimise any disruption to service and any inbound email received whilst each server is offline will be queued until the relevant server is live again.

The dates will be as follows – we will be sending reminders beforehand.

Wednesday 17th Jan: Zimbra004 – upgrade to Zimbra 8.7
Wednesday 31st Jan: Zimbra003 – upgrade to Zimbra 8.7
Wednesday 7th Feb: Zimbra002 – upgrade to Zimbra 8.7

Whilst Zimbra 8.8 has just been released, we want to monitor it over the coming months to ensure stability. Once satisfied that it will be reliable we will upgrade to that version.We haven’t forgotten Zimbra001 and will contact customers on this server about moving users in the New Year.

If you have any questions please let us know.

Service Alert: Zimbra004 Server – PSU Change

From approximately 7pm this evening (or shortly thereafter) we will be swapping out a faulty Power Supply Unit (PSU) on our Zimbra004 server.

We do not expect any impact, or outage, to the service.

Service Alert: Password Security

Password Security

We are seeing a significant rise in customers’ email accounts being used to distribute spam, having had their account credentials compromised.

In many cases this has been purely down to simple, fairly obvious, passwords being used. Would you believe that we have 276 mailboxes that use ‘password’ as their password? Or 73 with the password 123456?

It’s important for all our customers that we minimise the potential of our mail servers becoming blacklisted, so where patterns of outbound sending indicate a compromised mailbox and the distribution of spam, we will block the account from sending out any further email until the password is changed and a virus scan on the end users equipment performed if required.

This helps mitigate against blacklisting, but isn’t perfect by any means as it’s reactive in nature.

Whilst we can’t improve on the way we identify compromised mailboxes, we can improve the tools we give Partners to re-enable outbound SMTP immediately following a block.

Currently we send out an alert when an account is locked and rely on you contacting us to re-enable. Add to that, any blocking we’ve done has been at account level, rather than individual mailboxes, so one compromised mailbox can lead to outbound emails being blocked for the whole account.

As of Tuesday 27th June we are implementing a new process for dealing with compromised accounts:

  • Blocks can now be applied at individual mailbox level, rather than account level. This means that only the affected mailbox will be restricted from sending, rather than all users on that account.
  • Our systems will now automatically unlock any affected mailboxes once the user, or administrator, has changed the password.
  • Next Tuesday all mailboxes with passwords we deem to be easily compromised (for example, using part of the email address) will be blocked from sending outbound email until their password has been changed. This will only affect a relatively small proportion of our overall customer base, but needs to be implemented as the issue of compromised email boxes is on the rise.

We very much hope you’ll welcome the changes we’ve made which, as well as giving more control to customers in the event their email credentials are compromised, it also encourages everyone to think a little more seriously about the security of their email!

Under the hood: Spam Wars

In the ongoing war against the spammers, we have put a lot of effort over the last year or two in looking at the effectiveness of various methods, and thought it might be helpful to give a bit of a behind-the-scenes look at some of the lists and methodologies we use and their relative effectiveness. What these stats don’t show is the amount of false positives or the amount of spams that we miss as with our diverse user base it is impossible to measure these things accurately.

We try to be quite aggressive at detecting spams as the majority of our users make use of what we call auto-whitelisting, where anyone they send an email to automatically gets added to their whitelist and doesn’t get checked for spam in the future (well not at stage 2 anyway – see below).

The first stage of our spam blocking is the most aggressive, and most sensitive. If we have false positives here, we tend to find out about it because we reject the connections based on the IP address that is trying to connect to us.

I’ve included links below so you can investigate and find out more about any particular list.

Firstly, let’s look at connections to our servers. Taking a sample day, of Wednesday December 14th 2016, we received a total of 6,680,134 inbound SMTP connections. Here’s what we did with them.

Spamhaus Zen 5,403,727
Invaluement ivmSIP 236,211
Accepted 1,040,196

Of those 1,040,196 accepted connections, we received 1,008,901 individual emails. These were then broken up as follows:

Whitelisted 138,055
Blacklisted 3,513
Too Large to Scan 6,960
Not scanned (user not enabled anti-spam) 131,480
Scanned 728,893

So we now have a grand total of 728,893 emails to feed into our anti-spam servers. These run a piece of software called Spamassassin that looks for patterns in emails that mean they are probably spam and score them accordingly. Unfortunately, the spammers have access to this, and the good ones are very clever at making their spams not look like spam to a computer (though still obviously spam to a human), so we rely quite heavily on various blacklists to identify spam for us.

In the last couple of years, the spammers have become even more sophisticated and found ways to send out millions of spams before the blacklists are able to list them. The blacklists are fighting back, however, with new lists such as InstantRBL and faster listings (particularly good at URIBL).

Taking our sample day with 728,893 spams to be scanned, here is how many are caught by each different method/list employed. These stats show unique hits (so, for example, if something is caught by two lists, or one list and other Spamassassin rules, it won’t show up).

Spamhaus Zen 2,062
URIBL 8,137
Invaluement ivmSIP24 2,029
Invaluement ivmURI 10,194
Barracuda 9,109
InstantRBL 8,442
Protected Sky 5,468
Spamassassin other rules 32,007
Total caught 113,439

It’s hard to draw a pretty chart from all of this. However, here are the headline figures. 6,680,134 inbound connections, 891,949 emails delivered to inboxes, which represents 13% of the total. 131,480 of those didn’t get a chance to be scanned because our end user didn’t have the feature enabled.

So there you have it, it’s an ongoing battle, and the battleground keeps shifting. The spammers have access to all of the same tools that we do – that’s the nature of the internet, so they will keep trying to find new ways to beat the system, and we will keep trying to find new ways to stop them.