Update about service quality issues on 17 and 18 May


#1

On Wednesday and Thursday this week, a minority of Tide members experienced intermittent issues using the “banking” parts of their Tide app (viewing balances and initiating Faster Payments). The best remedy was simply to “wait a few minutes and try again” but we still consider this situation to be totally unacceptable, so I wanted to explain the detail of what happened.

Around 4am on 17th May we noticed an increased number of connection timeouts to our payments provider, PrePaid Solutions (PPS). Initial assessment of the situation showed that data was not transmitting correctly between two data centres - what engineers call “packet loss”.

Packet loss affects all online organisations occasionally and is generally resolved very quickly. However, this problem persisted so we raced round the clock to find solutions. Attempts to route traffic to PPS’s secondary systems proved unsuccessful, as did re-routing traffic via a secondary data-centre of Tide’s.

Finally a combination solution was found: PPS’s networking provider routed traffic through an alternative data network, and Tide moved our operations to utilise PPS’s secondary systems. Together we were able to re-establish stable connectivity.

What can we learn from this? In truth, neither Tide nor PPS’s systems “failed”, and both organisations moved quickly to remedy a problem we did not cause. Nevertheless it is clear that we can and should seek to have backup data transit processes in place for faster switching of networks should it be necessary. We’re now talking with PPS with the intention of setting that up.

Tide has successfully relied on PPS for banking services for nearly a year and we’ve never experienced an issue like this before. Fortunately the impact was modest but I want to be clear to our members that we take this very seriously and will do everything we can to ensure it can’t happen again.

Matt Wilson, CTO


#2

Thanks Matt hope you and team get some sleep over the weekend.

As a user this was absolutely terrifying, im new to the platform and all of my funds were held for 24 hours. Communication wasn’t ideal especially with a nerve racking wait for a response from the team (which when received was good but without difinative time scales to fix). I know tide is a internet bank but I really think a phone number would help in this scenario to be able to talk to a real person.

As a digital agency owner I am aghast that you had a critical pipeline without a backup to start with, but I dare say you won’t make this mistake again.

I highly recommend at this stage to bring in a 3rd party to take your architecture apart. I’m not in the least bit suggesting there is another issue but when you have a third perspective it’s incredibly useful and will highlight potential flaws without them becoming a problem as you scale

Thankyou for fixing

Ru


#3

I agree with @Ru1
An independent third party audit will improve upon current design and architecture.

Further more, I also recommend introducing a secondary web-based login mechanism, probably backed by 2-factor authorisation or something similar.

Relying on a single app based login is not redundant in case if it crashes, catches bugs or suffers malleable downtime.


#4

Thanks for your comments, we aim to be completely open and honest with our members, and we appreciate genuine and honest feedback in return.

To clarify - we do have backup systems, and many different data centres available to us all around the world. However, as mentioned in my post this was due to a fault at a major network connecting our independent systems. When a problem like this arises the only option available is to work around the clock to resolve the problem as best you can - in this case to attempt to re-route the traffic around the problem.

Thanks @0xjd for your suggestion of a secondary login process, we’ll certainly take it on board: although in this instance it wouldn’t have alleviated any of the problems we were seeing I can see the argument for offering one.

M