On Wednesday and Thursday this week, a minority of Tide members experienced intermittent issues using the “banking” parts of their Tide app (viewing balances and initiating Faster Payments). The best remedy was simply to “wait a few minutes and try again” but we still consider this situation to be totally unacceptable, so I wanted to explain the detail of what happened.
Around 4am on 17th May we noticed an increased number of connection timeouts to our payments provider, PrePaid Solutions (PPS). Initial assessment of the situation showed that data was not transmitting correctly between two data centres - what engineers call “packet loss”.
Packet loss affects all online organisations occasionally and is generally resolved very quickly. However, this problem persisted so we raced round the clock to find solutions. Attempts to route traffic to PPS’s secondary systems proved unsuccessful, as did re-routing traffic via a secondary data-centre of Tide’s.
Finally a combination solution was found: PPS’s networking provider routed traffic through an alternative data network, and Tide moved our operations to utilise PPS’s secondary systems. Together we were able to re-establish stable connectivity.
What can we learn from this? In truth, neither Tide nor PPS’s systems “failed”, and both organisations moved quickly to remedy a problem we did not cause. Nevertheless it is clear that we can and should seek to have backup data transit processes in place for faster switching of networks should it be necessary. We’re now talking with PPS with the intention of setting that up.
Tide has successfully relied on PPS for banking services for nearly a year and we’ve never experienced an issue like this before. Fortunately the impact was modest but I want to be clear to our members that we take this very seriously and will do everything we can to ensure it can’t happen again.
Matt Wilson, CTO