When developing an application that connects to Twilio it is important to provide multiple paths of communication between the Twilio Cloud and the Enterprise infrastructure in the event the Enterprise in unreachable. This could be due to scheduled maintenance, network outage, or some other unforeseen event.

Most Enterprise data centers are built with resiliency in mind. It is common to have at least two data centers with redundant network connectivity using multiple carriers for MPLS, SIP Trunking, and Internet. Moving up the stack will be redundant server clusters, storage, and at the top of the stack redundancy within applications. This solves reachability originated by the applications in the Enterprise to the outside world.

With multiple Service Providers and redundant data centers one would suspect this may be enough for maximum service uptime. However, if the application itself breaks or the data centers are appear unreachable due to network related BGP route flaps this would prohibit Twilio from connecting to the application and the Webhook transaction will be canceled.

In the Twilio Console click on a phone number to access its dashboard. By default the section titled A Call Comes In defaults to Webhook which is the currently desired behavior. This means if Twilio receives a Voice call or SMS to the phone number it will attempt to connect to the Enterprise application using a Webhook to the provided URL.

In the above screenshots note the field Primary Handler Fails. This allows a secondary URL for the client application. When Twilio is unable successfully connect to the primary URL either due to a timeout or an HTTP error the Twilio platform will attempt to connect to the secondary URL.

It is common practice for Enterprises to host backup infrastructure and services in a secondary data center. Before cloud platforms existed this was the most acceptable approach where Data Center A is active and Data Center B support either either an Active-Standby model or Active-Active model. A more optimal solution and recommended best practice which provides greater resiliency is to clusteri the application platform within the redundant data centers and have the cluster represent the Primary URL. By hosting the secondary URL and server on a third party cloud platform provides true separation between what is considered Primary vs. Secondary.

As seen above Twilio’s “Supernetwork” provides multiple Tier 1 connections globally which reduces the risk of application communication failing when deployed on large cloud provider platforms. The Supernetwork also handles peering to carriers for delivery of SMS and connecting Voice calls.

Below is an example of using AWS as the secondary URL platform for hosting the Enterprise application.

When the Enterprise data centers are unreachable Twilio will connect to the backup URLs   associated with SMS and Voice for the phone number.

If Webhook connections are failing to the primary URL there should be a method to provide alerts. Configuring Alert Triggers under Runtime in the Twilio Console will send email notifications when connection fail to the primary URL. Triggers may be customized.

If connections are unexpectedly failing to the primary URL the first step that should be taken is debugging.

There are currently no comments.