Monday, January 21, 2008

A fail-safe architecture for Voice on the Internet

Innovation on the internet has extremely low barriers to entry. If you know LAMP very well, you have a basic start. Familiarity with a few extra tools for web site creation should be enough. You can have content or other applications on your site. However, if you are thinking of providing voice or any other real-time application intergrated on your site, you need to stack up on your mountaineering gear.

Voice can come in different forms. Let look at a few very different examples: Skype, Jajah, Jaxtr provide some basic coverage. Skype is the only one which offers PC to PC calling I believe and for that it requires you to download a special client. The client has lots of specialized codecs embedded inside it which utilize the PC Operating system as a phone and require the use of handsets and other devices. Jajah and Jaxtr on the other hand are more "hands-off" applications of voice. They don't use the PC for the voice and instead rely on existing handsets or mobile phones. The web or internet part comes when you use their widgets or web interfaces to configure or initiate the whole calling process.

On another end are applications which utilize the Adobe Media Server framework for voice processing using Adobe flash. Flash is quite unique since it utilizes the browser as a place to embed itself and do voice processing. There are some other services which use ActiveX instead of flash to do similar things.

Note that almost all services offer other forms of browser integration like browser based calling, monitoring etc.

There are a lot of pros and cons of all three methods when it comes to the functionality and utility they deliver and the cost they have to operate the network. The ultimate web based voice application really needs to utilize the browser and should be standards based like the rest of the internet is. And there are issues there as well...

Lets dive into some details.

Skype's infrastructure utilizes P2P. Comparable services from Google and Yahoo! utilize more network infrastructure, but I believe they have costs in the same ballpark as long as the are not routing voice through their own network (which I am guessing they don't do). Passing real-time voice through your own network can impose significant costs to manage and deliver the quality and can require a lot of staffing on your side. Its really easy to get a VoIP infrastructure up and running without having to actually route the real-time voice on your own and just hand it over to more capable and mature partners. We will get into that in this blog (maybe not in this post!).

Note that services like Jajah and Jaxtr can easily be erected by deploying VoIP servers like Asterisk. Since they actually bring the voice call off the internet, the costs of doing these hand-offs can be extremely high on a per minute basis (since that's how third parties called carriers will charge you for putting your internet call over to a mobile or landline phone). The only problem with any Asterisk based system is its high cost of ownership.

Solutions based on Adobe Media Servers are very high utility to the user, but extremely expensive to operate and maintain. The voice media gets routed through a centralized Adobe server in all cases (for no reason at all!) which becomes a huge scalability bottleneck. Adobe servers are designed for playing static media and my guess is Adobe never actually intended to use them for making normal run of the mill voice calls for which they become very inefficient. However flash is still the king of the land and possibly the only browser based voice solution which does not require any installation or software downloads! Its voice codec is proprietary (NellyMoser) as well. Good news is that its not illegal to reverse engineer Flash Server based on observing how Flash client works. Some people have done it and the result is the red5 Flash open source project.

No comments: