New research project: Architecture and Protocols for Massively Scalable IP Media Streaming

In previous work, we have analysed repair and monitoring mechanisms for scalable media distribution using RTP over multicast UDP/IP. This project takes one step back to consider the bigger picture of what is an appropriate protocol to use for large-scale, scalable, IPTV content distribution.

We observe various flavours of IPTV gaining relevance in today's Internet. In the first instance, many Internet Service Providers (ISPs) are starting to deploy managed IPTV services to their customers. These generally use IP multicast, with a single media source and multiple local repair nodes to support large numbers of simultaneous clients with a service equivalent to traditional cable TV. This is a scalable solution, that requires only a limited investment in server infrastructure by the ISP, but a potentially large investment to enable multicast support in their network. Secondly, we see a proliferation of unmanaged services, operated by third parties over-the-top of the ISP network. These are generally built using unicast TCP (often HTTP) connections, either employing a single server connection per client (e.g., YouTube; BBC iPlayer), or connections from the client to multiple servers (e.g., Apple's live streaming). These require a significant investment in server infrastructure and bandwidth for the media source to support the large numbers of video streams required, but promise to work over the existing edge ISP networks. Lastly, we observe that peer-to-peer overlays (e.g., PPLive) are beginning to find favour in some markets, trading reduced quality for lower infrastructure costs. A general trend of media streaming shifting towards TCP- and HTTP-based streaming can be observed, thus better aligning streaming with other web applications.

Considering these applications, we distinguish four axes of variation: 1) over-the-top or managed services; 2) UDP- based or TCP-based transport; 3) native multicast, overlay multicast, or multi-unicast as the media distribution model; and 4) single or multiple transmission sources. In the present proposal, we will investigate the nature of the transport protocol and distribution model for IPTV systems, considering the optimal trade-off for services that are scalable while still supporting trick-play. Specifically, we consider novel approaches to enhancing TCP streaming, as compared with managed IP multicast services.

The proposed research comprises two tasks. In the initial task we will evaluate different transport protocols and distribution models for IPTV systems, to understand their applicability and constraints, and make recommendations for their use. Thereafter, we propose to develop enhancements to TCP-based streaming to bring its scalability closer to that of multicast IPTV systems.

Transport Protocols and Distribution Models for IPTV: The choice of transport protocol, media distribution model, and number of media sources is key to the scalability and performance of IPTV systems. In this first task, we will investigate the performance of the different approaches to IPTV that are in wide use today: unicast via a single TCP connection pushing content from server to client; unicast via multiple TCP connections where each connection is used by the client to pull different chunks of the media stream from one or more servers; and UDP multicast from a single server to multiple clients, with optional in-network local repair nodes. We will evaluate performance of these approaches according to the following criteria:

Evaluation will proceed using a mixture of analysis and simulation, and will be conducted with various combinations of media format (bit rate and codec), encoding parameters (chunk size for rate adaptation), receiver population (size, distribution), and network environment (loss rate, bandwidth, cross traffic).

Expected outcomes are an understanding of the limits of applicability of the various approaches to IPTV. Previous work has mapped out some of the constraints on the applicability of TCP for live streaming on a single connection), but there has been no systematic study of the problem space. This is especially critical in the important area of rate adaptation using client-pull over TCP from multiple servers, where the client measures its download speed, and runs a receiver-driven rate adaptation algorithm to determine the size of the next chunk it will download, since there has been no comparison with the performance that can be achieved by a single server that knows the TCP window and which packets have been ACKed, and so has good knowledge of the congestion control dynamics; and can avoid the handshake to request the next chunk.

Scaling TCP streaming: While CDNs paired with distributed server farms may offer scaling of TCP streaming to some extent, a significant load is still imposed on the server nodes and the network paths when scaling to millions of users. This is appropriate for on-demand and near-live streaming, where individual control is needed. However, for live content, i.e., content being viewed by all viewers at the same time—server farms do not offer benefits over individual multicast delivery. On the contrary, servers need to be placed to support worst-case demand, and thus need to move close to the viewer's home to keep the fan-out degree per server under control, so raising costs.

In this task, we will investigate how the network infrastructure can support user demand at massive scale for TCP streaming to achieve performance approaching UDP-based multicast solutions. We will discuss the various design alternatives and devise simple router optimisations that can be incrementally deployed in the network infrastructure. In particular, we will show how to borrow ideas from publish/subscribe networks (as content-centric networks have recently been designed) and peer-to-peer overlays that feature (opportunistic, incremental) caching, and transparently retrofit them to the present Internet architecture. We will especially consider the option of adding short-term caching functionality to selected routers. A key goal, where our initial ideas show promise, will be to simultaneously maintain low latency and compatibility with the TCP protocol and its congestion response, at low enough complexity for middlebox implementation. As a related important research problem, we seek to understand the number of such middleboxes required and their topological relationship (e.g., placement inside a provider network) to support a given subscriber number and interest distribution for real-time live streaming and explore the relationship between cache size, demand diversity, and maximum time lag between viewers.

This project is a collaboration between Aalto University (Jörg Ott) and the University of Glasgow (Colin Perkins). Funding is provided by Cisco Systems.

Opinions expressed are my own, and do not represent those of my employers or the organisations that fund my research.