Internet Engineering Task Force AVT WG INTERNET-DRAFT Ladan Gharai draft-ietf-avt-uncomp-video-02.txt Colin Perkins USC/ISI 27 February 2003 Expires: August 2003 RTP Payload Format for Uncompressed Video Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo specifies a packetization scheme for encapsulating uncompressed video into a payload format for the Real-time Transport Protocol, RTP. It supports a range of standard- and high-definition video formats, including common television formats such as ITU BT.601, SMPTE 274M and SMPTE 296M. The format is designed to be extensible as new video formats are developed. Gharai/Perkins [Page 1] INTERNET-DRAFT Expires: August 2003 February 2003 1. Introduction [Note to RFC Editor: All references to RFC XXXX are to be replaced with the RFC number of this memo, when published] This memo defines a scheme to packetize uncompressed, studio-quality, video streams for transport using RTP [RTP]. It supports a range of standard and high definition video formats, including ITU-R BT.601 [601], SMPTE 274M [274] and SMPTE 296M [296]. Formats for uncompressed standard definition television are defined by ITU Recommendation BT.601 [601] along with bit-serial and parallel interfaces in Recommendation BT.656 [656]. These formats allow both 625 line and 525 line operation, with 720 samples per digital active line, 4:2:2 color sub-sampling, and 8- or 10-bit digital representation. The representation of uncompressed high definition television is specified in SMPTE standards 274M [274] and 296M [296]. SMPTE 274M defines a family of scanning systems with an image format of 1920x1080 pixels with progressive and interlaced scanning, while SMPTE 296M standard defines systems with an image size of 1280x720 pixels and only progressive scanning. In progressive scanning, scan lines are displayed in sequence from top to bottom of a full frame. In interlaced scanning, a frame is divided into its odd and even scan lines (called fields) and the two fields are displayed in succession. SMPTE 274M and 296M define images with aspect ratios of 16:9, and define the digital representation for RGB and YCbCr components. In the case of YCbCr components, the Cb and Cr components are horizontally sub-sampled by a factor of two (4:2:2 color encoding). Although these formats differ in their details, they are structurally very similar. This memo specifies a payload format to encapsulate these, and other similar, video formats for transport within RTP. 2. Conventions Used in this Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[2119]. 3. Payload Design Each scan line of digital video is packetized into one or more (depending on the network MTU) RTP packets. A single RTP packet MAY also contain data for more than one scan line. Only the active samples are Gharai/Perkins [Page 2] INTERNET-DRAFT Expires: August 2003 February 2003 included in the RTP payload, inactive samples and the contents of horizontal and vertical blanking SHOULD NOT be transported. Scan line numbers are included in the RTP payload header, along with a field identifier for interlaced video. For SMPTE 296M format video, valid scan line numbers are from 26 through 745, inclusive. For progressive scan SMPTE 274M format video, valid scan lines are from scan line 42 through 1121 inclusive. For interlaced scan, valid scan line numbers for field one (F=0) are from 21 to 560 and valid scan line numbers for the second field (F=1) are from 584 to 1123. For ITU-R BT.601 format video, the blanking intervals defined in BT.656 are used: for 625 line video, lines 24 to 310 of field one (F=0) and 337 to 623 of the second field (F=1) are valid; for 525 line video, lines 21 to 263 of the first field, and 284 to 525 of the second field are valid. Other formats (e.g. [372]) may define different ranges of active lines. The payload header contains a 16 bit extension to the standard 16 bit RTP sequence number, thereby extending the sequence number to 32 bits and enabling RTP to accommodate high data rates. This is necessary as the 16 bit RTP sequence number will roll-over very quickly for high data rates. For example, for a 1 Gbps video stream with packet sizes of at least one thousand octets, the standard RTP packet will roll-over in 0.5 seconds, which can be a problem for detecting loss and out of order packets particularly in instances where the round trip time is greater than half a second. The extended 32 bit number allows for a longer wrap- around time of approximately nine hours. It is desirable for the video to be both octet aligned when packetized, and to adhere to the principles of application level framing [ALF] by ensuring that the samples relating to a single pixel are not fragmented across two packets. Samples may be transfered as 8, 10, 12 or 16 bit values. For 10 bit and 12 bit payloads, care must be taken to pack an appropriate number of samples per packet, such that the payload is also octet aligned. For RGB video, it is desirable that the samples corresponding to a single pixel are not fragmented across packets. Similarly, for YCrCb video, it is desirable that luminance and chrominance values are not fragmented across packets. For example, in YCrCb video with 4:1:1 color subsampling, each group of 4 pixels is represented by 6 values, Y1 Y2 Y3 Y4 Cr Cb. These should be packetized such that these values are not fragmented across a packet boundary. With 10 bit words this is a 60 bit value which is not octet Gharai/Perkins [Page 3] INTERNET-DRAFT Expires: August 2003 February 2003 aligned. To be both octet aligned, and appropriately framed, pixels must be framed in 2 groups of 4 pixels, thereby becoming octet aligned on a 15 octet boundary. This length is referred to as the pixel group ("pgroup"), and it is conveyed in the SDP parameters. Tables 1 to 4 display the pgroup values, in octets, for a range of color samplings and word lengths. When packetizing digital active line content, video data MUST NOT be fragmented within a pgroup. Video content is almost always associated with additional information such as audio tracks, time code, etc. In professional digital video applications this data is commonly embedded in non-active portions of the video stream (horizontal and vertical blanking periods) so that precise and robust synchronization is maintained. This payload format requires that applications using such synchronized ancillary data MUST deliver it in separate RTP sessions which operate concurrently with the video session. The normal RTP mechanisms SHOULD be used to synchronize the media. 8 bit words Color ---------------------------------------- Subsampling Pixels #words octet alignment #samples pgroup octets +-----------+------+---+ +------+---------------+---------------+ |monochrome | 1 |P/I| | 1x8 | 8/8 = 1 | 1 | 1 | +-----------+------+---+ +------+---------------+---------------+ | 4:1:1 | 4 |P/I| | 6x8 | 6x8/8 = 6 | 6 | 6 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:0 | 4 | P | | 6x8 | 6x8/8 = 6 | 6 | 6 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:0 | 4 | I | | 4x8 | 4x8/8 = 6 | 4 | 4 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:2 | 2 |P/I| | 4x8 | 4x8/8 = 8 | 4 | 4 | +-----------+------+---+ +------+---------------+---------------+ | 4:4:4 | 1 |P/I| | 3x8 | 3x8/8 = 3 | 3 | 3 | +-----------+------+---+ +------+---------------+---------------+ | 4:4:4:4 | 1 |P/I| | 4x8 | 4x8/8 = 4 | 4 | 4 | +-----------+------+---+ +------+---------------+---------------+ Table 1: pgroup values for 8 bit sampling Gharai/Perkins [Page 4] INTERNET-DRAFT Expires: August 2003 February 2003 10 bit words Color ---------------------------------------- Subsampling Pixels #words octet alignment #samples pgroup octets +-----------+------+---+ +------+---------------+---------------+ |monochrome | 4 |P/I| | 4x10 | 40/8 = 5 | 4 | 5 | +-----------+------+---+ +------+---------------+---------------+ | 4:1:1 | 4 |P/I| | 6x10 | 2x60/8 = 15 | 12 | 15 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:0 | 4 | P | | 6x10 | 2x60/8 = 15 | 12 | 15 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:0 | 4 | I | | 4x10 | 40/8 = 5 | 4 | 5 | +-----------+------+---+ +------+---------------+---------------+ | 4:2:2 | 2 |P/I| | 4x10 | 40/8 = 5 | 4 | 5 | +-----------+------+---+ +------+---------------+---------------+ | 4:4:4 | 1 |P/I| | 3x10 | 4x30/8 = 15 | 12 | 15 | +-----------+------+---+ +------+---------------+---------------+ | 4:4:4:4 | 1 |P/I| | 4x10 | 40/8 = 5 | 4 | 5 | +-----------+------+---+ +------+---------------+---------------+ Table 2: pgroup values for 10 bit sampling 12 bit words Color ---------------------------------------- Subsampling Pixels #words octet alignment #samples pgroup octets +-----------+------+---+ +------+---------------+---------------+ |monochrome | 2 |P/I| | 2x12 | 2x12/8 = 3 | 2 | 3 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:1:1 | 4 |P/I| | 6x12 | 72/8 = 9 | 6 | 9 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:0 | 4 | P | | 6x12 | 72/8 = 9 | 6 | 9 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:0 | 4 | I | | 4x12 | 48/8 = 6 | 4 | 6 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:2 | 2 |P/I| | 4x12 | 48/8 = 6 | 4 | 6 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:4:4 | 2 |P/I| | 6x12 | 2x36/8 = 9 | 6 | 9 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:4:4:4 | 1 |P/I| | 4x12 | 48/8 = 6 | 4 | 6 | +-----------+------+---+ +------+---------------+-------+-------+ Table 3: pgroup values for 12 bit sampling Gharai/Perkins [Page 5] INTERNET-DRAFT Expires: August 2003 February 2003 16 bit words Color -------------------------------------- Subsampling Pixels #words octet alignment samples pgroup octets +-----------+------+---+ +------+---------------+-------+-------+ |monochrome | 1 |P/I| | 1x16 | 16/8 = 2 | 1 | 2 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:1:1 | 4 |P/I| | 6x16 | 6x16/8 = 12 | 6 | 12 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:0 | 4 | P | | 6x16 | 6x16/8 = 12 | 6 | 12 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:0 | 4 | I | | 4x16 | 4x16/8 = 8 | 4 | 8 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:2:2 | 2 |P/I| | 4x16 | 4x16/8 = 8 | 4 | 8 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:4:4 | 1 |P/I| | 3x16 | 3x16/8 = 6 | 3 | 6 | +-----------+------+---+ +------+---------------+-------+-------+ | 4:4:4:4 | 1 |P/I| | 4x16 | 4x16/8 = 8 | 4 | 8 | +-----------+------+---+ +------+---------------+-------+-------+ Table 4: pgroup values for 16 bit sampling 4. RTP Packetization The standard RTP header is followed by a 4 octet payload header that extends the RTP Sequence Number, and by a 6 octet payload header for each line (or partial line) of video included. One or more lines, or partial lines, of video data follow. This format makes the payload header 32 bit aligned in the common case, where one scan line (fragment) of video is included in each RTP packet. For example, if two lines of video are encapsulated, the payload format will be as shown in Figure 1. Gharai/Perkins [Page 6] INTERNET-DRAFT Expires: August 2003 February 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Extended Sequence Number | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Scan Line No |C| Scan Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length |F| Scan Line No | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C| Scan Offset | . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . Two (partial) lines of video data . . . +---------------------------------------------------------------+ Figure 1: RTP Payload Format showing two (partial) lines of video 4.1. The RTP Header The fields of the fixed RTP header have their usual meaning, with the following additional notes: Payload Type (PT): 7 bits A dynamically allocated payload type field which designates the payload as uncompressed video. Timestamp: 32 bits For progressive scan video, the timestamp denotes the sampling instant of the frame to which the RTP packet belongs. Packets MUST NOT include data from multiple frames, and all packets belonging to the same frame MUST have the same timestamp. For interlaced video, the timestamp denotes the sampling instant of the field to which the RTP packet belongs. Packets MUST NOT include data from multiple fields, and all packets belonging to the same field MUST have the same timestamp. Use of field timestamps, rather than a frame timestamp and and field indicator bit, is needed to support reverse 3-2 pulldown. Gharai/Perkins [Page 7] INTERNET-DRAFT Expires: August 2003 February 2003 A 90 kHz timestamp MUST be used in both cases. If the sampling instant does not correspond to an integer value of the clock (as may be the case when interleaving, the value SHALL be truncated to the next lowest integer). Marker bit (M): 1 bit The Marker bit denotes the end of a video frame, and MUST be set to 1 for the last packet of the video frame. It MUST be set to 0 for other packets. Sequence Number: 16 bits The low order bits for RTP sequence number. The standard 16 bit sequence number is augmented with another 16 bits in the payload header, in order avoid problems due to wrap-around when operating at high rate rates. 4.2. Payload Header Extended Sequence Number : 16 bits The high order bits of the extended 32 bit sequence number, in network byte order. Scan Line No : 15 bits Scan line number of encapsulated data, in network byte order. Successive RTP packets MAY contains parts of the same scan line (with an incremented RTP sequence number, but the same timestamp), if it is necessary to fragment a line. Scan Offset : 15 bits Scan offset of the first sample in the payload data. If YCrCb format data is being transported, this is the offset of the co- sited luminance sample and if RGB format data is being transported it is the offset of the red sample. The value is in network byte order, and the offset has a value of zero if the first sample in the payload corresponds to the start of the line. Length: 16 bits Number of octets of data included from this scan line, in network byte order. This MUST be a multiple of the pgroup value. Gharai/Perkins [Page 8] INTERNET-DRAFT Expires: August 2003 February 2003 Field Identification (F): 1 bit Identifies which field the scan line belongs to, for interlaced data. F=0 identifies the the first field and F=1 the second field. For progressive scan data (e.g. SMPTE 296M format video), F MUST always be set to zero. Continuation (more lines) bit (C): 1 bit Determines if an additional scan line header follows the current scan line header in the RTP packet. Set to 1 if an additional header follows, implying that the RTP packet is carrying data for more than one scan line. Set to 0 otherwise. An unlimited number of scan lines MAY be included, up to the path MTU limit. The only way to determine the number of scan lines included per packet is to parse the payload headers. 4.3. Payload Data Depending on the video format, each RTP packet can include either a single complete scan line, a single fragment of a scan line, or one (or more) complete scan lines plus a fragment of a scan line. Every scan line or scan line fragment MUST begin at an octet boundary in the payload data. Scan lines SHOULD be fragmented so that the resulting RTP packet is smaller than the path MTU. It is possible that the scan line length is not evenly divisible by the number of pixels in a pgroup, so the final pixel data of a scan line does not align to either an octet or pgroup boundary. Nonetheless the payload MUST contain a whole number of pgroups; the sender MUST fill the remaining bits of the final pgroup with zero and the receiver MUST ignore the fill data. (In effect, the trailing edge of the image is black-filled to a pgroup boundary.) If the video is in YUV format, the packing of samples into the payload depends on the color sub-sampling used. For RGB format video, there is a single packing scheme. For RGB format video, samples are packed in order Red-Green-Blue. All samples are the same bit size, which may be 8, 10, 12, or 16 bits. If 8 bit samples are used, the pgroup is 3 octets. If 10 bit samples are used, samples from adjacent pixels are packed with no padding, and the pgroup is 15 octets (4 pixels). Refer to Tables 1 thru 4. For RGBA format video, samples are packed in order Red-Green-Blue-Alpha. All samples are the same bit size, which may be 8, 10, 12, or 16 bits. For pgroups refer to Tables 1 thru 4. Gharai/Perkins [Page 9] INTERNET-DRAFT Expires: August 2003 February 2003 For YUV 4:4:4 format video, samples are packed in order Cb-Y-Cr for both interlaced and progressive frames. Each sample is either an 8, 10, 12 or 16 bit value. For relevant pgroups refer to Tables 1 to 4. For YUV 4:2:2 format video, the Cb and Cr components are horizontally sub-sampled by a factor of two (each Cb and Cr samples corresponds to two Y components). Samples are packed in order Cb0-Y0-Cr0-Y1 for both interlaced and progressive scan lines. Samples are either an 8, 10, 12 or 16 bit value. For relevant pgroups refer to Tables 1 to 4. For YUV 4:1:1 format video, the Cb and Cr components are horizontally sub-sampled by a factor of four (each Cb and Cr sample corresponds to four Y components). Samples are packed in order Cb0-Y0-Y1-Cr0-Y2-Y3 for both interlaced and progressive scan lines. Samples are either an 8, 10, 12 or 16 bit value. For relevant pgroups refer to Tables 1 to 4. For YUV 4:2:0 video, the Cb and Cr components are sub-sampled by a factor of two both horizontally and vertically. Therefore chrominance values are shared between certain adjacent lines. Figure 2 illustrates the composition of luminance and chrominance values for 6x6 pixel grid in 4:2:0 YUV video. line 0: Y00 Y01 Y02 Y03 Y04 Y05 Cb00 Cr00 Cb01 Cr01 Cb02 Cr02 line 1: Y10 Y11 Y12 Y13 Y14 Y15 line 2: Y20 Y21 Y22 Y23 Y24 Y25 Cb10 Cr10 Cb11 Cr11 Cb12 Cr12 line 3: Y30 Y31 Y32 Y33 Y34 Y35 line 4: Y40 Y41 Y42 Y43 Y44 Y45 Cb20 Cr20 Cb21 Cr21 Cb22 Cr22 line 5: Y50 Y51 Y52 Y53 Y54 Y55 Figure 2: Chrominance and luminance compostion in 4:2:0 YUV video. Transport of progressive scan 4:2:0 YUV video entails the transport of two scan lines together such that: Gharai/Perkins [Page 10] INTERNET-DRAFT Expires: August 2003 February 2003 line 0,1: Y00-Y01-Y10-Y11-Cb00-Cr00 Y02-Y03-Y12-Y13-Cb01-Cr01 Y04-Y05-Y14-Y15-Cb02-Cr02 line 2,3: Y20-Y21-Y30-Y31-Cb10-Cr10 Y22-Y23-Y32-Y33-Cb11-Cr11 Y24-Y25-Y34-Y35-Cb12-Cr12 line 4,5: Y40-Y41-Y50-Y51-Cb20-Cr20 Y42-Y43-Y52-Y53-Cb21-Cr21 Y44-Y45-Y54-Y55-Cb22-Cr22 For interlaced transportm chrominance values are transported with every other line: field 0: line 0: Y00-Y01-Cb00-Cr00 Y02-Y03-Cb01-Cr01 Y04-Y05-Cb02-Cr02 line 2: Y20-Y21 Y22-Y23 Y24-Y25 line 4: Y40-Y41-Cb20-Cr20 Y42-Y43-Cb21-Cr21 Y44-Y45-Cb22-Cr22 field 1: line 1: Y10-Y11 Y12-Y13 Y14-Y15 line 3: Y30-Y31-Cb10-Cr10 Y32-Y33-Cb11 Cr11 Y34-Y35-Cb12-Cr12 line 5: Y50-Y51 Y52-Y53 Y54-Y55 5. RTCP Considerations RTCP SHOULD be used as specified in RFC1889 [RTP], which specifies two limits on the RTCP packet rate: RTCP bandwidth should be limited to 5% of the data rate, and the minimum for the average of the randomized intervals between RTCP packets should be 5 seconds. Considering the high data rate of many uncompressed video formats, the minimum interval is the governing factor in many cases. It should be noted that the sender's octet count in SR packets and the cumulative number of packets lost will wrap around quickly for high data rate streams. This means these two fields may not accurately represent octet count and number of packets lost since the beginning of transmission, as defined in RFC 1889. Therefore for network monitoring purposes other means of keeping track of these variables SHOULD be used. Gharai/Perkins [Page 11] INTERNET-DRAFT Expires: August 2003 February 2003 6. IANA Considerations 6.1. MIME type registration MIME media type name: video MIME subtype name: raw Required parameters: rate: The RTP timestamp clock rate. Applications using this payload format MUST be 90000 for this format. pgroup: The number of octets per the pixel group. See section 3 of RFC XXXX. color-mode: Determines the color mode of the video stream. Valid values for this parameter are: RGB, RGBA, and YUV. sub-sampling: Determines the type of color sub-sampling of the video stream. Valid values are: mono, 4:1:1, 4:2:0, 4:2:2, 4:4:4 and 4:4:4:4. width: Determines the number of pixels per line. This is an integer between 1 and 32767. height: Determines the number of lines per frame. This is an integer between 1 and 32767. depth: Determines the number of bits per samples. This is a decimal integer; typical values include 8, 10, 12, and 16. colorimetry: This parameter defines the set of colorimetric specfications and other transfer characteristics for the video source, by reference to an external specification. Valid values and their specification are: BT601-5 ITU Recommendation BT.601-5 [601] BT709-2 ITU Recommendation BT.709-2 [709] SMPTE240M SMPTE standard 240M [240M] NTSC The NTSC specification [NTSC] PAL The PAL specificaiton [PAL] New values may be registered as described in section 6.2 of RFC XXXX. Optional parameters: Gharai/Perkins [Page 12] INTERNET-DRAFT Expires: August 2003 February 2003 Interlace: If this optional parameter is present it indicates that the video stream is interlaced. If absent, progressive scan is implied. Encoding considerations: Uncompressed video can be transmitted with RTP as specified in RFC XXXX. No file format is defined at this time. Security considerations: See section 9 of RFC XXXX. Interoperability considerations: NONE. Published specification: RFC XXXX. Applications which use this media type: Video communication. Additional information: None Magic number(s): None File extension(s): None Macintosh File Type Code(s): None Person & email address to contact for further information: Ladan Gharai IETF Audio/Video Transport working group. Intended usage: COMMON Author/Change controller: Ladan Gharai 6.2. Parameter Registration New values of the "colorimetry" parameter MAY be registered with the IANA provided they reference an RFC or other permanent and readily available specification (the Specification Required policy of RFC 2434 [2434]). 7. Mapping to SDP Parameters Parameters are mapped to SDP [SDP] as in the following example: Gharai/Perkins [Page 13] INTERNET-DRAFT Expires: August 2003 February 2003 m=video 30000 RTP/AVP 112 a=rtpmap:112 raw/90000 a=fmtp:112 color-mode=YUV; sub-sampling=4:2:2; width=1280; height=720; depth=10; colorimetry=BT.709-2 In this example, a dynamic payload type 111 is used for uncompressed video. The RTP sampling clock is 90kHz. Note that the "a=fmtp:" line has been wrapped to fit this page, and will be a single long line in the SDP file. 8. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification, and any appropriate RTP profile. This implies that confidentiality of the media streams is achieved by encryption. This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat. It is important to be note that uncompressed video can have immense bandwidth requirements (up 270 Mbps for standard definition video, and approximately 1 Gbps for high definition video). This is sufficient to cause potential for denial-of-service if transmitted onto most currently available Internet paths. Accordingly, if best-effort service is being used, users of this payload format SHOULD monitor packet loss to ensure that the packet loss rate is within acceptable parameters. Packet loss is considered acceptable if a TCP flow across the same network path, and experiencing the same network conditions, would achieve an average throughput, measured on a reasonable timescale, that is not less than the RTP flow is achieving. This condition can be satisfied by implementing congestion control mechanisms to adapt the transmission rate (or the number of layers subscribed for a layered multicast session), or by arranging for a receiver to leave the session if the loss rate is unacceptably high. This payload format may also be used in networks which provide quality of service guarantees. If enhanced service is being used, receivers SHOULD monitor packet loss to ensure that the service that was requested is actually being delivered. If it is not, then they SHOULD assume that they are receiving best-effort service and behave accordingly. Gharai/Perkins [Page 14] INTERNET-DRAFT Expires: August 2003 February 2003 9. Relation to RFC 2431 In comparison with RFC 2431 this memo specifies support for a wider variety of uncompressed video, in terms of frame size, color subsampling and sample sizes. While [BT656] can transport up to 4096 scan lines and 2048 pixels per line, our payload type can support up to 64k scan lines and pixels per line. Also, RFC 2431 only address 4:2:2 YUV data, while this memo covers YUV and RGB and most common color subsampling schemes. Given the variety of video types that we cover, this memo also assumes out-of-band signaling for sample size and data types (RFC 2431 uses in band signaling). 10. Relation to RFC YYYY (tbd) Relation [292RTP] 11. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Gharai/Perkins [Page 15] INTERNET-DRAFT Expires: August 2003 February 2003 12. Acknowledgements The authors are grateful to Philippe Gentric and Chuck Harrison for their feedback. 13. Authors' Addresses Ladan Gharai Colin Perkins USC Information Sciences Institute 3811 N. Fairfax Drive, #200 Arlington, VA 22203 USA Normative References [RTP] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", Internet Engineering Task Force, RFC 1889, January 1996. [2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119. [2434] T. Narten and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 2434, October 1998. Informative References [274] Society of Motion Picture and Television Engineers, 1920x1080 Scanning and Analog and Parallel Digital Interfaces for Multiple Picture Rates, SMPTE 274M-1998. [268] Society of Motion Picture and Television Engineers, File Format for Digital Moving Picture Exchange (DPX), SMPTE 268M-1994. (Currently under revision.) [296] Society of Motion Picture and Television Engineers, 1280x720 Scanning, Analog and Digital Representation and Analog Interfaces, SMPTE 296M-1998. [372] Society of Motion Picture and Television Engineers, Dual Link 292M Interface for 1920 x 1080 Picture Raster, Gharai/Perkins [Page 16] INTERNET-DRAFT Expires: August 2003 February 2003 SMPTE 372M-2002. [ALF] Clark, D. D., and Tennenhouse, D. L., "Architectural Considerations for a New Generation of Protocols", In Proceedings of SIGCOMM '90 (Philadelphia, PA, Sept. 1990), ACM. [SDP] M. Handley and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [BT656] D. Tynan, "RTP Payload Format for BT.656 Video Encoding", Internet Engineering Task Force, RFC 2431, October 1998. [292RTP] L. Gharai et al., "RTP Payload Format for SMPTE 292M Video", Internet Draft, draft-ietf-avt-smpte292-video-07.txt, Work in progress. [601] International Telecommunication Union, "Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios", Recommendation BT.601, October 1995. [656] International Telecommunication Union, "Interfaces for Digital Component Video Signals in 525-line and 625-line Television Systems Operating at the 4:2:2 Level of Recommendation ITU-R BT.601 (Part A)", Recommendation BT.656, April 1998. [22028] ISO TC42 (Photography), Photography and graphic technology - Extended colour encodings for digital image storage, manipulation and interchange - Part 1: Architecture and requirements, ISO/CD 22028-1, Work in Progress. [709] ITU Recommendation BT.709-2 [240M] SMPTE Standard 240M [NTSC] (tbd) [PAL] (tbd) Gharai/Perkins [Page 17]