MediaEncoder Extensions for Hardware Encoding CR

Introduction

This article outlines the details and parameters that are required in order for the Android OS/ MediaEncoder.class to expose a higher level of control from Hardware Coder to Android Application. This document explains the available API for each category on stock Android and highlights the proposed extensions.

Glossary

  • CABAC: Context-based Adaptive Binary Arithmetic Coding
  • CAVLC: Context-based Adaptive Variable Length Coding
  • DPB: Decoded Picture Buffer
  • IDR: Instantaneous Decoding Refresh
  • LTR: Long Term Reference
  • MB: Macroblock
  • NAL: Network Abstraction Layer
  • PPS: Picture Parameter Set
  • QP: Quantization Parameter
  • ROI: Region Of Interest
  • SPS: Sequence Parameter Set
  • VUI: Video Usability Information
  • AOSP: Android Open Source Project

Requirements

Parameter Extensions to MediaRecorder and Hardware Encoders:

Document Reference Requirement Priority Notes
CODEC-EXT-1 4.1.2 Statistic Capability extension Must have
CODEC-EXT-2 4.2.2 Extended Configuration parameters Must have
CODEC-EXT-3 4.2.2 Extended Configuration parameters Must have
CODEC-EXT-4 4.4 Performance considerations Nice to have
CODEC-EXT-5 The hardware provides face detection information of captured frames, generated from a hardware-based face detector. Nice to have

Encoder and Decoder – Acceptance Criteria

  • The hardware encoder must support one-in-one-out operation. Specifically, the encoder shall start encoding an input frame immediately after the frame is passed into the encoder and output the encoded frame immediately when completed (no buffering allowed). The average and maximal encoding time must be smaller than 33ms and 500ms, respectively. The caller waits for the encoder output before providing the next input sample.
  • The hardware encoder must support dynamic control on IDR insertion, LTR operation, per-frame QP control and temporal layer count change on a precise frame basis. The application sets dynamic request (configurations) before sending the next input buffer to the encoder. The encoder shall apply that dynamic request to the very next input buffer.
  • The hardware encoder must support dynamic resolution change, profile/level change within average one frame latency (33ms) over a one-second window 1. The maximal latency must not exceed 500ms.
  • The hardware decoder must support one-in-one-out operation. Specifically, the decoder shall start decoding an input compressed frame immediately after the frame is passed into the decoder and output the decoded frame immediately when completed (no buffering allowed). The maximal decoding time must be smaller than 33ms.
  • The hardware decoder must support dynamic resolution change without the need to recreate decoding instances within average one frame latency (33ms) over a one-second window. The maximal latency must not exceed 500ms.
  • The duration to create an encoder or a decoder instance must be smaller than 100ms.
  • The hardware must support multiple concurrent encoding and decoding instances operating at different resolutions, frame rates, profiles, and levels. The hardware shall support 4 encoder instances and 8 decoder instances concurrently. The caller assures the sum of macroblock processing rate across all encoding/decoding instances does not exceed what exposed in the encoder and decoder capabilities respectively.

 

Hardware Encoder API Changes

The API to query and configure video component can be classified into following categories:

  • Statistic capability queries: Query the capabilities of the decoder/encoder component. These capabilities are fixed for a given platform and do not need a code instance for the query.
  • Static configurations: These are the initial configurations that are set on the encoder/decoder before start of the session.
  • Dynamic Configurations: These are the dynamic settings that can be applied during an ongoing session.

The graphic below shows the software stack on Android for accessing the video encoder/decoder and the extensions suggested to facilitate implementing the MediaCodec extension requirements:

Statistic Capability Queries

Static Capabilities can be queried from MediaCodec before instantiating the codec using CodecCapabilites.

Statistic Capability supported by AOSP

Reference: Static capabilities supported by default Android system.

The following provides examples and code snippets how to query them.

Capability VideoCapabilities
MaxFrameWidth VideoCapabilities.getSupportedWidths()
MaxFrameHeight VideoCapabilities.getSupportedHeights()
MaxInstances CodecCapabilities.getMaxSupportedInstances()
MaxMacroblockProcessingRate VideoCapabilities.getAchievableFrameRatesFor(width, height)
  • MaxFrameWidth is the maximal supported width of the input frame, in units of pixels.
  • MaxFrameHeight is the maximal supported height of the input frame, in units of pixels.
  • MaxInstances is the maximal possible number of concurrent encoding instances supported across the platform
  • MaxMacroblockProcessingRate is the maximal macroblock processing rate (in unit of macroblocks per second) supported by the encoder/decoder across all encoding instances.

Example:

MediaCodecInfo.CodecCapabilities caps = codecInfo.getCapabilitiesForType("video/avc"); 
MediaCodecInfo.VideoCapabilities vCaps = caps. getVideoCapabilities();
Int nMaxInstances = caps. getMaxSupportedInstances(); 
if (vCaps){
Range<Integer> widths = vCaps.getSupportedWidths(); int nMaxFrameWidth = widths.getUpper(); Range<Integer> heights = vCaps.getSupportedHeights(); int nMaxFrameHeight = heights.getUpper();
int nMaxFrameRate = vCaps. getAchievableFrameRatesFor(nMaxFrameWidth, nMaxFrameHeight).getUpper();
int nMaxMacroblockProcessingRate = (nMaxFrameWidth/16) * (nMaxFrameHeight/16);
}

Statistic Capability Extension

For querying additional capabilities, on API level 26 and above, a new approach is listed as below, on API level 26 and below, please refer to spec v0.2. Capabilities are now treated as extension parameters to the Mediacodec. Capability extension parameters can be set on a MediaCodec during configure() to indicate the intent to query the capability. The actual value of the requested capability is reflected in the output-format post-configuration.

The following code demonstrates the method to query capabilities from encoder and decoder component:

// capabilities
private static final DefaultHashMap<String, String> mCAPS;
mCAPS.put(“version”, “vendor.rtc-ext-enc-caps-vt-driver-version.number”); 
mCAPS.put(“maxLayers”, “vendor.rtc-ext-enc-caps-temporal-layers.max-p-count”);
// extensions
private static final DefaultHashMap<String, String> mKEYS;

mKEYS.put(“profile”, “vendor.rtc-ext-enc-custom-profile-level.profile”);
mKEYS.put(“level”, “vendor.rtc-ext-enc-custom-profile-level.level”); 
mKEYS.put(“numLtrFrames”, "vendor.rtc-ext-enc-ltr-count.num-ltr-frames"); 
mKEYS.put(“decodeOrder”, “vendor.rtc-ext-dec-picture-order.enable”);
// set capability parameters
format.setInteger (mCAPS.get(“version”), 0); 
format.setInteger(mCPAS.get(“maxLayers”), 0);
public void onOutputFormatChanged(MediaCodec codec, MediaFormat format) {
if (format.containsKey(mCAPS.get(“version”))){ 
version = format.getInteger(mCAPS.get(“version”));
}
if (format.containsKey(mCPAS.get(“maxLayers”))) {
maxLayers = format.getInteger(mCPAS.get(“maxLayers”)); 
}
format.setInteger(mKEYS.get("decodeOrder"), 1); // needed to speed-up decoder response decoder.configure(format, null, null, 0); decoderConfigured = true;
}

The following table maps the extended capability queries to the specific API:

Capability Parameter Key Applicable
DriverVersion “vendor.rtc-ext-enc-caps-vt-driver-version.number” “vendor.rtc-ext-dec-caps-vt-driver-version.number” Enc/Dec
LowLatency “vendor.rtc-ext-enc-low-latency.enable” “vendor.rtc-ext-dec-low-latency.enable” Enc/Dec
MaxLTRFrames “vendor.rtc-ext-enc-caps-ltr.max-count” Enc
Resize Support “vendor.rtc-ext-enc-caps-preprocess.max-downscale-factor” Enc
Rotation Support “vendor.rtc-ext-enc-caps-preprocess.rotation” Enc
  • DriverVersion is the hardware driver version. The driver version value of a later driver must be greater than that of an older version. This is a static API. The driver version should serve as a unique value that allows the app to blacklist a buggy driver or whitelist a good driver against some chip. Forexample, if a bug is found in driver “N” but later on IHV fixes it in driver “N+k”, app can get the driverversion parameter via Media codec extension and blacklist those with values from “N” to “N+k-1”.IHV may define this field to be a meaningful value (e.g. to include internal build number) as long as it follows the above rule.
  • LowLatency enforces the encoder/decoder to run in low latency mode. When the value is TRUE, encoder must (1) enforce 1-in-1-out behavior, (2) generate bitstreams with syntax element.
  • MaxLTRFrames is the maximal number of LTR frames supported by the encoder. The value must be smaller than or equal to nMaxRefFrames and greater than or equal to 3.
  • Resize support indicates what down scaling factors supported by the encoder when combined resizing with encoding is supported.
  • Rotation support indicates the encoder supports rotation or not.

NOTE: The Capabilities shall be published in platform’s media_codecs.xml alongside the AOSP mandatedcapabilities according to the hardware capability.

Example – setting static parameters:

MediaFormat format = MediaFormat.createVideoFormat(“video/avc”, width, height); 
format.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
// set extended params
format.setInteger(“vendor.rtc-ext-enc-low-latency.enable”, 1); 
format.setInteger(“prepend-sps-pps-to-idr-frames”, 1);

Static Configuration Parameters

Static Configuration parameters are specified to MediaCodec with configuration (MediaFormat …). The parameters, along with the values are aggregated as key-value pairs in MediaFormat.

Configuration parameters supported by AOSP

The following configuration parameters are supported by AOSP:

Parameter AOSP MediaFormat key
FrameWidth KEY_WIDTH
FrameHeight KEY_HEIGHT
Bitrate KEY_BIT_RATE
Framerate KEY_FRAME_RATE
ColorFormat KEY_COLOR_FORMAT
MaxTemporalLayerCount KEY_TEMPORAL_LAYERING
  • Framerate indicates the nominal (highest) input frame rate. The encoder shall be able to handle any run-time input frame rate lower than this number properly (i.e. as some cameras may extend exposure time when under low lighting condition which results in frame rates lower than the nominal setting). For example, when encoder rate control is in use, the encoder shall not assume input frame is always equal to Framerate but shall refer to timestamp value of each frame to update HRD buffer occupancy.

Extended Configuration Parameters

IHVs shall add support for accepting the following parameters via MediaFormat and setting the same on OMX component.

All of the following parameters shall be added to the MediaFormat as integer values using setInteger(key, value).

Parameter Extended Mediaformat key Valid integer range Applicable
Level “vendor.rtc-ext-enc-custom- profile-level.level” Values in {OMX_VIDEO_AVCLEVELTYPE} Enc
Profile “vendor.rtc-ext-enc-custom- profile-level.profile” Values in {OMX_VIDEO_EXTENSION_AVCPRO FILETYPE} Enc
SliceHeaderSpacing “vendor.rtc-ext-enc-slice.spacing” Integral values {0,.. INT_MAX), MBs in one slice Enc
SequenceHeaderWithIDR “prepend-sps-pps-to-idr- frames” {0, 1} Enc
RateControl “bitrate-mode” (SDK < 28) “vendor.rtc-ext-enc-bitrate- mode.value” (SDK >=28) values in {OMX_VIDEO_CONTROLRATETYPE } Enc
LTRFrames “vendor.rtc-ext-enc-ltr- count.num-ltr-frames” Integral values {0,.. max-num-ltr- frames) Enc
MaxTemporalLayerCount “ts-schema” String { “android.generic.N” | N = 0, 1 .. max-layer-count-1 } Enc
SarWidth “vendor.rtc-ext-enc-sar.width” Integral values {0,.. width) Enc
SarHeight “vendor.rtc-ext-enc-sar.height” Integral values {0,.. height) Enc
Rotation “vendor.rtc-ext-enc- preprocess-rotate.angle” {0, 90, 180, 270} Enc
InputQueueControl “vendor.rtc-ext-enc-app- input-control.enable” {0, 1} Enc
DownScaleWidth “vendor.rtc-ext-down- scalar.output-width” Integral values Enc
DownScaleHeight “vendor.rtc-ext-down- scalar.output-height” Integral values Enc

Example – resize settings:

int input_width = 1280; 
int input_height = 720; 
int output_width = 960; 
int output_height = 540;

MediaFormat format = MediaFormat.createVideoFormat(“video/avc”, input_width, input_height); 
format.setInteger(MediaFormat.KEY_FRAME_RATE, 30);

// set downscaled resolution params
format.setInteger(“vendor.rtc-ext-down-scalar.output-width”, output_width); 
format.setInteger(“vendor.rtc-ext-down-scalar.output-height”, output_height);

Dynamic Configuration Parameters

MediaCodec.setParameters() can be used to set/enable configuration to the codec dynamically. These settings take effect when codec is in Executing state (i.e., after MediaCodec.start() has been issued).

Dynamic Configuration Supported by AOSP

The following Dynamic configurations are supported by AOSP:

Config Parameter KEY
Request key frame PARAMETER_KEY_REQUEST_SYNC_FRAME
Bitrate PARAMETER_KEY_VIDEO_BITRATE
Temporal Layer count KEY_TEMPORAL_LAYERING
  • Request key frame instructs the encoder to encode the next base layer frame as an IDR frame. When this control is applied, the encoder shall generate an IDR frame in the next base layer. When a dynamic IDR frame request is signaled to the encoder, the encoder must continue to encode frames according to the current temporal layer count pattern until it reaches to the base layer frame and then it must start the new temporal layer pattern with the first base layer frame as an IDR frame. In other words, the encoder shall not break the dyadic temporal pattern. The encoder shall not queue up this control. If an IDR frame request is submitted while another request is still pending then the older request should be dropped.
  • Temporal Layer Count specifies the number of temporal layers in the bitstream. When a new temporal layer count is signaled to the encoder, the encoder must continue to encode frames according to the old temporal layer count pattern until it gets to the base layer frame and then it must start the new temporal layer pattern without introducing a new IDR if SPS syntax values are unchanged. The default value of temporal layer count is 1. The encoder shall not queue up this control. If a temporal layer count request is submitted while another request is still pending then the older request should be dropped. (NOTE: This is a dynamic control that shall not incur encoder reset when temporal layer count changes from 1 to 2 or vice versa)

Extended Dynamic Configurations

Following dynamic configurations corresponding to OMX dynamic configurations should be added by HW codec vendors:

Config Extended Parameter KEY Applicable
MarkLTR “vendor.rtc-ext-enc-ltr.mark-frame” Enc
UseLTR “vendor.rtc-ext-enc-ltr.use-frame” Enc
FrameQP “vendor.rtc-ext-enc-frame-qp.value” Enc
BaseLayerPID “vendor.rtc-ext-enc-base-layer-pid.value” Enc
InputCrop “vendor.rtc-ext-enc-input-crop.left” “vendor.rtc-ext-enc-input-crop.right” “vendor.rtc-ext-enc-input-crop.width” “vendor.rtc-ext-enc-input-crop.height” Enc
  • MarkLTR instructs the encoder to mark the next base layer frame as an LTR frame. When this control is applied, the encoder shall generate an LTR frame in the next base layer with H.264 variable long_term_frame_idx equal to nLongTermFrmIdx. When an LTR frame request is signaled to the encoder, the encoder must continue to encode frames according to the current temporal layer pattern until it gets to the base layer frame and then it must start the new temporal layer pattern with the first base layer frame as an LTR frame. The encoder shall not queue up this control. If a mark LTR request is submitted while another request is still pending then the older request should be dropped.
  • UseLTR This API instructs the encoder to use previously marked LTR frame(s) to encode the next frame. When this control is applied, the encoder must only use LTR frames with long_term_frame_idx specified in the bitmask of nUsedLTRFrameBM as reference frames. The encoder shall encode subsequent frames in encoding order subject to the following constraints:
    • It shall not use short term reference frames in encoding order older than the current frame for future encoding in encoding order.
    • It shall not use LTR frames not described by the most recent use LTR control.
    • It may use LTR frames updated during o rafter the current frame (includingthecurrentframe if it’s marked as LTR)

When an LTR request is signaled to the encoder, the encoder shall follow the below implementation guidelines in supporting this control:

  • When applying this control, memory management control operation equal to 1 shall not be used as it is not error resilient and will cause DPB state mismatch between client/decoder side and server/encoder side, on bit loss. Instead, sliding window DPB management and list reordering syntaxes shall be used.
  • When memory management control operation equal to 6 is present, memory management control operation equal to 2 could be redundant when they have the same long-term frame index (LongTermFramIdx), i.e. the two operations in the same slice header work on the same long-term frame index (LongTermFramIdx). In this case, it is recommended not to have the redundant memory management control operation equal to 2.
  • To achieve the best coding efficiency in general, it is recommended to use the nearest temporal neighbor in picture order count (POC), for reference, when multiple long term references are available, unless encoder has some logic to choose the best reference frame.

The encoder shall not queue up this control. If a use LTR request is submitted while another request is still pending then the older request should be dropped.

  • FrameQP dynamically specifies the quantization parameter (QP) value of the next frame. The default

QP is 34. If a frame QP request is submitted while another request is still pending then the older request should be dropped. When frame QP is not specified for a frame, the QP value of the previous frame shall be used. When this control is called, the encoder shall infer external rate control performed by the app is used and ignore Bitrate in OMX_VIDEO_PORTDEFINITIONTYPE.

  • BaseLayerPID changes the value of H.264 syntax element priority_id of the base temporal layer (i.e. with temporal_id equal to 0), starting from the next base layer frame. The value of priority_id of enhancement temporal layers is defined as base priority_id plus temporal_id. The default value of base priority_id is 0. This is a dynamic control that shall not introduce an IDR frame. The encoder shall not queue up this control. If a base layer priority ID request is submitted while another request is still pending then the older request should be dropped.
  • InputCrop in combination with input frame size and output bitstream resolution specify crop, scaling and combined crop/scaling operation.
    • Scaling only: When input picture frame size is different frame bitstream frame size, the encoder shall resize and encode the frame (or failed the operation if pre- processing is not supported).
    • Cropping only: if the crop rectangle is equal to the bitstream frame size, the encoder shall crop and encode the frame (or failed the operation if pre-processing is not supported).
    • Cropping and Scaling: if the crop rectangle is not equal to the bitstream frame size, the encoder shall crop, resize, and then encode the frame (or failed the operation if pre-processing is not supported).

In case that IHV cannot guarantee that issued Dynamic Configuration parameters shall always take effects at next encoded frame (especially at Surface mode), it is recommended to use an additional parameter to make sure the synchronized behavior as follows:

Config Extended Parameter KEY Applicable
mark timestamp of dynamic configurations “vendor.rtc-ext-enc-input-trigger.timestamp” Enc (Unit:us int64)

Which shall correspond to

  • Buffer-to-buffer mode, the presentationTimeUS which is set via queueInputBuffer (int index, int offset, int size, long presentationTimeUs, int flags)
  • Surface mode, the timestamp which is set via EGLExt.eglPresentationTimeANDROID(context.getEGLDisplay(), eglSurface, timestamp)

Example:

// set crop rectangle
final Bundle cropRect = new Bundle();
cropRect.putInt("vendor.rtc-ext-enc-input.crop-left ", 0);
cropRect.putInt("vendor.rtc-ext-enc-input.crop-right", 0);
cropRect.putInt("vendor.rtc-ext-enc-input.crop-width", 320);
cropRect.putInt("vendor.rtc-ext-enc-input.crop-height", 240);
encoder.setParameters(cropRect);

// loop to queue inputs

//condition to change QP

final Bundle frameQP = new Bundle();
frameQP.putInt("vendor.rtc-ext-enc-frame-qp.value ", 51);
encoder.setParameters(frameQP);

Performance Considerations

Since Skype requires some margin for real time encoding (e.g., 30%), some IHVs suggest to request the codec to operate at a higher rate than expected. At Skype app side, the following setting may be applied. Moreover, some IHVs might require dedicated Color format input and output to enable decoder to work with HW accelerated format and also provide a way to application to interpret YUV, Skype app might set:

configuration Parameter KEY Value Applicable
Operating rate “operating-rate” 60 (for 30fps) Enc & Dec
configuration Parameter KEY Value Applicable
Color-format MediaFormat.KEY_COLOR_FORMAT COLOR_FormatYUV420Flexible Dec