Frank Schönberger, Yannik Grewe and Roman RehausenApr 15, 20268 min read

Moving Next Generation Audio Production Workflows into the Cloud

11:28

NGA workflow - partners With new broadcast standards evolving all over the world and OTT streaming striving for the pole position in media consumption, both next-gen video and audio codecs are gaining momentum. While emerging video formats like VVC/H.266 or AV1 are the major drivers of such initiatives, Next Generation Audio (NGA) production workflows still require more attention and further research and development, especially when targeting live events. In on-premises environments, NGA workflows have been in use for many years. For cloud and hybrid pipelines, debate continues as to the ideal structure.

The production process for real-time events like sports and concerts is complex regardless of whether you are targeting TV broadcast or adaptive bitrate streaming workflows. The authoring step of immersive, object-based audio, which is required to create NGA content such as MPEG-H Audio, needs careful human interaction.

The NGA Codec in Focus: MPEG-H Audio

Fraunhofer IIS is the primary driver behind one of the major NGA codecs: MPEG-H Audio. It delivers personalized immersive sound that offers an unprecedented user experience. MPEG-H Audio is included in the ATSC (North America), DVB (Europe), TTA (South Korea) and SBTVD (Brazil) TV standards. In fact, the world’s first terrestrial UHD TV service in South Korea is already broadcasting MPEG-H Audio. In Brazil, it has been selected as the mandatory audio system for the country’s next-generation DTV+ broadcast service combining the next-gen video codec VVC/H.266 with NGA MPEG-H. Other countries and organizations, including ATSC 3.0 in the US, DVB in Europe, and ARIB in Japan, have been evaluating MPEG-H Audio as their sole or supplemental audio format.

In contrast to traditional audio formats with their fixed stereo or surround mixes, immersive audio codecs such as MPEG-H handle the individual sound elements (e.g. dialogue, commentary, music, sound effects) as separate audio objects. Each object is accompanied by metadata that defines, for example, what the audio element represents, where it should be positioned, when it is active over time, how it should be rendered on different playback devices, what user interactivity is permitted, or its loudness characteristics.

Classic on-premises NGA production workflows are well established and have been running smoothly for many years now. However, broadcasters are tending to move parts of the MPEG-H production workflows to the cloud to reduce hardware costs and increase flexibility. As a result, we need to rethink and adapt the current approach where source video and audio (with or without metadata) must be transferred and processed differently. For NAB 2026, several leading software companies and technology providers put their heads together to introduce a Proof of Concept (POF) for how an MPEG-H production workflow in the cloud can look.

A Unique Partner Collaboration of Leading Media Companies

For the POC at NAB, five companies teamed up to present a real-time NGA production chain in the cloud. Each company contributes to the showcase with its proven expertise in the broadcast market:

Fraunhofer IIS, the research organization behind MPEG-H Audio technology, provides encoding, decoding, and player libraries that enable the entire NGA experience from production through playback.
Jünger Audio brings deep expertise in audio processing with their flexAI platform, which handles the complex task of real-time metadata authoring and rendering. flexAI enables broadcasters to create and manage NGA audio and metadata streams in live production environments, supporting both S-ADM and MPEG-H Control Track workflows.
MainConcept provides their professional-grade AVC, HEVC and VVC Encoder as well as Decoder SDK technology combined with both MPEG-H contribution and emission encoding to ensure broadcast-quality video and audio throughout the complete production chain, including multiplexing and packaging for ABR formats like HLS, DASH and CMAF.
Techex contributes txdarwin, a sophisticated stream processing platform that demultiplexes incoming feeds, routes audio to processing systems, and remultiplexes the processed streams while maintaining frame-accurate time alignment – critical for professional broadcast applications.
AWS provides the cloud infrastructure that serves as a platform for the complete POC. All partner solutions are deployed on dedicated EC2 instances hosted by AWS.

NGA workflow - full

NAB 2026: An NGA Production Workflow for the Cloud

Historically, the traditional live production workflow was based on on-premises infrastructure, i.e. Outside Broadcasting (OB) vans, studios, and dedicated hardware, that was capable of dealing with uncompressed audio, real-time metadata authoring, and contribution to distribution processing pipelines. Nowadays, cloud technologies have the ability to extend these workflows beyond physical facilities without compromising audio quality, latency or creative control while evolving toward greater immersion and personalization.

The solution that Fraunhofer, Jünger Audio, MainConcept and Techex have set up on AWS for NAB 2026 is a first step to port the complete NGA production workflow to the cloud. Here is the solution architecture and signal workflow at a glance:

Audio Capture: For live events, this first step is still happening on-site, i.e. at the event location where individual video and audio sources – such as commentary, ambient sounds, and effects – are captured separately as discrete PCM audio signals and processed in the OB van. The separation of the various feeds enables interactivity and immersive rendering later on.  MainConcept’s video and audio codec technology used in Contribution Mode receives video and audio (16 channels of PCM audio) over baseband (SDI). The MainConcept MPEG-H Encoder library – powered by Fraunhofer’s immersive audio technology – encodes and outputs the content as an MPEG-H contribution bitstream plus a high-quality HEVC/H.265 video as Transport Stream (TS) over SRT. The SRT feed is then sent to the AWS cloud for processing.

An on-premises installation of Jünger Audio AIXpressor can be used to already create metadata and perform monitoring at the event location, as well as monitoring of the return feed from the cloud.
Metadata Creation / Authoring: Once the feed is transmitted to the cloud, an EC2 instance of Techex darwin receives the SRT-TS feed with the HEVC/H.265 video and MPEG-H contribution audio. The txdarwin solution demultiplexes the incoming feed into its audio and video elementary streams. The resulting audio bitstream is forwarded to Jünger Audio’s flexAIcloud instance via SRT where the MPEG-H contribution audio is decoded into PCM audio.

In flexAI, the actual authoring of the audio is applied. The user can either edit the incoming metadata or generate it from scratch. During this process, Jünger Audio flexAI reconstructs the metadata and incorporates it into a PCM Control Track defining elements such as:
- Channel-based beds: 5.1+4H and 7.1+4H immersive mix, etc.
- Audio objects: Home and away supporters, players on the pitch, commentators in multiple languages, etc.
- Presets/presentations: Combinations of beds and objects tailored for different viewer selections, etc.
Once the authoring and rendering is finished, both audio and metadata within the Control Track are again packaged in a 16 channel Transport Stream and transferred back to txdarwin as MPEG-H Audio bitstream over SRT.

In the EC2 instance of txdarwin, the authored audio with the corresponding metadata is once again multiplexed with the video stream. During this process, the Techex solution ensures that the timestamps of the audio and video are aligned properly to avoid any synch issues in the distribution encoder.

The multiplexed A/V Transport Stream is sent from txdarwin to a MainConcept Video and MPEG-H Emission Encoder to prepare the content for delivery. To make the workflow even more flexible, it is possible to have parallel input as live contribution SRT as well as NDI from a cloud content server, so you can seamlessly switch between the different streams.
Distribution: An EC2 instance with MainConcept’s software encoder receives the A/V output from txdarwin via SRT-TS. Operating in distribution mode, the live processing solution receives the incoming SRT-TS feed. Depending on the target format, the MainConcept video encoder libraries can convert the footage to AVC/H.264, HEVC/H.265 or VVC/H.266. The MainConcept MPEG-H Emission Encoder takes the information from the Control Track to process the audio stream for delivery and outputs the feed into complete HLS, DASH or CMAF content.

The resulting audio/video segments as well as the corresponding manifest or playlist files are uploaded to an Amazon S3 bucket. For finishing the complete MPEG-H production workflow in the cloud, Amazon CloudFront was selected as the CDN for OTT delivery with S3 as its origin.
Playback: Any device supporting AVC, HEVC or VVC, MPEG-H Audio and capable of receiving HLS, DASH or CMAF can play back the content in real-time. The player on the device decodes the video and MPEG-H bitstream containing the audio plus matching metadata, and turns the bitstream into PCM audio. The renderer within the actual decoder applies the metadata that adapts the mix to the current playback setup such as a soundbar, headphones, TV speaker or full surround system.

At the same time, viewer interactivity is turned on, enabling the user to select languages, dialogue level, commentary tracks and other personalized features via an on-screen display. All this is standard for MPEG-H Audio technology from Fraunhofer IIS.

The NGA Future in the Cloud is Now!

The impressive expertise and close collaboration of core technology providers in the broadcast and OTT streaming market make it possible to move the NGA production chain using MPEG-H Audio into the cloud. By combining Jünger Audio's competence in sound processing, MainConcept's industry-leading audio and video encoding capabilities, Techex's stream processing platform, and Fraunhofer IIS's MPEG-H technology, all deployed on AWS’ leading cloud instances portfolio, broadcasters worldwide have access to professional-grade NGA workflows on-premises, in the cloud and in hybrid environments without compromising on quality or functionality.

If you are interested in seeing a demo or talking to the teams who made this a reality, visit Fraunhofer IIS, Jünger Audio (Telos Alliance), MainConcept, Techex and AWS from April 19 – 22, at NAB 2026 in Las Vegas.

Where to find the technology providers at NAB 2026:

Fraunhofer IIS – MPEG-H Audio: W2343
MainConcept – A/V Codec Technologies: W1343
Jünger Audio – flexAI Platform: C1819 (Telos Alliance)
Techex – Darwin Platform: W2267
Amazon – AWS Cloud: W1701

Frank Schönberger, Yannik Grewe and Roman Rehausen

Frank Schönberger is Director, Strategic Partnerships at MainConcept
Yannik Grewe is Senior Manager, Media Technologies & Business Development at Fraunhofer IIS
Roman Rehausen is Senior Product Manager at Jünger Audio

CODEC COMPARISON TOOL

VMAF-E

MainConcept Easy Video API (EVA)

CMAF: Low-Latency at Scale