Adam Solove
Blog/UI engineering
3 Jun 2026

Designing a better podcast editor

A better concept model and more efficient tools for editing spoken word audio.

By Adam Solove · Published 3 Jun 2026 · Reading time 9 min

For the past few years, my partner has recorded and edited a niche podcast while I’ve helped a bit with selecting music and EQ setting. Our workflow was painful: one shared iCloud file, emails of notes coded to timestamps, carefully checking that our changes didn’t conflict.

Editing audio was like traveling back in time twenty years: no track changes, no comments, and no multiplayer editing.

The friction wasn’t just in the file-oriented workflow. The normal interaction model of Digital Audio Workstations (or DAWs) is not particularly well-suited for the task of editing spoken word audio.

So I decided to build the podcast editor we wanted: Ducking. It has a UI purpose-built for laying out spoken word audio, plus multiplayer editing, collaboration tools, and history management. In this post, I’ll talk about the improvements it makes to editing tools. Future posts will discuss the engineering challenges of multiplayer audio editing and the pleasure of building software for just a few users with design sketching techniques and LLM assistance.

Screenshot of Ducking — waveform timeline, transcript pane, history panel, two cursors on different clips
Screenshot of Ducking in use, with the comments and effects panels open.

How does Ducking major editing podcasts easier? It focuses on providing better tools for the two most-common recurring tasks:

Ducking itself was built specifically for our podcast workflow, serves its purpose doing that, and won’t be public anytime soon. But I hope that some of these ideas will spread into other tools and be more broadly useful.

Throughout this post, I’ll show simplified animations of the features in action to avoid distracting with other parts of the editing UI.

Audio layout

Like laying out a newspaper or a webpage, one of the main challenges with audio editing is to start by roughly trying out how different parts fit together, then to carefully specify more precisely, without messing up the existing choices.

Ducking provides an audio layout concept model that is much faster to work with, by borrowing ideas from other DAWs, text editors, and even further afield.

From absolute to magnetic time

In a traditional DAW, every clip has an absolute start time. When one clip is moved or edited for length, everything after drifts out of alignment.

Absolute layout — trimming any clip leaves a silent gap or overlaps the next clip.

Absolute layout is the right model for writing songs, where material in one measure should stay there. But it’s the wrong model for editing spoken word material, where the default is to reflow later material as earlier bits change.

The right layout model is a magnetic timeline, where clips are ordered, not positioned. Each clip’s place in time is computed from the lengths of the items before it. So when one clip is added, removed, or edited, everything after just re-flows automatically.

Gap clips allow adding explicitly-timed silence when that is needed.

Magnetic layout — clips and gaps reflow when you trim.

This is the model used by many video editing tools as well as audio tools that focus on spoken word, like Hindenburg. So the idea itself isn’t new. But it provides the first step and suggests that further playing with the idea of an automated layout model might be useful.

From splits to skip regions

The vast majority of podcast editing is repeatedly removing tiny bits of unwanted material like filler words, long pauses, or a flubbed sentence. In most audio editors, that means splitting each recording clip into lots of tiny parts and adjusting their alignment. After doing that dozens of times, the timeline view becomes a huge set of disconnected clips that are hard to scan or reorganize.

Without skip regions, every filler removal splits a clip in two. One more cut and you're up to ten detached fragments — none of them carrying any indication that they belong to the same original take.

Ducking uses “skip regions” as a better solution. The editor can leave a clip as a single unit while editing away part of it as not to be used. This keeps a single mostly-intact recording as a unit, so it’s easier to understand and rearrange, while still indicating where material has been removed.

Skip regions — fold a portion of a clip without splitting it in two.

The skip region acts like code folding in a text editor. It leaves a visible indication and can be unfolded to interact with the skipped audio or change the region’s start and end.

Pin-based alignment

So far the editor has only been working with a single track of audio. The problem gets harder as we add more parallel tracks.

Perhaps the trickiest case is transition music. The editor will usually want it to play gently underneath the end of one section, swell to be the main focus, and then duck back beneath the beginning of the next section.

Most audio tools allow laying out the second track either in absolute time or by connecting the start of a clip in one track to a specific place in another, which allows the second track to float along with the rest of the magnetic timeline.

Single connector — music clip tied to the end of a specific speech clip.

With the connector model, the editor can construct any particular transition, but they have to struggle a bit to translate between the creative vision they have and the set of tools that enables them to reach it.

Analyzing the creative decisions that they are trying to make, editors really care about:

So I built a set of layout tools that exactly correspond to those set of creative decisions. Using the pin-based layout system, the editor gets to pick which part of the music should play at the same time as which part of the preceding and following spoken word clips. Then they can independently control the volume or other effects on each track so that they layer properly.

Pins and constraints — two-tie constraint layout with fade points and automation.

Combining all of these elements — magnetic timeline, skip regions, and constraints between tracks — removes the layout busywork from audio editing and lets the editor focus directly on the emotional experience they’re trying to achieve.

Before any edit action can happen, first the editor has to understand and choose what to edit. Navigating audio happens in a few ways:

Ducking’s UI makes it easy to navigate in any of these manners by establishing correspondences between each of them. Rotating the timeline editing tools 90º lets the overview, waveform, and transcript view all move in the same alignment and scroll together.

Below is a low-fidelity interactive mockup that shows the core ideas.

Scrolling through a project. The timeline, transcript, and overview thumb stay in lockstep — the golden playhead marks the current position in all three.

With this high-level UI layout:

The UI layout establishes a clear correspondence between the overview, the waveforms, the transcript, and any currently-playing audio:

This same correspondence also ratchets up the power of other tools. When searching for text or looking at the history of edits to the project, those annotations can be overlayed onto all three views.

As an example, because the overview always shows the entire project, it’s a great way to see the overall structure and then orient where search results or tracked changes have happend in the document.

The same scrollbar, three jobs. The overview's whole-project context lets other tools — search, history compare — speak in the same affordance.

Conclusion and what’s next

Taken together, the more powerful audio layout model and the new UI navigation make it much faster for us to produce podcast episodes from raw recordings. The software is definitely tailored just for our needs, but these ideas may be more broadly applicable, which is why I am sharing them here.

Where this post focused on the UI, I plan to publish two future posts on other parts of the project:

  1. The experience of building a local-first, multiplayer experience on top of Automerge, especially focusing on ideas around collaboration and change management with non-textual data.
  2. The texture of working with AI coding assistants outside of a business environment, using the leverage not to intensify work but to enjoy more sketching and hammock time to decide what’s next. Plus the pleasure of building narrowcast software that only has to please two people.

About me

I’m Adam Solove, a product engineer who loves to build great products in complicated domains. I’m just wrapping up a six month sabbatical that focused on my local community and some building deeply personal tech experiments like the one above.

I’m starting to look for projects or my next role. If you’re building something interesting, please get in touch.