Enhancing Video Content by Removing Silence: A Triple Approach

In the realm of eLearning and language learning, pacing is paramount. Keeping a video’s rhythm engaging without prolonged silent intervals can drastically improve information retention. Previously, I introduced two methods to snip silent parts in your videos: a manual approach using Olive 0.1 and an automated method with FFmpeg. I will bring a third innovative method using voice recognition AI. This is a part of my book “Learning Engineering using Python“.

1. Manual Silence Removal with Olive 0.1

1.1 Introduction to Olive 0.1

Olive 0.1 is a non-linear video editing software for beginners and professionals. Its intuitive interface simplifies the task of visually identifying and trimming silent segments.

1.2 Step-by-Step Guide

  • Importing Video:
    • Open Olive and navigate to File > Import or press Ctrl+I.
    • Select the video file and click. Open.
  • Adding Video to the Timeline:
    • Drag the imported video from the project bin to the timeline.
  • Viewing Audio Waveform:
    • Expand the audio track on the timeline to visualize the waveform. Flat lines represent silence.
  • Identifying and Cutting Silent Parts:
    • Scrub through the timeline and locate silent sections.
    • Use the Razor Tool (shortcut C) to cut at the beginning and end of the silent section.
    • Switch to the Selection Tool (shortcut V), select the silent section, and press Delete.
  • Closing Gaps:
    • Right-click on the gap and select Ripple Delete.
  • Exporting Edited Video:
    • Go to File > Export, choose format, quality, and location, then click Export.

2. Automated Silence Removal with FFmpeg

2.1 Introduction to FFmpeg

FFmpeg is a powerful command-line toolkit designed for multimedia processing. Its versatility shines when automating tasks like detecting and trimming silent intervals in videos.

2.2 Python Script for Silence Removal

Below is a Python script utilizing FFmpeg to detect and remove silence:

import os
import re
import shutil
import subprocess
import tempfile

def remove_silence(input_file, output_file, threshold=-30, duration=1, padding=1):
    temp_dir = tempfile.mkdtemp(prefix="temp_segments_")
    command = ["ffmpeg", "-i", input_file, "-af", f"silencedetect=n={threshold}dB:d={duration}", "-f", "null", "-"]
    output = subprocess.check_output(command, stderr=subprocess.STDOUT, text=True)

    silence_starts = [float(match.group(1)) for match in re.finditer(r"silence_start: (\d+(\.\d+)?)", output)]
    silence_ends = [float(match.group(1)) for match in re.finditer(r"silence_end: (\d+(\.\d+)?)", output)]

    segments = []
    prev_end = 0
    for start, end in zip(silence_starts, silence_ends):
        if start - prev_end > padding:
            segments.append((prev_end, start))
        prev_end = end + padding

    match = re.search(r"Duration: (\d+:\d+:\d+\.\d+)", output)
    if match:
        hours, minutes, seconds = map(float, match.group(1).split(":"))
        total_duration = hours * 3600 + minutes * 60 + seconds
        if total_duration > prev_end:
            segments.append((prev_end, total_duration))

    list_file = f"{temp_dir}/list.txt"
    with open(list_file, "w") as f:
        for i, (start, end) in enumerate(segments):
            segment_file = f"{temp_dir}/segment_{i:03d}.mp4"
            command = ["ffmpeg", "-i", input_file, "-ss", str(start), "-to", str(end), "-c", "copy", segment_file]
            subprocess.run(command, check=True)
            f.write(f"file '{segment_file}'\n")

    command = ["ffmpeg", "-f", "concat", "-safe", "0", "-i", list_file, "-c", "copy", output_file]
    subprocess.run(command, check=True)
    shutil.rmtree(temp_dir)

input_file = "your_video.mp4"
output_file = "output.mp4"
remove_silence(input_file, output_file)

Replace your_video.mp4 with the path to your video.

3. Silence Removal Using Voice Recognition AI

Harnessing the power of artificial intelligence, we can leverage voice recognition models to detect valid audio segments in a video. This method not only detects silent parts but can also filter out disfluencies, ensuring a seamless video output.

3.1 AI-based Silence Detection Code

import os from moviepy.editor import VideoFileClip import whisper import matplotlib.pyplot as plt import numpy as np import librosa print("Current Working Directory:", os.getcwd()) print("Files in Current Working Directory:", os.listdir()) video_path = os.path.join(os.getcwd(), "IND.mp4") video = VideoFileClip(video_path) video.audio.write_audiofile(os.path.join(os.getcwd(), "myaudio.wav")) audio = whisper.load_audio(os.path.join(os.getcwd(), "myaudio.wav")) model = whisper.load_model("large", device="cpu") result = whisper.transcribe(model, audio, language="ko", detect_disfluencies=True) valid_audio_segments = [] valid_video_segments = [] plt.figure(figsize=(15, 6)) y, sr = librosa.load(os.path.join(os.getcwd(), "myaudio.wav"), sr=None) librosa.display.waveshow(y, sr=sr, alpha=0.7, label='Original Waveform') times = np.linspace(0, len(y) / sr, num=len(y)) for segment in result['segments']: for word in segment['words']: if not '[*]' in word['text']: start_time = word['start'] end_time = word['end'] plt.fill_between(times, y, where=((times >= start_time) & (times <= end_time)), color='green', alpha=0.5) start_sample = int(word['start'] * sr) end_sample = int(word['end'] * sr) valid_audio_segments.append(y[start_sample:end_sample]) valid_video_segments.append(video.subclip(word['start'], word['end'])) plt.title('Valid Audio Segments vs. Original Waveform') plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.legend() plt.tight_layout() plt.show() new_audio = np.concatenate(valid_audio_segments) final_video = concatenate_videoclips(valid_video_segments, method="compose") output_path = os.path.join(os.getcwd(), "Processed_Test04.mp4") final_video.write_videofile(output_path, codec='libx264', audio_codec='aac')

Sorry, the following video is Korean.

4. Expanding Creative Horizons

These methods can be applied beyond eLearning scenarios. Interspersing rhythmic elements or musical phrases between content sections can further elevate viewer engagement.

Conclusion

With Olive 0.1, FFmpeg, and voice recognition AI at your disposal, you’re well-equipped to craft captivating videos free from lengthy silences. Whether you’re manually editing, scripting the process, or utilizing AI, the final product promises enhanced engagement and efficacy for your audience.

Leave a Reply

Your email address will not be published. Required fields are marked *