In the realm of eLearning and language learning, pacing is paramount. Keeping a video’s rhythm engaging without prolonged silent intervals can drastically improve information retention. Previously, I introduced two methods to snip silent parts in your videos: a manual approach using Olive 0.1 and an automated method with FFmpeg. I will bring a third innovative method using voice recognition AI. Let’s dive in!
1. Manual Silence Removal with Olive 0.1
1.1 Introduction to Olive 0.1
Olive 0.1 is a non-linear video editing software for beginners and professionals. Its intuitive interface simplifies the task of visually identifying and trimming silent segments.
1.2 Step-by-Step Guide
- Importing Video:
- Open Olive and navigate to
File > Import
or pressCtrl+I
. - Select the video file and click.
Open
.
- Open Olive and navigate to
- Adding Video to the Timeline:
- Drag the imported video from the project bin to the timeline.
- Viewing Audio Waveform:
- Expand the audio track on the timeline to visualize the waveform. Flat lines represent silence.
- Identifying and Cutting Silent Parts:
- Scrub through the timeline and locate silent sections.
- Use the
Razor Tool
(shortcutC
) to cut at the beginning and end of the silent section. - Switch to the
Selection Tool
(shortcutV
), select the silent section, and pressDelete
.
- Closing Gaps:
- Right-click on the gap and select
Ripple Delete
.
- Right-click on the gap and select
- Exporting Edited Video:
- Go to
File > Export
, choose format, quality, and location, then clickExport
.
- Go to
2. Automated Silence Removal with FFmpeg
2.1 Introduction to FFmpeg
FFmpeg is a powerful command-line toolkit designed for multimedia processing. Its versatility shines when automating tasks like detecting and trimming silent intervals in videos.
2.2 Python Script for Silence Removal
Below is a Python script utilizing FFmpeg to detect and remove silence:
import os
import re
import shutil
import subprocess
import tempfile
def remove_silence(input_file, output_file, threshold=-30, duration=1, padding=1):
temp_dir = tempfile.mkdtemp(prefix="temp_segments_")
command = ["ffmpeg", "-i", input_file, "-af", f"silencedetect=n={threshold}dB:d={duration}", "-f", "null", "-"]
output = subprocess.check_output(command, stderr=subprocess.STDOUT, text=True)
silence_starts = [float(match.group(1)) for match in re.finditer(r"silence_start: (\d+(\.\d+)?)", output)]
silence_ends = [float(match.group(1)) for match in re.finditer(r"silence_end: (\d+(\.\d+)?)", output)]
segments = []
prev_end = 0
for start, end in zip(silence_starts, silence_ends):
if start - prev_end > padding:
segments.append((prev_end, start))
prev_end = end + padding
match = re.search(r"Duration: (\d+:\d+:\d+\.\d+)", output)
if match:
hours, minutes, seconds = map(float, match.group(1).split(":"))
total_duration = hours * 3600 + minutes * 60 + seconds
if total_duration > prev_end:
segments.append((prev_end, total_duration))
list_file = f"{temp_dir}/list.txt"
with open(list_file, "w") as f:
for i, (start, end) in enumerate(segments):
segment_file = f"{temp_dir}/segment_{i:03d}.mp4"
command = ["ffmpeg", "-i", input_file, "-ss", str(start), "-to", str(end), "-c", "copy", segment_file]
subprocess.run(command, check=True)
f.write(f"file '{segment_file}'\n")
command = ["ffmpeg", "-f", "concat", "-safe", "0", "-i", list_file, "-c", "copy", output_file]
subprocess.run(command, check=True)
shutil.rmtree(temp_dir)
input_file = "your_video.mp4"
output_file = "output.mp4"
remove_silence(input_file, output_file)
Replace your_video.mp4 with the path to your video.
3. Silence Removal Using Voice Recognition AI
Harnessing the power of artificial intelligence, we can leverage voice recognition models to detect valid audio segments in a video. This method not only detects silent parts but can also filter out disfluencies, ensuring a seamless video output.
3.1 AI-based Silence Detection Code
import os
from moviepy.editor import VideoFileClip
import whisper
import matplotlib.pyplot as plt
import numpy as np
import librosa
print("Current Working Directory:", os.getcwd())
print("Files in Current Working Directory:", os.listdir())
video_path = os.path.join(os.getcwd(), "IND.mp4")
video = VideoFileClip(video_path)
video.audio.write_audiofile(os.path.join(os.getcwd(), "myaudio.wav"))
audio = whisper.load_audio(os.path.join(os.getcwd(), "myaudio.wav"))
model = whisper.load_model("large", device="cpu")
result = whisper.transcribe(model, audio, language="ko", detect_disfluencies=True)
valid_audio_segments = []
valid_video_segments = []
plt.figure(figsize=(15, 6))
y, sr = librosa.load(os.path.join(os.getcwd(), "myaudio.wav"), sr=None)
librosa.display.waveshow(y, sr=sr, alpha=0.7, label='Original Waveform')
times = np.linspace(0, len(y) / sr, num=len(y))
for segment in result['segments']:
for word in segment['words']:
if not '[*]' in word['text']:
start_time = word['start']
end_time = word['end']
plt.fill_between(times, y, where=((times >= start_time) & (times <= end_time)), color='green', alpha=0.5)
start_sample = int(word['start'] * sr)
end_sample = int(word['end'] * sr)
valid_audio_segments.append(y[start_sample:end_sample])
valid_video_segments.append(video.subclip(word['start'], word['end']))
plt.title('Valid Audio Segments vs. Original Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.legend()
plt.tight_layout()
plt.show()
new_audio = np.concatenate(valid_audio_segments)
final_video = concatenate_videoclips(valid_video_segments, method="compose")
output_path = os.path.join(os.getcwd(), "Processed_Test04.mp4")
final_video.write_videofile(output_path, codec='libx264', audio_codec='aac')
4. Expanding Creative Horizons
These methods can be applied beyond eLearning scenarios. Interspersing rhythmic elements or musical phrases between content sections can further elevate viewer engagement.
Conclusion
With Olive 0.1, FFmpeg, and voice recognition AI at your disposal, you’re well-equipped to craft captivating videos free from lengthy silences. Whether you’re manually editing, scripting the process, or utilizing AI, the final product promises enhanced engagement and efficacy for your audience.