從音軌中刪除人聲的算法

我想從mp3音軌中刪除人聲。我搜索谷歌，並嘗試了幾個軟件，但沒有一個令人信服。我打算讀取mp3文件，獲取波形並刪除超過指定限制的波形。從音軌中刪除人聲的算法

你有什麼建議如何進行。

- 更新

我只是想代碼，可以讀取MP3文件格式。有沒有任何軟件？

2010-09-09 Boolean

這將是非常酷......什麼軟件你已經嘗試過？ – sholsapp 2010-09-09 00:58:54

大膽，wavosaur和額外的男孩親 – Boolean 2010-09-09 01:05:46

這不是一個「算法」作爲「技巧」，但它可以在代碼中自動化。它主要用於以人聲爲中心的立體聲音軌。如果人聲集中，他們在兩個軌道上表現平等。如果您反轉其中一首曲目，然後將它們合併到一起，則中心聲樂的波形將被抵消並被虛擬刪除。您可以使用大多數優秀的音頻編輯器（如大膽）手動執行此操作。它不會給你完美的結果，其餘的音頻也會受到一些影響，但它使得卡拉OK曲目很好:)

來源

2010-09-09 01:05:23

它被稱爲相位抵消，主要缺點是生成的軌道是單聲道的。 – arul 2010-09-09 01:11:17

>「音頻的其餘部分也受到一些損傷」 - 這種幸運的情況很少見。最可能的情況是，剩下的聲音很小，而且聽起來很錯誤。然而，如果一個人的立體聲源不止（5.1等），通常可以做更好的事情。但它也不是那麼簡單 – 2017-03-13 15:05:57

高於指定限制？聽起來像是一個高通濾波器......如果你有acapella軌道以及原始軌道，你可以使用相位取消。除此之外，除非60年代的老歌直接在中間，其他所有事情都很難平息，否則我不認爲有一種非常乾淨的去除人聲的方式。

來源

2010-09-09 00:58:47

有沒有什麼方法可以分辨輸入聲音的不同聲音？例如，我的意思是算法給我們例如100個不同的發現聲音，並留下找到特定的聲音給我們去除。 – ConductedClever 2014-11-10 07:14:46

@ConductedClever：https：//en.wikipedia.org/wiki/Independent_component_analysis – user 2016-03-06 07:56:18

或者更一般地說，https://en.wikipedia.org/wiki/Blind_signal_separation – user 2016-03-06 08:21:07

來源：http://www.cdf.utoronto.ca/~csc209h/summer/a2/a2.html，由Daniel Zingaro所着。

Sounds are waves of air pressure. When a sound is generated, a sound wave consisting of compressions (increases in pressure) and rarefactions (decreases in pressure) moves through the air. This is similar to what happens if you throw a stone into a pond: the water rises and falls in a repeating wave.

When a microphone records sound, it takes a measure of the air pressure and returns it as a value. These values are called samples and can be positive or negative corresponding to increases or decreases in air pressure. Each time the air pressure is recorded, we are sampling the sound. Each sample records the sound at an instant in time; the faster we sample, the more accurate is our representation of the sound. The sampling rate refers to how many times per second we sample the sound. For example, CD-quality sound uses a sampling rate of 44100 samples per second; sampling someone's voice for use in a VOIP conversation uses far less than this. Sampling rates of 11025 (voice quality), 22050, and 44100 (CD quality) are common...

For mono sounds (those with one sound channel), a sample is simply a positive or negative integer that represents the amount of compression in the air at the point the sample was taken. For stereo sounds (which we use in this assignment), a sample is actually made up of two integer values: one for the left speaker and one for the right...

Here's how the algorithm [to remove vocals] works.

Copy the first 44 bytes verbatim from the input file to the output file. Those 44 bytes contain important header information that should not be modified.

Next, treat the rest of the input file as a sequence of shorts. Take each pair of shorts left and right, and compute combined = (left - right) /2. Write two copies of combined to the output file.

Why Does This Work?

For the curious, a brief explanation of the vocal-removal algorithm is in order. As you noticed from the algorithm, we are simply subtracting one channel from the other (and then dividing by 2 to keep the volume from getting too loud). So why does subtracting the left channel from the right channel magically remove vocals?

When music is recorded, it is sometimes the case that vocals are recorded by a single microphone, and that single vocal track is used for the vocals in both channels. The other instruments in the song are recorded by multiple microphones, so that they sound different in both channels. Subtracting one channel from the other takes away everything that is ``in common'' between those two channels which, if we're lucky, means removing the vocals.

Of course, things rarely work so well. Try your vocal remover on this badly-behaved wav file . Sure, the vocals are gone, but so is the body of the music! Apparently, some of the instruments were also recorded "centred", so that they are removed along with the vocals when channels are subtracted.

來源

2011-06-19 07:25:39 Daniel

你試過這個嗎？ – ConductedClever 2014-09-14 07:05:57

那，我只是審覈了班級，所以我不必這樣做。看起來這個鏈接不起作用了... – Daniel 2014-10-27 03:56:50

WAV文件是帶有一個或多個WAVE部分的RIFF文件。以這種方式修改文件可能會破壞具有多個WAVE部分的文件，並且還會摧毀其他部分，例如INFO和ID3標籤。 – meklarian 2016-09-16 16:33:38

可以使用pydub工具箱，見here詳細內容見here相關問題。它是依賴於FFmpeg，可以讀取任何FILEFORMAT

然後你就可以做到以下幾點：

from pydub import AudioSegment 
from pydub.playback import play 

# read in audio file and get the two mono tracks 
sound_stereo = AudioSegment.from_file(myAudioFile, format="mp3") 
sound_monoL = sound_stereo.split_to_mono()[0] 
sound_monoR = sound_stereo.split_to_mono()[1] 

# Invert phase of the Right audio file 
sound_monoR_inv = sound_monoR.invert_phase() 

# Merge two L and R_inv files, this cancels out the centers 
sound_CentersOut = sound_monoL.overlay(sound_monoR_inv) 

# Export merged audio file 
fh = sound_CentersOut.export(myAudioFile_CentersOut, format="mp3")

來源

2017-03-10 22:23:04 ingnie

如何從原始中刪除生成的centerOut。 – 2017-05-31 15:59:35

從音軌中刪除人聲的算法

回答

相關問題