-<p>:class:<code>~torch.utils.data.DataLoader</code> by default constructs a index
-sampler that yields integral indices.
-To make it work with a map-style
-dataset with non-integral indices/keys, a custom sampler must be provided.</p>
-<p>A dataset that assumes audio sources to be stored
-in track folder where each track has a fixed number of sources.
-For each track the users specifies the target file-name (<code>target_file</code>)
-and a list of interferences files (<code>interferer_files</code>).
-A linear mix is performed on the fly by summing the target and
-the inferers up.</p>
-<p>Due to the fact that all tracks comprise the exact same set
-of sources, the random track mixing augmentation technique
-can be used, where sources from different tracks are mixed
-together. Setting <code>random_track_mix=True</code> results in an
-unaligned dataset.
-When random track mixing is enabled, we define an epoch as
-when the the target source from all tracks has been seen and only once
-with whatever interfering sources has randomly been drawn.</p>
-<p>This dataset is recommended to be used for small/medium size
-for example like the MUSDB18 or other custom source separation
-<h1 id="example">Example</h1>
-<p>train/1/vocals.wav ---------------
-train/1/drums.wav (interferer1) &mdash;+&ndash;&gt; input
-train/1/bass.wav -(interferer2) &ndash;/</p>
-<p>train/1/vocals.wav -------------------&gt; output</p></div>
-<pre><code class="python">class FixedSourcesTrackFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = &#34;train&#34;,
-        target_file: str = &#34;vocals.wav&#34;,
-        interferer_files: List[str] = [&#34;bass.wav&#34;, &#34;drums.wav&#34;],
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = False,
-        random_track_mix: bool = False,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        sample_rate: float = 44100.0,
-        seed: int = 42,
-    ) -&gt; None:
-        &#34;&#34;&#34;A dataset that assumes audio sources to be stored
-        in track folder where each track has a fixed number of sources.
-        For each track the users specifies the target file-name (`target_file`)
-        and a list of interferences files (`interferer_files`).
-        A linear mix is performed on the fly by summing the target and
-        the inferers up.
-        Due to the fact that all tracks comprise the exact same set
-        of sources, the random track mixing augmentation technique
-        can be used, where sources from different tracks are mixed
-        together. Setting `random_track_mix=True` results in an
-        unaligned dataset.
-        When random track mixing is enabled, we define an epoch as
-        when the the target source from all tracks has been seen and only once
-        with whatever interfering sources has randomly been drawn.
-        This dataset is recommended to be used for small/medium size
-        for example like the MUSDB18 or other custom source separation
-        datasets.
-        Example
-        =======
-        train/1/vocals.wav ---------------\
-        train/1/drums.wav (interferer1) ---+--&gt; input
-        train/1/bass.wav -(interferer2) --/
-        train/1/vocals.wav -------------------&gt; output
-        &#34;&#34;&#34;
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.random_track_mix = random_track_mix
-        self.random_chunks = random_chunks
-        self.source_augmentations = source_augmentations
-        # set the input and output files (accept glob)
-        self.target_file = target_file
-        self.interferer_files = interferer_files
-        self.source_files = self.interferer_files + [self.target_file]
-        self.seed = seed
-        random.seed(self.seed)
-        self.tracks = list(self.get_tracks())
-        if not len(self.tracks):
-            raise RuntimeError(&#34;No tracks found&#34;)
-    def __getitem__(self, index):
-        # first, get target track
-        track_path = self.tracks[index][&#34;path&#34;]
-        min_duration = self.tracks[index][&#34;min_duration&#34;]
-        if self.random_chunks:
-            # determine start seek by target duration
-            start = random.uniform(0, min_duration - self.seq_duration)
-        else:
-            start = 0
-        # assemble the mixture of target and interferers
-        audio_sources = []
-        # load target
-        target_audio, _ = load_audio(
-            track_path / self.target_file, start=start, dur=self.seq_duration
-        )
-        target_audio = self.source_augmentations(target_audio)
-        audio_sources.append(target_audio)
-        # load interferers
-        for source in self.interferer_files:
-            # optionally select a random track for each source
-            if self.random_track_mix:
-                random_idx = random.choice(range(len(self.tracks)))
-                track_path = self.tracks[random_idx][&#34;path&#34;]
-                if self.random_chunks:
-                    min_duration = self.tracks[random_idx][&#34;min_duration&#34;]
-                    start = random.uniform(0, min_duration - self.seq_duration)
-            audio, _ = load_audio(track_path / source, start=start, dur=self.seq_duration)
-            audio = self.source_augmentations(audio)
-            audio_sources.append(audio)
-        stems = torch.stack(audio_sources)
-        # # apply linear mix over source index=0
-        x = stems.sum(0)
-        # target is always the first element in the list
-        y = stems[0]
-        return x, y
-    def __len__(self):
-        return len(self.tracks)
-    def get_tracks(self):
-        &#34;&#34;&#34;Loads input and output tracks&#34;&#34;&#34;
-        p = Path(self.root, self.split)
-        for track_path in tqdm.tqdm(p.iterdir()):
-            if track_path.is_dir():
-                source_paths = [track_path / s for s in self.source_files]
-                if not all(sp.exists() for sp in source_paths):
-                    print(&#34;Exclude track &#34;, track_path)
-                    continue
-                if self.seq_duration is not None:
-                    infos = list(map(load_info, source_paths))
-                    # get minimum duration of track
-                    min_duration = min(i[&#34;duration&#34;] for i in infos)
-                    if min_duration &gt; self.seq_duration:
-                        yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: min_duration})
-                else:
-                    yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: None})</code></pre>
-<dt id="openunmix.data.FixedSourcesTrackFolderDataset.get_tracks"><code class="name flex">
-<span>def <span class="ident">get_tracks</span></span>(<span>self)</span>
-<div class="desc"><p>Loads input and output tracks</p></div>
-<pre><code class="python">def get_tracks(self):
-    &#34;&#34;&#34;Loads input and output tracks&#34;&#34;&#34;
-    p = Path(self.root, self.split)
-    for track_path in tqdm.tqdm(p.iterdir()):
-        if track_path.is_dir():
-            source_paths = [track_path / s for s in self.source_files]
-            if not all(sp.exists() for sp in source_paths):
-                print(&#34;Exclude track &#34;, track_path)
-                continue
-            if self.seq_duration is not None:
-                infos = list(map(load_info, source_paths))
-                # get minimum duration of track
-                min_duration = min(i[&#34;duration&#34;] for i in infos)
-                if min_duration &gt; self.seq_duration:
-                    yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: min_duration})
-            else:
-                yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: None})</code></pre>
-<dt id="openunmix.data.MUSDBDataset"><code class="flex name class">
-<span>class <span class="ident">MUSDBDataset</span></span>
-<span>(</span><span>target: str = 'vocals', root: str = None, download: bool = False, is_wav: bool = False, subsets: str = 'train', split: str = 'train', seq_duration: Union[float, NoneType] = 6.0, samples_per_track: int = 64, source_augmentations: Union[Callable, NoneType] = &lt;function MUSDBDataset.&lt;lambda&gt;&gt;, random_track_mix: bool = False, seed: int = 42, *args, **kwargs)</span>
-<div class="desc"><p>An abstract class representing a :class:<code>Dataset</code>.</p>
-<p>All datasets that represent a map from keys to data samples should subclass
-it. All subclasses should overwrite :meth:<code>__getitem__</code>, supporting fetching a
-data sample for a given key. Subclasses could also optionally overwrite
-:meth:<code>__len__</code>, which is expected to return the size of the dataset by many
-:class:<code>~torch.utils.data.Sampler</code> implementations and the default options
-of :class:<code>~torch.utils.data.DataLoader</code>.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>:class:<code>~torch.utils.data.DataLoader</code> by default constructs a index
-sampler that yields integral indices.
-To make it work with a map-style
-dataset with non-integral indices/keys, a custom sampler must be provided.</p>
-<p>MUSDB18 torch.data.Dataset that samples from the MUSDB tracks
-using track and excerpts with replacement.</p>
-<h2 id="parameters">Parameters</h2>
-<dt><strong><code>target</code></strong> :&ensp;<code>str</code></dt>
-<dd>target name of the source to be separated, defaults to <code>vocals</code>.</dd>
-<dt><strong><code>root</code></strong> :&ensp;<code>str</code></dt>
-<dd>root path of MUSDB</dd>
-<dt><strong><code>download</code></strong> :&ensp;<code>boolean</code></dt>
-<dd>automatically download 7s preview version of MUSDB</dd>
-<dt><strong><code>is_wav</code></strong> :&ensp;<code>boolean</code></dt>
-<dd>specify if the WAV version (instead of the MP4 STEMS) are used</dd>
-<dt><strong><code>subsets</code></strong> :&ensp;<code>list-like [str]</code></dt>
-<dd>subset str or list of subset. Defaults to <code>train</code>.</dd>
-<dt><strong><code>split</code></strong> :&ensp;<code>str</code></dt>
-<dd>use (stratified) track splits for validation split (<code>valid</code>),
-defaults to <code>train</code>.</dd>
-<dt><strong><code>seq_duration</code></strong> :&ensp;<code>float</code></dt>
-<dd>training is performed in chunks of <code>seq_duration</code> (in seconds,
-defaults to <code>None</code> which loads the full audio track</dd>
-<dt><strong><code>samples_per_track</code></strong> :&ensp;<code>int</code></dt>
-<dd>sets the number of samples, yielded from each track per epoch.
-Defaults to 64</dd>
-<dt><strong><code>source_augmentations</code></strong> :&ensp;<code>list[callables]</code></dt>
-<dd>provide list of augmentation function that take a multi-channel
-audio file of shape (src, samples) as input and output. Defaults to
-no-augmentations (input = output)</dd>
-<dt><strong><code>random_track_mix</code></strong> :&ensp;<code>boolean</code></dt>
-<dd>randomly mixes sources from different tracks to assemble a
-custom mix. This augmenation is only applied for the train subset.</dd>
-<dt><strong><code>seed</code></strong> :&ensp;<code>int</code></dt>
-<dd>control randomness of dataset iterations</dd>
-<dt><strong><code>args</code></strong>, <strong><code>kwargs</code></strong> :&ensp;<code>additional keyword arguments</code></dt>
-<dd>used to add further control for the musdb dataset
-initialization function.</dd>
-<pre><code class="python">class MUSDBDataset(UnmixDataset):
-    def __init__(
-        self,
-        target: str = &#34;vocals&#34;,
-        root: str = None,
-        download: bool = False,
-        is_wav: bool = False,
-        subsets: str = &#34;train&#34;,
-        split: str = &#34;train&#34;,
-        seq_duration: Optional[float] = 6.0,
-        samples_per_track: int = 64,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        random_track_mix: bool = False,
-        seed: int = 42,
-        *args,
-        **kwargs,
-    ) -&gt; None:
-        &#34;&#34;&#34;MUSDB18 torch.data.Dataset that samples from the MUSDB tracks
-        using track and excerpts with replacement.
-        Parameters
-        ----------
-        target : str
-            target name of the source to be separated, defaults to ``vocals``.
-        root : str
-            root path of MUSDB
-        download : boolean
-            automatically download 7s preview version of MUSDB
-        is_wav : boolean
-            specify if the WAV version (instead of the MP4 STEMS) are used
-        subsets : list-like [str]
-            subset str or list of subset. Defaults to ``train``.
-        split : str
-            use (stratified) track splits for validation split (``valid``),
-            defaults to ``train``.
-        seq_duration : float
-            training is performed in chunks of ``seq_duration`` (in seconds,
-            defaults to ``None`` which loads the full audio track
-        samples_per_track : int
-            sets the number of samples, yielded from each track per epoch.
-            Defaults to 64
-        source_augmentations : list[callables]
-            provide list of augmentation function that take a multi-channel
-            audio file of shape (src, samples) as input and output. Defaults to
-            no-augmentations (input = output)
-        random_track_mix : boolean
-            randomly mixes sources from different tracks to assemble a
-            custom mix. This augmenation is only applied for the train subset.
-        seed : int
-            control randomness of dataset iterations
-        args, kwargs : additional keyword arguments
-            used to add further control for the musdb dataset
-            initialization function.
-        &#34;&#34;&#34;
-        import musdb
-        self.seed = seed
-        random.seed(seed)
-        self.is_wav = is_wav
-        self.seq_duration = seq_duration
-        self.target = target
-        self.subsets = subsets
-        self.split = split
-        self.samples_per_track = samples_per_track
-        self.source_augmentations = source_augmentations
-        self.random_track_mix = random_track_mix
-        self.mus = musdb.DB(
-            root=root,
-            is_wav=is_wav,
-            split=split,
-            subsets=subsets,
-            download=download,
-            *args,
-            **kwargs,
-        )
-        self.sample_rate = 44100.0  # musdb is fixed sample rate
-    def __getitem__(self, index):
-        audio_sources = []
-        target_ind = None
-        # select track
-        track = self.mus.tracks[index // self.samples_per_track]
-        # at training time we assemble a custom mix
-        if self.split == &#34;train&#34; and self.seq_duration:
-            for k, source in enumerate(self.mus.setup[&#34;sources&#34;]):
-                # memorize index of target source
-                if source == self.target:
-                    target_ind = k
-                # select a random track
-                if self.random_track_mix:
-                    track = random.choice(self.mus.tracks)
-                # set the excerpt duration
-                track.chunk_duration = self.seq_duration
-                # set random start position
-                track.chunk_start = random.uniform(0, track.duration - self.seq_duration)
-                # load source audio and apply time domain source_augmentations
-                audio = torch.as_tensor(track.sources[source].audio.T, dtype=torch.float32)
-                audio = self.source_augmentations(audio)
-                audio_sources.append(audio)
-            # create stem tensor of shape (source, channel, samples)
-            stems = torch.stack(audio_sources, dim=0)
-            # # apply linear mix over source index=0
-            x = stems.sum(0)
-            # get the target stem
-            if target_ind is not None:
-                y = stems[target_ind]
-            # assuming vocal/accompaniment scenario if target!=source
-            else:
-                vocind = list(self.mus.setup[&#34;sources&#34;].keys()).index(&#34;vocals&#34;)
-                # apply time domain subtraction
-                y = x - stems[vocind]
-        # for validation and test, we deterministically yield the full
-        # pre-mixed musdb track
-        else:
-            # get the non-linear source mix straight from musdb
-            x = torch.as_tensor(track.audio.T, dtype=torch.float32)
-            y = torch.as_tensor(track.targets[self.target].audio.T, dtype=torch.float32)
-        return x, y
-    def __len__(self):
-        return len(self.mus.tracks) * self.samples_per_track</code></pre>
-<dt id="openunmix.data.SourceFolderDataset"><code class="flex name class">
-<span>class <span class="ident">SourceFolderDataset</span></span>
-<span>(</span><span>root: str, split: str = 'train', target_dir: str = 'vocals', interferer_dirs: List[str] = ['bass', 'drums'], ext: str = '.wav', nb_samples: int = 1000, seq_duration: Union[float, NoneType] = None, random_chunks: bool = True, sample_rate: float = 44100.0, source_augmentations: Union[Callable, NoneType] = &lt;function SourceFolderDataset.&lt;lambda&gt;&gt;, seed: int = 42)</span>
-<div class="desc"><p>An abstract class representing a :class:<code>Dataset</code>.</p>
-<p>All datasets that represent a map from keys to data samples should subclass
-it. All subclasses should overwrite :meth:<code>__getitem__</code>, supporting fetching a
-data sample for a given key. Subclasses could also optionally overwrite
-:meth:<code>__len__</code>, which is expected to return the size of the dataset by many
-:class:<code>~torch.utils.data.Sampler</code> implementations and the default options
-of :class:<code>~torch.utils.data.DataLoader</code>.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>:class:<code>~torch.utils.data.DataLoader</code> by default constructs a index
-sampler that yields integral indices.
-To make it work with a map-style
-dataset with non-integral indices/keys, a custom sampler must be provided.</p>
-<p>A dataset that assumes folders of sources,
-instead of track folders. This is a common
-format for speech and environmental sound datasets
-such das DCASE. For each source a variable number of
-tracks/sounds is available, therefore the dataset
-is unaligned by design.
-By default, for each sample, sources from random track are drawn
-to assemble the mixture.</p>
-<h1 id="example">Example</h1>
-<p>train/vocals/track11.wav -----------------
-(interferer1) &mdash;+&ndash;&gt; input
-(interferer2) &ndash;/</p>
-<p>train/vocals/track11.wav ---------------------&gt; output</p></div>
-<pre><code class="python">class SourceFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = &#34;train&#34;,
-        target_dir: str = &#34;vocals&#34;,
-        interferer_dirs: List[str] = [&#34;bass&#34;, &#34;drums&#34;],
-        ext: str = &#34;.wav&#34;,
-        nb_samples: int = 1000,
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = True,
-        sample_rate: float = 44100.0,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        seed: int = 42,
-    ) -&gt; None:
-        &#34;&#34;&#34;A dataset that assumes folders of sources,
-        instead of track folders. This is a common
-        format for speech and environmental sound datasets
-        such das DCASE. For each source a variable number of
-        tracks/sounds is available, therefore the dataset
-        is unaligned by design.
-        By default, for each sample, sources from random track are drawn
-        to assemble the mixture.
-        Example
-        =======
-        train/vocals/track11.wav -----------------\
-        train/drums/track202.wav  (interferer1) ---+--&gt; input
-        train/bass/track007a.wav  (interferer2) --/
-        train/vocals/track11.wav ---------------------&gt; output
-        &#34;&#34;&#34;
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.ext = ext
-        self.random_chunks = random_chunks
-        self.source_augmentations = source_augmentations
-        self.target_dir = target_dir
-        self.interferer_dirs = interferer_dirs
-        self.source_folders = self.interferer_dirs + [self.target_dir]
-        self.source_tracks = self.get_tracks()
-        self.nb_samples = nb_samples
-        self.seed = seed
-        random.seed(self.seed)
-    def __getitem__(self, index):
-        # For each source draw a random sound and mix them together
-        audio_sources = []
-        for source in self.source_folders:
-            if self.split == &#34;valid&#34;:
-                # provide deterministic behaviour for validation so that
-                # each epoch, the same tracks are yielded
-                random.seed(index)
-            # select a random track for each source
-            source_path = random.choice(self.source_tracks[source])
-            duration = load_info(source_path)[&#34;duration&#34;]
-            if self.random_chunks:
-                # for each source, select a random chunk
-                start = random.uniform(0, duration - self.seq_duration)
-            else:
-                # use center segment
-                start = max(duration // 2 - self.seq_duration // 2, 0)
-            audio, _ = load_audio(source_path, start=start, dur=self.seq_duration)
-            audio = self.source_augmentations(audio)
-            audio_sources.append(audio)
-        stems = torch.stack(audio_sources)
-        # # apply linear mix over source index=0
-        x = stems.sum(0)
-        # target is always the last element in the list
-        y = stems[-1]
-        return x, y
-    def __len__(self):
-        return self.nb_samples
-    def get_tracks(self):
-        &#34;&#34;&#34;Loads input and output tracks&#34;&#34;&#34;
-        p = Path(self.root, self.split)
-        source_tracks = {}
-        for source_folder in tqdm.tqdm(self.source_folders):
-            tracks = []
-            source_path = p / source_folder
-            for source_track_path in sorted(source_path.glob(&#34;*&#34; + self.ext)):
-                if self.seq_duration is not None:
-                    info = load_info(source_track_path)
-                    # get minimum duration of track
-                    if info[&#34;duration&#34;] &gt; self.seq_duration:
-                        tracks.append(source_track_path)
-                else:
-                    tracks.append(source_track_path)
-            source_tracks[source_folder] = tracks
-        return source_tracks</code></pre>
-<dt id="openunmix.data.SourceFolderDataset.get_tracks"><code class="name flex">
-<span>def <span class="ident">get_tracks</span></span>(<span>self)</span>
-<div class="desc"><p>Loads input and output tracks</p></div>
-<pre><code class="python">def get_tracks(self):
-    &#34;&#34;&#34;Loads input and output tracks&#34;&#34;&#34;
-    p = Path(self.root, self.split)
-    source_tracks = {}
-    for source_folder in tqdm.tqdm(self.source_folders):
-        tracks = []
-        source_path = p / source_folder
-        for source_track_path in sorted(source_path.glob(&#34;*&#34; + self.ext)):
-            if self.seq_duration is not None:
-                info = load_info(source_track_path)
-                # get minimum duration of track
-                if info[&#34;duration&#34;] &gt; self.seq_duration:
-                    tracks.append(source_track_path)
-            else:
-                tracks.append(source_track_path)
-        source_tracks[source_folder] = tracks
-    return source_tracks</code></pre>
-<dt id="openunmix.data.UnmixDataset"><code class="flex name class">
-<span>class <span class="ident">UnmixDataset</span></span>
-<span>(</span><span>root: Union[pathlib.Path, str], sample_rate: float, seq_duration: Union[float, NoneType] = None, source_augmentations: Union[Callable, NoneType] = None)</span>
-<div class="desc"><p>An abstract class representing a :class:<code>Dataset</code>.</p>
-<p>All datasets that represent a map from keys to data samples should subclass
-it. All subclasses should overwrite :meth:<code>__getitem__</code>, supporting fetching a
-data sample for a given key. Subclasses could also optionally overwrite
-:meth:<code>__len__</code>, which is expected to return the size of the dataset by many
-:class:<code>~torch.utils.data.Sampler</code> implementations and the default options
-of :class:<code>~torch.utils.data.DataLoader</code>.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>:class:<code>~torch.utils.data.DataLoader</code> by default constructs a index
-sampler that yields integral indices.
-To make it work with a map-style
-dataset with non-integral indices/keys, a custom sampler must be provided.</p>
-<pre><code class="python">class UnmixDataset(torch.utils.data.Dataset):
-    _repr_indent = 4
-    def __init__(
-        self,
-        root: Union[Path, str],
-        sample_rate: float,
-        seq_duration: Optional[float] = None,
-        source_augmentations: Optional[Callable] = None,
-    ) -&gt; None:
-        self.root = Path(args.root).expanduser()
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.source_augmentations = source_augmentations
-    def __getitem__(self, index: int) -&gt; Any:
-        raise NotImplementedError
-    def __len__(self) -&gt; int:
-        raise NotImplementedError
-    def __repr__(self) -&gt; str:
-        head = &#34;Dataset &#34; + self.__class__.__name__
-        body = [&#34;Number of datapoints: {}&#34;.format(self.__len__())]
-        body += self.extra_repr().splitlines()
-        lines = [head] + [&#34; &#34; * self._repr_indent + line for line in body]
-        return &#34;\n&#34;.join(lines)
-    def extra_repr(self) -&gt; str:
-        return &#34;&#34;</code></pre>
-<dt id="openunmix.data.UnmixDataset.extra_repr"><code class="name flex">
-<span>def <span class="ident">extra_repr</span></span>(<span>self) ‑> str</span>
-<div class="desc"></div>
-<pre><code class="python">def extra_repr(self) -&gt; str:
-    return &#34;&#34;</code></pre>
-<dt id="openunmix.data.VariableSourcesTrackFolderDataset"><code class="flex name class">
-<span>class <span class="ident">VariableSourcesTrackFolderDataset</span></span>
-<span>(</span><span>root: str, split: str = 'train', target_file: str = 'vocals.wav', ext: str = '.wav', seq_duration: Union[float, NoneType] = None, random_chunks: bool = False, random_interferer_mix: bool = False, sample_rate: float = 44100.0, source_augmentations: Union[Callable, NoneType] = &lt;function VariableSourcesTrackFolderDataset.&lt;lambda&gt;&gt;, silence_missing_targets: bool = False)</span>
-<div class="desc"><p>An abstract class representing a :class:<code>Dataset</code>.</p>
-<p>All datasets that represent a map from keys to data samples should subclass
-it. All subclasses should overwrite :meth:<code>__getitem__</code>, supporting fetching a
-data sample for a given key. Subclasses could also optionally overwrite
-:meth:<code>__len__</code>, which is expected to return the size of the dataset by many
-:class:<code>~torch.utils.data.Sampler</code> implementations and the default options
-of :class:<code>~torch.utils.data.DataLoader</code>.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>:class:<code>~torch.utils.data.DataLoader</code> by default constructs a index
-sampler that yields integral indices.
-To make it work with a map-style
-dataset with non-integral indices/keys, a custom sampler must be provided.</p>
-<p>A dataset that assumes audio sources to be stored
-in track folder where each track has a <em>variable</em> number of sources.
-The users specifies the target file-name (<code>target_file</code>)
-and the extension of sources to used for mixing.
-A linear mix is performed on the fly by summing all sources in a
-track folder.</p>
-<p>Since the number of sources differ per track,
-while target is fixed, a random track mix
-augmentation cannot be used. Instead, a random track
-can be used to load the interfering sources.</p>
-<p>Also make sure, that you do not provide the mixture
-file among the sources!</p>
-<h1 id="example">Example</h1>
-<p>train/1/vocals.wav &ndash;&gt; input target
-train/1/drums.wav &ndash;&gt; input target
-train/1/bass.wav &ndash;&gt; input target
-&ndash;+&ndash;&gt; input
-train/1/accordion.wav &ndash;&gt; input target |
-train/1/marimba.wav &ndash;&gt; input target
-<p>train/1/vocals.wav -----------------------&gt; output</p></div>
-<pre><code class="python">class VariableSourcesTrackFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = &#34;train&#34;,
-        target_file: str = &#34;vocals.wav&#34;,
-        ext: str = &#34;.wav&#34;,
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = False,
-        random_interferer_mix: bool = False,
-        sample_rate: float = 44100.0,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        silence_missing_targets: bool = False,
-    ) -&gt; None:
-        &#34;&#34;&#34;A dataset that assumes audio sources to be stored
-        in track folder where each track has a _variable_ number of sources.
-        The users specifies the target file-name (`target_file`)
-        and the extension of sources to used for mixing.
-        A linear mix is performed on the fly by summing all sources in a
-        track folder.
-        Since the number of sources differ per track,
-        while target is fixed, a random track mix
-        augmentation cannot be used. Instead, a random track
-        can be used to load the interfering sources.
-        Also make sure, that you do not provide the mixture
-        file among the sources!
-        Example
-        =======
-        train/1/vocals.wav --&gt; input target   \
-        train/1/drums.wav --&gt; input target     |
-        train/1/bass.wav --&gt; input target    --+--&gt; input
-        train/1/accordion.wav --&gt; input target |
-        train/1/marimba.wav --&gt; input target  /
-        train/1/vocals.wav -----------------------&gt; output
-        &#34;&#34;&#34;
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.random_chunks = random_chunks
-        self.random_interferer_mix = random_interferer_mix
-        self.source_augmentations = source_augmentations
-        self.target_file = target_file
-        self.ext = ext
-        self.silence_missing_targets = silence_missing_targets
-        self.tracks = list(self.get_tracks())
-    def __getitem__(self, index):
-        # select the target based on the dataset   index
-        target_track_path = self.tracks[index][&#34;path&#34;]
-        if self.random_chunks:
-            target_min_duration = self.tracks[index][&#34;min_duration&#34;]
-            target_start = random.uniform(0, target_min_duration - self.seq_duration)
-        else:
-            target_start = 0
-        # optionally select a random interferer track
-        if self.random_interferer_mix:
-            random_idx = random.choice(range(len(self.tracks)))
-            intfr_track_path = self.tracks[random_idx][&#34;path&#34;]
-            if self.random_chunks:
-                intfr_min_duration = self.tracks[random_idx][&#34;min_duration&#34;]
-                intfr_start = random.uniform(0, intfr_min_duration - self.seq_duration)
-            else:
-                intfr_start = 0
-        else:
-            intfr_track_path = target_track_path
-            intfr_start = target_start
-        # get sources from interferer track
-        sources = sorted(list(intfr_track_path.glob(&#34;*&#34; + self.ext)))
-        # load sources
-        x = 0
-        for source_path in sources:
-            # skip target file and load it later
-            if source_path == intfr_track_path / self.target_file:
-                continue
-            try:
-                audio, _ = load_audio(source_path, start=intfr_start, dur=self.seq_duration)
-            except RuntimeError:
-                index = index - 1 if index &gt; 0 else index + 1
-                return self.__getitem__(index)
-            x += self.source_augmentations(audio)
-        # load the selected track target
-        if Path(target_track_path / self.target_file).exists():
-            y, _ = load_audio(
-                target_track_path / self.target_file,
-                start=target_start,
-                dur=self.seq_duration,
-            )
-            y = self.source_augmentations(y)
-            x += y
-        # Use silence if target does not exist
-        else:
-            y = torch.zeros(audio.shape)
-        return x, y
-    def __len__(self):
-        return len(self.tracks)
-    def get_tracks(self):
-        p = Path(self.root, self.split)
-        for track_path in tqdm.tqdm(p.iterdir()):
-            if track_path.is_dir():
-                # check if target exists
-                if Path(track_path, self.target_file).exists() or self.silence_missing_targets:
-                    sources = sorted(list(track_path.glob(&#34;*&#34; + self.ext)))
-                    if not sources:
-                        # in case of empty folder
-                        print(&#34;empty track: &#34;, track_path)
-                        continue
-                    if self.seq_duration is not None:
-                        # check sources
-                        infos = list(map(load_info, sources))
-                        # get minimum duration of source
-                        min_duration = min(i[&#34;duration&#34;] for i in infos)
-                        if min_duration &gt; self.seq_duration:
-                            yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: min_duration})
-                    else:
-                        yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: None})</code></pre>
-<dt id="openunmix.data.VariableSourcesTrackFolderDataset.get_tracks"><code class="name flex">
-<span>def <span class="ident">get_tracks</span></span>(<span>self)</span>
-<div class="desc"></div>
-<pre><code class="python">def get_tracks(self):
-    p = Path(self.root, self.split)
-    for track_path in tqdm.tqdm(p.iterdir()):
-        if track_path.is_dir():
-            # check if target exists
-            if Path(track_path, self.target_file).exists() or self.silence_missing_targets:
-                sources = sorted(list(track_path.glob(&#34;*&#34; + self.ext)))
-                if not sources:
-                    # in case of empty folder
-                    print(&#34;empty track: &#34;, track_path)
-                    continue
-                if self.seq_duration is not None:
-                    # check sources
-                    infos = list(map(load_info, sources))
-                    # get minimum duration of source
-                    min_duration = min(i[&#34;duration&#34;] for i in infos)
-                    if min_duration &gt; self.seq_duration:
-                        yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: min_duration})
-                else:
-                    yield ({&#34;path&#34;: track_path, &#34;min_duration&#34;: None})</code></pre>
\ No newline at end of file
-<article id="content">
-<h1 class="title">Module <code>openunmix.evaluate</code></h1>
-<section id="section-intro">
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/evaluate.py#L0-L196" class="git-link">Browse git</a>
-<pre><code class="python">import argparse
-import functools
-import json
-import multiprocessing
-from typing import Optional, Union
-import musdb
-import museval
-import torch
-import tqdm
-from openunmix import utils
-def separate_and_evaluate(
-    track: musdb.MultiTrack,
-    targets: list,
-    model_str_or_path: str,
-    niter: int,
-    output_dir: str,
-    eval_dir: str,
-    residual: bool,
-    mus,
-    aggregate_dict: dict = None,
-    device: Union[str, torch.device] = &#34;cpu&#34;,
-    wiener_win_len: Optional[int] = None,
-    filterbank=&#34;torch&#34;,
-) -&gt; str:
-    separator = utils.load_separator(
-        model_str_or_path=model_str_or_path,
-        targets=targets,
-        niter=niter,
-        residual=residual,
-        wiener_win_len=wiener_win_len,
-        device=device,
-        pretrained=True,
-        filterbank=filterbank,
-    )
-    separator.freeze()
-    separator.to(device)
-    audio = torch.as_tensor(track.audio, dtype=torch.float32, device=device)
-    audio = utils.preprocess(audio, track.rate, separator.sample_rate)
-    estimates = separator(audio)
-    estimates = separator.to_dict(estimates, aggregate_dict=aggregate_dict)
-    for key in estimates:
-        estimates[key] = estimates[key][0].cpu().detach().numpy().T
-    if output_dir:
-        mus.save_estimates(estimates, track, output_dir)
-    scores = museval.eval_mus_track(track, estimates, output_dir=eval_dir)
-    return scores
-if __name__ == &#34;__main__&#34;:
-    # Training settings
-    parser = argparse.ArgumentParser(description=&#34;MUSDB18 Evaluation&#34;, add_help=False)
-    parser.add_argument(
-        &#34;--targets&#34;,
-        nargs=&#34;+&#34;,
-        default=[&#34;vocals&#34;, &#34;drums&#34;, &#34;bass&#34;, &#34;other&#34;],
-        type=str,
-        help=&#34;provide targets to be processed. \
-              If none, all available targets will be computed&#34;,
-    )
-    parser.add_argument(
-        &#34;--model&#34;,
-        default=&#34;umxhq&#34;,
-        type=str,
-        help=&#34;path to mode base directory of pretrained models&#34;,
-    )
-    parser.add_argument(
-        &#34;--outdir&#34;,
-        type=str,
-        help=&#34;Results path where audio evaluation results are stored&#34;,
-    )
-    parser.add_argument(&#34;--evaldir&#34;, type=str, help=&#34;Results path for museval estimates&#34;)
-    parser.add_argument(&#34;--root&#34;, type=str, help=&#34;Path to MUSDB18&#34;)
-    parser.add_argument(&#34;--subset&#34;, type=str, default=&#34;test&#34;, help=&#34;MUSDB subset (`train`/`test`)&#34;)
-    parser.add_argument(&#34;--cores&#34;, type=int, default=1)
-    parser.add_argument(
-        &#34;--no-cuda&#34;, action=&#34;store_true&#34;, default=False, help=&#34;disables CUDA inference&#34;
-    )
-    parser.add_argument(
-        &#34;--is-wav&#34;,
-        action=&#34;store_true&#34;,
-        default=False,
-        help=&#34;flags wav version of the dataset&#34;,
-    )
-    parser.add_argument(
-        &#34;--niter&#34;,
-        type=int,
-        default=1,
-        help=&#34;number of iterations for refining results.&#34;,
-    )
-    parser.add_argument(
-        &#34;--wiener-win-len&#34;,
-        type=int,
-        default=300,
-        help=&#34;Number of frames on which to apply filtering independently&#34;,
-    )
-    parser.add_argument(
-        &#34;--residual&#34;,
-        type=str,
-        default=None,
-        help=&#34;if provided, build a source with given name&#34;
-        &#34;for the mix minus all estimated targets&#34;,
-    )
-    parser.add_argument(
-        &#34;--aggregate&#34;,
-        type=str,
-        default=None,
-        help=&#34;if provided, must be a string containing a valid expression for &#34;
-        &#34;a dictionary, with keys as output target names, and values &#34;
-        &#34;a list of targets that are used to build it. For instance: &#34;
-        &#39;\&#39;{&#34;vocals&#34;:[&#34;vocals&#34;], &#34;accompaniment&#34;:[&#34;drums&#34;,&#39;
-        &#39;&#34;bass&#34;,&#34;other&#34;]}\&#39;&#39;,
-    )
-    args = parser.parse_args()
-    use_cuda = not args.no_cuda and torch.cuda.is_available()
-    device = torch.device(&#34;cuda&#34; if use_cuda else &#34;cpu&#34;)
-    mus = musdb.DB(
-        root=args.root,
-        download=args.root is None,
-        subsets=args.subset,
-        is_wav=args.is_wav,
-    )
-    aggregate_dict = None if args.aggregate is None else json.loads(args.aggregate)
-    if args.cores &gt; 1:
-        pool = multiprocessing.Pool(args.cores)
-        results = museval.EvalStore()
-        scores_list = list(
-            pool.imap_unordered(
-                func=functools.partial(
-                    separate_and_evaluate,
-                    targets=args.targets,
-                    model_str_or_path=args.model,
-                    niter=args.niter,
-                    residual=args.residual,
-                    mus=mus,
-                    aggregate_dict=aggregate_dict,
-                    output_dir=args.outdir,
-                    eval_dir=args.evaldir,
-                    device=device,
-                ),
-                iterable=mus.tracks,
-                chunksize=1,
-            )
-        )
-        pool.close()
-        pool.join()
-        for scores in scores_list:
-            results.add_track(scores)
-    else:
-        results = museval.EvalStore()
-        for track in tqdm.tqdm(mus.tracks):
-            scores = separate_and_evaluate(
-                track,
-                targets=args.targets,
-                model_str_or_path=args.model,
-                niter=args.niter,
-                residual=args.residual,
-                mus=mus,
-                aggregate_dict=aggregate_dict,
-                output_dir=args.outdir,
-                eval_dir=args.evaldir,
-                device=device,
-            )
-            print(track, &#34;\n&#34;, scores)
-            results.add_track(scores)
-    print(results)
-    method = museval.MethodStore()
-    method.add_evalstore(results, args.model)
-    method.save(args.model + &#34;.pandas&#34;)</code></pre>
-<pre><code class="python">def separate_and_evaluate(
-    track: musdb.MultiTrack,
-    targets: list,
-    model_str_or_path: str,
-    niter: int,
-    output_dir: str,
-    eval_dir: str,
-    residual: bool,
-    mus,
-    aggregate_dict: dict = None,
-    device: Union[str, torch.device] = &#34;cpu&#34;,
-    wiener_win_len: Optional[int] = None,
-    filterbank=&#34;torch&#34;,
-) -&gt; str:
-    separator = utils.load_separator(
-        model_str_or_path=model_str_or_path,
-        targets=targets,
-        niter=niter,
-        residual=residual,
-        wiener_win_len=wiener_win_len,
-        device=device,
-        pretrained=True,
-        filterbank=filterbank,
-    )
-    separator.freeze()
-    separator.to(device)
-    audio = torch.as_tensor(track.audio, dtype=torch.float32, device=device)
-    audio = utils.preprocess(audio, track.rate, separator.sample_rate)
-    estimates = separator(audio)
-    estimates = separator.to_dict(estimates, aggregate_dict=aggregate_dict)
-    for key in estimates:
-        estimates[key] = estimates[key][0].cpu().detach().numpy().T
-    if output_dir:
-        mus.save_estimates(estimates, track, output_dir)
-    scores = museval.eval_mus_track(track, estimates, output_dir=eval_dir)
-    return scores</code></pre>
\ No newline at end of file
-import torch.nn as nn
-from torch import Tensor
-from torch.utils.data import DataLoader
-def atan2(y, x):
-    r&#34;&#34;&#34;Element-wise arctangent function of y/x.
-    Returns a new tensor with signed angles in radians.
-    It is an alternative implementation of torch.atan2
-    Args:
-        y (Tensor): First input tensor
-        x (Tensor): Second input tensor [shape=y.shape]
-    Returns:
-        Tensor: [shape=y.shape].
-    &#34;&#34;&#34;
-    pi = 2 * torch.asin(torch.tensor(1.0))
-    x += ((x == 0) &amp; (y == 0)) * 1.0
-    out = torch.atan(y / x)
-    out += ((y &gt;= 0) &amp; (x &lt; 0)) * pi
-    out -= ((y &lt; 0) &amp; (x &lt; 0)) * pi
-    out *= 1 - ((y &gt; 0) &amp; (x == 0)) * 1.0
-    out += ((y &gt; 0) &amp; (x == 0)) * (pi / 2)
-    out *= 1 - ((y &lt; 0) &amp; (x == 0)) * 1.0
-    out += ((y &lt; 0) &amp; (x == 0)) * (-pi / 2)
-    return out
-# Define basic complex operations on torch.Tensor objects whose last dimension
-# consists in the concatenation of the real and imaginary parts.
-def _norm(x: torch.Tensor) -&gt; torch.Tensor:
-    r&#34;&#34;&#34;Computes the norm value of a torch Tensor, assuming that it
-    comes as real and imaginary part in its last dimension.
-    Args:
-        x (Tensor): Input Tensor of shape [shape=(..., 2)]
-    Returns:
-        Tensor: shape as x excluding the last dimension.
-    &#34;&#34;&#34;
-    return torch.abs(x[..., 0]) ** 2 + torch.abs(x[..., 1]) ** 2
-def _mul_add(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -&gt; torch.Tensor:
-    &#34;&#34;&#34;Element-wise multiplication of two complex Tensors described
-    through their real and imaginary parts.
-    The result is added to the `out` tensor&#34;&#34;&#34;
-    # check `out` and allocate it if needed
-    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])
-    if out is None or out.shape != target_shape:
-        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)
-    if out is a:
-        real_a = a[..., 0]
-        out[..., 0] = out[..., 0] + (real_a * b[..., 0] - a[..., 1] * b[..., 1])
-        out[..., 1] = out[..., 1] + (real_a * b[..., 1] + a[..., 1] * b[..., 0])
-    else:
-        out[..., 0] = out[..., 0] + (a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1])
-        out[..., 1] = out[..., 1] + (a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0])
-    return out
-def _mul(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -&gt; torch.Tensor:
-    &#34;&#34;&#34;Element-wise multiplication of two complex Tensors described
-    through their real and imaginary parts
-    can work in place in case out is a only&#34;&#34;&#34;
-    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])
-    if out is None or out.shape != target_shape:
-        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)
-    if out is a:
-        real_a = a[..., 0]
-        out[..., 0] = real_a * b[..., 0] - a[..., 1] * b[..., 1]
-        out[..., 1] = real_a * b[..., 1] + a[..., 1] * b[..., 0]
-    else:
-        out[..., 0] = a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1]
-        out[..., 1] = a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0]
-    return out
-def _inv(z: torch.Tensor, out: Optional[torch.Tensor] = None) -&gt; torch.Tensor:
-    &#34;&#34;&#34;Element-wise multiplicative inverse of a Tensor with complex
-    entries described through their real and imaginary parts.
-    can work in place in case out is z&#34;&#34;&#34;
-    ez = _norm(z)
-    if out is None or out.shape != z.shape:
-        out = torch.zeros_like(z)
-    out[..., 0] = z[..., 0] / ez
-    out[..., 1] = -z[..., 1] / ez
-    return out
-def _conj(z, out: Optional[torch.Tensor] = None) -&gt; torch.Tensor:
-    &#34;&#34;&#34;Element-wise complex conjugate of a Tensor with complex entries
-    described through their real and imaginary parts.
-    can work in place in case out is z&#34;&#34;&#34;
-    if out is None or out.shape != z.shape:
-        out = torch.zeros_like(z)
-    out[..., 0] = z[..., 0]
-    out[..., 1] = -z[..., 1]
-    return out
-def _invert(M: torch.Tensor, out: Optional[torch.Tensor] = None) -&gt; torch.Tensor:
-    &#34;&#34;&#34;
-    Invert 1x1 or 2x2 matrices
-    Will generate errors if the matrices are singular: user must handle this
-    through his own regularization schemes.
-    Args:
-        M (Tensor): [shape=(..., nb_channels, nb_channels, 2)]
-            matrices to invert: must be square along dimensions -3 and -2
-    Returns:
-        invM (Tensor): [shape=M.shape]
-            inverses of M
-    &#34;&#34;&#34;
-    nb_channels = M.shape[-2]
-    if out is None or out.shape != M.shape:
-        out = torch.empty_like(M)
-    if nb_channels == 1:
-        # scalar case
-        out = _inv(M, out)
-    elif nb_channels == 2:
-        # two channels case: analytical expression
-        # first compute the determinent
-        det = _mul(M[..., 0, 0, :], M[..., 1, 1, :])
-        det = det - _mul(M[..., 0, 1, :], M[..., 1, 0, :])
-        # invert it
-        invDet = _inv(det)
-        # then fill out the matrix with the inverse
-        out[..., 0, 0, :] = _mul(invDet, M[..., 1, 1, :], out[..., 0, 0, :])
-        out[..., 1, 0, :] = _mul(-invDet, M[..., 1, 0, :], out[..., 1, 0, :])
-        out[..., 0, 1, :] = _mul(-invDet, M[..., 0, 1, :], out[..., 0, 1, :])
-        out[..., 1, 1, :] = _mul(invDet, M[..., 0, 0, :], out[..., 1, 1, :])
-    else:
-        raise Exception(&#34;Only 2 channels are supported for the torch version.&#34;)
-    return out
-# Now define the signal-processing low-level functions used by the Separator
-def expectation_maximization(
-    y: torch.Tensor,
-    x: torch.Tensor,
-    iterations: int = 2,
-    eps: float = 1e-10,
-    batch_size: int = 200,
-    r&#34;&#34;&#34;Expectation maximization algorithm, for refining source separation
-    estimates.
-    This algorithm allows to make source separation results better by
-    enforcing multichannel consistency for the estimates. This usually means
-    a better perceptual quality in terms of spatial artifacts.
-    The implementation follows the details presented in [1]_, taking
-    inspiration from the original EM algorithm proposed in [2]_ and its
-    weighted refinement proposed in [3]_, [4]_.
-    It works by iteratively:
-     * Re-estimate source parameters (power spectral densities and spatial
-       covariance matrices) through :func:`get_local_gaussian_model`.
-     * Separate again the mixture with the new parameters by first computing
-       the new modelled mixture covariance matrices with :func:`get_mix_model`,
-       prepare the Wiener filters through :func:`wiener_gain` and apply them
-       with :func:`apply_filter``.
-    References
-    ----------
-    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-        N. Takahashi and Y. Mitsufuji, &#34;Improving music source separation based
-        on deep neural networks through data augmentation and network
-        blending.&#34; 2017 IEEE International Conference on Acoustics, Speech
-        and Signal Processing (ICASSP). IEEE, 2017.
-    .. [2] N.Q. Duong and E. Vincent and R.Gribonval. &#34;Under-determined
-        reverberant audio source separation using a full-rank spatial
-        covariance model.&#34; IEEE Transactions on Audio, Speech, and Language
-        Processing 18.7 (2010): 1830-1840.
-    .. [3] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel audio source
-        separation with deep neural networks.&#34; IEEE/ACM Transactions on Audio,
-        Speech, and Language Processing 24.9 (2016): 1652-1664.
-    .. [4] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel music
-        separation with deep neural networks.&#34; 2016 24th European Signal
-        Processing Conference (EUSIPCO). IEEE, 2016.
-    .. [5] A. Liutkus and R. Badeau and G. Richard &#34;Kernel additive models for
-        source separation.&#34; IEEE Transactions on Signal Processing
-        62.16 (2014): 4298-4310.
-    Args:
-        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-            initial estimates for the sources
-        x (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2)]
-            complex STFT of the mixture signal
-        iterations (int): [scalar]
-            number of iterations for the EM algorithm.
-        eps (float or None): [scalar]
-            The epsilon value to use for regularization and filters.
-    Returns:
-        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-            estimated sources after iterations
-        v (Tensor): [shape=(nb_frames, nb_bins, nb_sources)]
-            estimated power spectral densities
-        R (Tensor): [shape=(nb_bins, nb_channels, nb_channels, 2, nb_sources)]
-            estimated spatial covariance matrices
-    Notes:
-        * You need an initial estimate for the sources to apply this
-          algorithm. This is precisely what the :func:`wiener` function does.
-        * This algorithm *is not* an implementation of the &#34;exact&#34; EM
-          proposed in [1]_. In particular, it does compute the posterior
-          covariance matrices the same (exact) way. Instead, it uses the
-          simplified approximate scheme initially proposed in [5]_ and further
-          refined in [3]_, [4]_, that boils down to just take the empirical
-          covariance of the recent source estimates, followed by a weighted
-          average for the update of the spatial covariance matrix. It has been
-          empirically demonstrated that this simplified algorithm is more
-          robust for music separation.
-    Warning:
-        It is *very* important to make sure `x.dtype` is `torch.float64`
-        if you want double precision, because this function will **not**
-        do such conversion for you from `torch.complex32`, in case you want the
-        smaller RAM usage on purpose.
-        It is usually always better in terms of quality to have double
-        precision, by e.g. calling :func:`expectation_maximization`
-        with ``x.to(torch.float64)``.
-    &#34;&#34;&#34;
-    # dimensions
-    (nb_frames, nb_bins, nb_channels) = x.shape[:-1]
-    nb_sources = y.shape[-1]
-    regularization = torch.cat(
-        (
-            torch.eye(nb_channels, dtype=x.dtype, device=x.device)[..., None],
-            torch.zeros((nb_channels, nb_channels, 1), dtype=x.dtype, device=x.device),
-        ),
-        dim=2,
-    )
-    regularization = torch.sqrt(torch.as_tensor(eps)) * (
-        regularization[None, None, ...].expand((-1, nb_bins, -1, -1, -1))
-    )
-    # allocate the spatial covariance matrices
-    R = [
-        torch.zeros((nb_bins, nb_channels, nb_channels, 2), dtype=x.dtype, device=x.device)
-        for j in range(nb_sources)
-    ]
-    weight: torch.Tensor = torch.zeros((nb_bins,), dtype=x.dtype, device=x.device)
-    v: torch.Tensor = torch.zeros((nb_frames, nb_bins, nb_sources), dtype=x.dtype, device=x.device)
-    for it in range(iterations):
-        # constructing the mixture covariance matrix. Doing it with a loop
-        # to avoid storing anytime in RAM the whole 6D tensor
-        # update the PSD as the average spectrogram over channels
-        v = torch.mean(torch.abs(y[..., 0, :]) ** 2 + torch.abs(y[..., 1, :]) ** 2, dim=-2)
-        # update spatial covariance matrices (weighted update)
-        for j in range(nb_sources):
-            R[j] = torch.tensor(0.0, device=x.device)
-            weight = torch.tensor(eps, device=x.device)
-            pos: int = 0
-            batch_size = batch_size if batch_size else nb_frames
-            while pos &lt; nb_frames:
-                t = torch.arange(pos, min(nb_frames, pos + batch_size))
-                pos = int(t[-1]) + 1
-                R[j] = R[j] + torch.sum(_covariance(y[t, ..., j]), dim=0)
-                weight = weight + torch.sum(v[t, ..., j], dim=0)
-            R[j] = R[j] / weight[..., None, None, None]
-            weight = torch.zeros_like(weight)
-        # cloning y if we track gradient, because we&#39;re going to update it
-        if y.requires_grad:
-            y = y.clone()
-        pos = 0
-        while pos &lt; nb_frames:
-            t = torch.arange(pos, min(nb_frames, pos + batch_size))
-            pos = int(t[-1]) + 1
-            y[t, ...] = torch.tensor(0.0, device=x.device)
-            # compute mix covariance matrix
-            Cxx = regularization
-            for j in range(nb_sources):
-                Cxx = Cxx + (v[t, ..., j, None, None, None] * R[j][None, ...].clone())
-            # invert it
-            inv_Cxx = _invert(Cxx)
-            # separate the sources
-            for j in range(nb_sources):
-                # create a wiener gain for this source
-                gain = torch.zeros_like(inv_Cxx)
-                # computes multichannel Wiener gain as v_j R_j inv_Cxx
-                indices = torch.cartesian_prod(
-                    torch.arange(nb_channels),
-                    torch.arange(nb_channels),
-                    torch.arange(nb_channels),
-                )
-                for index in indices:
-                    gain[:, :, index[0], index[1], :] = _mul_add(
-                        R[j][None, :, index[0], index[2], :].clone(),
-                        inv_Cxx[:, :, index[2], index[1], :],
-                        gain[:, :, index[0], index[1], :],
-                    )
-                gain = gain * v[t, ..., None, None, None, j]
-                # apply it to the mixture
-                for i in range(nb_channels):
-                    y[t, ..., j] = _mul_add(gain[..., i, :], x[t, ..., i, None, :], y[t, ..., j])
-    return y, v, R
-def wiener(
-    targets_spectrograms: torch.Tensor,
-    mix_stft: torch.Tensor,
-    iterations: int = 1,
-    softmask: bool = False,
-    residual: bool = False,
-    scale_factor: float = 10.0,
-    eps: float = 1e-10,
-    &#34;&#34;&#34;Wiener-based separation for multichannel audio.
-    The method uses the (possibly multichannel) spectrograms  of the
-    sources to separate the (complex) Short Term Fourier Transform  of the
-    mix. Separation is done in a sequential way by:
-    * Getting an initial estimate. This can be done in two ways: either by
-      directly using the spectrograms with the mixture phase, or
-      by using a softmasking strategy. This initial phase is controlled
-      by the `softmask` flag.
-    * If required, adding an additional residual target as the mix minus
-      all targets.
-    * Refinining these initial estimates through a call to
-      :func:`expectation_maximization` if the number of iterations is nonzero.
-    This implementation also allows to specify the epsilon value used for
-    regularization. It is based on [1]_, [2]_, [3]_, [4]_.
-    References
-    ----------
-    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-        N. Takahashi and Y. Mitsufuji, &#34;Improving music source separation based
-        on deep neural networks through data augmentation and network
-        blending.&#34; 2017 IEEE International Conference on Acoustics, Speech
-        and Signal Processing (ICASSP). IEEE, 2017.
-    .. [2] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel audio source
-        separation with deep neural networks.&#34; IEEE/ACM Transactions on Audio,
-        Speech, and Language Processing 24.9 (2016): 1652-1664.
-    .. [3] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel music
-        separation with deep neural networks.&#34; 2016 24th European Signal
-        Processing Conference (EUSIPCO). IEEE, 2016.
-    .. [4] A. Liutkus and R. Badeau and G. Richard &#34;Kernel additive models for
-        source separation.&#34; IEEE Transactions on Signal Processing
-        62.16 (2014): 4298-4310.
-    Args:
-        targets_spectrograms (Tensor): spectrograms of the sources
-            [shape=(nb_frames, nb_bins, nb_channels, nb_sources)].
-            This is a nonnegative tensor that is
-            usually the output of the actual separation method of the user. The
-            spectrograms may be mono, but they need to be 4-dimensional in all
-            cases.
-        mix_stft (Tensor): [shape=(nb_frames, nb_bins, nb_channels, complex=2)]
-            STFT of the mixture signal.
-        iterations (int): [scalar]
-            number of iterations for the EM algorithm
-        softmask (bool): Describes how the initial estimates are obtained.
-            * if `False`, then the mixture phase will directly be used with the
-            spectrogram as initial estimates.
-            * if `True`, initial estimates are obtained by multiplying the
-            complex mix element-wise with the ratio of each target spectrogram
-            with the sum of them all. This strategy is better if the model are
-            not really good, and worse otherwise.
-        residual (bool): if `True`, an additional target is created, which is
-            equal to the mixture minus the other targets, before application of
-            expectation maximization
-        eps (float): Epsilon value to use for computing the separations.
-            This is used whenever division with a model energy is
-            performed, i.e. when softmasking and when iterating the EM.
-            It can be understood as the energy of the additional white noise
-            that is taken out when separating.
-    Returns:
-        Tensor: shape=(nb_frames, nb_bins, nb_channels, complex=2, nb_sources)
-            STFT of estimated sources
-    Notes:
-        * Be careful that you need *magnitude spectrogram estimates* for the
-        case `softmask==False`.
-        * `softmask=False` is recommended
-        * The epsilon value will have a huge impact on performance. If it&#39;s
-        large, only the parts of the signal with a significant energy will
-        be kept in the sources. This epsilon then directly controls the
-        energy of the reconstruction error.
-    Warning:
-        As in :func:`expectation_maximization`, we recommend converting the
-        mixture `x` to double precision `torch.float64` *before* calling
-        :func:`wiener`.
-    &#34;&#34;&#34;
-    if softmask:
-        # if we use softmask, we compute the ratio mask for all targets and
-        # multiply by the mix stft
-        y = (
-            mix_stft[..., None]
-            * (
-                targets_spectrograms
-                / (eps + torch.sum(targets_spectrograms, dim=-1, keepdim=True).to(mix_stft.dtype))
-            )[..., None, :]
-        )
-    else:
-        # otherwise, we just multiply the targets spectrograms with mix phase
-        # we tacitly assume that we have magnitude estimates.
-        angle = atan2(mix_stft[..., 1], mix_stft[..., 0])[..., None]
-        nb_sources = targets_spectrograms.shape[-1]
-        y = torch.zeros(
-            mix_stft.shape + (nb_sources,), dtype=mix_stft.dtype, device=mix_stft.device
-        )
-        y[..., 0, :] = targets_spectrograms * torch.cos(angle)
-        y[..., 1, :] = targets_spectrograms * torch.sin(angle)
-    if residual:
-        # if required, adding an additional target as the mix minus
-        # available targets
-        y = torch.cat([y, mix_stft[..., None] - y.sum(dim=-1, keepdim=True)], dim=-1)
-    if iterations == 0:
-        return y
-    # we need to refine the estimates. Scales down the estimates for
-    # numerical stability
-    max_abs = torch.max(
-        torch.as_tensor(1.0, dtype=mix_stft.dtype, device=mix_stft.device),
-        torch.sqrt(_norm(mix_stft)).max() / scale_factor,
-    )
-    mix_stft = mix_stft / max_abs
-    y = y / max_abs
-    # call expectation maximization
-    y = expectation_maximization(y, mix_stft, iterations, eps=eps)[0]
-    # scale estimates up again
-    y = y * max_abs
-    return y
-def _covariance(y_j):
-    &#34;&#34;&#34;
-    Compute the empirical covariance for a source.
-    Args:
-        y_j (Tensor): complex stft of the source.
-            [shape=(nb_frames, nb_bins, nb_channels, 2)].
-    Returns:
-        Cj (Tensor): [shape=(nb_frames, nb_bins, nb_channels, nb_channels, 2)]
-            just y_j * conj(y_j.T): empirical covariance for each TF bin.
-    &#34;&#34;&#34;
-    (nb_frames, nb_bins, nb_channels) = y_j.shape[:-1]
-    Cj = torch.zeros(
-        (nb_frames, nb_bins, nb_channels, nb_channels, 2),
-        dtype=y_j.dtype,
-        device=y_j.device,
-    )
-    indices = torch.cartesian_prod(torch.arange(nb_channels), torch.arange(nb_channels))
-    for index in indices:
-        Cj[:, :, index[0], index[1], :] = _mul_add(
-            y_j[:, :, index[0], :],
-            _conj(y_j[:, :, index[1], :]),
-            Cj[:, :, index[0], index[1], :],
-        )
-    return Cj</code></pre>
-<pre><code class="python">def atan2(y, x):
-    r&#34;&#34;&#34;Element-wise arctangent function of y/x.
-    Returns a new tensor with signed angles in radians.
-    It is an alternative implementation of torch.atan2
-    Args:
-        y (Tensor): First input tensor
-        x (Tensor): Second input tensor [shape=y.shape]
-    Returns:
-        Tensor: [shape=y.shape].
-    &#34;&#34;&#34;
-    pi = 2 * torch.asin(torch.tensor(1.0))
-    x += ((x == 0) &amp; (y == 0)) * 1.0
-    out = torch.atan(y / x)
-    out += ((y &gt;= 0) &amp; (x &lt; 0)) * pi
-    out -= ((y &lt; 0) &amp; (x &lt; 0)) * pi
-    out *= 1 - ((y &gt; 0) &amp; (x == 0)) * 1.0
-    out += ((y &gt; 0) &amp; (x == 0)) * (pi / 2)
-    out *= 1 - ((y &lt; 0) &amp; (x == 0)) * 1.0
-    out += ((y &lt; 0) &amp; (x == 0)) * (-pi / 2)
-    return out</code></pre>
-<dt id="openunmix.filtering.expectation_maximization"><code class="name flex">
-<span>def <span class="ident">expectation_maximization</span></span>(<span>y: torch.Tensor, x: torch.Tensor, iterations: int = 2, eps: float = 1e-10, batch_size: int = 200)</span>
-<div class="desc"><p>Expectation maximization algorithm, for refining source separation
-<p>This algorithm allows to make source separation results better by
-enforcing multichannel consistency for the estimates. This usually means
-a better perceptual quality in terms of spatial artifacts.</p>
-<p>The implementation follows the details presented in [1]<em>, taking
-inspiration from the original EM algorithm proposed in [2]</em> and its
-weighted refinement proposed in [3]<em>, [4]</em>.
-It works by iteratively:</p>
-<p>Re-estimate source parameters (power spectral densities and spatial
-covariance matrices) through :func:<code>get_local_gaussian_model</code>.</p>
-<p>Separate again the mixture with the new parameters by first computing
-the new modelled mixture covariance matrices with :func:<code>get_mix_model</code>,
-prepare the Wiener filters through :func:<code>wiener_gain</code> and apply them
-with :func:`apply_filter``.</p>
-<h2 id="references">References</h2>
-<p>.. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-N. Takahashi and Y. Mitsufuji, "Improving music source separation based
-on deep neural networks through data augmentation and network
-blending." 2017 IEEE International Conference on Acoustics, Speech
-and Signal Processing (ICASSP). IEEE, 2017.</p>
-<p>.. [2] N.Q. Duong and E. Vincent and R.Gribonval. "Under-determined
-reverberant audio source separation using a full-rank spatial
-covariance model." IEEE Transactions on Audio, Speech, and Language
-Processing 18.7 (2010): 1830-1840.</p>
-<p>.. [3] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel audio source
-separation with deep neural networks." IEEE/ACM Transactions on Audio,
-Speech, and Language Processing 24.9 (2016): 1652-1664.</p>
-<p>.. [4] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel music
-separation with deep neural networks." 2016 24th European Signal
-Processing Conference (EUSIPCO). IEEE, 2016.</p>
-<p>.. [5] A. Liutkus and R. Badeau and G. Richard "Kernel additive models for
-source separation." IEEE Transactions on Signal Processing
-62.16 (2014): 4298-4310.</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>y</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>[shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-initial estimates for the sources</dd>
-<dt><strong><code>x</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>[shape=(nb_frames, nb_bins, nb_channels, 2)]
-complex STFT of the mixture signal</dd>
-<dt><strong><code>iterations</code></strong> :&ensp;<code>int</code></dt>
-number of iterations for the EM algorithm.</dd>
-<dt><strong><code>eps</code></strong> :&ensp;<code>float</code> or <code>None</code></dt>
-The epsilon value to use for regularization and filters.</dd>
-<h2 id="returns">Returns</h2>
-<p>y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-estimated sources after iterations
-v (Tensor): [shape=(nb_frames, nb_bins, nb_sources)]
-estimated power spectral densities
-R (Tensor): [shape=(nb_bins, nb_channels, nb_channels, 2, nb_sources)]
-estimated spatial covariance matrices</p>
-<h2 id="notes">Notes</h2>
-<li>You need an initial estimate for the sources to apply this
-algorithm. This is precisely what the :func:<code><a title="openunmix.filtering.wiener" href="#openunmix.filtering.wiener">wiener()</a></code> function does.</li>
-<li>This algorithm <em>is not</em> an implementation of the "exact" EM
-proposed in [1]<em>. In particular, it does compute the posterior
-covariance matrices the same (exact) way. Instead, it uses the
-simplified approximate scheme initially proposed in [5]</em> and further
-refined in [3]<em>, [4]</em>, that boils down to just take the empirical
-covariance of the recent source estimates, followed by a weighted
-average for the update of the spatial covariance matrix. It has been
-empirically demonstrated that this simplified algorithm is more
-robust for music separation.</li>
-<h2 id="warning">Warning</h2>
-<p>It is <em>very</em> important to make sure <code>x.dtype</code> is <code>torch.float64</code>
-if you want double precision, because this function will <strong>not</strong>
-do such conversion for you from <code>torch.complex32</code>, in case you want the
-smaller RAM usage on purpose.</p>
-<p>It is usually always better in terms of quality to have double
-precision, by e.g. calling :func:<code><a title="openunmix.filtering.expectation_maximization" href="#openunmix.filtering.expectation_maximization">expectation_maximization()</a></code>
-with <code>x.to(torch.float64)</code>.</p></div>
-<pre><code class="python">def expectation_maximization(
-    y: torch.Tensor,
-    x: torch.Tensor,
-    iterations: int = 2,
-    eps: float = 1e-10,
-    batch_size: int = 200,
-    r&#34;&#34;&#34;Expectation maximization algorithm, for refining source separation
-    estimates.
-    This algorithm allows to make source separation results better by
-    enforcing multichannel consistency for the estimates. This usually means
-    a better perceptual quality in terms of spatial artifacts.
-    The implementation follows the details presented in [1]_, taking
-    inspiration from the original EM algorithm proposed in [2]_ and its
-    weighted refinement proposed in [3]_, [4]_.
-    It works by iteratively:
-     * Re-estimate source parameters (power spectral densities and spatial
-       covariance matrices) through :func:`get_local_gaussian_model`.
-     * Separate again the mixture with the new parameters by first computing
-       the new modelled mixture covariance matrices with :func:`get_mix_model`,
-       prepare the Wiener filters through :func:`wiener_gain` and apply them
-       with :func:`apply_filter``.
-    References
-    ----------
-    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-        N. Takahashi and Y. Mitsufuji, &#34;Improving music source separation based
-        on deep neural networks through data augmentation and network
-        blending.&#34; 2017 IEEE International Conference on Acoustics, Speech
-        and Signal Processing (ICASSP). IEEE, 2017.
-    .. [2] N.Q. Duong and E. Vincent and R.Gribonval. &#34;Under-determined
-        reverberant audio source separation using a full-rank spatial
-        covariance model.&#34; IEEE Transactions on Audio, Speech, and Language
-        Processing 18.7 (2010): 1830-1840.
-    .. [3] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel audio source
-        separation with deep neural networks.&#34; IEEE/ACM Transactions on Audio,
-        Speech, and Language Processing 24.9 (2016): 1652-1664.
-    .. [4] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel music
-        separation with deep neural networks.&#34; 2016 24th European Signal
-        Processing Conference (EUSIPCO). IEEE, 2016.
-    .. [5] A. Liutkus and R. Badeau and G. Richard &#34;Kernel additive models for
-        source separation.&#34; IEEE Transactions on Signal Processing
-        62.16 (2014): 4298-4310.
-    Args:
-        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-            initial estimates for the sources
-        x (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2)]
-            complex STFT of the mixture signal
-        iterations (int): [scalar]
-            number of iterations for the EM algorithm.
-        eps (float or None): [scalar]
-            The epsilon value to use for regularization and filters.
-    Returns:
-        y (Tensor): [shape=(nb_frames, nb_bins, nb_channels, 2, nb_sources)]
-            estimated sources after iterations
-        v (Tensor): [shape=(nb_frames, nb_bins, nb_sources)]
-            estimated power spectral densities
-        R (Tensor): [shape=(nb_bins, nb_channels, nb_channels, 2, nb_sources)]
-            estimated spatial covariance matrices
-    Notes:
-        * You need an initial estimate for the sources to apply this
-          algorithm. This is precisely what the :func:`wiener` function does.
-        * This algorithm *is not* an implementation of the &#34;exact&#34; EM
-          proposed in [1]_. In particular, it does compute the posterior
-          covariance matrices the same (exact) way. Instead, it uses the
-          simplified approximate scheme initially proposed in [5]_ and further
-          refined in [3]_, [4]_, that boils down to just take the empirical
-          covariance of the recent source estimates, followed by a weighted
-          average for the update of the spatial covariance matrix. It has been
-          empirically demonstrated that this simplified algorithm is more
-          robust for music separation.
-    Warning:
-        It is *very* important to make sure `x.dtype` is `torch.float64`
-        if you want double precision, because this function will **not**
-        do such conversion for you from `torch.complex32`, in case you want the
-        smaller RAM usage on purpose.
-        It is usually always better in terms of quality to have double
-        precision, by e.g. calling :func:`expectation_maximization`
-        with ``x.to(torch.float64)``.
-    &#34;&#34;&#34;
-    # dimensions
-    (nb_frames, nb_bins, nb_channels) = x.shape[:-1]
-    nb_sources = y.shape[-1]
-    regularization = torch.cat(
-        (
-            torch.eye(nb_channels, dtype=x.dtype, device=x.device)[..., None],
-            torch.zeros((nb_channels, nb_channels, 1), dtype=x.dtype, device=x.device),
-        ),
-        dim=2,
-    )
-    regularization = torch.sqrt(torch.as_tensor(eps)) * (
-        regularization[None, None, ...].expand((-1, nb_bins, -1, -1, -1))
-    )
-    # allocate the spatial covariance matrices
-    R = [
-        torch.zeros((nb_bins, nb_channels, nb_channels, 2), dtype=x.dtype, device=x.device)
-        for j in range(nb_sources)
-    ]
-    weight: torch.Tensor = torch.zeros((nb_bins,), dtype=x.dtype, device=x.device)
-    v: torch.Tensor = torch.zeros((nb_frames, nb_bins, nb_sources), dtype=x.dtype, device=x.device)
-    for it in range(iterations):
-        # constructing the mixture covariance matrix. Doing it with a loop
-        # to avoid storing anytime in RAM the whole 6D tensor
-        # update the PSD as the average spectrogram over channels
-        v = torch.mean(torch.abs(y[..., 0, :]) ** 2 + torch.abs(y[..., 1, :]) ** 2, dim=-2)
-        # update spatial covariance matrices (weighted update)
-        for j in range(nb_sources):
-            R[j] = torch.tensor(0.0, device=x.device)
-            weight = torch.tensor(eps, device=x.device)
-            pos: int = 0
-            batch_size = batch_size if batch_size else nb_frames
-            while pos &lt; nb_frames:
-                t = torch.arange(pos, min(nb_frames, pos + batch_size))
-                pos = int(t[-1]) + 1
-                R[j] = R[j] + torch.sum(_covariance(y[t, ..., j]), dim=0)
-                weight = weight + torch.sum(v[t, ..., j], dim=0)
-            R[j] = R[j] / weight[..., None, None, None]
-            weight = torch.zeros_like(weight)
-        # cloning y if we track gradient, because we&#39;re going to update it
-        if y.requires_grad:
-            y = y.clone()
-        pos = 0
-        while pos &lt; nb_frames:
-            t = torch.arange(pos, min(nb_frames, pos + batch_size))
-            pos = int(t[-1]) + 1
-            y[t, ...] = torch.tensor(0.0, device=x.device)
-            # compute mix covariance matrix
-            Cxx = regularization
-            for j in range(nb_sources):
-                Cxx = Cxx + (v[t, ..., j, None, None, None] * R[j][None, ...].clone())
-            # invert it
-            inv_Cxx = _invert(Cxx)
-            # separate the sources
-            for j in range(nb_sources):
-                # create a wiener gain for this source
-                gain = torch.zeros_like(inv_Cxx)
-                # computes multichannel Wiener gain as v_j R_j inv_Cxx
-                indices = torch.cartesian_prod(
-                    torch.arange(nb_channels),
-                    torch.arange(nb_channels),
-                    torch.arange(nb_channels),
-                )
-                for index in indices:
-                    gain[:, :, index[0], index[1], :] = _mul_add(
-                        R[j][None, :, index[0], index[2], :].clone(),
-                        inv_Cxx[:, :, index[2], index[1], :],
-                        gain[:, :, index[0], index[1], :],
-                    )
-                gain = gain * v[t, ..., None, None, None, j]
-                # apply it to the mixture
-                for i in range(nb_channels):
-                    y[t, ..., j] = _mul_add(gain[..., i, :], x[t, ..., i, None, :], y[t, ..., j])
-    return y, v, R</code></pre>
-<dt id="openunmix.filtering.wiener"><code class="name flex">
-<span>def <span class="ident">wiener</span></span>(<span>targets_spectrograms: torch.Tensor, mix_stft: torch.Tensor, iterations: int = 1, softmask: bool = False, residual: bool = False, scale_factor: float = 10.0, eps: float = 1e-10)</span>
-<div class="desc"><p>Wiener-based separation for multichannel audio.</p>
-<p>The method uses the (possibly multichannel) spectrograms
-of the
-sources to separate the (complex) Short Term Fourier Transform
-of the
-mix. Separation is done in a sequential way by:</p>
-<p>Getting an initial estimate. This can be done in two ways: either by
-directly using the spectrograms with the mixture phase, or
-by using a softmasking strategy. This initial phase is controlled
-by the <code>softmask</code> flag.</p>
-<p>If required, adding an additional residual target as the mix minus
-all targets.</p>
-<p>Refinining these initial estimates through a call to
-:func:<code><a title="openunmix.filtering.expectation_maximization" href="#openunmix.filtering.expectation_maximization">expectation_maximization()</a></code> if the number of iterations is nonzero.</p>
-<p>This implementation also allows to specify the epsilon value used for
-regularization. It is based on [1]<em>, [2]</em>, [3]<em>, [4]</em>.</p>
-<h2 id="references">References</h2>
-<p>.. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-N. Takahashi and Y. Mitsufuji, "Improving music source separation based
-on deep neural networks through data augmentation and network
-blending." 2017 IEEE International Conference on Acoustics, Speech
-and Signal Processing (ICASSP). IEEE, 2017.</p>
-<p>.. [2] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel audio source
-separation with deep neural networks." IEEE/ACM Transactions on Audio,
-Speech, and Language Processing 24.9 (2016): 1652-1664.</p>
-<p>.. [3] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel music
-separation with deep neural networks." 2016 24th European Signal
-Processing Conference (EUSIPCO). IEEE, 2016.</p>
-<p>.. [4] A. Liutkus and R. Badeau and G. Richard "Kernel additive models for
-source separation." IEEE Transactions on Signal Processing
-62.16 (2014): 4298-4310.</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>targets_spectrograms</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>spectrograms of the sources
-[shape=(nb_frames, nb_bins, nb_channels, nb_sources)].
-This is a nonnegative tensor that is
-usually the output of the actual separation method of the user. The
-spectrograms may be mono, but they need to be 4-dimensional in all
-<dt><strong><code>mix_stft</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>[shape=(nb_frames, nb_bins, nb_channels, complex=2)]
-STFT of the mixture signal.</dd>
-<dt><strong><code>iterations</code></strong> :&ensp;<code>int</code></dt>
-number of iterations for the EM algorithm</dd>
-<dt><strong><code>softmask</code></strong> :&ensp;<code>bool</code></dt>
-<dd>Describes how the initial estimates are obtained.
-* if <code>False</code>, then the mixture phase will directly be used with the
-spectrogram as initial estimates.
-* if <code>True</code>, initial estimates are obtained by multiplying the
-complex mix element-wise with the ratio of each target spectrogram
-with the sum of them all. This strategy is better if the model are
-not really good, and worse otherwise.</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if <code>True</code>, an additional target is created, which is
-equal to the mixture minus the other targets, before application of
-expectation maximization</dd>
-<dt><strong><code>eps</code></strong> :&ensp;<code>float</code></dt>
-<dd>Epsilon value to use for computing the separations.
-This is used whenever division with a model energy is
-performed, i.e. when softmasking and when iterating the EM.
-It can be understood as the energy of the additional white noise
-that is taken out when separating.</dd>
-<h2 id="returns">Returns</h2>
-<dd>shape=(nb_frames, nb_bins, nb_channels, complex=2, nb_sources)
-STFT of estimated sources</dd>
-<h2 id="notes">Notes</h2>
-<li>Be careful that you need <em>magnitude spectrogram estimates</em> for the
-case <code>softmask==False</code>.</li>
-<li><code>softmask=False</code> is recommended</li>
-<li>The epsilon value will have a huge impact on performance. If it's
-large, only the parts of the signal with a significant energy will
-be kept in the sources. This epsilon then directly controls the
-energy of the reconstruction error.</li>
-<h2 id="warning">Warning</h2>
-<p>As in :func:<code><a title="openunmix.filtering.expectation_maximization" href="#openunmix.filtering.expectation_maximization">expectation_maximization()</a></code>, we recommend converting the
-mixture <code>x</code> to double precision <code>torch.float64</code> <em>before</em> calling
-:func:<code><a title="openunmix.filtering.wiener" href="#openunmix.filtering.wiener">wiener()</a></code>.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/filtering.py#L338-L476" class="git-link">Browse git</a>
-<pre><code class="python">def wiener(
-    targets_spectrograms: torch.Tensor,
-    mix_stft: torch.Tensor,
-    iterations: int = 1,
-    softmask: bool = False,
-    residual: bool = False,
-    scale_factor: float = 10.0,
-    eps: float = 1e-10,
-    &#34;&#34;&#34;Wiener-based separation for multichannel audio.
-    The method uses the (possibly multichannel) spectrograms  of the
-    sources to separate the (complex) Short Term Fourier Transform  of the
-    mix. Separation is done in a sequential way by:
-    * Getting an initial estimate. This can be done in two ways: either by
-      directly using the spectrograms with the mixture phase, or
-      by using a softmasking strategy. This initial phase is controlled
-      by the `softmask` flag.
-    * If required, adding an additional residual target as the mix minus
-      all targets.
-    * Refinining these initial estimates through a call to
-      :func:`expectation_maximization` if the number of iterations is nonzero.
-    This implementation also allows to specify the epsilon value used for
-    regularization. It is based on [1]_, [2]_, [3]_, [4]_.
-    References
-    ----------
-    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-        N. Takahashi and Y. Mitsufuji, &#34;Improving music source separation based
-        on deep neural networks through data augmentation and network
-        blending.&#34; 2017 IEEE International Conference on Acoustics, Speech
-        and Signal Processing (ICASSP). IEEE, 2017.
-    .. [2] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel audio source
-        separation with deep neural networks.&#34; IEEE/ACM Transactions on Audio,
-        Speech, and Language Processing 24.9 (2016): 1652-1664.
-    .. [3] A. Nugraha and A. Liutkus and E. Vincent. &#34;Multichannel music
-        separation with deep neural networks.&#34; 2016 24th European Signal
-        Processing Conference (EUSIPCO). IEEE, 2016.
-    .. [4] A. Liutkus and R. Badeau and G. Richard &#34;Kernel additive models for
-        source separation.&#34; IEEE Transactions on Signal Processing
-        62.16 (2014): 4298-4310.
-    Args:
-        targets_spectrograms (Tensor): spectrograms of the sources
-            [shape=(nb_frames, nb_bins, nb_channels, nb_sources)].
-            This is a nonnegative tensor that is
-            usually the output of the actual separation method of the user. The
-            spectrograms may be mono, but they need to be 4-dimensional in all
-            cases.
-        mix_stft (Tensor): [shape=(nb_frames, nb_bins, nb_channels, complex=2)]
-            STFT of the mixture signal.
-        iterations (int): [scalar]
-            number of iterations for the EM algorithm
-        softmask (bool): Describes how the initial estimates are obtained.
-            * if `False`, then the mixture phase will directly be used with the
-            spectrogram as initial estimates.
-            * if `True`, initial estimates are obtained by multiplying the
-            complex mix element-wise with the ratio of each target spectrogram
-            with the sum of them all. This strategy is better if the model are
-            not really good, and worse otherwise.
-        residual (bool): if `True`, an additional target is created, which is
-            equal to the mixture minus the other targets, before application of
-            expectation maximization
-        eps (float): Epsilon value to use for computing the separations.
-            This is used whenever division with a model energy is
-            performed, i.e. when softmasking and when iterating the EM.
-            It can be understood as the energy of the additional white noise
-            that is taken out when separating.
-    Returns:
-        Tensor: shape=(nb_frames, nb_bins, nb_channels, complex=2, nb_sources)
-            STFT of estimated sources
-    Notes:
-        * Be careful that you need *magnitude spectrogram estimates* for the
-        case `softmask==False`.
-        * `softmask=False` is recommended
-        * The epsilon value will have a huge impact on performance. If it&#39;s
-        large, only the parts of the signal with a significant energy will
-        be kept in the sources. This epsilon then directly controls the
-        energy of the reconstruction error.
-    Warning:
-        As in :func:`expectation_maximization`, we recommend converting the
-        mixture `x` to double precision `torch.float64` *before* calling
-        :func:`wiener`.
-    &#34;&#34;&#34;
-    if softmask:
-        # if we use softmask, we compute the ratio mask for all targets and
-        # multiply by the mix stft
-        y = (
-            mix_stft[..., None]
-            * (
-                targets_spectrograms
-                / (eps + torch.sum(targets_spectrograms, dim=-1, keepdim=True).to(mix_stft.dtype))
-            )[..., None, :]
-        )
-    else:
-        # otherwise, we just multiply the targets spectrograms with mix phase
-        # we tacitly assume that we have magnitude estimates.
-        angle = atan2(mix_stft[..., 1], mix_stft[..., 0])[..., None]
-        nb_sources = targets_spectrograms.shape[-1]
-        y = torch.zeros(
-            mix_stft.shape + (nb_sources,), dtype=mix_stft.dtype, device=mix_stft.device
-        )
-        y[..., 0, :] = targets_spectrograms * torch.cos(angle)
-        y[..., 1, :] = targets_spectrograms * torch.sin(angle)
-    if residual:
-        # if required, adding an additional target as the mix minus
-        # available targets
-        y = torch.cat([y, mix_stft[..., None] - y.sum(dim=-1, keepdim=True)], dim=-1)
-    if iterations == 0:
-        return y
-    # we need to refine the estimates. Scales down the estimates for
-    # numerical stability
-    max_abs = torch.max(
-        torch.as_tensor(1.0, dtype=mix_stft.dtype, device=mix_stft.device),
-        torch.sqrt(_norm(mix_stft)).max() / scale_factor,
-    )
-    mix_stft = mix_stft / max_abs
-    y = y / max_abs
-    # call expectation maximization
-    y = expectation_maximization(y, mix_stft, iterations, eps=eps)[0]
-    # scale estimates up again
-    y = y * max_abs
-    return y</code></pre>
\ No newline at end of file
-<article id="content">
-<h1 class="title">Package <code>openunmix</code></h1>
-<section id="section-intro">
-<p><img alt="sigsep logo" src="https://sigsep.github.io/hero.png">
-Open-Unmix is a deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments. The models were pre-trained on the MUSDB18 dataset. See details at apply pre-trained model.</p>
-<p>This is the python package API documentation.
-Please checkout <a href="https://sigsep.github.io/open-unmix">the open-unmix website</a> for more information.</p>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L0-L260" class="git-link">Browse git</a>
-<pre><code class="python">&#34;&#34;&#34;
-![sigsep logo](https://sigsep.github.io/hero.png)
-Open-Unmix is a deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments. The models were pre-trained on the MUSDB18 dataset. See details at apply pre-trained model.
-This is the python package API documentation. 
-Please checkout [the open-unmix website](https://sigsep.github.io/open-unmix) for more information.
-from openunmix import utils
-import torch.hub
-def umxse_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    target_urls = {
-        &#34;speech&#34;: &#34;https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/speech_f5e0d9f9.pth&#34;,
-        &#34;noise&#34;: &#34;https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/noise_04a6fc2d.pth&#34;,
-    }
-    from .model import OpenUnmix
-    if targets is None:
-        targets = [&#34;speech&#34;, &#34;noise&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=16000.0, n_fft=1024, bandwidth=16000)
-    # load open unmix models speech enhancement models
-    target_models = {}
-    for target in targets:
-        target_unmix = OpenUnmix(
-            nb_bins=1024 // 2 + 1, nb_channels=1, hidden_size=256, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umxse(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix Speech Enhancemennt 1-channel BiLSTM Model
-    trained on the 28-speaker version of Voicebank+Demand
-    (Sampling rate: 16kHz)
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;speech&#39;, &#39;noise&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    Reference:
-        Uhlich, Stefan, &amp; Mitsufuji, Yuki. (2020).
-        Open-Unmix for Speech Enhancement (UMX SE).
-        Zenodo. http://doi.org/10.5281/zenodo.3786908
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umxse_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=1024,
-        n_hop=512,
-        nb_channels=1,
-        sample_rate=16000.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
-def umxhq_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        &#34;bass&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/bass-8d85a5bd.pth&#34;,
-        &#34;drums&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/drums-9619578f.pth&#34;,
-        &#34;other&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/other-b52fbbf7.pth&#34;,
-        &#34;vocals&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/vocals-b62c91ce.pth&#34;,
-    }
-    if targets is None:
-        targets = [&#34;vocals&#34;, &#34;drums&#34;, &#34;bass&#34;, &#34;other&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umxhq(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18-HQ
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umxhq_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
-def umx_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        &#34;bass&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/bass-646024d3.pth&#34;,
-        &#34;drums&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/drums-5a48008b.pth&#34;,
-        &#34;other&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/other-f8e132cc.pth&#34;,
-        &#34;vocals&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/vocals-c8df74a5.pth&#34;,
-    }
-    if targets is None:
-        targets = [&#34;vocals&#34;, &#34;drums&#34;, &#34;bass&#34;, &#34;other&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umx(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umx_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator</code></pre>
-<h2 class="section-title" id="header-submodules">Sub-modules</h2>
-<dt><code class="name"><a title="openunmix.cli" href="cli.html">openunmix.cli</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.data" href="data.html">openunmix.data</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.evaluate" href="evaluate.html">openunmix.evaluate</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.filtering" href="filtering.html">openunmix.filtering</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.model" href="model.html">openunmix.model</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.predict" href="predict.html">openunmix.predict</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.transforms" href="transforms.html">openunmix.transforms</a></code></dt>
-<div class="desc"></div>
-<dt><code class="name"><a title="openunmix.utils" href="utils.html">openunmix.utils</a></code></dt>
-<div class="desc"></div>
-<h2 class="section-title" id="header-functions">Functions</h2>
-<dt id="openunmix.umx"><code class="name flex">
-<span>def <span class="ident">umx</span></span>(<span>targets=None, residual=False, niter=1, device='cpu', pretrained=True, filterbank='torch')</span>
-<div class="desc"><p>Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>targets</code></strong> :&ensp;<code>str</code></dt>
-<dd>select the targets for the source to be separated.
-a list including: ['vocals', 'drums', 'bass', 'other'].
-If you don't pick them all, you probably want to
-activate the <code>residual=True</code> option.
-Defaults to all available targets per model.</dd>
-<dt><strong><code>pretrained</code></strong> :&ensp;<code>bool</code></dt>
-<dd>If True, returns a model pre-trained on MUSDB18-HQ</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if True, a "garbage" target is created</dd>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>the number of post-processingiterations, defaults to 0</dd>
-<dt><strong><code>device</code></strong> :&ensp;<code>str</code></dt>
-<dd>selects device to be used for inference</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L218-L261" class="git-link">Browse git</a>
-<pre><code class="python">def umx(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umx_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator</code></pre>
-<dt id="openunmix.umx_spec"><code class="name flex">
-<span>def <span class="ident">umx_spec</span></span>(<span>targets=None, device='cpu', pretrained=True)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L181-L215" class="git-link">Browse git</a>
-<pre><code class="python">def umx_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        &#34;bass&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/bass-646024d3.pth&#34;,
-        &#34;drums&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/drums-5a48008b.pth&#34;,
-        &#34;other&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/other-f8e132cc.pth&#34;,
-        &#34;vocals&#34;: &#34;https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/vocals-c8df74a5.pth&#34;,
-    }
-    if targets is None:
-        targets = [&#34;vocals&#34;, &#34;drums&#34;, &#34;bass&#34;, &#34;other&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models</code></pre>
-<dt id="openunmix.umxhq"><code class="name flex">
-<span>def <span class="ident">umxhq</span></span>(<span>targets=None, residual=False, niter=1, device='cpu', pretrained=True, filterbank='torch')</span>
-<div class="desc"><p>Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18-HQ</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>targets</code></strong> :&ensp;<code>str</code></dt>
-<dd>select the targets for the source to be separated.
-a list including: ['vocals', 'drums', 'bass', 'other'].
-If you don't pick them all, you probably want to
-activate the <code>residual=True</code> option.
-Defaults to all available targets per model.</dd>
-<dt><strong><code>pretrained</code></strong> :&ensp;<code>bool</code></dt>
-<dd>If True, returns a model pre-trained on MUSDB18-HQ</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if True, a "garbage" target is created</dd>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>the number of post-processingiterations, defaults to 0</dd>
-<dt><strong><code>device</code></strong> :&ensp;<code>str</code></dt>
-<dd>selects device to be used for inference</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L135-L178" class="git-link">Browse git</a>
-<pre><code class="python">def umxhq(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18-HQ
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umxhq_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator</code></pre>
-<dt id="openunmix.umxhq_spec"><code class="name flex">
-<span>def <span class="ident">umxhq_spec</span></span>(<span>targets=None, device='cpu', pretrained=True)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L98-L132" class="git-link">Browse git</a>
-<pre><code class="python">def umxhq_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        &#34;bass&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/bass-8d85a5bd.pth&#34;,
-        &#34;drums&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/drums-9619578f.pth&#34;,
-        &#34;other&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/other-b52fbbf7.pth&#34;,
-        &#34;vocals&#34;: &#34;https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/vocals-b62c91ce.pth&#34;,
-    }
-    if targets is None:
-        targets = [&#34;vocals&#34;, &#34;drums&#34;, &#34;bass&#34;, &#34;other&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models</code></pre>
-<dt id="openunmix.umxse"><code class="name flex">
-<span>def <span class="ident">umxse</span></span>(<span>targets=None, residual=False, niter=1, device='cpu', pretrained=True, filterbank='torch')</span>
-<div class="desc"><p>Open Unmix Speech Enhancemennt 1-channel BiLSTM Model
-trained on the 28-speaker version of Voicebank+Demand
-(Sampling rate: 16kHz)</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>targets</code></strong> :&ensp;<code>str</code></dt>
-<dd>select the targets for the source to be separated.
-a list including: ['speech', 'noise'].
-If you don't pick them all, you probably want to
-activate the <code>residual=True</code> option.
-Defaults to all available targets per model.</dd>
-<dt><strong><code>pretrained</code></strong> :&ensp;<code>bool</code></dt>
-<dd>If True, returns a model pre-trained on MUSDB18-HQ</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if True, a "garbage" target is created</dd>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>the number of post-processingiterations, defaults to 0</dd>
-<dt><strong><code>device</code></strong> :&ensp;<code>str</code></dt>
-<dd>selects device to be used for inference</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<h2 id="reference">Reference</h2>
-<p>Uhlich, Stefan, &amp; Mitsufuji, Yuki. (2020).
-Open-Unmix for Speech Enhancement (UMX SE).
-Zenodo. <a href="http://doi.org/10.5281/zenodo.3786908">http://doi.org/10.5281/zenodo.3786908</a></p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L46-L95" class="git-link">Browse git</a>
-<pre><code class="python">def umxse(
-    targets=None,
-    residual=False,
-    niter=1,
-    device=&#34;cpu&#34;,
-    pretrained=True,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix Speech Enhancemennt 1-channel BiLSTM Model
-    trained on the 28-speaker version of Voicebank+Demand
-    (Sampling rate: 16kHz)
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: [&#39;speech&#39;, &#39;noise&#39;].
-                If you don&#39;t pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    Reference:
-        Uhlich, Stefan, &amp; Mitsufuji, Yuki. (2020).
-        Open-Unmix for Speech Enhancement (UMX SE).
-        Zenodo. http://doi.org/10.5281/zenodo.3786908
-    &#34;&#34;&#34;
-    from .model import Separator
-    target_models = umxse_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=1024,
-        n_hop=512,
-        nb_channels=1,
-        sample_rate=16000.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator</code></pre>
-<dt id="openunmix.umxse_spec"><code class="name flex">
-<span>def <span class="ident">umxse_spec</span></span>(<span>targets=None, device='cpu', pretrained=True)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/__init__.py#L12-L43" class="git-link">Browse git</a>
-<pre><code class="python">def umxse_spec(targets=None, device=&#34;cpu&#34;, pretrained=True):
-    target_urls = {
-        &#34;speech&#34;: &#34;https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/speech_f5e0d9f9.pth&#34;,
-        &#34;noise&#34;: &#34;https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/noise_04a6fc2d.pth&#34;,
-    }
-    from .model import OpenUnmix
-    if targets is None:
-        targets = [&#34;speech&#34;, &#34;noise&#34;]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=16000.0, n_fft=1024, bandwidth=16000)
-    # load open unmix models speech enhancement models
-    target_models = {}
-    for target in targets:
-        target_unmix = OpenUnmix(
-            nb_bins=1024 // 2 + 1, nb_channels=1, hidden_size=256, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models</code></pre>
-<nav id="sidebar">
-<div class="toc">
-<ul id="index">
-<li><h3><a href="#header-submodules">Sub-modules</a></h3>
-<li><code><a title="openunmix.cli" href="cli.html">openunmix.cli</a></code></li>
-<li><code><a title="openunmix.data" href="data.html">openunmix.data</a></code></li>
-<li><code><a title="openunmix.evaluate" href="evaluate.html">openunmix.evaluate</a></code></li>
-<li><code><a title="openunmix.filtering" href="filtering.html">openunmix.filtering</a></code></li>
-<li><code><a title="openunmix.model" href="model.html">openunmix.model</a></code></li>
-<li><code><a title="openunmix.predict" href="predict.html">openunmix.predict</a></code></li>
-<li><code><a title="openunmix.transforms" href="transforms.html">openunmix.transforms</a></code></li>
-<li><code><a title="openunmix.utils" href="utils.html">openunmix.utils</a></code></li>
-<li><h3><a href="#header-functions">Functions</a></h3>
-<ul class="two-column">
-<li><code><a title="openunmix.umx" href="#openunmix.umx">umx</a></code></li>
-<li><code><a title="openunmix.umx_spec" href="#openunmix.umx_spec">umx_spec</a></code></li>
-<li><code><a title="openunmix.umxhq" href="#openunmix.umxhq">umxhq</a></code></li>
-<li><code><a title="openunmix.umxhq_spec" href="#openunmix.umxhq_spec">umxhq_spec</a></code></li>
-<li><code><a title="openunmix.umxse" href="#openunmix.umxse">umxse</a></code></li>
-<li><code><a title="openunmix.umxse_spec" href="#openunmix.umxse_spec">umxse_spec</a></code></li>
-<footer id="footer">
-<p>Generated by <a href="https://pdoc3.github.io/pdoc"><cite>pdoc</cite> 0.9.2</a>.</p>
\ No newline at end of file
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/inference.md b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/inference.md
deleted file mode 100644
index 0a82b6fa2c369aaa6579bfd11a25c80a1c9f45c4..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/inference.md
+++ /dev/null
@@ -1,57 +0,0 @@
-# Performing separation
-## Interfacing using the command line
-The primary interface to separate files is the command line. To separate a mixture file into the four stems you can just run
-umx input_file.wav
-Note that we support all files that can be read by torchaudio, depending on the set backend (either `soundfile` (libsndfile) or `sox`).
-For training, we set the default to `soundfile` as it is faster than `sox`. However for inference users might prefer `mp3` decoding capabilities.
-The separation can be controlled with additional parameters that influence the performance of the separation.
-| Command line Argument      | Description                                                                     | Default         |
-|`--start <float>`  | set start in seconds to reduce the duration of the audio being loaded | `0.0` |
-|`--duration <float>`  | set duration in seconds to reduce length of the audio being loaded. Negative values will make the full audio being loaded | `-1.0` |
-|`--model <str>`  | path or string of model name to select either a self pre-trained model or a model loaded from `torchhub`.  | |
-| `--targets list(str)`           | Targets to be used for separation. For each target a model file with with same name is required.                                                  | `['vocals', 'drums', 'bass', 'other']`          |
-| `--niter <int>`           | Number of EM steps for refining initial estimates in a post-processing stage. `--niter 0` skips this step altogether (and thus makes separation significantly faster) More iterations can get better interference reduction at the price of more artifacts.                                                  | `1`          |
-| `--residual`           |               computes a residual target, for custom separation scenarios when not all targets are available (at the expense of slightly less performance). E.g vocal/accompaniment can be performed with `--targets vocals --residual`.                                   | not set          |
-| `--softmask`       | if activated, then the initial estimates for the sources will be obtained through a ratio mask of the mixture STFT, and not by using the default behavior of reconstructing waveforms by using the mixture phase.  | not set            |
-| `--wiener-win-len <int>`         | Number of frames on which to apply filtering independently  | `300`                   |
-| `--audio-backend <str>`         | choose audio loading backend, either `sox_io`,  `soundfile` or `stempeg` (which needs additional installation requirements) | [torchaudio default](https://pytorch.org/audio/stable/backend.html) |
-| `--aggregate <str>`         | if provided, must be a string containing a valid expression for a dictionary, with keys as output target names, and values a list of targets that are used to build it. For instance: `{ "vocals": ["vocals"], "accompaniment": ["drums", "bass", "other"]}` | `None` |
-| `--filterbank <str>`         | filterbank implementation method. Supported: `['torch', 'asteroid']`. While `torch` is ~30% faster compared to `asteroid` on large FFT sizes such as 4096, asteroids STFT maybe be easier to be exported for deployment. | `torch` |
-## Interfacing from python
-At the core of the process of separating audio is the `Separator` Module which
-takes a numpy audio array or a `torch.Tensor` as input (the mixture) and separates into `targets` stems.
-Note, that for each target a separate model will be loaded. E.g. for `umx` and `umxhq` the supported targets are
-`['vocals', 'drums', 'bass', 'other']`. The models have to be passed to the separators `target_models` parameter.
-Both models `umx`, `umxhq`, `umxl` and `umxse` are downloaded automatically.
-Here is an example for constructor for the `Separator` takes the following arguments, with suggested default values:
-seperator = openunmix.Separator(
-    target_models: dict,
-    niter: int = 0,
-    softmask: bool = False,
-    residual: bool = False,
-    sample_rate: float = 44100.0,
-    n_fft: int = 4096,
-    n_hop: int = 1024,
-    nb_channels: int = 2,
-    wiener_win_len: Optional[int] = 300,
-    filterbank: str = 'torch'
-When passing 
-> __Caution__ `training` using the EM algorithm (`niter>0`) is not supported. Only plain post-processing is supported right now for gradient computation. This is because the performance overhead of avoiding all the in-places operations is too large.
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/model.html b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/model.html
deleted file mode 100644
index 67d43fb7a9e9bf2823f7ba2baab39c3f3fb12c63..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/model.html
+++ /dev/null
@@ -1,1150 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
-<meta name="generator" content="pdoc 0.9.2" />
-<title>openunmix.model API documentation</title>
-<meta name="description" content="" />
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
-<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
-<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
-<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
-<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
-<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
-<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
-<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
-<article id="content">
-<h1 class="title">Module <code>openunmix.model</code></h1>
-<section id="section-intro">
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L0-L345" class="git-link">Browse git</a>
-<pre><code class="python">from typing import Optional
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torch import Tensor
-from torch.nn import LSTM, BatchNorm1d, Linear, Parameter
-from .filtering import wiener
-from .transforms import make_filterbanks, ComplexNorm
-class OpenUnmix(nn.Module):
-    &#34;&#34;&#34;OpenUnmix Core spectrogram based separation module.
-    Args:
-        nb_bins (int): Number of input time-frequency bins (Default: `4096`).
-        nb_channels (int): Number of input audio channels (Default: `2`).
-        hidden_size (int): Size for bottleneck layers (Default: `512`).
-        nb_layers (int): Number of Bi-LSTM layers (Default: `3`).
-        unidirectional (bool): Use causal model useful for realtime purpose.
-            (Default `False`)
-        input_mean (ndarray or None): global data mean of shape `(nb_bins, )`.
-            Defaults to zeros(nb_bins)
-        input_scale (ndarray or None): global data mean of shape `(nb_bins, )`.
-            Defaults to ones(nb_bins)
-        max_bin (int or None): Internal frequency bin threshold to
-            reduce high frequency content. Defaults to `None` which results
-            in `nb_bins`
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        nb_bins=4096,
-        nb_channels=2,
-        hidden_size=512,
-        nb_layers=3,
-        unidirectional=False,
-        input_mean=None,
-        input_scale=None,
-        max_bin=None,
-    ):
-        super(OpenUnmix, self).__init__()
-        self.nb_output_bins = nb_bins
-        if max_bin:
-            self.nb_bins = max_bin
-        else:
-            self.nb_bins = self.nb_output_bins
-        self.hidden_size = hidden_size
-        self.fc1 = Linear(self.nb_bins * nb_channels, hidden_size, bias=False)
-        self.bn1 = BatchNorm1d(hidden_size)
-        if unidirectional:
-            lstm_hidden_size = hidden_size
-        else:
-            lstm_hidden_size = hidden_size // 2
-        self.lstm = LSTM(
-            input_size=hidden_size,
-            hidden_size=lstm_hidden_size,
-            num_layers=nb_layers,
-            bidirectional=not unidirectional,
-            batch_first=False,
-            dropout=0.4 if nb_layers &gt; 1 else 0,
-        )
-        fc2_hiddensize = hidden_size * 2
-        self.fc2 = Linear(in_features=fc2_hiddensize, out_features=hidden_size, bias=False)
-        self.bn2 = BatchNorm1d(hidden_size)
-        self.fc3 = Linear(
-            in_features=hidden_size,
-            out_features=self.nb_output_bins * nb_channels,
-            bias=False,
-        )
-        self.bn3 = BatchNorm1d(self.nb_output_bins * nb_channels)
-        if input_mean is not None:
-            input_mean = torch.from_numpy(-input_mean[: self.nb_bins]).float()
-        else:
-            input_mean = torch.zeros(self.nb_bins)
-        if input_scale is not None:
-            input_scale = torch.from_numpy(1.0 / input_scale[: self.nb_bins]).float()
-        else:
-            input_scale = torch.ones(self.nb_bins)
-        self.input_mean = Parameter(input_mean)
-        self.input_scale = Parameter(input_scale)
-        self.output_scale = Parameter(torch.ones(self.nb_output_bins).float())
-        self.output_mean = Parameter(torch.ones(self.nb_output_bins).float())
-    def freeze(self):
-        # set all parameters as not requiring gradient, more RAM-efficient
-        # at test time
-        for p in self.parameters():
-            p.requires_grad = False
-        self.eval()
-    def forward(self, x: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;
-        Args:
-            x: input spectrogram of shape
-                `(nb_samples, nb_channels, nb_bins, nb_frames)`
-        Returns:
-            Tensor: filtered spectrogram of shape
-                `(nb_samples, nb_channels, nb_bins, nb_frames)`
-        &#34;&#34;&#34;
-        # permute so that batch is last for lstm
-        x = x.permute(3, 0, 1, 2)
-        # get current spectrogram shape
-        nb_frames, nb_samples, nb_channels, nb_bins = x.data.shape
-        mix = x.detach().clone()
-        # crop
-        x = x[..., : self.nb_bins]
-        # shift and scale input to mean=0 std=1 (across all bins)
-        x += self.input_mean
-        x *= self.input_scale
-        # to (nb_frames*nb_samples, nb_channels*nb_bins)
-        # and encode to (nb_frames*nb_samples, hidden_size)
-        x = self.fc1(x.reshape(-1, nb_channels * self.nb_bins))
-        # normalize every instance in a batch
-        x = self.bn1(x)
-        x = x.reshape(nb_frames, nb_samples, self.hidden_size)
-        # squash range ot [-1, 1]
-        x = torch.tanh(x)
-        # apply 3-layers of stacked LSTM
-        lstm_out = self.lstm(x)
-        # lstm skip connection
-        x = torch.cat([x, lstm_out[0]], -1)
-        # first dense stage + batch norm
-        x = self.fc2(x.reshape(-1, x.shape[-1]))
-        x = self.bn2(x)
-        x = F.relu(x)
-        # second dense stage + layer norm
-        x = self.fc3(x)
-        x = self.bn3(x)
-        # reshape back to original dim
-        x = x.reshape(nb_frames, nb_samples, nb_channels, self.nb_output_bins)
-        # apply output scaling
-        x *= self.output_scale
-        x += self.output_mean
-        # since our output is non-negative, we can apply RELU
-        x = F.relu(x) * mix
-        # permute back to (nb_samples, nb_channels, nb_bins, nb_frames)
-        return x.permute(1, 2, 3, 0)
-class Separator(nn.Module):
-    &#34;&#34;&#34;
-    Separator class to encapsulate all the stereo filtering
-    as a torch Module, to enable end-to-end learning.
-    Args:
-        targets (dict of str: nn.Module): dictionary of target models
-            the spectrogram models to be used by the Separator.
-        niter (int): Number of EM steps for refining initial estimates in a
-            post-processing stage. Zeroed if only one target is estimated.
-            defaults to `1`.
-        residual (bool): adds an additional residual target, obtained by
-            subtracting the other estimated targets from the mixture,
-            before any potential EM post-processing.
-            Defaults to `False`.
-        wiener_win_len (int or None): The size of the excerpts
-            (number of frames) on which to apply filtering
-            independently. This means assuming time varying stereo models and
-            localization of sources.
-            None means not batching but using the whole signal. It comes at the
-            price of a much larger memory usage.
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        target_models: dict,
-        niter: int = 0,
-        softmask: bool = False,
-        residual: bool = False,
-        sample_rate: float = 44100.0,
-        n_fft: int = 4096,
-        n_hop: int = 1024,
-        nb_channels: int = 2,
-        wiener_win_len: Optional[int] = 300,
-        filterbank: str = &#34;torch&#34;,
-    ):
-        super(Separator, self).__init__()
-        # saving parameters
-        self.niter = niter
-        self.residual = residual
-        self.softmask = softmask
-        self.wiener_win_len = wiener_win_len
-        self.stft, self.istft = make_filterbanks(
-            n_fft=n_fft,
-            n_hop=n_hop,
-            center=True,
-            method=filterbank,
-            sample_rate=sample_rate,
-        )
-        self.complexnorm = ComplexNorm(mono=nb_channels == 1)
-        # registering the targets models
-        self.target_models = nn.ModuleDict(target_models)
-        # adding till https://github.com/pytorch/pytorch/issues/38963
-        self.nb_targets = len(self.target_models)
-        # get the sample_rate as the sample_rate of the first model
-        # (tacitly assume it&#39;s the same for all targets)
-        self.register_buffer(&#34;sample_rate&#34;, torch.as_tensor(sample_rate))
-    def freeze(self):
-        # set all parameters as not requiring gradient, more RAM-efficient
-        # at test time
-        for p in self.parameters():
-            p.requires_grad = False
-        self.eval()
-    def forward(self, audio: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;Performing the separation on audio input
-        Args:
-            audio (Tensor): [shape=(nb_samples, nb_channels, nb_timesteps)]
-                mixture audio waveform
-        Returns:
-            Tensor: stacked tensor of separated waveforms
-                shape `(nb_samples, nb_targets, nb_channels, nb_timesteps)`
-        &#34;&#34;&#34;
-        nb_sources = self.nb_targets
-        nb_samples = audio.shape[0]
-        # getting the STFT of mix:
-        # (nb_samples, nb_channels, nb_bins, nb_frames, 2)
-        mix_stft = self.stft(audio)
-        X = self.complexnorm(mix_stft)
-        # initializing spectrograms variable
-        spectrograms = torch.zeros(X.shape + (nb_sources,), dtype=audio.dtype, device=X.device)
-        for j, (target_name, target_module) in enumerate(self.target_models.items()):
-            # apply current model to get the source spectrogram
-            target_spectrogram = target_module(X.detach().clone())
-            spectrograms[..., j] = target_spectrogram
-        # transposing it as
-        # (nb_samples, nb_frames, nb_bins,{1,nb_channels}, nb_sources)
-        spectrograms = spectrograms.permute(0, 3, 2, 1, 4)
-        # rearranging it into:
-        # (nb_samples, nb_frames, nb_bins, nb_channels, 2) to feed
-        # into filtering methods
-        mix_stft = mix_stft.permute(0, 3, 2, 1, 4)
-        # create an additional target if we need to build a residual
-        if self.residual:
-            # we add an additional target
-            nb_sources += 1
-        if nb_sources == 1 and self.niter &gt; 0:
-            raise Exception(
-                &#34;Cannot use EM if only one target is estimated.&#34;
-                &#34;Provide two targets or create an additional &#34;
-                &#34;one with `--residual`&#34;
-            )
-        nb_frames = spectrograms.shape[1]
-        targets_stft = torch.zeros(
-            mix_stft.shape + (nb_sources,), dtype=audio.dtype, device=mix_stft.device
-        )
-        for sample in range(nb_samples):
-            pos = 0
-            if self.wiener_win_len:
-                wiener_win_len = self.wiener_win_len
-            else:
-                wiener_win_len = nb_frames
-            while pos &lt; nb_frames:
-                cur_frame = torch.arange(pos, min(nb_frames, pos + wiener_win_len))
-                pos = int(cur_frame[-1]) + 1
-                targets_stft[sample, cur_frame] = wiener(
-                    spectrograms[sample, cur_frame],
-                    mix_stft[sample, cur_frame],
-                    self.niter,
-                    softmask=self.softmask,
-                    residual=self.residual,
-                )
-        # getting to (nb_samples, nb_targets, channel, fft_size, n_frames, 2)
-        targets_stft = targets_stft.permute(0, 5, 3, 2, 1, 4).contiguous()
-        # inverse STFT
-        estimates = self.istft(targets_stft, length=audio.shape[2])
-        return estimates
-    def to_dict(self, estimates: Tensor, aggregate_dict: Optional[dict] = None) -&gt; dict:
-        &#34;&#34;&#34;Convert estimates as stacked tensor to dictionary
-        Args:
-            estimates (Tensor): separated targets of shape
-                (nb_samples, nb_targets, nb_channels, nb_timesteps)
-            aggregate_dict (dict or None)
-        Returns:
-            (dict of str: Tensor):
-        &#34;&#34;&#34;
-        estimates_dict = {}
-        for k, target in enumerate(self.target_models):
-            estimates_dict[target] = estimates[:, k, ...]
-        # in the case of residual, we added another source
-        if self.residual:
-            estimates_dict[&#34;residual&#34;] = estimates[:, -1, ...]
-        if aggregate_dict is not None:
-            new_estimates = {}
-            for key in aggregate_dict:
-                new_estimates[key] = torch.tensor(0.0)
-                for target in aggregate_dict[key]:
-                    new_estimates[key] = new_estimates[key] + estimates_dict[target]
-            estimates_dict = new_estimates
-        return estimates_dict</code></pre>
-<h2 class="section-title" id="header-classes">Classes</h2>
-<dt id="openunmix.model.OpenUnmix"><code class="flex name class">
-<span>class <span class="ident">OpenUnmix</span></span>
-<span>(</span><span>nb_bins=4096, nb_channels=2, hidden_size=512, nb_layers=3, unidirectional=False, input_mean=None, input_scale=None, max_bin=None)</span>
-<div class="desc"><p>OpenUnmix Core spectrogram based separation module.</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>nb_bins</code></strong> :&ensp;<code>int</code></dt>
-<dd>Number of input time-frequency bins (Default: <code>4096</code>).</dd>
-<dt><strong><code>nb_channels</code></strong> :&ensp;<code>int</code></dt>
-<dd>Number of input audio channels (Default: <code>2</code>).</dd>
-<dt><strong><code>hidden_size</code></strong> :&ensp;<code>int</code></dt>
-<dd>Size for bottleneck layers (Default: <code>512</code>).</dd>
-<dt><strong><code>nb_layers</code></strong> :&ensp;<code>int</code></dt>
-<dd>Number of Bi-LSTM layers (Default: <code>3</code>).</dd>
-<dt><strong><code>unidirectional</code></strong> :&ensp;<code>bool</code></dt>
-<dd>Use causal model useful for realtime purpose.
-(Default <code>False</code>)</dd>
-<dt><strong><code>input_mean</code></strong> :&ensp;<code>ndarray</code> or <code>None</code></dt>
-<dd>global data mean of shape <code>(nb_bins, )</code>.
-Defaults to zeros(nb_bins)</dd>
-<dt><strong><code>input_scale</code></strong> :&ensp;<code>ndarray</code> or <code>None</code></dt>
-<dd>global data mean of shape <code>(nb_bins, )</code>.
-Defaults to ones(nb_bins)</dd>
-<dt><strong><code>max_bin</code></strong> :&ensp;<code>int</code> or <code>None</code></dt>
-<dd>Internal frequency bin threshold to
-reduce high frequency content. Defaults to <code>None</code> which results
-in <code>nb_bins</code></dd>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L12-L165" class="git-link">Browse git</a>
-<pre><code class="python">class OpenUnmix(nn.Module):
-    &#34;&#34;&#34;OpenUnmix Core spectrogram based separation module.
-    Args:
-        nb_bins (int): Number of input time-frequency bins (Default: `4096`).
-        nb_channels (int): Number of input audio channels (Default: `2`).
-        hidden_size (int): Size for bottleneck layers (Default: `512`).
-        nb_layers (int): Number of Bi-LSTM layers (Default: `3`).
-        unidirectional (bool): Use causal model useful for realtime purpose.
-            (Default `False`)
-        input_mean (ndarray or None): global data mean of shape `(nb_bins, )`.
-            Defaults to zeros(nb_bins)
-        input_scale (ndarray or None): global data mean of shape `(nb_bins, )`.
-            Defaults to ones(nb_bins)
-        max_bin (int or None): Internal frequency bin threshold to
-            reduce high frequency content. Defaults to `None` which results
-            in `nb_bins`
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        nb_bins=4096,
-        nb_channels=2,
-        hidden_size=512,
-        nb_layers=3,
-        unidirectional=False,
-        input_mean=None,
-        input_scale=None,
-        max_bin=None,
-    ):
-        super(OpenUnmix, self).__init__()
-        self.nb_output_bins = nb_bins
-        if max_bin:
-            self.nb_bins = max_bin
-        else:
-            self.nb_bins = self.nb_output_bins
-        self.hidden_size = hidden_size
-        self.fc1 = Linear(self.nb_bins * nb_channels, hidden_size, bias=False)
-        self.bn1 = BatchNorm1d(hidden_size)
-        if unidirectional:
-            lstm_hidden_size = hidden_size
-        else:
-            lstm_hidden_size = hidden_size // 2
-        self.lstm = LSTM(
-            input_size=hidden_size,
-            hidden_size=lstm_hidden_size,
-            num_layers=nb_layers,
-            bidirectional=not unidirectional,
-            batch_first=False,
-            dropout=0.4 if nb_layers &gt; 1 else 0,
-        )
-        fc2_hiddensize = hidden_size * 2
-        self.fc2 = Linear(in_features=fc2_hiddensize, out_features=hidden_size, bias=False)
-        self.bn2 = BatchNorm1d(hidden_size)
-        self.fc3 = Linear(
-            in_features=hidden_size,
-            out_features=self.nb_output_bins * nb_channels,
-            bias=False,
-        )
-        self.bn3 = BatchNorm1d(self.nb_output_bins * nb_channels)
-        if input_mean is not None:
-            input_mean = torch.from_numpy(-input_mean[: self.nb_bins]).float()
-        else:
-            input_mean = torch.zeros(self.nb_bins)
-        if input_scale is not None:
-            input_scale = torch.from_numpy(1.0 / input_scale[: self.nb_bins]).float()
-        else:
-            input_scale = torch.ones(self.nb_bins)
-        self.input_mean = Parameter(input_mean)
-        self.input_scale = Parameter(input_scale)
-        self.output_scale = Parameter(torch.ones(self.nb_output_bins).float())
-        self.output_mean = Parameter(torch.ones(self.nb_output_bins).float())
-    def freeze(self):
-        # set all parameters as not requiring gradient, more RAM-efficient
-        # at test time
-        for p in self.parameters():
-            p.requires_grad = False
-        self.eval()
-    def forward(self, x: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;
-        Args:
-            x: input spectrogram of shape
-                `(nb_samples, nb_channels, nb_bins, nb_frames)`
-        Returns:
-            Tensor: filtered spectrogram of shape
-                `(nb_samples, nb_channels, nb_bins, nb_frames)`
-        &#34;&#34;&#34;
-        # permute so that batch is last for lstm
-        x = x.permute(3, 0, 1, 2)
-        # get current spectrogram shape
-        nb_frames, nb_samples, nb_channels, nb_bins = x.data.shape
-        mix = x.detach().clone()
-        # crop
-        x = x[..., : self.nb_bins]
-        # shift and scale input to mean=0 std=1 (across all bins)
-        x += self.input_mean
-        x *= self.input_scale
-        # to (nb_frames*nb_samples, nb_channels*nb_bins)
-        # and encode to (nb_frames*nb_samples, hidden_size)
-        x = self.fc1(x.reshape(-1, nb_channels * self.nb_bins))
-        # normalize every instance in a batch
-        x = self.bn1(x)
-        x = x.reshape(nb_frames, nb_samples, self.hidden_size)
-        # squash range ot [-1, 1]
-        x = torch.tanh(x)
-        # apply 3-layers of stacked LSTM
-        lstm_out = self.lstm(x)
-        # lstm skip connection
-        x = torch.cat([x, lstm_out[0]], -1)
-        # first dense stage + batch norm
-        x = self.fc2(x.reshape(-1, x.shape[-1]))
-        x = self.bn2(x)
-        x = F.relu(x)
-        # second dense stage + layer norm
-        x = self.fc3(x)
-        x = self.bn3(x)
-        # reshape back to original dim
-        x = x.reshape(nb_frames, nb_samples, nb_channels, self.nb_output_bins)
-        # apply output scaling
-        x *= self.output_scale
-        x += self.output_mean
-        # since our output is non-negative, we can apply RELU
-        x = F.relu(x) * mix
-        # permute back to (nb_samples, nb_channels, nb_bins, nb_frames)
-        return x.permute(1, 2, 3, 0)</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.model.OpenUnmix.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.model.OpenUnmix.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.model.OpenUnmix.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
-<div class="desc"><h2 id="args">Args</h2>
-<dd>input spectrogram of shape
-<code>(nb_samples, nb_channels, nb_bins, nb_frames)</code></dd>
-<h2 id="returns">Returns</h2>
-<dd>filtered spectrogram of shape
-<code>(nb_samples, nb_channels, nb_bins, nb_frames)</code></dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L106-L165" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, x: Tensor) -&gt; Tensor:
-    &#34;&#34;&#34;
-    Args:
-        x: input spectrogram of shape
-            `(nb_samples, nb_channels, nb_bins, nb_frames)`
-    Returns:
-        Tensor: filtered spectrogram of shape
-            `(nb_samples, nb_channels, nb_bins, nb_frames)`
-    &#34;&#34;&#34;
-    # permute so that batch is last for lstm
-    x = x.permute(3, 0, 1, 2)
-    # get current spectrogram shape
-    nb_frames, nb_samples, nb_channels, nb_bins = x.data.shape
-    mix = x.detach().clone()
-    # crop
-    x = x[..., : self.nb_bins]
-    # shift and scale input to mean=0 std=1 (across all bins)
-    x += self.input_mean
-    x *= self.input_scale
-    # to (nb_frames*nb_samples, nb_channels*nb_bins)
-    # and encode to (nb_frames*nb_samples, hidden_size)
-    x = self.fc1(x.reshape(-1, nb_channels * self.nb_bins))
-    # normalize every instance in a batch
-    x = self.bn1(x)
-    x = x.reshape(nb_frames, nb_samples, self.hidden_size)
-    # squash range ot [-1, 1]
-    x = torch.tanh(x)
-    # apply 3-layers of stacked LSTM
-    lstm_out = self.lstm(x)
-    # lstm skip connection
-    x = torch.cat([x, lstm_out[0]], -1)
-    # first dense stage + batch norm
-    x = self.fc2(x.reshape(-1, x.shape[-1]))
-    x = self.bn2(x)
-    x = F.relu(x)
-    # second dense stage + layer norm
-    x = self.fc3(x)
-    x = self.bn3(x)
-    # reshape back to original dim
-    x = x.reshape(nb_frames, nb_samples, nb_channels, self.nb_output_bins)
-    # apply output scaling
-    x *= self.output_scale
-    x += self.output_mean
-    # since our output is non-negative, we can apply RELU
-    x = F.relu(x) * mix
-    # permute back to (nb_samples, nb_channels, nb_bins, nb_frames)
-    return x.permute(1, 2, 3, 0)</code></pre>
-<dt id="openunmix.model.OpenUnmix.freeze"><code class="name flex">
-<span>def <span class="ident">freeze</span></span>(<span>self)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L99-L104" class="git-link">Browse git</a>
-<pre><code class="python">def freeze(self):
-    # set all parameters as not requiring gradient, more RAM-efficient
-    # at test time
-    for p in self.parameters():
-        p.requires_grad = False
-    self.eval()</code></pre>
-<dt id="openunmix.model.Separator"><code class="flex name class">
-<span>class <span class="ident">Separator</span></span>
-<span>(</span><span>target_models: dict, niter: int = 0, softmask: bool = False, residual: bool = False, sample_rate: float = 44100.0, n_fft: int = 4096, n_hop: int = 1024, nb_channels: int = 2, wiener_win_len: Union[int, NoneType] = 300, filterbank: str = 'torch')</span>
-<div class="desc"><p>Separator class to encapsulate all the stereo filtering
-as a torch Module, to enable end-to-end learning.</p>
-<h2 id="args">Args</h2>
-<dt>targets (dict of str: nn.Module): dictionary of target models</dt>
-<dt>the spectrogram models to be used by the Separator.</dt>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>Number of EM steps for refining initial estimates in a
-post-processing stage. Zeroed if only one target is estimated.
-defaults to <code>1</code>.</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>adds an additional residual target, obtained by
-subtracting the other estimated targets from the mixture,
-before any potential EM post-processing.
-Defaults to <code>False</code>.</dd>
-<dt><strong><code>wiener_win_len</code></strong> :&ensp;<code>int</code> or <code>None</code></dt>
-<dd>The size of the excerpts
-(number of frames) on which to apply filtering
-independently. This means assuming time varying stereo models and
-localization of sources.
-None means not batching but using the whole signal. It comes at the
-price of a much larger memory usage.</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L168-L346" class="git-link">Browse git</a>
-<pre><code class="python">class Separator(nn.Module):
-    &#34;&#34;&#34;
-    Separator class to encapsulate all the stereo filtering
-    as a torch Module, to enable end-to-end learning.
-    Args:
-        targets (dict of str: nn.Module): dictionary of target models
-            the spectrogram models to be used by the Separator.
-        niter (int): Number of EM steps for refining initial estimates in a
-            post-processing stage. Zeroed if only one target is estimated.
-            defaults to `1`.
-        residual (bool): adds an additional residual target, obtained by
-            subtracting the other estimated targets from the mixture,
-            before any potential EM post-processing.
-            Defaults to `False`.
-        wiener_win_len (int or None): The size of the excerpts
-            (number of frames) on which to apply filtering
-            independently. This means assuming time varying stereo models and
-            localization of sources.
-            None means not batching but using the whole signal. It comes at the
-            price of a much larger memory usage.
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        target_models: dict,
-        niter: int = 0,
-        softmask: bool = False,
-        residual: bool = False,
-        sample_rate: float = 44100.0,
-        n_fft: int = 4096,
-        n_hop: int = 1024,
-        nb_channels: int = 2,
-        wiener_win_len: Optional[int] = 300,
-        filterbank: str = &#34;torch&#34;,
-    ):
-        super(Separator, self).__init__()
-        # saving parameters
-        self.niter = niter
-        self.residual = residual
-        self.softmask = softmask
-        self.wiener_win_len = wiener_win_len
-        self.stft, self.istft = make_filterbanks(
-            n_fft=n_fft,
-            n_hop=n_hop,
-            center=True,
-            method=filterbank,
-            sample_rate=sample_rate,
-        )
-        self.complexnorm = ComplexNorm(mono=nb_channels == 1)
-        # registering the targets models
-        self.target_models = nn.ModuleDict(target_models)
-        # adding till https://github.com/pytorch/pytorch/issues/38963
-        self.nb_targets = len(self.target_models)
-        # get the sample_rate as the sample_rate of the first model
-        # (tacitly assume it&#39;s the same for all targets)
-        self.register_buffer(&#34;sample_rate&#34;, torch.as_tensor(sample_rate))
-    def freeze(self):
-        # set all parameters as not requiring gradient, more RAM-efficient
-        # at test time
-        for p in self.parameters():
-            p.requires_grad = False
-        self.eval()
-    def forward(self, audio: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;Performing the separation on audio input
-        Args:
-            audio (Tensor): [shape=(nb_samples, nb_channels, nb_timesteps)]
-                mixture audio waveform
-        Returns:
-            Tensor: stacked tensor of separated waveforms
-                shape `(nb_samples, nb_targets, nb_channels, nb_timesteps)`
-        &#34;&#34;&#34;
-        nb_sources = self.nb_targets
-        nb_samples = audio.shape[0]
-        # getting the STFT of mix:
-        # (nb_samples, nb_channels, nb_bins, nb_frames, 2)
-        mix_stft = self.stft(audio)
-        X = self.complexnorm(mix_stft)
-        # initializing spectrograms variable
-        spectrograms = torch.zeros(X.shape + (nb_sources,), dtype=audio.dtype, device=X.device)
-        for j, (target_name, target_module) in enumerate(self.target_models.items()):
-            # apply current model to get the source spectrogram
-            target_spectrogram = target_module(X.detach().clone())
-            spectrograms[..., j] = target_spectrogram
-        # transposing it as
-        # (nb_samples, nb_frames, nb_bins,{1,nb_channels}, nb_sources)
-        spectrograms = spectrograms.permute(0, 3, 2, 1, 4)
-        # rearranging it into:
-        # (nb_samples, nb_frames, nb_bins, nb_channels, 2) to feed
-        # into filtering methods
-        mix_stft = mix_stft.permute(0, 3, 2, 1, 4)
-        # create an additional target if we need to build a residual
-        if self.residual:
-            # we add an additional target
-            nb_sources += 1
-        if nb_sources == 1 and self.niter &gt; 0:
-            raise Exception(
-                &#34;Cannot use EM if only one target is estimated.&#34;
-                &#34;Provide two targets or create an additional &#34;
-                &#34;one with `--residual`&#34;
-            )
-        nb_frames = spectrograms.shape[1]
-        targets_stft = torch.zeros(
-            mix_stft.shape + (nb_sources,), dtype=audio.dtype, device=mix_stft.device
-        )
-        for sample in range(nb_samples):
-            pos = 0
-            if self.wiener_win_len:
-                wiener_win_len = self.wiener_win_len
-            else:
-                wiener_win_len = nb_frames
-            while pos &lt; nb_frames:
-                cur_frame = torch.arange(pos, min(nb_frames, pos + wiener_win_len))
-                pos = int(cur_frame[-1]) + 1
-                targets_stft[sample, cur_frame] = wiener(
-                    spectrograms[sample, cur_frame],
-                    mix_stft[sample, cur_frame],
-                    self.niter,
-                    softmask=self.softmask,
-                    residual=self.residual,
-                )
-        # getting to (nb_samples, nb_targets, channel, fft_size, n_frames, 2)
-        targets_stft = targets_stft.permute(0, 5, 3, 2, 1, 4).contiguous()
-        # inverse STFT
-        estimates = self.istft(targets_stft, length=audio.shape[2])
-        return estimates
-    def to_dict(self, estimates: Tensor, aggregate_dict: Optional[dict] = None) -&gt; dict:
-        &#34;&#34;&#34;Convert estimates as stacked tensor to dictionary
-        Args:
-            estimates (Tensor): separated targets of shape
-                (nb_samples, nb_targets, nb_channels, nb_timesteps)
-            aggregate_dict (dict or None)
-        Returns:
-            (dict of str: Tensor):
-        &#34;&#34;&#34;
-        estimates_dict = {}
-        for k, target in enumerate(self.target_models):
-            estimates_dict[target] = estimates[:, k, ...]
-        # in the case of residual, we added another source
-        if self.residual:
-            estimates_dict[&#34;residual&#34;] = estimates[:, -1, ...]
-        if aggregate_dict is not None:
-            new_estimates = {}
-            for key in aggregate_dict:
-                new_estimates[key] = torch.tensor(0.0)
-                for target in aggregate_dict[key]:
-                    new_estimates[key] = new_estimates[key] + estimates_dict[target]
-            estimates_dict = new_estimates
-        return estimates_dict</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.model.Separator.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.model.Separator.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.model.Separator.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, audio: torch.Tensor) ‑> torch.Tensor</span>
-<div class="desc"><p>Performing the separation on audio input</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>audio</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>[shape=(nb_samples, nb_channels, nb_timesteps)]
-mixture audio waveform</dd>
-<h2 id="returns">Returns</h2>
-<dd>stacked tensor of separated waveforms
-shape <code>(nb_samples, nb_targets, nb_channels, nb_timesteps)</code></dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L241-L318" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, audio: Tensor) -&gt; Tensor:
-    &#34;&#34;&#34;Performing the separation on audio input
-    Args:
-        audio (Tensor): [shape=(nb_samples, nb_channels, nb_timesteps)]
-            mixture audio waveform
-    Returns:
-        Tensor: stacked tensor of separated waveforms
-            shape `(nb_samples, nb_targets, nb_channels, nb_timesteps)`
-    &#34;&#34;&#34;
-    nb_sources = self.nb_targets
-    nb_samples = audio.shape[0]
-    # getting the STFT of mix:
-    # (nb_samples, nb_channels, nb_bins, nb_frames, 2)
-    mix_stft = self.stft(audio)
-    X = self.complexnorm(mix_stft)
-    # initializing spectrograms variable
-    spectrograms = torch.zeros(X.shape + (nb_sources,), dtype=audio.dtype, device=X.device)
-    for j, (target_name, target_module) in enumerate(self.target_models.items()):
-        # apply current model to get the source spectrogram
-        target_spectrogram = target_module(X.detach().clone())
-        spectrograms[..., j] = target_spectrogram
-    # transposing it as
-    # (nb_samples, nb_frames, nb_bins,{1,nb_channels}, nb_sources)
-    spectrograms = spectrograms.permute(0, 3, 2, 1, 4)
-    # rearranging it into:
-    # (nb_samples, nb_frames, nb_bins, nb_channels, 2) to feed
-    # into filtering methods
-    mix_stft = mix_stft.permute(0, 3, 2, 1, 4)
-    # create an additional target if we need to build a residual
-    if self.residual:
-        # we add an additional target
-        nb_sources += 1
-    if nb_sources == 1 and self.niter &gt; 0:
-        raise Exception(
-            &#34;Cannot use EM if only one target is estimated.&#34;
-            &#34;Provide two targets or create an additional &#34;
-            &#34;one with `--residual`&#34;
-        )
-    nb_frames = spectrograms.shape[1]
-    targets_stft = torch.zeros(
-        mix_stft.shape + (nb_sources,), dtype=audio.dtype, device=mix_stft.device
-    )
-    for sample in range(nb_samples):
-        pos = 0
-        if self.wiener_win_len:
-            wiener_win_len = self.wiener_win_len
-        else:
-            wiener_win_len = nb_frames
-        while pos &lt; nb_frames:
-            cur_frame = torch.arange(pos, min(nb_frames, pos + wiener_win_len))
-            pos = int(cur_frame[-1]) + 1
-            targets_stft[sample, cur_frame] = wiener(
-                spectrograms[sample, cur_frame],
-                mix_stft[sample, cur_frame],
-                self.niter,
-                softmask=self.softmask,
-                residual=self.residual,
-            )
-    # getting to (nb_samples, nb_targets, channel, fft_size, n_frames, 2)
-    targets_stft = targets_stft.permute(0, 5, 3, 2, 1, 4).contiguous()
-    # inverse STFT
-    estimates = self.istft(targets_stft, length=audio.shape[2])
-    return estimates</code></pre>
-<dt id="openunmix.model.Separator.freeze"><code class="name flex">
-<span>def <span class="ident">freeze</span></span>(<span>self)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L234-L239" class="git-link">Browse git</a>
-<pre><code class="python">def freeze(self):
-    # set all parameters as not requiring gradient, more RAM-efficient
-    # at test time
-    for p in self.parameters():
-        p.requires_grad = False
-    self.eval()</code></pre>
-<dt id="openunmix.model.Separator.to_dict"><code class="name flex">
-<span>def <span class="ident">to_dict</span></span>(<span>self, estimates: torch.Tensor, aggregate_dict: Union[dict, NoneType] = None) ‑> dict</span>
-<div class="desc"><p>Convert estimates as stacked tensor to dictionary</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>estimates</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>separated targets of shape
-(nb_samples, nb_targets, nb_channels, nb_timesteps)</dd>
-<p>aggregate_dict (dict or None)</p>
-<h2 id="returns">Returns</h2>
-<p>(dict of str: Tensor):</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/model.py#L320-L346" class="git-link">Browse git</a>
-<pre><code class="python">def to_dict(self, estimates: Tensor, aggregate_dict: Optional[dict] = None) -&gt; dict:
-    &#34;&#34;&#34;Convert estimates as stacked tensor to dictionary
-    Args:
-        estimates (Tensor): separated targets of shape
-            (nb_samples, nb_targets, nb_channels, nb_timesteps)
-        aggregate_dict (dict or None)
-    Returns:
-        (dict of str: Tensor):
-    &#34;&#34;&#34;
-    estimates_dict = {}
-    for k, target in enumerate(self.target_models):
-        estimates_dict[target] = estimates[:, k, ...]
-    # in the case of residual, we added another source
-    if self.residual:
-        estimates_dict[&#34;residual&#34;] = estimates[:, -1, ...]
-    if aggregate_dict is not None:
-        new_estimates = {}
-        for key in aggregate_dict:
-            new_estimates[key] = torch.tensor(0.0)
-            for target in aggregate_dict[key]:
-                new_estimates[key] = new_estimates[key] + estimates_dict[target]
-        estimates_dict = new_estimates
-    return estimates_dict</code></pre>
-<nav id="sidebar">
-<div class="toc">
-<ul id="index">
-<li><code><a title="openunmix" href="index.html">openunmix</a></code></li>
-<li><h3><a href="#header-classes">Classes</a></h3>
-<h4><code><a title="openunmix.model.OpenUnmix" href="#openunmix.model.OpenUnmix">OpenUnmix</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.model.OpenUnmix.dump_patches" href="#openunmix.model.OpenUnmix.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.model.OpenUnmix.forward" href="#openunmix.model.OpenUnmix.forward">forward</a></code></li>
-<li><code><a title="openunmix.model.OpenUnmix.freeze" href="#openunmix.model.OpenUnmix.freeze">freeze</a></code></li>
-<li><code><a title="openunmix.model.OpenUnmix.training" href="#openunmix.model.OpenUnmix.training">training</a></code></li>
-<h4><code><a title="openunmix.model.Separator" href="#openunmix.model.Separator">Separator</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.model.Separator.dump_patches" href="#openunmix.model.Separator.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.model.Separator.forward" href="#openunmix.model.Separator.forward">forward</a></code></li>
-<li><code><a title="openunmix.model.Separator.freeze" href="#openunmix.model.Separator.freeze">freeze</a></code></li>
-<li><code><a title="openunmix.model.Separator.to_dict" href="#openunmix.model.Separator.to_dict">to_dict</a></code></li>
-<li><code><a title="openunmix.model.Separator.training" href="#openunmix.model.Separator.training">training</a></code></li>
-<footer id="footer">
-<p>Generated by <a href="https://pdoc3.github.io/pdoc"><cite>pdoc</cite> 0.9.2</a>.</p>
\ No newline at end of file
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/predict.html b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/predict.html
deleted file mode 100644
index 2f8e34eb8a9613dcb2d2aa699706dd327e7d23da..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/predict.html
+++ /dev/null
@@ -1,282 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
-<meta name="generator" content="pdoc 0.9.2" />
-<title>openunmix.predict API documentation</title>
-<meta name="description" content="" />
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
-<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
-<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
-<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
-<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
-<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
-<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
-<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
-<article id="content">
-<h1 class="title">Module <code>openunmix.predict</code></h1>
-<section id="section-intro">
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/predict.py#L0-L79" class="git-link">Browse git</a>
-<pre><code class="python">from openunmix import utils
-def separate(
-    audio,
-    rate=None,
-    model_str_or_path=&#34;umxhq&#34;,
-    targets=None,
-    niter=1,
-    residual=False,
-    wiener_win_len=300,
-    aggregate_dict=None,
-    separator=None,
-    device=None,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix functional interface
-    Separates a torch.Tensor or the content of an audio file.
-    If a separator is provided, use it for inference. If not, create one
-    and use it afterwards.
-    Args:
-        audio: audio to process
-            torch Tensor: shape (channels, length), and
-            `rate` must also be provided.
-        rate: int or None: only used if audio is a Tensor. Otherwise,
-            inferred from the file.
-        model_str_or_path: the pretrained model to use
-        targets (str): select the targets for the source to be separated.
-            a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-            If you don&#39;t pick them all, you probably want to
-            activate the `residual=True` option.
-            Defaults to all available targets per model.
-        niter (int): the number of post-processingiterations, defaults to 1
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        wiener_win_len (int): the number of frames to use when batching
-            the post-processing step
-        aggregate_dict (str): if provided, must be a string containing a &#39;
-            &#39;valid expression for a dictionary, with keys as output &#39;
-            &#39;target names, and values a list of targets that are used to &#39;
-            &#39;build it. For instance: \&#39;{\&#34;vocals\&#34;:[\&#34;vocals\&#34;], &#39;
-            &#39;\&#34;accompaniment\&#34;:[\&#34;drums\&#34;,\&#34;bass\&#34;,\&#34;other\&#34;]}\&#39;
-        separator: if provided, the model.Separator object that will be used
-             to perform separation
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    if separator is None:
-        separator = utils.load_separator(
-            model_str_or_path=model_str_or_path,
-            targets=targets,
-            niter=niter,
-            residual=residual,
-            wiener_win_len=wiener_win_len,
-            device=device,
-            pretrained=True,
-            filterbank=filterbank,
-        )
-        separator.freeze()
-        if device:
-            separator.to(device)
-    if rate is None:
-        raise Exception(&#34;rate` must be provided.&#34;)
-    if device:
-        audio = audio.to(device)
-    audio = utils.preprocess(audio, rate, separator.sample_rate)
-    # getting the separated signals
-    estimates = separator(audio)
-    estimates = separator.to_dict(estimates, aggregate_dict=aggregate_dict)
-    return estimates</code></pre>
-<h2 class="section-title" id="header-functions">Functions</h2>
-<dt id="openunmix.predict.separate"><code class="name flex">
-<span>def <span class="ident">separate</span></span>(<span>audio, rate=None, model_str_or_path='umxhq', targets=None, niter=1, residual=False, wiener_win_len=300, aggregate_dict=None, separator=None, device=None, filterbank='torch')</span>
-<div class="desc"><p>Open Unmix functional interface</p>
-<p>Separates a torch.Tensor or the content of an audio file.</p>
-<p>If a separator is provided, use it for inference. If not, create one
-and use it afterwards.</p>
-<h2 id="args">Args</h2>
-<dd>audio to process
-torch Tensor: shape (channels, length), and
-<code>rate</code> must also be provided.</dd>
-<dd>int or None: only used if audio is a Tensor. Otherwise,
-inferred from the file.</dd>
-<dd>the pretrained model to use</dd>
-<dt><strong><code>targets</code></strong> :&ensp;<code>str</code></dt>
-<dd>select the targets for the source to be separated.
-a list including: ['vocals', 'drums', 'bass', 'other'].
-If you don't pick them all, you probably want to
-activate the <code>residual=True</code> option.
-Defaults to all available targets per model.</dd>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>the number of post-processingiterations, defaults to 1</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if True, a "garbage" target is created</dd>
-<dt><strong><code>wiener_win_len</code></strong> :&ensp;<code>int</code></dt>
-<dd>the number of frames to use when batching
-the post-processing step</dd>
-<dt><strong><code>aggregate_dict</code></strong> :&ensp;<code>str</code></dt>
-<dd>if provided, must be a string containing a '
-'valid expression for a dictionary, with keys as output '
-'target names, and values a list of targets that are used to '
-'build it. For instance: '{"vocals":["vocals"], '
-<dd>if provided, the model.Separator object that will be used
-to perform separation</dd>
-<dt><strong><code>device</code></strong> :&ensp;<code>str</code></dt>
-<dd>selects device to be used for inference</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/predict.py#L4-L80" class="git-link">Browse git</a>
-<pre><code class="python">def separate(
-    audio,
-    rate=None,
-    model_str_or_path=&#34;umxhq&#34;,
-    targets=None,
-    niter=1,
-    residual=False,
-    wiener_win_len=300,
-    aggregate_dict=None,
-    separator=None,
-    device=None,
-    filterbank=&#34;torch&#34;,
-    &#34;&#34;&#34;
-    Open Unmix functional interface
-    Separates a torch.Tensor or the content of an audio file.
-    If a separator is provided, use it for inference. If not, create one
-    and use it afterwards.
-    Args:
-        audio: audio to process
-            torch Tensor: shape (channels, length), and
-            `rate` must also be provided.
-        rate: int or None: only used if audio is a Tensor. Otherwise,
-            inferred from the file.
-        model_str_or_path: the pretrained model to use
-        targets (str): select the targets for the source to be separated.
-            a list including: [&#39;vocals&#39;, &#39;drums&#39;, &#39;bass&#39;, &#39;other&#39;].
-            If you don&#39;t pick them all, you probably want to
-            activate the `residual=True` option.
-            Defaults to all available targets per model.
-        niter (int): the number of post-processingiterations, defaults to 1
-        residual (bool): if True, a &#34;garbage&#34; target is created
-        wiener_win_len (int): the number of frames to use when batching
-            the post-processing step
-        aggregate_dict (str): if provided, must be a string containing a &#39;
-            &#39;valid expression for a dictionary, with keys as output &#39;
-            &#39;target names, and values a list of targets that are used to &#39;
-            &#39;build it. For instance: \&#39;{\&#34;vocals\&#34;:[\&#34;vocals\&#34;], &#39;
-            &#39;\&#34;accompaniment\&#34;:[\&#34;drums\&#34;,\&#34;bass\&#34;,\&#34;other\&#34;]}\&#39;
-        separator: if provided, the model.Separator object that will be used
-             to perform separation
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    if separator is None:
-        separator = utils.load_separator(
-            model_str_or_path=model_str_or_path,
-            targets=targets,
-            niter=niter,
-            residual=residual,
-            wiener_win_len=wiener_win_len,
-            device=device,
-            pretrained=True,
-            filterbank=filterbank,
-        )
-        separator.freeze()
-        if device:
-            separator.to(device)
-    if rate is None:
-        raise Exception(&#34;rate` must be provided.&#34;)
-    if device:
-        audio = audio.to(device)
-    audio = utils.preprocess(audio, rate, separator.sample_rate)
-    # getting the separated signals
-    estimates = separator(audio)
-    estimates = separator.to_dict(estimates, aggregate_dict=aggregate_dict)
-    return estimates</code></pre>
-<nav id="sidebar">
-<div class="toc">
-<ul id="index">
-<li><code><a title="openunmix" href="index.html">openunmix</a></code></li>
-<li><h3><a href="#header-functions">Functions</a></h3>
-<ul class="">
-<li><code><a title="openunmix.predict.separate" href="#openunmix.predict.separate">separate</a></code></li>
-<footer id="footer">
-<p>Generated by <a href="https://pdoc3.github.io/pdoc"><cite>pdoc</cite> 0.9.2</a>.</p>
\ No newline at end of file
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/training.md b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/training.md
deleted file mode 100644
index 82fa80b959ea66908e1c01f846ac8fb2b236631d..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/training.md
+++ /dev/null
@@ -1,240 +0,0 @@
-# Training Open-Unmix
-> This documentation refers to the standard training procedure for _Open-unmix_, where each target is trained independently. It has not been updated for the end-to-end training capabilities that the `Separator` module allows. Please contribute if you try this.
-Both models, `umxhq` and `umx` that are provided with pre-trained weights, can be trained using the default parameters of the `scripts/train.py` function.
-## Installation
-The train function is not part of the python package, thus we suggest to use [Anaconda](https://anaconda.org/) to install the training requirments, also because the environment would allow reproducible results.
-To create a conda environment for _open-unmix_, simply run:
-`conda env create -f scripts/environment-X.yml` where `X` is either [`cpu-linux`, `gpu-linux-cuda10`, `cpu-osx`], depending on your system. For now, we haven't tested windows support.
-## Training API
-The [MUSDB18](https://sigsep.github.io/datasets/musdb.html) and [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html) are the largest freely available datasets for professionally produced music tracks (~10h duration) of different styles. They come with isolated `drums`, `bass`, `vocals` and `others` stems. _MUSDB18_ contains two subsets: "train", composed of 100 songs, and "test", composed of 50 songs.
-To directly train a vocal model with _open-unmix_, we first would need to download one of the datasets and place in _unzipped_ in a directory of your choice (called `root`).
-| Argument | Description | Default |
-| `--root <str>` | path to root of dataset on disk.                                                  | `None`       |
-Also note that, if `--root` is not specified, we automatically download a 7 second preview version of the MUSDB18 dataset. While this is comfortable for testing purposes, we wouldn't recommend to actually train your model on this.
-Training can be started using
-python train.py --root path/to/musdb18 --target vocals
-Training `MUSDB18` using _open-unmix_ comes with several design decisions that we made as part of our defaults to improve efficiency and performance:
-* __chunking__: we do not feed full audio tracks into _open-unmix_ but instead chunk the audio into 6s excerpts (`--seq-dur 6.0`).
-* __balanced track sampling__: to not create a bias for longer audio tracks we randomly yield one track from MUSDB18 and select a random chunk subsequently. In one epoch we select (on average) 64 samples from each track.
-* __source augmentation__: we apply random gains between `0.25` and `1.25` to all sources before mixing. Furthermore, we randomly swap the channels the input mixture.
-* __random track mixing__: for a given target we select a _random track_ with replacement. To yield a mixture we draw the interfering sources from different tracks (again with replacement) to increase generalization of the model.
-* __fixed validation split__: we provide a fixed validation split of [14 tracks](https://github.com/sigsep/sigsep-mus-db/blob/b283da5b8f24e84172a60a06bb8f3dacd57aa6cd/musdb/configs/mus.yaml#L41). We evaluate on these tracks in full length instead of using chunking to have evaluation as close as possible to the actual test data.
-Some of the parameters for the MUSDB sampling can be controlled using the following arguments:
-| Argument      | Description                                                            | Default      |
-| `--is-wav`          | loads the decoded WAVs instead of STEMS for faster data loading. See [more details here](https://github.com/sigsep/sigsep-mus-db#using-wav-files-optional). | `True`      |
-| `--samples-per-track <int>` | sets the number of samples that are randomly drawn from each track  | `64`       |
-| `--source-augmentations <list[str]>` | applies augmentations to each audio source before mixing, available augmentations: `[gain, channelswap]`| [gain, channelswap]       |
-## Training and Model Parameters
-An extensive list of additional training parameters allows researchers to quickly try out different parameterizations such as a different FFT size. The table below, we list the additional training parameters and their default values (used for `umxhq` and `umx`L:
-| Argument      | Description                                                                     | Default         |
-| `--target <str>`           | name of target source (will be passed to the dataset)                         | `vocals`      |
-| `--output <str>`           | path where to save the trained output model as well as checkpoints.                         | `./open-unmix`      |
-| `--checkpoint <str>`           | path to checkpoint of target model to resume training. | not set      |
-| `--model <str>`           | path or str to pretrained target to fine-tune model | not set      |
-| `--no_cuda`           | disable cuda even if available                                              | not set      |
-| `--epochs <int>`           | Number of epochs to train                                                       | `1000`          |
-| `--batch-size <int>`       | Batch size has influence on memory usage and performance of the LSTM layer      | `16`            |
-| `--patience <int>`         | early stopping patience                                                         | `140`            |
-| `--seq-dur <int>`          | Sequence duration in seconds of chunks taken from the dataset. A value of `<=0.0` results in full/variable length               | `6.0`           |
-| `--unidirectional`           | changes the bidirectional LSTM to unidirectional (for real-time applications)  | not set      |
-| `--hidden-size <int>`             | Hidden size parameter of dense bottleneck layers  | `512`            |
-| `--nfft <int>`             | STFT FFT window length in samples                                               | `4096`          |
-| `--nhop <int>`             | STFT hop length in samples                                                      | `1024`          |
-| `--lr <float>`             | learning rate                                                                   | `0.001`        |
-| `--lr-decay-patience <int>`             | learning rate decay patience for plateau scheduler                                                                   | `80`        |
-| `--lr-decay-gamma <float>`             | gamma of learning rate plateau scheduler.  | `0.3`        |
-| `--weight-decay <float>`             | weight decay for regularization                                                                   | `0.00001`        |
-| `--bandwidth <int>`        | maximum bandwidth in Hertz processed by the LSTM. Input and Output is always full bandwidth! | `16000`         |
-| `--nb-channels <int>`      | set number of channels for model (1 for mono (spectral downmix is applied,) 2 for stereo)                     | `2`             |
-| `--nb-workers <int>`      | Number of (parallel) workers for data-loader, can be safely increased for wav files   | `0` |
-| `--quiet`                  | disable print and progress bar during training                                   | not set         |
-| `--seed <int>`             | Initial seed to set the random initialization                                   | `42`            |
-| `--audio-backend <str>`         | choose audio loading backend, either `sox` or `soundfile` | `soundfile` for training, `sox` for inference |
-### Training details of `umxhq`
-The training of `umxhq` took place on Nvidia RTX2080 cards. Equipped with fast SSDs and `--nb-workers 4`, we could utilize around 90% of the GPU, thus training time was around 80 seconds per epoch. We ran four different seeds for each target and selected the model with the lowest validation loss.
-The training and validation loss curves are plotted below:
-## Other Datasets
-_open-unmix_ uses standard PyTorch [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) classes. The repository comes with __five__ different datasets which cover a wide range of tasks and applications around source separation. Furthermore we also provide a template Dataset if you want to start using your own dataset. The dataset can be selected through a command line argument:
-| Argument      | Description                                                            | Default      |
-| `--dataset <str>`          | Name of the dataset (select from `musdb`, `aligned`, `sourcefolder`, `trackfolder_var`, `trackfolder_fix`) | `musdb`      |
-### `AlignedDataset` (aligned)
-This dataset assumes multiple track folders, where each track includes an input and one output file, directly corresponding to the input and the output of the model.
-This dataset is the most basic of all datasets provided here, due to the least amount of
-preprocessing, it is also the fastest option, however, it lacks any kind of source augmentations or custom mixing. Instead, it directly uses the target files that are within the folder. The filenames would have to be identical for each track. E.g, for the first sample of the training, input could be `1/mixture.wav` and output could be `1/vocals.wav`.
-Typical use cases:
-* Source Separation (Mixture -> Target)
-* Denoising (Noisy -> Clean)
-* Bandwidth Extension (Low Bandwidth -> High Bandwidth)
-#### File Structure
-data/train/1/mixture.wav --> input
-data/train/1/vocals.wav ---> output
-data/valid/1/mixture.wav --> input
-data/valid/1/vocals.wav ---> output
-#### Parameters
-| Argument | Description | Default |
-|`--input-file <str>` | input file name | `None` |
-|`--output-file <str>` | output file name | `None` |
-#### Example
-python train.py --dataset aligned --root /dataset --input_file mixture.wav --output_file vocals.wav
-### `SourceFolderDataset` (sourcefolder)
-A dataset of that assumes folders of sources,
-instead of track folders. This is a common
-format for speech and environmental sound datasets
-such das DCASE. For each source a variable number of
-tracks/sounds is available, therefore the dataset is unaligned by design.
-In this scenario one could easily train a network to separate a target sounds from interfering sounds. For each sample, the data loader loads a random combination of target+interferer as the input and performs a linear mixture of these. The output of the model is the target.
-#### File structure
-train/vocals/track11.wav -----------------\
-train/drums/track202.wav  (interferer1) ---+--> input
-train/bass/track007a.wav  (interferer2) --/
-train/vocals/track11.wav ---------------------> output
-#### Parameters
-| Argument | Description | Default |
-|`--interferer-dirs list[<str>]` | list of directories used as interferers | `None` |
-|`--target-dir <str>` | directory that contains the target source | `None` |
-|`--ext <str>` | File extension | `.wav` |
-|`--ext <str>` | File extension | `.wav` |
-|`--nb-train-samples <str>` | Number of samples drawn for training | `1000` |
-|`--nb-valid-samples <str>` | Number of samples drawn for validation | `100` |
-|`--source-augmentations list[<str>]` | List of augmentation functions that are processed in the order of the list | |
-#### Example
-python train.py --dataset sourcefolder --root /data --target-dir vocals --interferer-dirs car_noise wind_noise --ext .ogg --nb-train-samples 1000
-### `FixedSourcesTrackFolderDataset` (trackfolder_fix)
-A dataset of that assumes audio sources to be stored
-in track folder where each track has a fixed number of sources. For each track the users specifies the target file-name (`target_file`) and a list of interferences files (`interferer_files`).
-A linear mix is performed on the fly by summing the target and the interferers up.
-Due to the fact that all tracks comprise the exact same set of sources, the random track mixing augmentation technique can be used, where sources from different tracks are mixed together. Setting `random_track_mix=True` results in an unaligned dataset.
-When random track mixing is enabled, we define an epoch as when the the target source from all tracks has been seen and only once with whatever interfering sources has randomly been drawn.
-This dataset is recommended to be used for small/medium size for example like the MUSDB18 or other custom source separation datasets.
-#### File structure
-train/1/vocals.wav ---------------\
-train/1/drums.wav (interferer1) ---+--> input
-train/1/bass.wav -(interferer2) --/
-train/1/vocals.wav -------------------> output
-#### Parameters
-| Argument | Description | Default |
-|`--target-file <str>` | Target file (includes extension) | `None` |
-|`--interferer-files list[<str>]` | list of interfering sources | `None` |
-|`--random-track-mix` | Applies random track mixing | `False` |
-|`--source-augmentations list[<str>]` | List of augmentation functions that are processed in the order of the list | |
-#### Example
-python train.py  --root /data --dataset trackfolder_fix --target-file vocals.flac --interferer-files bass.flac drums.flac other.flac
-### `VariableSourcesTrackFolderDataset` (trackfolder_var)
-A dataset of that assumes audio sources to be stored in track folder where each track has a _variable_ number of sources. The users specifies the target file-name (`target_file`) and the extension of sources to used for mixing. A linear mix is performed on the fly by summing all sources in a track folder.
-Since the number of sources differ per track, while target is fixed, a random track mix augmentation cannot be used.
-Also make sure, that you do not provide the mixture file among the sources! This dataset maximizes the number of tracks that can be used since it doesn't require the presence of a fixed number of sources per track. However, it is required to
-have the target file to be present. To increase the dataset utilization even further users can enable the `--silence-missing-targets` option that outputs silence to missing targets.
-#### File structure
-train/1/vocals.wav --> input target   \
-train/1/drums.wav --> input target     |
-train/1/bass.wav --> input target    --+--> input
-train/1/accordion.wav --> input target |
-train/1/marimba.wav --> input target  /
-train/1/vocals.wav -----------------------> output
-#### Parameters
-| Argument | Description | Default |
-|`--target-file <str>` | file name of target file | `None` |
-|`--silence-missing-targets` | if a target is not among the list of sources it will be filled with zero | not set |
-|`random interferer mixing` | use _random track_ for the inference track to increase generalization of the model. | not set |
-|`--ext <str>` | File extension that is used to find the interfering files | `.wav` |
-|`--source-augmentations list[<str>]` | List of augmentation functions that are processed in the order of the list | |
-#### Example
-python train.py --root /data --dataset trackfolder_var --target-file vocals.flac --ext .wav
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/transforms.html b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/transforms.html
deleted file mode 100644
index 251edafc481921734ecaa357bf043b8fd875e60a..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/transforms.html
+++ /dev/null
@@ -1,950 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
-<meta name="generator" content="pdoc 0.9.2" />
-<title>openunmix.transforms API documentation</title>
-<meta name="description" content="" />
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
-<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
-<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
-<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
-<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
-<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
-<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
-<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
-<article id="content">
-<h1 class="title">Module <code>openunmix.transforms</code></h1>
-<section id="section-intro">
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L0-L208" class="git-link">Browse git</a>
-<pre><code class="python">from typing import Optional
-import torch
-import torchaudio
-from torch import Tensor
-import torch.nn as nn
-    from asteroid_filterbanks.enc_dec import Encoder, Decoder
-    from asteroid_filterbanks.transforms import to_torchaudio, from_torchaudio
-    from asteroid_filterbanks import torch_stft_fb
-except ImportError:
-    pass
-def make_filterbanks(n_fft=4096, n_hop=1024, center=False, sample_rate=44100.0, method=&#34;torch&#34;):
-    window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-    if method == &#34;torch&#34;:
-        encoder = TorchSTFT(n_fft=n_fft, n_hop=n_hop, window=window, center=center)
-        decoder = TorchISTFT(n_fft=n_fft, n_hop=n_hop, window=window, center=center)
-    elif method == &#34;asteroid&#34;:
-        fb = torch_stft_fb.TorchSTFTFB.from_torch_args(
-            n_fft=n_fft,
-            hop_length=n_hop,
-            win_length=n_fft,
-            window=window,
-            center=center,
-            sample_rate=sample_rate,
-        )
-        encoder = AsteroidSTFT(fb)
-        decoder = AsteroidISTFT(fb)
-    else:
-        raise NotImplementedError
-    return encoder, decoder
-class AsteroidSTFT(nn.Module):
-    def __init__(self, fb):
-        super(AsteroidSTFT, self).__init__()
-        self.enc = Encoder(fb)
-    def forward(self, x):
-        aux = self.enc(x)
-        return to_torchaudio(aux)
-class AsteroidISTFT(nn.Module):
-    def __init__(self, fb):
-        super(AsteroidISTFT, self).__init__()
-        self.dec = Decoder(fb)
-    def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-        aux = from_torchaudio(X)
-        return self.dec(aux, length=length)
-class TorchSTFT(nn.Module):
-    &#34;&#34;&#34;Multichannel Short-Time-Fourier Forward transform
-    uses hard coded hann_window.
-    Args:
-        n_fft (int, optional): transform FFT size. Defaults to 4096.
-        n_hop (int, optional): transform hop size. Defaults to 1024.
-        center (bool, optional): If True, the signals first window is
-            zero padded. Centering is required for a perfect
-            reconstruction of the signal. However, during training
-            of spectrogram models, it can safely turned off.
-            Defaults to `true`
-        window (nn.Parameter, optional): window function
-    &#34;&#34;&#34;
-    def __init__(self, n_fft=4096, n_hop=1024, center=False, window=None):
-        super(TorchSTFT, self).__init__()
-        if window is not None:
-            self.window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-        else:
-            self.window = window
-        self.n_fft = n_fft
-        self.n_hop = n_hop
-        self.center = center
-    def forward(self, x: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;STFT forward path
-        Args:
-            x (Tensor): audio waveform of
-                shape (nb_samples, nb_channels, nb_timesteps)
-        Returns:
-            STFT (Tensor): complex stft of
-                shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-                last axis is stacked real and imaginary
-        &#34;&#34;&#34;
-        shape = x.size()
-        nb_samples, nb_channels, nb_timesteps = shape
-        # pack batch
-        x = x.view(-1, shape[-1])
-        stft_f = torch.stft(
-            x,
-            n_fft=self.n_fft,
-            hop_length=self.n_hop,
-            window=self.window,
-            center=self.center,
-            normalized=False,
-            onesided=True,
-            pad_mode=&#34;reflect&#34;,
-        )
-        # unpack batch
-        stft_f = stft_f.view(shape[:-1] + stft_f.shape[-3:])
-        return stft_f
-class TorchISTFT(nn.Module):
-    &#34;&#34;&#34;Multichannel Inverse-Short-Time-Fourier functional
-    wrapper for torch.istft to support batches
-    Args:
-        STFT (Tensor): complex stft of
-            shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-            last axis is stacked real and imaginary
-        n_fft (int, optional): transform FFT size. Defaults to 4096.
-        n_hop (int, optional): transform hop size. Defaults to 1024.
-        window (callable, optional): window function
-        center (bool, optional): If True, the signals first window is
-            zero padded. Centering is required for a perfect
-            reconstruction of the signal. However, during training
-            of spectrogram models, it can safely turned off.
-            Defaults to `true`
-        length (int, optional): audio signal length to crop the signal
-    Returns:
-        x (Tensor): audio waveform of
-            shape (nb_samples, nb_channels, nb_timesteps)
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        n_fft: int = 4096,
-        n_hop: int = 1024,
-        center: bool = False,
-        sample_rate: float = 44100.0,
-        window: Optional[nn.Parameter] = None,
-    ) -&gt; None:
-        super(TorchISTFT, self).__init__()
-        self.n_fft = n_fft
-        self.n_hop = n_hop
-        self.center = center
-        self.sample_rate = sample_rate
-        if window is not None:
-            self.window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-        else:
-            self.window = window
-    def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-        shape = X.size()
-        X = X.reshape(-1, shape[-3], shape[-2], shape[-1])
-        y = torch.istft(
-            X,
-            n_fft=self.n_fft,
-            hop_length=self.n_hop,
-            window=self.window,
-            center=self.center,
-            normalized=False,
-            onesided=True,
-            length=length,
-        )
-        y = y.reshape(shape[:-3] + y.shape[-1:])
-        return y
-class ComplexNorm(nn.Module):
-    r&#34;&#34;&#34;Compute the norm of complex tensor input.
-    Extension of `torchaudio.functional.complex_norm` with mono
-    Args:
-        power (float): Power of the norm. (Default: `1.0`).
-        mono (bool): Downmix to single channel after applying power norm
-            to maximize
-    &#34;&#34;&#34;
-    def __init__(self, power: float = 1.0, mono: bool = False):
-        super(ComplexNorm, self).__init__()
-        self.power = power
-        self.mono = mono
-    def forward(self, spec: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;
-        Args:
-            spec: complex_tensor (Tensor): Tensor shape of
-                `(..., complex=2)`
-        Returns:
-            Tensor: Power/Mag of input
-                `(...,)`
-        &#34;&#34;&#34;
-        # take the magnitude
-        spec = torchaudio.functional.complex_norm(spec, power=self.power)
-        # downmix in the mag domain to preserve energy
-        if self.mono:
-            spec = torch.mean(spec, 1, keepdim=True)
-        return spec</code></pre>
-<h2 class="section-title" id="header-functions">Functions</h2>
-<dt id="openunmix.transforms.make_filterbanks"><code class="name flex">
-<span>def <span class="ident">make_filterbanks</span></span>(<span>n_fft=4096, n_hop=1024, center=False, sample_rate=44100.0, method='torch')</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L16-L35" class="git-link">Browse git</a>
-<pre><code class="python">def make_filterbanks(n_fft=4096, n_hop=1024, center=False, sample_rate=44100.0, method=&#34;torch&#34;):
-    window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-    if method == &#34;torch&#34;:
-        encoder = TorchSTFT(n_fft=n_fft, n_hop=n_hop, window=window, center=center)
-        decoder = TorchISTFT(n_fft=n_fft, n_hop=n_hop, window=window, center=center)
-    elif method == &#34;asteroid&#34;:
-        fb = torch_stft_fb.TorchSTFTFB.from_torch_args(
-            n_fft=n_fft,
-            hop_length=n_hop,
-            win_length=n_fft,
-            window=window,
-            center=center,
-            sample_rate=sample_rate,
-        )
-        encoder = AsteroidSTFT(fb)
-        decoder = AsteroidISTFT(fb)
-    else:
-        raise NotImplementedError
-    return encoder, decoder</code></pre>
-<h2 class="section-title" id="header-classes">Classes</h2>
-<dt id="openunmix.transforms.AsteroidISTFT"><code class="flex name class">
-<span>class <span class="ident">AsteroidISTFT</span></span>
-<div class="desc"><p>Base class for all neural network modules.</p>
-<p>Your models should also subclass this class.</p>
-<p>Modules can also contain other Modules, allowing to nest them in
-a tree structure. You can assign the submodules as regular attributes::</p>
-<pre><code>import torch.nn as nn
-import torch.nn.functional as F
-class Model(nn.Module):
-    def __init__(self):
-        super(Model, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5)
-        self.conv2 = nn.Conv2d(20, 20, 5)
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        return F.relu(self.conv2(x))
-<p>Submodules assigned in this way will be registered, and will have their
-parameters converted too when you call :meth:<code>to</code>, etc.</p>
-<p>:ivar training: Boolean represents whether this module is in training or
-evaluation mode.
-:vartype training: bool</p>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L48-L55" class="git-link">Browse git</a>
-<pre><code class="python">class AsteroidISTFT(nn.Module):
-    def __init__(self, fb):
-        super(AsteroidISTFT, self).__init__()
-        self.dec = Decoder(fb)
-    def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-        aux = from_torchaudio(X)
-        return self.dec(aux, length=length)</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.transforms.AsteroidISTFT.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.AsteroidISTFT.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.AsteroidISTFT.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, X: torch.Tensor, length: Union[int, NoneType] = None) ‑> torch.Tensor</span>
-<div class="desc"><p>Defines the computation performed at every call.</p>
-<p>Should be overridden by all subclasses.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>Although the recipe for forward pass needs to be defined within
-this function, one should call the :class:<code>Module</code> instance afterwards
-instead of this since the former takes care of running the
-registered hooks while the latter silently ignores them.</p>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L53-L55" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-    aux = from_torchaudio(X)
-    return self.dec(aux, length=length)</code></pre>
-<dt id="openunmix.transforms.AsteroidSTFT"><code class="flex name class">
-<span>class <span class="ident">AsteroidSTFT</span></span>
-<div class="desc"><p>Base class for all neural network modules.</p>
-<p>Your models should also subclass this class.</p>
-<p>Modules can also contain other Modules, allowing to nest them in
-a tree structure. You can assign the submodules as regular attributes::</p>
-<pre><code>import torch.nn as nn
-import torch.nn.functional as F
-class Model(nn.Module):
-    def __init__(self):
-        super(Model, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5)
-        self.conv2 = nn.Conv2d(20, 20, 5)
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        return F.relu(self.conv2(x))
-<p>Submodules assigned in this way will be registered, and will have their
-parameters converted too when you call :meth:<code>to</code>, etc.</p>
-<p>:ivar training: Boolean represents whether this module is in training or
-evaluation mode.
-:vartype training: bool</p>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L38-L45" class="git-link">Browse git</a>
-<pre><code class="python">class AsteroidSTFT(nn.Module):
-    def __init__(self, fb):
-        super(AsteroidSTFT, self).__init__()
-        self.enc = Encoder(fb)
-    def forward(self, x):
-        aux = self.enc(x)
-        return to_torchaudio(aux)</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.transforms.AsteroidSTFT.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.AsteroidSTFT.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.AsteroidSTFT.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, x) ‑> Callable[..., Any]</span>
-<div class="desc"><p>Defines the computation performed at every call.</p>
-<p>Should be overridden by all subclasses.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>Although the recipe for forward pass needs to be defined within
-this function, one should call the :class:<code>Module</code> instance afterwards
-instead of this since the former takes care of running the
-registered hooks while the latter silently ignores them.</p>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L43-L45" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, x):
-    aux = self.enc(x)
-    return to_torchaudio(aux)</code></pre>
-<dt id="openunmix.transforms.ComplexNorm"><code class="flex name class">
-<span>class <span class="ident">ComplexNorm</span></span>
-<span>(</span><span>power: float = 1.0, mono: bool = False)</span>
-<div class="desc"><p>Compute the norm of complex tensor input.</p>
-<p>Extension of <code>torchaudio.functional.complex_norm</code> with mono</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>power</code></strong> :&ensp;<code>float</code></dt>
-<dd>Power of the norm. (Default: <code>1.0</code>).</dd>
-<dt><strong><code>mono</code></strong> :&ensp;<code>bool</code></dt>
-<dd>Downmix to single channel after applying power norm
-to maximize</dd>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L176-L209" class="git-link">Browse git</a>
-<pre><code class="python">class ComplexNorm(nn.Module):
-    r&#34;&#34;&#34;Compute the norm of complex tensor input.
-    Extension of `torchaudio.functional.complex_norm` with mono
-    Args:
-        power (float): Power of the norm. (Default: `1.0`).
-        mono (bool): Downmix to single channel after applying power norm
-            to maximize
-    &#34;&#34;&#34;
-    def __init__(self, power: float = 1.0, mono: bool = False):
-        super(ComplexNorm, self).__init__()
-        self.power = power
-        self.mono = mono
-    def forward(self, spec: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;
-        Args:
-            spec: complex_tensor (Tensor): Tensor shape of
-                `(..., complex=2)`
-        Returns:
-            Tensor: Power/Mag of input
-                `(...,)`
-        &#34;&#34;&#34;
-        # take the magnitude
-        spec = torchaudio.functional.complex_norm(spec, power=self.power)
-        # downmix in the mag domain to preserve energy
-        if self.mono:
-            spec = torch.mean(spec, 1, keepdim=True)
-        return spec</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.transforms.ComplexNorm.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.ComplexNorm.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.ComplexNorm.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, spec: torch.Tensor) ‑> torch.Tensor</span>
-<div class="desc"><h2 id="args">Args</h2>
-<dd>complex_tensor (Tensor): Tensor shape of
-<code>(..., complex=2)</code></dd>
-<h2 id="returns">Returns</h2>
-<dd>Power/Mag of input
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L192-L209" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, spec: Tensor) -&gt; Tensor:
-    &#34;&#34;&#34;
-    Args:
-        spec: complex_tensor (Tensor): Tensor shape of
-            `(..., complex=2)`
-    Returns:
-        Tensor: Power/Mag of input
-            `(...,)`
-    &#34;&#34;&#34;
-    # take the magnitude
-    spec = torchaudio.functional.complex_norm(spec, power=self.power)
-    # downmix in the mag domain to preserve energy
-    if self.mono:
-        spec = torch.mean(spec, 1, keepdim=True)
-    return spec</code></pre>
-<dt id="openunmix.transforms.TorchISTFT"><code class="flex name class">
-<span>class <span class="ident">TorchISTFT</span></span>
-<span>(</span><span>n_fft: int = 4096, n_hop: int = 1024, center: bool = False, sample_rate: float = 44100.0, window: Union[torch.nn.parameter.Parameter, NoneType] = None)</span>
-<div class="desc"><p>Multichannel Inverse-Short-Time-Fourier functional
-wrapper for torch.istft to support batches</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>STFT</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>complex stft of
-shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-last axis is stacked real and imaginary</dd>
-<dt><strong><code>n_fft</code></strong> :&ensp;<code>int</code>, optional</dt>
-<dd>transform FFT size. Defaults to 4096.</dd>
-<dt><strong><code>n_hop</code></strong> :&ensp;<code>int</code>, optional</dt>
-<dd>transform hop size. Defaults to 1024.</dd>
-<dt><strong><code>window</code></strong> :&ensp;<code>callable</code>, optional</dt>
-<dd>window function</dd>
-<dt><strong><code>center</code></strong> :&ensp;<code>bool</code>, optional</dt>
-<dd>If True, the signals first window is
-zero padded. Centering is required for a perfect
-reconstruction of the signal. However, during training
-of spectrogram models, it can safely turned off.
-Defaults to <code>true</code></dd>
-<dt><strong><code>length</code></strong> :&ensp;<code>int</code>, optional</dt>
-<dd>audio signal length to crop the signal</dd>
-<h2 id="returns">Returns</h2>
-<p>x (Tensor): audio waveform of
-shape (nb_samples, nb_channels, nb_timesteps)
-Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L115-L173" class="git-link">Browse git</a>
-<pre><code class="python">class TorchISTFT(nn.Module):
-    &#34;&#34;&#34;Multichannel Inverse-Short-Time-Fourier functional
-    wrapper for torch.istft to support batches
-    Args:
-        STFT (Tensor): complex stft of
-            shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-            last axis is stacked real and imaginary
-        n_fft (int, optional): transform FFT size. Defaults to 4096.
-        n_hop (int, optional): transform hop size. Defaults to 1024.
-        window (callable, optional): window function
-        center (bool, optional): If True, the signals first window is
-            zero padded. Centering is required for a perfect
-            reconstruction of the signal. However, during training
-            of spectrogram models, it can safely turned off.
-            Defaults to `true`
-        length (int, optional): audio signal length to crop the signal
-    Returns:
-        x (Tensor): audio waveform of
-            shape (nb_samples, nb_channels, nb_timesteps)
-    &#34;&#34;&#34;
-    def __init__(
-        self,
-        n_fft: int = 4096,
-        n_hop: int = 1024,
-        center: bool = False,
-        sample_rate: float = 44100.0,
-        window: Optional[nn.Parameter] = None,
-    ) -&gt; None:
-        super(TorchISTFT, self).__init__()
-        self.n_fft = n_fft
-        self.n_hop = n_hop
-        self.center = center
-        self.sample_rate = sample_rate
-        if window is not None:
-            self.window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-        else:
-            self.window = window
-    def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-        shape = X.size()
-        X = X.reshape(-1, shape[-3], shape[-2], shape[-1])
-        y = torch.istft(
-            X,
-            n_fft=self.n_fft,
-            hop_length=self.n_hop,
-            window=self.window,
-            center=self.center,
-            normalized=False,
-            onesided=True,
-            length=length,
-        )
-        y = y.reshape(shape[:-3] + y.shape[-1:])
-        return y</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.transforms.TorchISTFT.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.TorchISTFT.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.TorchISTFT.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, X: torch.Tensor, length: Union[int, NoneType] = None) ‑> torch.Tensor</span>
-<div class="desc"><p>Defines the computation performed at every call.</p>
-<p>Should be overridden by all subclasses.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>Although the recipe for forward pass needs to be defined within
-this function, one should call the :class:<code>Module</code> instance afterwards
-instead of this since the former takes care of running the
-registered hooks while the latter silently ignores them.</p>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L156-L173" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, X: Tensor, length: Optional[int] = None) -&gt; Tensor:
-    shape = X.size()
-    X = X.reshape(-1, shape[-3], shape[-2], shape[-1])
-    y = torch.istft(
-        X,
-        n_fft=self.n_fft,
-        hop_length=self.n_hop,
-        window=self.window,
-        center=self.center,
-        normalized=False,
-        onesided=True,
-        length=length,
-    )
-    y = y.reshape(shape[:-3] + y.shape[-1:])
-    return y</code></pre>
-<dt id="openunmix.transforms.TorchSTFT"><code class="flex name class">
-<span>class <span class="ident">TorchSTFT</span></span>
-<span>(</span><span>n_fft=4096, n_hop=1024, center=False, window=None)</span>
-<div class="desc"><p>Multichannel Short-Time-Fourier Forward transform
-uses hard coded hann_window.</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>n_fft</code></strong> :&ensp;<code>int</code>, optional</dt>
-<dd>transform FFT size. Defaults to 4096.</dd>
-<dt><strong><code>n_hop</code></strong> :&ensp;<code>int</code>, optional</dt>
-<dd>transform hop size. Defaults to 1024.</dd>
-<dt><strong><code>center</code></strong> :&ensp;<code>bool</code>, optional</dt>
-<dd>If True, the signals first window is
-zero padded. Centering is required for a perfect
-reconstruction of the signal. However, during training
-of spectrogram models, it can safely turned off.
-Defaults to <code>true</code></dd>
-<dt><strong><code>window</code></strong> :&ensp;<code>nn.Parameter</code>, optional</dt>
-<dd>window function</dd>
-<p>Initializes internal Module state, shared by both nn.Module and ScriptModule.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L58-L112" class="git-link">Browse git</a>
-<pre><code class="python">class TorchSTFT(nn.Module):
-    &#34;&#34;&#34;Multichannel Short-Time-Fourier Forward transform
-    uses hard coded hann_window.
-    Args:
-        n_fft (int, optional): transform FFT size. Defaults to 4096.
-        n_hop (int, optional): transform hop size. Defaults to 1024.
-        center (bool, optional): If True, the signals first window is
-            zero padded. Centering is required for a perfect
-            reconstruction of the signal. However, during training
-            of spectrogram models, it can safely turned off.
-            Defaults to `true`
-        window (nn.Parameter, optional): window function
-    &#34;&#34;&#34;
-    def __init__(self, n_fft=4096, n_hop=1024, center=False, window=None):
-        super(TorchSTFT, self).__init__()
-        if window is not None:
-            self.window = nn.Parameter(torch.hann_window(n_fft), requires_grad=False)
-        else:
-            self.window = window
-        self.n_fft = n_fft
-        self.n_hop = n_hop
-        self.center = center
-    def forward(self, x: Tensor) -&gt; Tensor:
-        &#34;&#34;&#34;STFT forward path
-        Args:
-            x (Tensor): audio waveform of
-                shape (nb_samples, nb_channels, nb_timesteps)
-        Returns:
-            STFT (Tensor): complex stft of
-                shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-                last axis is stacked real and imaginary
-        &#34;&#34;&#34;
-        shape = x.size()
-        nb_samples, nb_channels, nb_timesteps = shape
-        # pack batch
-        x = x.view(-1, shape[-1])
-        stft_f = torch.stft(
-            x,
-            n_fft=self.n_fft,
-            hop_length=self.n_hop,
-            window=self.window,
-            center=self.center,
-            normalized=False,
-            onesided=True,
-            pad_mode=&#34;reflect&#34;,
-        )
-        # unpack batch
-        stft_f = stft_f.view(shape[:-1] + stft_f.shape[-3:])
-        return stft_f</code></pre>
-<ul class="hlist">
-<h3>Class variables</h3>
-<dt id="openunmix.transforms.TorchSTFT.dump_patches"><code class="name">var <span class="ident">dump_patches</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.TorchSTFT.training"><code class="name">var <span class="ident">training</span> : bool</code></dt>
-<div class="desc"></div>
-<dt id="openunmix.transforms.TorchSTFT.forward"><code class="name flex">
-<span>def <span class="ident">forward</span></span>(<span>self, x: torch.Tensor) ‑> torch.Tensor</span>
-<div class="desc"><p>STFT forward path</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>x</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>audio waveform of
-shape (nb_samples, nb_channels, nb_timesteps)</dd>
-<h2 id="returns">Returns</h2>
-<p>STFT (Tensor): complex stft of
-shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-last axis is stacked real and imaginary</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/transforms.py#L82-L112" class="git-link">Browse git</a>
-<pre><code class="python">def forward(self, x: Tensor) -&gt; Tensor:
-    &#34;&#34;&#34;STFT forward path
-    Args:
-        x (Tensor): audio waveform of
-            shape (nb_samples, nb_channels, nb_timesteps)
-    Returns:
-        STFT (Tensor): complex stft of
-            shape (nb_samples, nb_channels, nb_bins, nb_frames, complex=2)
-            last axis is stacked real and imaginary
-    &#34;&#34;&#34;
-    shape = x.size()
-    nb_samples, nb_channels, nb_timesteps = shape
-    # pack batch
-    x = x.view(-1, shape[-1])
-    stft_f = torch.stft(
-        x,
-        n_fft=self.n_fft,
-        hop_length=self.n_hop,
-        window=self.window,
-        center=self.center,
-        normalized=False,
-        onesided=True,
-        pad_mode=&#34;reflect&#34;,
-    )
-    # unpack batch
-    stft_f = stft_f.view(shape[:-1] + stft_f.shape[-3:])
-    return stft_f</code></pre>
-<nav id="sidebar">
-<div class="toc">
-<ul id="index">
-<li><code><a title="openunmix" href="index.html">openunmix</a></code></li>
-<li><h3><a href="#header-functions">Functions</a></h3>
-<ul class="">
-<li><code><a title="openunmix.transforms.make_filterbanks" href="#openunmix.transforms.make_filterbanks">make_filterbanks</a></code></li>
-<li><h3><a href="#header-classes">Classes</a></h3>
-<h4><code><a title="openunmix.transforms.AsteroidISTFT" href="#openunmix.transforms.AsteroidISTFT">AsteroidISTFT</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.transforms.AsteroidISTFT.dump_patches" href="#openunmix.transforms.AsteroidISTFT.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.transforms.AsteroidISTFT.forward" href="#openunmix.transforms.AsteroidISTFT.forward">forward</a></code></li>
-<li><code><a title="openunmix.transforms.AsteroidISTFT.training" href="#openunmix.transforms.AsteroidISTFT.training">training</a></code></li>
-<h4><code><a title="openunmix.transforms.AsteroidSTFT" href="#openunmix.transforms.AsteroidSTFT">AsteroidSTFT</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.transforms.AsteroidSTFT.dump_patches" href="#openunmix.transforms.AsteroidSTFT.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.transforms.AsteroidSTFT.forward" href="#openunmix.transforms.AsteroidSTFT.forward">forward</a></code></li>
-<li><code><a title="openunmix.transforms.AsteroidSTFT.training" href="#openunmix.transforms.AsteroidSTFT.training">training</a></code></li>
-<h4><code><a title="openunmix.transforms.ComplexNorm" href="#openunmix.transforms.ComplexNorm">ComplexNorm</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.transforms.ComplexNorm.dump_patches" href="#openunmix.transforms.ComplexNorm.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.transforms.ComplexNorm.forward" href="#openunmix.transforms.ComplexNorm.forward">forward</a></code></li>
-<li><code><a title="openunmix.transforms.ComplexNorm.training" href="#openunmix.transforms.ComplexNorm.training">training</a></code></li>
-<h4><code><a title="openunmix.transforms.TorchISTFT" href="#openunmix.transforms.TorchISTFT">TorchISTFT</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.transforms.TorchISTFT.dump_patches" href="#openunmix.transforms.TorchISTFT.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.transforms.TorchISTFT.forward" href="#openunmix.transforms.TorchISTFT.forward">forward</a></code></li>
-<li><code><a title="openunmix.transforms.TorchISTFT.training" href="#openunmix.transforms.TorchISTFT.training">training</a></code></li>
-<h4><code><a title="openunmix.transforms.TorchSTFT" href="#openunmix.transforms.TorchSTFT">TorchSTFT</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.transforms.TorchSTFT.dump_patches" href="#openunmix.transforms.TorchSTFT.dump_patches">dump_patches</a></code></li>
-<li><code><a title="openunmix.transforms.TorchSTFT.forward" href="#openunmix.transforms.TorchSTFT.forward">forward</a></code></li>
-<li><code><a title="openunmix.transforms.TorchSTFT.training" href="#openunmix.transforms.TorchSTFT.training">training</a></code></li>
-<footer id="footer">
-<p>Generated by <a href="https://pdoc3.github.io/pdoc"><cite>pdoc</cite> 0.9.2</a>.</p>
\ No newline at end of file
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/utils.html b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/utils.html
deleted file mode 100644
index b9d33da73d5433280c020b66cf9e41a7dcc50a6e..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/docs/utils.html
+++ /dev/null
@@ -1,926 +0,0 @@
-<!doctype html>
-<html lang="en">
-<meta charset="utf-8">
-<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1" />
-<meta name="generator" content="pdoc 0.9.2" />
-<title>openunmix.utils API documentation</title>
-<meta name="description" content="" />
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/sanitize.min.css" integrity="sha256-PK9q560IAAa6WVRRh76LtCaI8pjTJ2z11v0miyNNjrs=" crossorigin>
-<link rel="preload stylesheet" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/10up-sanitize.css/11.0.1/typography.min.css" integrity="sha256-7l/o7C8jubJiy74VsKTidCy1yBkRtiUGbVkYBylBqUg=" crossorigin>
-<link rel="stylesheet preload" as="style" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/styles/github.min.css" crossorigin>
-<style>:root{--highlight-color:#fe9}.flex{display:flex !important}body{line-height:1.5em}#content{padding:20px}#sidebar{padding:30px;overflow:hidden}#sidebar > *:last-child{margin-bottom:2cm}.http-server-breadcrumbs{font-size:130%;margin:0 0 15px 0}#footer{font-size:.75em;padding:5px 30px;border-top:1px solid #ddd;text-align:right}#footer p{margin:0 0 0 1em;display:inline-block}#footer p:last-child{margin-right:30px}h1,h2,h3,h4,h5{font-weight:300}h1{font-size:2.5em;line-height:1.1em}h2{font-size:1.75em;margin:1em 0 .50em 0}h3{font-size:1.4em;margin:25px 0 10px 0}h4{margin:0;font-size:105%}h1:target,h2:target,h3:target,h4:target,h5:target,h6:target{background:var(--highlight-color);padding:.2em 0}a{color:#058;text-decoration:none;transition:color .3s ease-in-out}a:hover{color:#e82}.title code{font-weight:bold}h2[id^="header-"]{margin-top:2em}.ident{color:#900}pre code{background:#f8f8f8;font-size:.8em;line-height:1.4em}code{background:#f2f2f1;padding:1px 4px;overflow-wrap:break-word}h1 code{background:transparent}pre{background:#f8f8f8;border:0;border-top:1px solid #ccc;border-bottom:1px solid #ccc;margin:1em 0;padding:1ex}#http-server-module-list{display:flex;flex-flow:column}#http-server-module-list div{display:flex}#http-server-module-list dt{min-width:10%}#http-server-module-list p{margin-top:0}.toc ul,#index{list-style-type:none;margin:0;padding:0}#index code{background:transparent}#index h3{border-bottom:1px solid #ddd}#index ul{padding:0}#index h4{margin-top:.6em;font-weight:bold}@media (min-width:200ex){#index .two-column{column-count:2}}@media (min-width:300ex){#index .two-column{column-count:3}}dl{margin-bottom:2em}dl dl:last-child{margin-bottom:4em}dd{margin:0 0 1em 3em}#header-classes + dl > dd{margin-bottom:3em}dd dd{margin-left:2em}dd p{margin:10px 0}.name{background:#eee;font-weight:bold;font-size:.85em;padding:5px 10px;display:inline-block;min-width:40%}.name:hover{background:#e0e0e0}dt:target .name{background:var(--highlight-color)}.name > span:first-child{white-space:nowrap}.name.class > span:nth-child(2){margin-left:.4em}.inherited{color:#999;border-left:5px solid #eee;padding-left:1em}.inheritance em{font-style:normal;font-weight:bold}.desc h2{font-weight:400;font-size:1.25em}.desc h3{font-size:1em}.desc dt code{background:inherit}.source summary,.git-link-div{color:#666;text-align:right;font-weight:400;font-size:.8em;text-transform:uppercase}.source summary > *{white-space:nowrap;cursor:pointer}.git-link{color:inherit;margin-left:1em}.source pre{max-height:500px;overflow:auto;margin:0}.source pre code{font-size:12px;overflow:visible}.hlist{list-style:none}.hlist li{display:inline}.hlist li:after{content:',\2002'}.hlist li:last-child:after{content:none}.hlist .hlist{display:inline;padding-left:1em}img{max-width:100%}td{padding:0 .5em}.admonition{padding:.1em .5em;margin-bottom:1em}.admonition-title{font-weight:bold}.admonition.note,.admonition.info,.admonition.important{background:#aef}.admonition.todo,.admonition.versionadded,.admonition.tip,.admonition.hint{background:#dfd}.admonition.warning,.admonition.versionchanged,.admonition.deprecated{background:#fd4}.admonition.error,.admonition.danger,.admonition.caution{background:lightpink}</style>
-<style media="screen and (min-width: 700px)">@media screen and (min-width:700px){#sidebar{width:30%;height:100vh;overflow:auto;position:sticky;top:0}#content{width:70%;max-width:100ch;padding:3em 4em;border-left:1px solid #ddd}pre code{font-size:1em}.item .name{font-size:1em}main{display:flex;flex-direction:row-reverse;justify-content:flex-end}.toc ul ul,#index ul{padding-left:1.5em}.toc > ul > li{margin-top:.5em}}</style>
-<style media="print">@media print{#sidebar h1{page-break-before:always}.source{display:none}}@media print{*{background:transparent !important;color:#000 !important;box-shadow:none !important;text-shadow:none !important}a[href]:after{content:" (" attr(href) ")";font-size:90%}a[href][title]:after{content:none}abbr[title]:after{content:" (" attr(title) ")"}.ir a:after,a[href^="javascript:"]:after,a[href^="#"]:after{content:""}pre,blockquote{border:1px solid #999;page-break-inside:avoid}thead{display:table-header-group}tr,img{page-break-inside:avoid}img{max-width:100% !important}@page{margin:0.5cm}p,h2,h3{orphans:3;widows:3}h1,h2,h3,h4,h5,h6{page-break-after:avoid}}</style>
-<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS_CHTML" integrity="sha256-kZafAc6mZvK3W3v1pHOcUix30OHQN6pU/NO2oFkqZVw=" crossorigin></script>
-<script defer src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.1.1/highlight.min.js" integrity="sha256-Uv3H6lx7dJmRfRvH8TH6kJD1TSK1aFcwgx+mdg3epi8=" crossorigin></script>
-<script>window.addEventListener('DOMContentLoaded', () => hljs.initHighlighting())</script>
-<article id="content">
-<h1 class="title">Module <code>openunmix.utils</code></h1>
-<section id="section-intro">
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L0-L304" class="git-link">Browse git</a>
-<pre><code class="python">from typing import Optional, Union
-import torch
-import os
-import numpy as np
-import torchaudio
-import warnings
-from pathlib import Path
-from contextlib import redirect_stderr
-import io
-import json
-import openunmix
-from openunmix import model
-def bandwidth_to_max_bin(rate: float, n_fft: int, bandwidth: float) -&gt; np.ndarray:
-    &#34;&#34;&#34;Convert bandwidth to maximum bin count
-    Assuming lapped transforms such as STFT
-    Args:
-        rate (int): Sample rate
-        n_fft (int): FFT length
-        bandwidth (float): Target bandwidth in Hz
-    Returns:
-        np.ndarray: maximum frequency bin
-    &#34;&#34;&#34;
-    freqs = np.linspace(0, rate / 2, n_fft // 2 + 1, endpoint=True)
-    return np.max(np.where(freqs &lt;= bandwidth)[0]) + 1
-def save_checkpoint(state: dict, is_best: bool, path: str, target: str):
-    &#34;&#34;&#34;Convert bandwidth to maximum bin count
-    Assuming lapped transforms such as STFT
-    Args:
-        state (dict): torch model state dict
-        is_best (bool): if current model is about to be saved as best model
-        path (str): model path
-        target (str): target name
-    &#34;&#34;&#34;
-    # save full checkpoint including optimizer
-    torch.save(state, os.path.join(path, target + &#34;.chkpnt&#34;))
-    if is_best:
-        # save just the weights
-        torch.save(state[&#34;state_dict&#34;], os.path.join(path, target + &#34;.pth&#34;))
-class AverageMeter(object):
-    &#34;&#34;&#34;Computes and stores the average and current value&#34;&#34;&#34;
-    def __init__(self):
-        self.reset()
-    def reset(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-    def update(self, val, n=1):
-        self.val = val
-        self.sum += val * n
-        self.count += n
-        self.avg = self.sum / self.count
-class EarlyStopping(object):
-    &#34;&#34;&#34;Early Stopping Monitor&#34;&#34;&#34;
-    def __init__(self, mode=&#34;min&#34;, min_delta=0, patience=10):
-        self.mode = mode
-        self.min_delta = min_delta
-        self.patience = patience
-        self.best = None
-        self.num_bad_epochs = 0
-        self.is_better = None
-        self._init_is_better(mode, min_delta)
-        if patience == 0:
-            self.is_better = lambda a, b: True
-    def step(self, metrics):
-        if self.best is None:
-            self.best = metrics
-            return False
-        if np.isnan(metrics):
-            return True
-        if self.is_better(metrics, self.best):
-            self.num_bad_epochs = 0
-            self.best = metrics
-        else:
-            self.num_bad_epochs += 1
-        if self.num_bad_epochs &gt;= self.patience:
-            return True
-        return False
-    def _init_is_better(self, mode, min_delta):
-        if mode not in {&#34;min&#34;, &#34;max&#34;}:
-            raise ValueError(&#34;mode &#34; + mode + &#34; is unknown!&#34;)
-        if mode == &#34;min&#34;:
-            self.is_better = lambda a, best: a &lt; best - min_delta
-        if mode == &#34;max&#34;:
-            self.is_better = lambda a, best: a &gt; best + min_delta
-def load_target_models(targets, model_str_or_path=&#34;umxhq&#34;, device=&#34;cpu&#34;, pretrained=True):
-    &#34;&#34;&#34;Core model loader
-    target model path can be either &lt;target&gt;.pth, or &lt;target&gt;-sha256.pth
-    (as used on torchub)
-    The loader either loads the models from a known model string
-    as registered in the __init__.py or loads from custom configs.
-    &#34;&#34;&#34;
-    if isinstance(targets, str):
-        targets = [targets]
-    model_path = Path(model_str_or_path).expanduser()
-    if not model_path.exists():
-        # model path does not exist, use pretrained models
-        try:
-            # disable progress bar
-            hub_loader = getattr(openunmix, model_str_or_path + &#34;_spec&#34;)
-            err = io.StringIO()
-            with redirect_stderr(err):
-                return hub_loader(targets=targets, device=device, pretrained=pretrained)
-            print(err.getvalue())
-        except AttributeError:
-            raise NameError(&#34;Model does not exist on torchhub&#34;)
-            # assume model is a path to a local model_str_or_path directory
-    else:
-        models = {}
-        for target in targets:
-            # load model from disk
-            with open(Path(model_path, target + &#34;.json&#34;), &#34;r&#34;) as stream:
-                results = json.load(stream)
-            target_model_path = next(Path(model_path).glob(&#34;%s*.pth&#34; % target))
-            state = torch.load(target_model_path, map_location=device)
-            models[target] = model.OpenUnmix(
-                nb_bins=results[&#34;args&#34;][&#34;nfft&#34;] // 2 + 1,
-                nb_channels=results[&#34;args&#34;][&#34;nb_channels&#34;],
-                hidden_size=results[&#34;args&#34;][&#34;hidden_size&#34;],
-                max_bin=state[&#34;input_mean&#34;].shape[0],
-            )
-            if pretrained:
-                models[target].load_state_dict(state, strict=False)
-            models[target].to(device)
-        return models
-def load_separator(
-    model_str_or_path: str = &#34;umxhq&#34;,
-    targets: Optional[list] = None,
-    niter: int = 1,
-    residual: bool = False,
-    wiener_win_len: Optional[int] = 300,
-    device: Union[str, torch.device] = &#34;cpu&#34;,
-    pretrained: bool = True,
-    filterbank: str = &#34;torch&#34;,
-    &#34;&#34;&#34;Separator loader
-    Args:
-        model_str_or_path (str): Model name or path to model _parent_ directory
-            E.g. The following files are assumed to present when
-            loading `model_str_or_path=&#39;mymodel&#39;, targets=[&#39;vocals&#39;]`
-            &#39;mymodel/separator.json&#39;, mymodel/vocals.pth&#39;, &#39;mymodel/vocals.json&#39;.
-            Defaults to `umxhq`.
-        targets (list of str or None): list of target names. When loading a
-            pre-trained model, all `targets` can be None as all targets
-            will be loaded
-        niter (int): Number of EM steps for refining initial estimates
-            in a post-processing stage. `--niter 0` skips this step altogether
-            (and thus makes separation significantly faster) More iterations
-            can get better interference reduction at the price of artifacts.
-            Defaults to `1`.
-        residual (bool): Computes a residual target, for custom separation
-            scenarios when not all targets are available (at the expense
-            of slightly less performance). E.g vocal/accompaniment
-            Defaults to `False`.
-        wiener_win_len (int): The size of the excerpts (number of frames) on
-            which to apply filtering independently. This means assuming
-            time varying stereo models and localization of sources.
-            None means not batching but using the whole signal. It comes at the
-            price of a much larger memory usage.
-            Defaults to `300`
-        device (str): torch device, defaults to `cpu`
-        pretrained (bool): determines if loading pre-trained weights
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    model_path = Path(model_str_or_path).expanduser()
-    # when path exists, we assume its a custom model saved locally
-    if model_path.exists():
-        if targets is None:
-            raise UserWarning(&#34;For custom models, please specify the targets&#34;)
-        target_models = load_target_models(
-            targets=targets, model_str_or_path=model_path, pretrained=pretrained
-        )
-        with open(Path(model_path, &#34;separator.json&#34;), &#34;r&#34;) as stream:
-            enc_conf = json.load(stream)
-        separator = model.Separator(
-            target_models=target_models,
-            niter=niter,
-            residual=residual,
-            wiener_win_len=wiener_win_len,
-            sample_rate=enc_conf[&#34;sample_rate&#34;],
-            n_fft=enc_conf[&#34;nfft&#34;],
-            n_hop=enc_conf[&#34;nhop&#34;],
-            nb_channels=enc_conf[&#34;nb_channels&#34;],
-            filterbank=filterbank,
-        ).to(device)
-    # otherwise we load the separator from torchhub
-    else:
-        hub_loader = getattr(openunmix, model_str_or_path)
-        separator = hub_loader(
-            targets=targets,
-            device=device,
-            pretrained=True,
-            niter=niter,
-            residual=residual,
-            filterbank=filterbank,
-        )
-    return separator
-def preprocess(
-    audio: torch.Tensor,
-    rate: Optional[float] = None,
-    model_rate: Optional[float] = None,
-) -&gt; torch.Tensor:
-    &#34;&#34;&#34;
-    From an input tensor, convert it to a tensor of shape
-    shape=(nb_samples, nb_channels, nb_timesteps). This includes:
-    -  if input is 1D, adding the samples and channels dimensions.
-    -  if input is 2D
-        o and the smallest dimension is 1 or 2, adding the samples one.
-        o and all dimensions are &gt; 2, assuming the smallest is the samples
-          one, and adding the channel one
-    - at the end, if the number of channels is greater than the number
-      of time steps, swap those two.
-    - resampling to target rate if necessary
-    Args:
-        audio (Tensor): input waveform
-        rate (float): sample rate for the audio
-        model_rate (float): sample rate for the model
-    Returns:
-        Tensor: [shape=(nb_samples, nb_channels=2, nb_timesteps)]
-    &#34;&#34;&#34;
-    shape = torch.as_tensor(audio.shape, device=audio.device)
-    if len(shape) == 1:
-        # assuming only time dimension is provided.
-        audio = audio[None, None, ...]
-    elif len(shape) == 2:
-        if shape.min() &lt;= 2:
-            # assuming sample dimension is missing
-            audio = audio[None, ...]
-        else:
-            # assuming channel dimension is missing
-            audio = audio[:, None, ...]
-    if audio.shape[1] &gt; audio.shape[2]:
-        # swapping channel and time
-        audio = audio.transpose(1, 2)
-    if audio.shape[1] &gt; 2:
-        warnings.warn(&#34;Channel count &gt; 2!. Only the first two channels &#34; &#34;will be processed!&#34;)
-        audio = audio[..., :2]
-    if audio.shape[1] == 1:
-        # if we have mono, we duplicate it to get stereo
-        audio = torch.repeat_interleave(audio, 2, dim=1)
-    if rate != model_rate:
-        print(&#34;resampling&#34;)
-        # we have to resample to model samplerate if needed
-        # this makes sure we resample input only once
-        resampler = torchaudio.transforms.Resample(
-            orig_freq=rate, new_freq=model_rate, resampling_method=&#34;sinc_interpolation&#34;
-        ).to(audio.device)
-        audio = resampler(audio)
-    return audio</code></pre>
-<h2 class="section-title" id="header-functions">Functions</h2>
-<dt id="openunmix.utils.bandwidth_to_max_bin"><code class="name flex">
-<span>def <span class="ident">bandwidth_to_max_bin</span></span>(<span>rate: float, n_fft: int, bandwidth: float) ‑> numpy.ndarray</span>
-<div class="desc"><p>Convert bandwidth to maximum bin count</p>
-<p>Assuming lapped transforms such as STFT</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>rate</code></strong> :&ensp;<code>int</code></dt>
-<dd>Sample rate</dd>
-<dt><strong><code>n_fft</code></strong> :&ensp;<code>int</code></dt>
-<dd>FFT length</dd>
-<dt><strong><code>bandwidth</code></strong> :&ensp;<code>float</code></dt>
-<dd>Target bandwidth in Hz</dd>
-<h2 id="returns">Returns</h2>
-<dd>maximum frequency bin</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L17-L32" class="git-link">Browse git</a>
-<pre><code class="python">def bandwidth_to_max_bin(rate: float, n_fft: int, bandwidth: float) -&gt; np.ndarray:
-    &#34;&#34;&#34;Convert bandwidth to maximum bin count
-    Assuming lapped transforms such as STFT
-    Args:
-        rate (int): Sample rate
-        n_fft (int): FFT length
-        bandwidth (float): Target bandwidth in Hz
-    Returns:
-        np.ndarray: maximum frequency bin
-    &#34;&#34;&#34;
-    freqs = np.linspace(0, rate / 2, n_fft // 2 + 1, endpoint=True)
-    return np.max(np.where(freqs &lt;= bandwidth)[0]) + 1</code></pre>
-<dt id="openunmix.utils.load_separator"><code class="name flex">
-<span>def <span class="ident">load_separator</span></span>(<span>model_str_or_path: str = 'umxhq', targets: Union[list, NoneType] = None, niter: int = 1, residual: bool = False, wiener_win_len: Union[int, NoneType] = 300, device: Union[str, torch.device] = 'cpu', pretrained: bool = True, filterbank: str = 'torch')</span>
-<div class="desc"><p>Separator loader</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>model_str_or_path</code></strong> :&ensp;<code>str</code></dt>
-<dd>Model name or path to model <em>parent</em> directory
-E.g. The following files are assumed to present when
-loading <code>model_str_or_path='mymodel', targets=['vocals']</code>
-'mymodel/separator.json', mymodel/vocals.pth', 'mymodel/vocals.json'.
-Defaults to <code>umxhq</code>.</dd>
-<dt><strong><code>targets</code></strong> :&ensp;<code>list</code> of <code>str</code> or <code>None</code></dt>
-<dd>list of target names. When loading a
-pre-trained model, all <code>targets</code> can be None as all targets
-will be loaded</dd>
-<dt><strong><code>niter</code></strong> :&ensp;<code>int</code></dt>
-<dd>Number of EM steps for refining initial estimates
-in a post-processing stage. <code>--niter 0</code> skips this step altogether
-(and thus makes separation significantly faster) More iterations
-can get better interference reduction at the price of artifacts.
-Defaults to <code>1</code>.</dd>
-<dt><strong><code>residual</code></strong> :&ensp;<code>bool</code></dt>
-<dd>Computes a residual target, for custom separation
-scenarios when not all targets are available (at the expense
-of slightly less performance). E.g vocal/accompaniment
-Defaults to <code>False</code>.</dd>
-<dt><strong><code>wiener_win_len</code></strong> :&ensp;<code>int</code></dt>
-<dd>The size of the excerpts (number of frames) on
-which to apply filtering independently. This means assuming
-time varying stereo models and localization of sources.
-None means not batching but using the whole signal. It comes at the
-price of a much larger memory usage.
-Defaults to <code>300</code></dd>
-<dt><strong><code>device</code></strong> :&ensp;<code>str</code></dt>
-<dd>torch device, defaults to <code>cpu</code></dd>
-<dt><strong><code>pretrained</code></strong> :&ensp;<code>bool</code></dt>
-<dd>determines if loading pre-trained weights</dd>
-<dt><strong><code>filterbank</code></strong> :&ensp;<code>str</code></dt>
-<dd>filterbank implementation method.
-Supported are <code>['torch', 'asteroid']</code>. <code>torch</code> is about 30% faster
-compared to <code>asteroid</code> on large FFT sizes such as 4096. However,
-asteroids stft can be exported to onnx, which makes is practical
-for deployment.</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L164-L246" class="git-link">Browse git</a>
-<pre><code class="python">def load_separator(
-    model_str_or_path: str = &#34;umxhq&#34;,
-    targets: Optional[list] = None,
-    niter: int = 1,
-    residual: bool = False,
-    wiener_win_len: Optional[int] = 300,
-    device: Union[str, torch.device] = &#34;cpu&#34;,
-    pretrained: bool = True,
-    filterbank: str = &#34;torch&#34;,
-    &#34;&#34;&#34;Separator loader
-    Args:
-        model_str_or_path (str): Model name or path to model _parent_ directory
-            E.g. The following files are assumed to present when
-            loading `model_str_or_path=&#39;mymodel&#39;, targets=[&#39;vocals&#39;]`
-            &#39;mymodel/separator.json&#39;, mymodel/vocals.pth&#39;, &#39;mymodel/vocals.json&#39;.
-            Defaults to `umxhq`.
-        targets (list of str or None): list of target names. When loading a
-            pre-trained model, all `targets` can be None as all targets
-            will be loaded
-        niter (int): Number of EM steps for refining initial estimates
-            in a post-processing stage. `--niter 0` skips this step altogether
-            (and thus makes separation significantly faster) More iterations
-            can get better interference reduction at the price of artifacts.
-            Defaults to `1`.
-        residual (bool): Computes a residual target, for custom separation
-            scenarios when not all targets are available (at the expense
-            of slightly less performance). E.g vocal/accompaniment
-            Defaults to `False`.
-        wiener_win_len (int): The size of the excerpts (number of frames) on
-            which to apply filtering independently. This means assuming
-            time varying stereo models and localization of sources.
-            None means not batching but using the whole signal. It comes at the
-            price of a much larger memory usage.
-            Defaults to `300`
-        device (str): torch device, defaults to `cpu`
-        pretrained (bool): determines if loading pre-trained weights
-        filterbank (str): filterbank implementation method.
-            Supported are `[&#39;torch&#39;, &#39;asteroid&#39;]`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    &#34;&#34;&#34;
-    model_path = Path(model_str_or_path).expanduser()
-    # when path exists, we assume its a custom model saved locally
-    if model_path.exists():
-        if targets is None:
-            raise UserWarning(&#34;For custom models, please specify the targets&#34;)
-        target_models = load_target_models(
-            targets=targets, model_str_or_path=model_path, pretrained=pretrained
-        )
-        with open(Path(model_path, &#34;separator.json&#34;), &#34;r&#34;) as stream:
-            enc_conf = json.load(stream)
-        separator = model.Separator(
-            target_models=target_models,
-            niter=niter,
-            residual=residual,
-            wiener_win_len=wiener_win_len,
-            sample_rate=enc_conf[&#34;sample_rate&#34;],
-            n_fft=enc_conf[&#34;nfft&#34;],
-            n_hop=enc_conf[&#34;nhop&#34;],
-            nb_channels=enc_conf[&#34;nb_channels&#34;],
-            filterbank=filterbank,
-        ).to(device)
-    # otherwise we load the separator from torchhub
-    else:
-        hub_loader = getattr(openunmix, model_str_or_path)
-        separator = hub_loader(
-            targets=targets,
-            device=device,
-            pretrained=True,
-            niter=niter,
-            residual=residual,
-            filterbank=filterbank,
-        )
-    return separator</code></pre>
-<dt id="openunmix.utils.load_target_models"><code class="name flex">
-<span>def <span class="ident">load_target_models</span></span>(<span>targets, model_str_or_path='umxhq', device='cpu', pretrained=True)</span>
-<div class="desc"><p>Core model loader</p>
-<p>target model path can be either <target>.pth, or <target>-sha256.pth
-(as used on torchub)</p>
-<p>The loader either loads the models from a known model string
-as registered in the <strong>init</strong>.py or loads from custom configs.</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L115-L161" class="git-link">Browse git</a>
-<pre><code class="python">def load_target_models(targets, model_str_or_path=&#34;umxhq&#34;, device=&#34;cpu&#34;, pretrained=True):
-    &#34;&#34;&#34;Core model loader
-    target model path can be either &lt;target&gt;.pth, or &lt;target&gt;-sha256.pth
-    (as used on torchub)
-    The loader either loads the models from a known model string
-    as registered in the __init__.py or loads from custom configs.
-    &#34;&#34;&#34;
-    if isinstance(targets, str):
-        targets = [targets]
-    model_path = Path(model_str_or_path).expanduser()
-    if not model_path.exists():
-        # model path does not exist, use pretrained models
-        try:
-            # disable progress bar
-            hub_loader = getattr(openunmix, model_str_or_path + &#34;_spec&#34;)
-            err = io.StringIO()
-            with redirect_stderr(err):
-                return hub_loader(targets=targets, device=device, pretrained=pretrained)
-            print(err.getvalue())
-        except AttributeError:
-            raise NameError(&#34;Model does not exist on torchhub&#34;)
-            # assume model is a path to a local model_str_or_path directory
-    else:
-        models = {}
-        for target in targets:
-            # load model from disk
-            with open(Path(model_path, target + &#34;.json&#34;), &#34;r&#34;) as stream:
-                results = json.load(stream)
-            target_model_path = next(Path(model_path).glob(&#34;%s*.pth&#34; % target))
-            state = torch.load(target_model_path, map_location=device)
-            models[target] = model.OpenUnmix(
-                nb_bins=results[&#34;args&#34;][&#34;nfft&#34;] // 2 + 1,
-                nb_channels=results[&#34;args&#34;][&#34;nb_channels&#34;],
-                hidden_size=results[&#34;args&#34;][&#34;hidden_size&#34;],
-                max_bin=state[&#34;input_mean&#34;].shape[0],
-            )
-            if pretrained:
-                models[target].load_state_dict(state, strict=False)
-            models[target].to(device)
-        return models</code></pre>
-<dt id="openunmix.utils.preprocess"><code class="name flex">
-<span>def <span class="ident">preprocess</span></span>(<span>audio: torch.Tensor, rate: Union[float, NoneType] = None, model_rate: Union[float, NoneType] = None) ‑> torch.Tensor</span>
-<div class="desc"><p>From an input tensor, convert it to a tensor of shape
-shape=(nb_samples, nb_channels, nb_timesteps). This includes:
-if input is 1D, adding the samples and channels dimensions.
-if input is 2D
-o and the smallest dimension is 1 or 2, adding the samples one.
-o and all dimensions are &gt; 2, assuming the smallest is the samples
-one, and adding the channel one
-- at the end, if the number of channels is greater than the number
-of time steps, swap those two.
-- resampling to target rate if necessary</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>audio</code></strong> :&ensp;<code>Tensor</code></dt>
-<dd>input waveform</dd>
-<dt><strong><code>rate</code></strong> :&ensp;<code>float</code></dt>
-<dd>sample rate for the audio</dd>
-<dt><strong><code>model_rate</code></strong> :&ensp;<code>float</code></dt>
-<dd>sample rate for the model</dd>
-<h2 id="returns">Returns</h2>
-<dd>[shape=(nb_samples, nb_channels=2, nb_timesteps)]</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L249-L305" class="git-link">Browse git</a>
-<pre><code class="python">def preprocess(
-    audio: torch.Tensor,
-    rate: Optional[float] = None,
-    model_rate: Optional[float] = None,
-) -&gt; torch.Tensor:
-    &#34;&#34;&#34;
-    From an input tensor, convert it to a tensor of shape
-    shape=(nb_samples, nb_channels, nb_timesteps). This includes:
-    -  if input is 1D, adding the samples and channels dimensions.
-    -  if input is 2D
-        o and the smallest dimension is 1 or 2, adding the samples one.
-        o and all dimensions are &gt; 2, assuming the smallest is the samples
-          one, and adding the channel one
-    - at the end, if the number of channels is greater than the number
-      of time steps, swap those two.
-    - resampling to target rate if necessary
-    Args:
-        audio (Tensor): input waveform
-        rate (float): sample rate for the audio
-        model_rate (float): sample rate for the model
-    Returns:
-        Tensor: [shape=(nb_samples, nb_channels=2, nb_timesteps)]
-    &#34;&#34;&#34;
-    shape = torch.as_tensor(audio.shape, device=audio.device)
-    if len(shape) == 1:
-        # assuming only time dimension is provided.
-        audio = audio[None, None, ...]
-    elif len(shape) == 2:
-        if shape.min() &lt;= 2:
-            # assuming sample dimension is missing
-            audio = audio[None, ...]
-        else:
-            # assuming channel dimension is missing
-            audio = audio[:, None, ...]
-    if audio.shape[1] &gt; audio.shape[2]:
-        # swapping channel and time
-        audio = audio.transpose(1, 2)
-    if audio.shape[1] &gt; 2:
-        warnings.warn(&#34;Channel count &gt; 2!. Only the first two channels &#34; &#34;will be processed!&#34;)
-        audio = audio[..., :2]
-    if audio.shape[1] == 1:
-        # if we have mono, we duplicate it to get stereo
-        audio = torch.repeat_interleave(audio, 2, dim=1)
-    if rate != model_rate:
-        print(&#34;resampling&#34;)
-        # we have to resample to model samplerate if needed
-        # this makes sure we resample input only once
-        resampler = torchaudio.transforms.Resample(
-            orig_freq=rate, new_freq=model_rate, resampling_method=&#34;sinc_interpolation&#34;
-        ).to(audio.device)
-        audio = resampler(audio)
-    return audio</code></pre>
-<dt id="openunmix.utils.save_checkpoint"><code class="name flex">
-<span>def <span class="ident">save_checkpoint</span></span>(<span>state: dict, is_best: bool, path: str, target: str)</span>
-<div class="desc"><p>Convert bandwidth to maximum bin count</p>
-<p>Assuming lapped transforms such as STFT</p>
-<h2 id="args">Args</h2>
-<dt><strong><code>state</code></strong> :&ensp;<code>dict</code></dt>
-<dd>torch model state dict</dd>
-<dt><strong><code>is_best</code></strong> :&ensp;<code>bool</code></dt>
-<dd>if current model is about to be saved as best model</dd>
-<dt><strong><code>path</code></strong> :&ensp;<code>str</code></dt>
-<dd>model path</dd>
-<dt><strong><code>target</code></strong> :&ensp;<code>str</code></dt>
-<dd>target name</dd>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L35-L50" class="git-link">Browse git</a>
-<pre><code class="python">def save_checkpoint(state: dict, is_best: bool, path: str, target: str):
-    &#34;&#34;&#34;Convert bandwidth to maximum bin count
-    Assuming lapped transforms such as STFT
-    Args:
-        state (dict): torch model state dict
-        is_best (bool): if current model is about to be saved as best model
-        path (str): model path
-        target (str): target name
-    &#34;&#34;&#34;
-    # save full checkpoint including optimizer
-    torch.save(state, os.path.join(path, target + &#34;.chkpnt&#34;))
-    if is_best:
-        # save just the weights
-        torch.save(state[&#34;state_dict&#34;], os.path.join(path, target + &#34;.pth&#34;))</code></pre>
-<h2 class="section-title" id="header-classes">Classes</h2>
-<dt id="openunmix.utils.AverageMeter"><code class="flex name class">
-<span>class <span class="ident">AverageMeter</span></span>
-<div class="desc"><p>Computes and stores the average and current value</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L53-L69" class="git-link">Browse git</a>
-<pre><code class="python">class AverageMeter(object):
-    &#34;&#34;&#34;Computes and stores the average and current value&#34;&#34;&#34;
-    def __init__(self):
-        self.reset()
-    def reset(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-    def update(self, val, n=1):
-        self.val = val
-        self.sum += val * n
-        self.count += n
-        self.avg = self.sum / self.count</code></pre>
-<dt id="openunmix.utils.AverageMeter.reset"><code class="name flex">
-<span>def <span class="ident">reset</span></span>(<span>self)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L59-L63" class="git-link">Browse git</a>
-<pre><code class="python">def reset(self):
-    self.val = 0
-    self.avg = 0
-    self.sum = 0
-    self.count = 0</code></pre>
-<dt id="openunmix.utils.AverageMeter.update"><code class="name flex">
-<span>def <span class="ident">update</span></span>(<span>self, val, n=1)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L65-L69" class="git-link">Browse git</a>
-<pre><code class="python">def update(self, val, n=1):
-    self.val = val
-    self.sum += val * n
-    self.count += n
-    self.avg = self.sum / self.count</code></pre>
-<dt id="openunmix.utils.EarlyStopping"><code class="flex name class">
-<span>class <span class="ident">EarlyStopping</span></span>
-<span>(</span><span>mode='min', min_delta=0, patience=10)</span>
-<div class="desc"><p>Early Stopping Monitor</p></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L72-L112" class="git-link">Browse git</a>
-<pre><code class="python">class EarlyStopping(object):
-    &#34;&#34;&#34;Early Stopping Monitor&#34;&#34;&#34;
-    def __init__(self, mode=&#34;min&#34;, min_delta=0, patience=10):
-        self.mode = mode
-        self.min_delta = min_delta
-        self.patience = patience
-        self.best = None
-        self.num_bad_epochs = 0
-        self.is_better = None
-        self._init_is_better(mode, min_delta)
-        if patience == 0:
-            self.is_better = lambda a, b: True
-    def step(self, metrics):
-        if self.best is None:
-            self.best = metrics
-            return False
-        if np.isnan(metrics):
-            return True
-        if self.is_better(metrics, self.best):
-            self.num_bad_epochs = 0
-            self.best = metrics
-        else:
-            self.num_bad_epochs += 1
-        if self.num_bad_epochs &gt;= self.patience:
-            return True
-        return False
-    def _init_is_better(self, mode, min_delta):
-        if mode not in {&#34;min&#34;, &#34;max&#34;}:
-            raise ValueError(&#34;mode &#34; + mode + &#34; is unknown!&#34;)
-        if mode == &#34;min&#34;:
-            self.is_better = lambda a, best: a &lt; best - min_delta
-        if mode == &#34;max&#34;:
-            self.is_better = lambda a, best: a &gt; best + min_delta</code></pre>
-<dt id="openunmix.utils.EarlyStopping.step"><code class="name flex">
-<span>def <span class="ident">step</span></span>(<span>self, metrics)</span>
-<div class="desc"></div>
-<details class="source">
-<span>Expand source code</span>
-<a href="https://github.com/sigsep/open-unmix-pytorch/blob/b436d5f7d40c2b8ff0b2500e9d953fa47929b261/openunmix/utils.py#L87-L104" class="git-link">Browse git</a>
-<pre><code class="python">def step(self, metrics):
-    if self.best is None:
-        self.best = metrics
-        return False
-    if np.isnan(metrics):
-        return True
-    if self.is_better(metrics, self.best):
-        self.num_bad_epochs = 0
-        self.best = metrics
-    else:
-        self.num_bad_epochs += 1
-    if self.num_bad_epochs &gt;= self.patience:
-        return True
-    return False</code></pre>
-<nav id="sidebar">
-<div class="toc">
-<ul id="index">
-<li><code><a title="openunmix" href="index.html">openunmix</a></code></li>
-<li><h3><a href="#header-functions">Functions</a></h3>
-<ul class="">
-<li><code><a title="openunmix.utils.bandwidth_to_max_bin" href="#openunmix.utils.bandwidth_to_max_bin">bandwidth_to_max_bin</a></code></li>
-<li><code><a title="openunmix.utils.load_separator" href="#openunmix.utils.load_separator">load_separator</a></code></li>
-<li><code><a title="openunmix.utils.load_target_models" href="#openunmix.utils.load_target_models">load_target_models</a></code></li>
-<li><code><a title="openunmix.utils.preprocess" href="#openunmix.utils.preprocess">preprocess</a></code></li>
-<li><code><a title="openunmix.utils.save_checkpoint" href="#openunmix.utils.save_checkpoint">save_checkpoint</a></code></li>
-<li><h3><a href="#header-classes">Classes</a></h3>
-<h4><code><a title="openunmix.utils.AverageMeter" href="#openunmix.utils.AverageMeter">AverageMeter</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.utils.AverageMeter.reset" href="#openunmix.utils.AverageMeter.reset">reset</a></code></li>
-<li><code><a title="openunmix.utils.AverageMeter.update" href="#openunmix.utils.AverageMeter.update">update</a></code></li>
-<h4><code><a title="openunmix.utils.EarlyStopping" href="#openunmix.utils.EarlyStopping">EarlyStopping</a></code></h4>
-<ul class="">
-<li><code><a title="openunmix.utils.EarlyStopping.step" href="#openunmix.utils.EarlyStopping.step">step</a></code></li>
-<footer id="footer">
-<p>Generated by <a href="https://pdoc3.github.io/pdoc"><cite>pdoc</cite> 0.9.2</a>.</p>
\ No newline at end of file
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/hubconf.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/hubconf.py
deleted file mode 100644
index 669017fd2bf02d7041ad23b431d0ccc60e43076e..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/hubconf.py
+++ /dev/null
@@ -1,19 +0,0 @@
-# This file is to be parsed by torch.hub mechanics
-# `xxx_spec` take spectrogram inputs and output separated spectrograms
-# `xxx`      take waveform inputs and output separated waveforms
-# Optional list of dependencies required by the package
-dependencies = ['torch', 'numpy']
-from openunmix import umxse_spec
-from openunmix import umxse
-from openunmix import umxhq_spec
-from openunmix import umxhq
-from openunmix import umx_spec
-from openunmix import umx
-from openunmix import umxl_spec
-from openunmix import umxl
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/__init__.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/__init__.py
deleted file mode 100644
index dc3fbb8a281bed5d68819549a4216811cd93cb00..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/__init__.py
+++ /dev/null
@@ -1,346 +0,0 @@
-![sigsep logo](https://sigsep.github.io/hero.png)
-Open-Unmix is a deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments. The models were pre-trained on the MUSDB18 dataset. See details at apply pre-trained model.
-This is the python package API documentation. 
-Please checkout [the open-unmix website](https://sigsep.github.io/open-unmix) for more information.
-from openunmix import utils
-import torch.hub
-def umxse_spec(targets=None, device="cpu", pretrained=True):
-    target_urls = {
-        "speech": "https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/speech_f5e0d9f9.pth",
-        "noise": "https://zenodo.org/api/files/765b45a3-c70d-48a6-936b-09a7989c349a/noise_04a6fc2d.pth",
-    }
-    from .model import OpenUnmix
-    if targets is None:
-        targets = ["speech", "noise"]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=16000.0, n_fft=1024, bandwidth=16000)
-    # load open unmix models speech enhancement models
-    target_models = {}
-    for target in targets:
-        target_unmix = OpenUnmix(
-            nb_bins=1024 // 2 + 1, nb_channels=1, hidden_size=256, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umxse(
-    targets=None,
-    residual=False,
-    niter=1,
-    device="cpu",
-    pretrained=True,
-    filterbank="torch",
-    """
-    Open Unmix Speech Enhancemennt 1-channel BiLSTM Model
-    trained on the 28-speaker version of Voicebank+Demand
-    (Sampling rate: 16kHz)
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: ['speech', 'noise'].
-                If you don't pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a "garbage" target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `['torch', 'asteroid']`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    Reference:
-        Uhlich, Stefan, & Mitsufuji, Yuki. (2020).
-        Open-Unmix for Speech Enhancement (UMX SE).
-        Zenodo. http://doi.org/10.5281/zenodo.3786908
-    """
-    from .model import Separator
-    target_models = umxse_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=1024,
-        n_hop=512,
-        nb_channels=1,
-        sample_rate=16000.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
-def umxhq_spec(targets=None, device="cpu", pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        "bass": "https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/bass-8d85a5bd.pth",
-        "drums": "https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/drums-9619578f.pth",
-        "other": "https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/other-b52fbbf7.pth",
-        "vocals": "https://zenodo.org/api/files/1c8f83c5-33a5-4f59-b109-721fdd234875/vocals-b62c91ce.pth",
-    }
-    if targets is None:
-        targets = ["vocals", "drums", "bass", "other"]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umxhq(
-    targets=None,
-    residual=False,
-    niter=1,
-    device="cpu",
-    pretrained=True,
-    filterbank="torch",
-    """
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18-HQ
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: ['vocals', 'drums', 'bass', 'other'].
-                If you don't pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a "garbage" target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `['torch', 'asteroid']`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    """
-    from .model import Separator
-    target_models = umxhq_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
-def umx_spec(targets=None, device="cpu", pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        "bass": "https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/bass-646024d3.pth",
-        "drums": "https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/drums-5a48008b.pth",
-        "other": "https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/other-f8e132cc.pth",
-        "vocals": "https://zenodo.org/api/files/d6105b95-8c52-430c-84ce-bd14b803faaf/vocals-c8df74a5.pth",
-    }
-    if targets is None:
-        targets = ["vocals", "drums", "bass", "other"]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=512, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umx(
-    targets=None,
-    residual=False,
-    niter=1,
-    device="cpu",
-    pretrained=True,
-    filterbank="torch",
-    """
-    Open Unmix 2-channel/stereo BiLSTM Model trained on MUSDB18
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: ['vocals', 'drums', 'bass', 'other'].
-                If you don't pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a "garbage" target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `['torch', 'asteroid']`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    """
-    from .model import Separator
-    target_models = umx_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
-def umxl_spec(targets=None, device="cpu", pretrained=True):
-    from .model import OpenUnmix
-    # set urls for weights
-    target_urls = {
-        "bass": "https://zenodo.org/api/files/f8209c3e-ba60-48cf-8e79-71ae65beca61/bass-2ca1ce51.pth",
-        "drums": "https://zenodo.org/api/files/f8209c3e-ba60-48cf-8e79-71ae65beca61/drums-69e0ebd4.pth",
-        "other": "https://zenodo.org/api/files/f8209c3e-ba60-48cf-8e79-71ae65beca61/other-c8c5b3e6.pth",
-        "vocals": "https://zenodo.org/api/files/f8209c3e-ba60-48cf-8e79-71ae65beca61/vocals-bccbd9aa.pth",
-    }
-    if targets is None:
-        targets = ["vocals", "drums", "bass", "other"]
-    # determine the maximum bin count for a 16khz bandwidth model
-    max_bin = utils.bandwidth_to_max_bin(rate=44100.0, n_fft=4096, bandwidth=16000)
-    target_models = {}
-    for target in targets:
-        # load open unmix model
-        target_unmix = OpenUnmix(
-            nb_bins=4096 // 2 + 1, nb_channels=2, hidden_size=1024, max_bin=max_bin
-        )
-        # enable centering of stft to minimize reconstruction error
-        if pretrained:
-            state_dict = torch.hub.load_state_dict_from_url(
-                target_urls[target], map_location=device
-            )
-            target_unmix.load_state_dict(state_dict, strict=False)
-            target_unmix.eval()
-        target_unmix.to(device)
-        target_models[target] = target_unmix
-    return target_models
-def umxl(
-    targets=None,
-    residual=False,
-    niter=1,
-    device="cpu",
-    pretrained=True,
-    filterbank="torch",
-    """
-    Open Unmix Extra (UMX-L), 2-channel/stereo BLSTM Model trained on a private dataset
-    of ~400h of multi-track audio.
-    Args:
-        targets (str): select the targets for the source to be separated.
-                a list including: ['vocals', 'drums', 'bass', 'other'].
-                If you don't pick them all, you probably want to
-                activate the `residual=True` option.
-                Defaults to all available targets per model.
-        pretrained (bool): If True, returns a model pre-trained on MUSDB18-HQ
-        residual (bool): if True, a "garbage" target is created
-        niter (int): the number of post-processingiterations, defaults to 0
-        device (str): selects device to be used for inference
-        filterbank (str): filterbank implementation method.
-            Supported are `['torch', 'asteroid']`. `torch` is about 30% faster
-            compared to `asteroid` on large FFT sizes such as 4096. However,
-            asteroids stft can be exported to onnx, which makes is practical
-            for deployment.
-    """
-    from .model import Separator
-    target_models = umxl_spec(targets=targets, device=device, pretrained=pretrained)
-    separator = Separator(
-        target_models=target_models,
-        niter=niter,
-        residual=residual,
-        n_fft=4096,
-        n_hop=1024,
-        nb_channels=2,
-        sample_rate=44100.0,
-        filterbank=filterbank,
-    ).to(device)
-    return separator
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/cli.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/cli.py
deleted file mode 100644
index 23d2a37ee3ea7c3ceca469ba6e0da6ce68813250..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/cli.py
+++ /dev/null
@@ -1,205 +0,0 @@
-from pathlib import Path
-import torch
-import torchaudio
-import json
-import numpy as np
-import tqdm
-from openunmix import utils
-from openunmix import predict
-from openunmix import data
-import argparse
-def separate():
-    parser = argparse.ArgumentParser(
-        description="UMX Inference",
-        add_help=True,
-        formatter_class=argparse.RawDescriptionHelpFormatter,
-    )
-    parser.add_argument("input", type=str, nargs="+", help="List of paths to wav/flac files.")
-    parser.add_argument(
-        "--model",
-        default="umxl",
-        type=str,
-        help="path to mode base directory of pretrained models, defaults to UMX-L",
-    )
-    parser.add_argument(
-        "--targets",
-        nargs="+",
-        type=str,
-        help="provide targets to be processed. \
-              If none, all available targets will be computed",
-    )
-    parser.add_argument(
-        "--outdir",
-        type=str,
-        help="Results path where audio evaluation results are stored",
-    )
-    parser.add_argument(
-        "--ext",
-        type=str,
-        default=".wav",
-        help="Output extension which sets the audio format",
-    )
-    parser.add_argument("--start", type=float, default=0.0, help="Audio chunk start in seconds")
-    parser.add_argument(
-        "--duration",
-        type=float,
-        help="Audio chunk duration in seconds, negative values load full track",
-    )
-    parser.add_argument(
-        "--no-cuda", action="store_true", default=False, help="disables CUDA inference"
-    )
-    parser.add_argument(
-        "--audio-backend",
-        type=str,
-        help="Sets audio backend. Default to torchaudio's default backend: See https://pytorch.org/audio/stable/backend.html"
-        "(`sox_io`, `sox`, `soundfile` or `stempeg`)",
-    )
-    parser.add_argument(
-        "--niter",
-        type=int,
-        default=1,
-        help="number of iterations for refining results.",
-    )
-    parser.add_argument(
-        "--wiener-win-len",
-        type=int,
-        default=300,
-        help="Number of frames on which to apply filtering independently",
-    )
-    parser.add_argument(
-        "--residual",
-        type=str,
-        default=None,
-        help="if provided, build a source with given name "
-        "for the mix minus all estimated targets",
-    )
-    parser.add_argument(
-        "--aggregate",
-        type=str,
-        default=None,
-        help="if provided, must be a string containing a valid expression for "
-        "a dictionary, with keys as output target names, and values "
-        "a list of targets that are used to build it. For instance: "
-        '\'{"vocals":["vocals"], "accompaniment":["drums",'
-        '"bass","other"]}\'',
-    )
-    parser.add_argument(
-        "--filterbank",
-        type=str,
-        default="torch",
-        help="filterbank implementation method. "
-        "Supported: `['torch', 'asteroid']`. `torch` is ~30%% faster "
-        "compared to `asteroid` on large FFT sizes such as 4096. However "
-        "asteroids stft can be exported to onnx, which makes is practical "
-        "for deployment.",
-    )
-    parser.add_argument(
-        "--verbose",
-        action="store_true",
-        default=False,
-        help="Enable log messages",
-    )
-    args = parser.parse_args()
-    if args.audio_backend != "stempeg" and args.audio_backend is not None:
-        torchaudio.set_audio_backend(args.audio_backend)
-    use_cuda = not args.no_cuda and torch.cuda.is_available()
-    device = torch.device("cuda" if use_cuda else "cpu")
-    if args.verbose:
-        print("Using ", device)
-    # parsing the output dict
-    aggregate_dict = None if args.aggregate is None else json.loads(args.aggregate)
-    # create separator only once to reduce model loading
-    # when using multiple files
-    separator = utils.load_separator(
-        model_str_or_path=args.model,
-        targets=args.targets,
-        niter=args.niter,
-        residual=args.residual,
-        wiener_win_len=args.wiener_win_len,
-        device=device,
-        pretrained=True,
-        filterbank=args.filterbank,
-    )
-    separator.freeze()
-    separator.to(device)
-    if args.audio_backend == "stempeg":
-        try:
-            import stempeg
-        except ImportError:
-            raise RuntimeError("Please install pip package `stempeg`")
-    # loop over the files
-    for input_file in tqdm.tqdm(args.input):
-        if args.audio_backend == "stempeg":
-            audio, rate = stempeg.read_stems(
-                input_file,
-                start=args.start,
-                duration=args.duration,
-                sample_rate=separator.sample_rate,
-                dtype=np.float32,
-            )
-            audio = torch.tensor(audio)
-        else:
-            audio, rate = data.load_audio(input_file, start=args.start, dur=args.duration)
-        estimates = predict.separate(
-            audio=audio,
-            rate=rate,
-            aggregate_dict=aggregate_dict,
-            separator=separator,
-            device=device,
-        )
-        if not args.outdir:
-            model_path = Path(args.model)
-            if not model_path.exists():
-                outdir = Path(Path(input_file).stem + "_" + args.model)
-            else:
-                outdir = Path(Path(input_file).stem + "_" + model_path.stem)
-        else:
-            outdir = Path(args.outdir) / Path(input_file).stem
-        outdir.mkdir(exist_ok=True, parents=True)
-        # write out estimates
-        if args.audio_backend == "stempeg":
-            target_path = str(outdir / Path("target").with_suffix(args.ext))
-            # convert torch dict to numpy dict
-            estimates_numpy = {}
-            for target, estimate in estimates.items():
-                estimates_numpy[target] = torch.squeeze(estimate).detach().cpu().numpy().T
-            stempeg.write_stems(
-                target_path,
-                estimates_numpy,
-                sample_rate=separator.sample_rate,
-                writer=stempeg.FilesWriter(multiprocess=True, output_sample_rate=rate),
-            )
-        else:
-            for target, estimate in estimates.items():
-                target_path = str(outdir / Path(target).with_suffix(args.ext))
-                torchaudio.save(
-                    target_path,
-                    torch.squeeze(estimate).to("cpu"),
-                    sample_rate=separator.sample_rate,
-                )
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/data.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/data.py
deleted file mode 100644
index c07cac8200515812e782e5918c9ce8dfed150fd0..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/data.py
+++ /dev/null
@@ -1,974 +0,0 @@
-import argparse
-import random
-from pathlib import Path
-from typing import Optional, Union, Tuple, List, Any, Callable
-import torch
-import torch.utils.data
-import torchaudio
-import tqdm
-def load_info(path: str) -> dict:
-    """Load audio metadata
-    this is a backend_independent wrapper around torchaudio.info
-    Args:
-        path: Path of filename
-    Returns:
-        Dict: Metadata with
-        `samplerate`, `samples` and `duration` in seconds
-    """
-    # get length of file in samples
-    if torchaudio.get_audio_backend() == "sox":
-        raise RuntimeError("Deprecated backend is not supported")
-    info = {}
-    si = torchaudio.info(str(path))
-    info["samplerate"] = si.sample_rate
-    info["samples"] = si.num_frames
-    info["channels"] = si.num_channels
-    info["duration"] = info["samples"] / info["samplerate"]
-    return info
-def load_audio(
-    path: str,
-    start: float = 0.0,
-    dur: Optional[float] = None,
-    info: Optional[dict] = None,
-    """Load audio file
-    Args:
-        path: Path of audio file
-        start: start position in seconds, defaults on the beginning.
-        dur: end position in seconds, defaults to `None` (full file).
-        info: metadata object as called from `load_info`.
-    Returns:
-        Tensor: torch tensor waveform of shape `(num_channels, num_samples)`
-    """
-    # loads the full track duration
-    if dur is None:
-        # we ignore the case where start!=0 and dur=None
-        # since we have to deal with fixed length audio
-        sig, rate = torchaudio.load(path)
-        return sig, rate
-    else:
-        if info is None:
-            info = load_info(path)
-        num_frames = int(dur * info["samplerate"])
-        frame_offset = int(start * info["samplerate"])
-        sig, rate = torchaudio.load(path, num_frames=num_frames, frame_offset=frame_offset)
-        return sig, rate
-def aug_from_str(list_of_function_names: list):
-    if list_of_function_names:
-        return Compose([globals()["_augment_" + aug] for aug in list_of_function_names])
-    else:
-        return lambda audio: audio
-class Compose(object):
-    """Composes several augmentation transforms.
-    Args:
-        augmentations: list of augmentations to compose.
-    """
-    def __init__(self, transforms):
-        self.transforms = transforms
-    def __call__(self, audio: torch.Tensor) -> torch.Tensor:
-        for t in self.transforms:
-            audio = t(audio)
-        return audio
-def _augment_gain(audio: torch.Tensor, low: float = 0.25, high: float = 1.25) -> torch.Tensor:
-    """Applies a random gain between `low` and `high`"""
-    g = low + torch.rand(1) * (high - low)
-    return audio * g
-def _augment_channelswap(audio: torch.Tensor) -> torch.Tensor:
-    """Swap channels of stereo signals with a probability of p=0.5"""
-    if audio.shape[0] == 2 and torch.tensor(1.0).uniform_() < 0.5:
-        return torch.flip(audio, [0])
-    else:
-        return audio
-def _augment_force_stereo(audio: torch.Tensor) -> torch.Tensor:
-    # for multichannel > 2, we drop the other channels
-    if audio.shape[0] > 2:
-        audio = audio[:2, ...]
-    if audio.shape[0] == 1:
-        # if we have mono, we duplicate it to get stereo
-        audio = torch.repeat_interleave(audio, 2, dim=0)
-    return audio
-class UnmixDataset(torch.utils.data.Dataset):
-    _repr_indent = 4
-    def __init__(
-        self,
-        root: Union[Path, str],
-        sample_rate: float,
-        seq_duration: Optional[float] = None,
-        source_augmentations: Optional[Callable] = None,
-    ) -> None:
-        self.root = Path(args.root).expanduser()
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.source_augmentations = source_augmentations
-    def __getitem__(self, index: int) -> Any:
-        raise NotImplementedError
-    def __len__(self) -> int:
-        raise NotImplementedError
-    def __repr__(self) -> str:
-        head = "Dataset " + self.__class__.__name__
-        body = ["Number of datapoints: {}".format(self.__len__())]
-        body += self.extra_repr().splitlines()
-        lines = [head] + [" " * self._repr_indent + line for line in body]
-        return "\n".join(lines)
-    def extra_repr(self) -> str:
-        return ""
-def load_datasets(
-    parser: argparse.ArgumentParser, args: argparse.Namespace
-) -> Tuple[UnmixDataset, UnmixDataset, argparse.Namespace]:
-    """Loads the specified dataset from commandline arguments
-    Returns:
-        train_dataset, validation_dataset
-    """
-    if args.dataset == "aligned":
-        parser.add_argument("--input-file", type=str)
-        parser.add_argument("--output-file", type=str)
-        args = parser.parse_args()
-        # set output target to basename of output file
-        args.target = Path(args.output_file).stem
-        dataset_kwargs = {
-            "root": Path(args.root),
-            "seq_duration": args.seq_dur,
-            "input_file": args.input_file,
-            "output_file": args.output_file,
-        }
-        args.target = Path(args.output_file).stem
-        train_dataset = AlignedDataset(
-            split="train", random_chunks=True, **dataset_kwargs
-        )  # type: UnmixDataset
-        valid_dataset = AlignedDataset(split="valid", **dataset_kwargs)  # type: UnmixDataset
-    elif args.dataset == "sourcefolder":
-        parser.add_argument("--interferer-dirs", type=str, nargs="+")
-        parser.add_argument("--target-dir", type=str)
-        parser.add_argument("--ext", type=str, default=".wav")
-        parser.add_argument("--nb-train-samples", type=int, default=1000)
-        parser.add_argument("--nb-valid-samples", type=int, default=100)
-        parser.add_argument("--source-augmentations", type=str, nargs="+")
-        args = parser.parse_args()
-        args.target = args.target_dir
-        dataset_kwargs = {
-            "root": Path(args.root),
-            "interferer_dirs": args.interferer_dirs,
-            "target_dir": args.target_dir,
-            "ext": args.ext,
-        }
-        source_augmentations = aug_from_str(args.source_augmentations)
-        train_dataset = SourceFolderDataset(
-            split="train",
-            source_augmentations=source_augmentations,
-            random_chunks=True,
-            nb_samples=args.nb_train_samples,
-            seq_duration=args.seq_dur,
-            **dataset_kwargs,
-        )
-        valid_dataset = SourceFolderDataset(
-            split="valid",
-            random_chunks=True,
-            seq_duration=args.seq_dur,
-            nb_samples=args.nb_valid_samples,
-            **dataset_kwargs,
-        )
-    elif args.dataset == "trackfolder_fix":
-        parser.add_argument("--target-file", type=str)
-        parser.add_argument("--interferer-files", type=str, nargs="+")
-        parser.add_argument(
-            "--random-track-mix",
-            action="store_true",
-            default=False,
-            help="Apply random track mixing augmentation",
-        )
-        parser.add_argument("--source-augmentations", type=str, nargs="+")
-        args = parser.parse_args()
-        args.target = Path(args.target_file).stem
-        dataset_kwargs = {
-            "root": Path(args.root),
-            "interferer_files": args.interferer_files,
-            "target_file": args.target_file,
-        }
-        source_augmentations = aug_from_str(args.source_augmentations)
-        train_dataset = FixedSourcesTrackFolderDataset(
-            split="train",
-            source_augmentations=source_augmentations,
-            random_track_mix=args.random_track_mix,
-            random_chunks=True,
-            seq_duration=args.seq_dur,
-            **dataset_kwargs,
-        )
-        valid_dataset = FixedSourcesTrackFolderDataset(
-            split="valid", seq_duration=None, **dataset_kwargs
-        )
-    elif args.dataset == "trackfolder_var":
-        parser.add_argument("--ext", type=str, default=".wav")
-        parser.add_argument("--target-file", type=str)
-        parser.add_argument("--source-augmentations", type=str, nargs="+")
-        parser.add_argument(
-            "--random-interferer-mix",
-            action="store_true",
-            default=False,
-            help="Apply random interferer mixing augmentation",
-        )
-        parser.add_argument(
-            "--silence-missing",
-            action="store_true",
-            default=False,
-            help="silence missing targets",
-        )
-        args = parser.parse_args()
-        args.target = Path(args.target_file).stem
-        dataset_kwargs = {
-            "root": Path(args.root),
-            "target_file": args.target_file,
-            "ext": args.ext,
-            "silence_missing_targets": args.silence_missing,
-        }
-        source_augmentations = Compose(
-            [globals()["_augment_" + aug] for aug in args.source_augmentations]
-        )
-        train_dataset = VariableSourcesTrackFolderDataset(
-            split="train",
-            source_augmentations=source_augmentations,
-            random_interferer_mix=args.random_interferer_mix,
-            random_chunks=True,
-            seq_duration=args.seq_dur,
-            **dataset_kwargs,
-        )
-        valid_dataset = VariableSourcesTrackFolderDataset(
-            split="valid", seq_duration=None, **dataset_kwargs
-        )
-    else:
-        parser.add_argument(
-            "--is-wav",
-            action="store_true",
-            default=False,
-            help="loads wav instead of STEMS",
-        )
-        parser.add_argument("--samples-per-track", type=int, default=64)
-        parser.add_argument(
-            "--source-augmentations", type=str, default=["gain", "channelswap"], nargs="+"
-        )
-        args = parser.parse_args()
-        dataset_kwargs = {
-            "root": args.root,
-            "is_wav": args.is_wav,
-            "subsets": "train",
-            "target": args.target,
-            "download": args.root is None,
-            "seed": args.seed,
-        }
-        source_augmentations = aug_from_str(args.source_augmentations)
-        train_dataset = MUSDBDataset(
-            split="train",
-            samples_per_track=args.samples_per_track,
-            seq_duration=args.seq_dur,
-            source_augmentations=source_augmentations,
-            random_track_mix=True,
-            **dataset_kwargs,
-        )
-        valid_dataset = MUSDBDataset(
-            split="valid", samples_per_track=1, seq_duration=None, **dataset_kwargs
-        )
-    return train_dataset, valid_dataset, args
-class AlignedDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = "train",
-        input_file: str = "mixture.wav",
-        output_file: str = "vocals.wav",
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = False,
-        sample_rate: float = 44100.0,
-        source_augmentations: Optional[Callable] = None,
-        seed: int = 42,
-    ) -> None:
-        """A dataset of that assumes multiple track folders
-        where each track includes and input and an output file
-        which directly corresponds to the the input and the
-        output of the model. This dataset is the most basic of
-        all datasets provided here, due to the least amount of
-        preprocessing, it is also the fastest option, however,
-        it lacks any kind of source augmentations or custum mixing.
-        Typical use cases:
-        * Source Separation (Mixture -> Target)
-        * Denoising (Noisy -> Clean)
-        * Bandwidth Extension (Low Bandwidth -> High Bandwidth)
-        Example
-        =======
-        data/train/01/mixture.wav --> input
-        data/train/01/vocals.wav ---> output
-        """
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.random_chunks = random_chunks
-        # set the input and output files (accept glob)
-        self.input_file = input_file
-        self.output_file = output_file
-        self.tuple_paths = list(self._get_paths())
-        if not self.tuple_paths:
-            raise RuntimeError("Dataset is empty, please check parameters")
-        self.seed = seed
-        random.seed(self.seed)
-    def __getitem__(self, index):
-        input_path, output_path = self.tuple_paths[index]
-        if self.random_chunks:
-            input_info = load_info(input_path)
-            output_info = load_info(output_path)
-            duration = min(input_info["duration"], output_info["duration"])
-            start = random.uniform(0, duration - self.seq_duration)
-        else:
-            start = 0
-        X_audio, _ = load_audio(input_path, start=start, dur=self.seq_duration)
-        Y_audio, _ = load_audio(output_path, start=start, dur=self.seq_duration)
-        # return torch tensors
-        return X_audio, Y_audio
-    def __len__(self):
-        return len(self.tuple_paths)
-    def _get_paths(self):
-        """Loads input and output tracks"""
-        p = Path(self.root, self.split)
-        for track_path in tqdm.tqdm(p.iterdir()):
-            if track_path.is_dir():
-                input_path = list(track_path.glob(self.input_file))
-                output_path = list(track_path.glob(self.output_file))
-                if input_path and output_path:
-                    if self.seq_duration is not None:
-                        input_info = load_info(input_path[0])
-                        output_info = load_info(output_path[0])
-                        min_duration = min(input_info["duration"], output_info["duration"])
-                        # check if both targets are available in the subfolder
-                        if min_duration > self.seq_duration:
-                            yield input_path[0], output_path[0]
-                    else:
-                        yield input_path[0], output_path[0]
-class SourceFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = "train",
-        target_dir: str = "vocals",
-        interferer_dirs: List[str] = ["bass", "drums"],
-        ext: str = ".wav",
-        nb_samples: int = 1000,
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = True,
-        sample_rate: float = 44100.0,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        seed: int = 42,
-    ) -> None:
-        """A dataset that assumes folders of sources,
-        instead of track folders. This is a common
-        format for speech and environmental sound datasets
-        such das DCASE. For each source a variable number of
-        tracks/sounds is available, therefore the dataset
-        is unaligned by design.
-        By default, for each sample, sources from random track are drawn
-        to assemble the mixture.
-        Example
-        =======
-        train/vocals/track11.wav -----------------\
-        train/drums/track202.wav  (interferer1) ---+--> input
-        train/bass/track007a.wav  (interferer2) --/
-        train/vocals/track11.wav ---------------------> output
-        """
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.ext = ext
-        self.random_chunks = random_chunks
-        self.source_augmentations = source_augmentations
-        self.target_dir = target_dir
-        self.interferer_dirs = interferer_dirs
-        self.source_folders = self.interferer_dirs + [self.target_dir]
-        self.source_tracks = self.get_tracks()
-        self.nb_samples = nb_samples
-        self.seed = seed
-        random.seed(self.seed)
-    def __getitem__(self, index):
-        # For each source draw a random sound and mix them together
-        audio_sources = []
-        for source in self.source_folders:
-            if self.split == "valid":
-                # provide deterministic behaviour for validation so that
-                # each epoch, the same tracks are yielded
-                random.seed(index)
-            # select a random track for each source
-            source_path = random.choice(self.source_tracks[source])
-            duration = load_info(source_path)["duration"]
-            if self.random_chunks:
-                # for each source, select a random chunk
-                start = random.uniform(0, duration - self.seq_duration)
-            else:
-                # use center segment
-                start = max(duration // 2 - self.seq_duration // 2, 0)
-            audio, _ = load_audio(source_path, start=start, dur=self.seq_duration)
-            audio = self.source_augmentations(audio)
-            audio_sources.append(audio)
-        stems = torch.stack(audio_sources)
-        # # apply linear mix over source index=0
-        x = stems.sum(0)
-        # target is always the last element in the list
-        y = stems[-1]
-        return x, y
-    def __len__(self):
-        return self.nb_samples
-    def get_tracks(self):
-        """Loads input and output tracks"""
-        p = Path(self.root, self.split)
-        source_tracks = {}
-        for source_folder in tqdm.tqdm(self.source_folders):
-            tracks = []
-            source_path = p / source_folder
-            for source_track_path in sorted(source_path.glob("*" + self.ext)):
-                if self.seq_duration is not None:
-                    info = load_info(source_track_path)
-                    # get minimum duration of track
-                    if info["duration"] > self.seq_duration:
-                        tracks.append(source_track_path)
-                else:
-                    tracks.append(source_track_path)
-            source_tracks[source_folder] = tracks
-        return source_tracks
-class FixedSourcesTrackFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = "train",
-        target_file: str = "vocals.wav",
-        interferer_files: List[str] = ["bass.wav", "drums.wav"],
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = False,
-        random_track_mix: bool = False,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        sample_rate: float = 44100.0,
-        seed: int = 42,
-    ) -> None:
-        """A dataset that assumes audio sources to be stored
-        in track folder where each track has a fixed number of sources.
-        For each track the users specifies the target file-name (`target_file`)
-        and a list of interferences files (`interferer_files`).
-        A linear mix is performed on the fly by summing the target and
-        the inferers up.
-        Due to the fact that all tracks comprise the exact same set
-        of sources, the random track mixing augmentation technique
-        can be used, where sources from different tracks are mixed
-        together. Setting `random_track_mix=True` results in an
-        unaligned dataset.
-        When random track mixing is enabled, we define an epoch as
-        when the the target source from all tracks has been seen and only once
-        with whatever interfering sources has randomly been drawn.
-        This dataset is recommended to be used for small/medium size
-        for example like the MUSDB18 or other custom source separation
-        datasets.
-        Example
-        =======
-        train/1/vocals.wav ---------------\
-        train/1/drums.wav (interferer1) ---+--> input
-        train/1/bass.wav -(interferer2) --/
-        train/1/vocals.wav -------------------> output
-        """
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.random_track_mix = random_track_mix
-        self.random_chunks = random_chunks
-        self.source_augmentations = source_augmentations
-        # set the input and output files (accept glob)
-        self.target_file = target_file
-        self.interferer_files = interferer_files
-        self.source_files = self.interferer_files + [self.target_file]
-        self.seed = seed
-        random.seed(self.seed)
-        self.tracks = list(self.get_tracks())
-        if not len(self.tracks):
-            raise RuntimeError("No tracks found")
-    def __getitem__(self, index):
-        # first, get target track
-        track_path = self.tracks[index]["path"]
-        min_duration = self.tracks[index]["min_duration"]
-        if self.random_chunks:
-            # determine start seek by target duration
-            start = random.uniform(0, min_duration - self.seq_duration)
-        else:
-            start = 0
-        # assemble the mixture of target and interferers
-        audio_sources = []
-        # load target
-        target_audio, _ = load_audio(
-            track_path / self.target_file, start=start, dur=self.seq_duration
-        )
-        target_audio = self.source_augmentations(target_audio)
-        audio_sources.append(target_audio)
-        # load interferers
-        for source in self.interferer_files:
-            # optionally select a random track for each source
-            if self.random_track_mix:
-                random_idx = random.choice(range(len(self.tracks)))
-                track_path = self.tracks[random_idx]["path"]
-                if self.random_chunks:
-                    min_duration = self.tracks[random_idx]["min_duration"]
-                    start = random.uniform(0, min_duration - self.seq_duration)
-            audio, _ = load_audio(track_path / source, start=start, dur=self.seq_duration)
-            audio = self.source_augmentations(audio)
-            audio_sources.append(audio)
-        stems = torch.stack(audio_sources)
-        # # apply linear mix over source index=0
-        x = stems.sum(0)
-        # target is always the first element in the list
-        y = stems[0]
-        return x, y
-    def __len__(self):
-        return len(self.tracks)
-    def get_tracks(self):
-        """Loads input and output tracks"""
-        p = Path(self.root, self.split)
-        for track_path in tqdm.tqdm(p.iterdir()):
-            if track_path.is_dir():
-                source_paths = [track_path / s for s in self.source_files]
-                if not all(sp.exists() for sp in source_paths):
-                    print("Exclude track ", track_path)
-                    continue
-                if self.seq_duration is not None:
-                    infos = list(map(load_info, source_paths))
-                    # get minimum duration of track
-                    min_duration = min(i["duration"] for i in infos)
-                    if min_duration > self.seq_duration:
-                        yield ({"path": track_path, "min_duration": min_duration})
-                else:
-                    yield ({"path": track_path, "min_duration": None})
-class VariableSourcesTrackFolderDataset(UnmixDataset):
-    def __init__(
-        self,
-        root: str,
-        split: str = "train",
-        target_file: str = "vocals.wav",
-        ext: str = ".wav",
-        seq_duration: Optional[float] = None,
-        random_chunks: bool = False,
-        random_interferer_mix: bool = False,
-        sample_rate: float = 44100.0,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        silence_missing_targets: bool = False,
-    ) -> None:
-        """A dataset that assumes audio sources to be stored
-        in track folder where each track has a _variable_ number of sources.
-        The users specifies the target file-name (`target_file`)
-        and the extension of sources to used for mixing.
-        A linear mix is performed on the fly by summing all sources in a
-        track folder.
-        Since the number of sources differ per track,
-        while target is fixed, a random track mix
-        augmentation cannot be used. Instead, a random track
-        can be used to load the interfering sources.
-        Also make sure, that you do not provide the mixture
-        file among the sources!
-        Example
-        =======
-        train/1/vocals.wav --> input target   \
-        train/1/drums.wav --> input target     |
-        train/1/bass.wav --> input target    --+--> input
-        train/1/accordion.wav --> input target |
-        train/1/marimba.wav --> input target  /
-        train/1/vocals.wav -----------------------> output
-        """
-        self.root = Path(root).expanduser()
-        self.split = split
-        self.sample_rate = sample_rate
-        self.seq_duration = seq_duration
-        self.random_chunks = random_chunks
-        self.random_interferer_mix = random_interferer_mix
-        self.source_augmentations = source_augmentations
-        self.target_file = target_file
-        self.ext = ext
-        self.silence_missing_targets = silence_missing_targets
-        self.tracks = list(self.get_tracks())
-    def __getitem__(self, index):
-        # select the target based on the dataset   index
-        target_track_path = self.tracks[index]["path"]
-        if self.random_chunks:
-            target_min_duration = self.tracks[index]["min_duration"]
-            target_start = random.uniform(0, target_min_duration - self.seq_duration)
-        else:
-            target_start = 0
-        # optionally select a random interferer track
-        if self.random_interferer_mix:
-            random_idx = random.choice(range(len(self.tracks)))
-            intfr_track_path = self.tracks[random_idx]["path"]
-            if self.random_chunks:
-                intfr_min_duration = self.tracks[random_idx]["min_duration"]
-                intfr_start = random.uniform(0, intfr_min_duration - self.seq_duration)
-            else:
-                intfr_start = 0
-        else:
-            intfr_track_path = target_track_path
-            intfr_start = target_start
-        # get sources from interferer track
-        sources = sorted(list(intfr_track_path.glob("*" + self.ext)))
-        # load sources
-        x = 0
-        for source_path in sources:
-            # skip target file and load it later
-            if source_path == intfr_track_path / self.target_file:
-                continue
-            try:
-                audio, _ = load_audio(source_path, start=intfr_start, dur=self.seq_duration)
-            except RuntimeError:
-                index = index - 1 if index > 0 else index + 1
-                return self.__getitem__(index)
-            x += self.source_augmentations(audio)
-        # load the selected track target
-        if Path(target_track_path / self.target_file).exists():
-            y, _ = load_audio(
-                target_track_path / self.target_file,
-                start=target_start,
-                dur=self.seq_duration,
-            )
-            y = self.source_augmentations(y)
-            x += y
-        # Use silence if target does not exist
-        else:
-            y = torch.zeros(audio.shape)
-        return x, y
-    def __len__(self):
-        return len(self.tracks)
-    def get_tracks(self):
-        p = Path(self.root, self.split)
-        for track_path in tqdm.tqdm(p.iterdir()):
-            if track_path.is_dir():
-                # check if target exists
-                if Path(track_path, self.target_file).exists() or self.silence_missing_targets:
-                    sources = sorted(list(track_path.glob("*" + self.ext)))
-                    if not sources:
-                        # in case of empty folder
-                        print("empty track: ", track_path)
-                        continue
-                    if self.seq_duration is not None:
-                        # check sources
-                        infos = list(map(load_info, sources))
-                        # get minimum duration of source
-                        min_duration = min(i["duration"] for i in infos)
-                        if min_duration > self.seq_duration:
-                            yield ({"path": track_path, "min_duration": min_duration})
-                    else:
-                        yield ({"path": track_path, "min_duration": None})
-class MUSDBDataset(UnmixDataset):
-    def __init__(
-        self,
-        target: str = "vocals",
-        root: str = None,
-        download: bool = False,
-        is_wav: bool = False,
-        subsets: str = "train",
-        split: str = "train",
-        seq_duration: Optional[float] = 6.0,
-        samples_per_track: int = 64,
-        source_augmentations: Optional[Callable] = lambda audio: audio,
-        random_track_mix: bool = False,
-        seed: int = 42,
-        *args,
-        **kwargs,
-    ) -> None:
-        """MUSDB18 torch.data.Dataset that samples from the MUSDB tracks
-        using track and excerpts with replacement.
-        Parameters
-        ----------
-        target : str
-            target name of the source to be separated, defaults to ``vocals``.
-        root : str
-            root path of MUSDB
-        download : boolean
-            automatically download 7s preview version of MUSDB
-        is_wav : boolean
-            specify if the WAV version (instead of the MP4 STEMS) are used
-        subsets : list-like [str]
-            subset str or list of subset. Defaults to ``train``.
-        split : str
-            use (stratified) track splits for validation split (``valid``),
-            defaults to ``train``.
-        seq_duration : float
-            training is performed in chunks of ``seq_duration`` (in seconds,
-            defaults to ``None`` which loads the full audio track
-        samples_per_track : int
-            sets the number of samples, yielded from each track per epoch.
-            Defaults to 64
-        source_augmentations : list[callables]
-            provide list of augmentation function that take a multi-channel
-            audio file of shape (src, samples) as input and output. Defaults to
-            no-augmentations (input = output)
-        random_track_mix : boolean
-            randomly mixes sources from different tracks to assemble a
-            custom mix. This augmenation is only applied for the train subset.
-        seed : int
-            control randomness of dataset iterations
-        args, kwargs : additional keyword arguments
-            used to add further control for the musdb dataset
-            initialization function.
-        """
-        import musdb
-        self.seed = seed
-        random.seed(seed)
-        self.is_wav = is_wav
-        self.seq_duration = seq_duration
-        self.target = target
-        self.subsets = subsets
-        self.split = split
-        self.samples_per_track = samples_per_track
-        self.source_augmentations = source_augmentations
-        self.random_track_mix = random_track_mix
-        self.mus = musdb.DB(
-            root=root,
-            is_wav=is_wav,
-            split=split,
-            subsets=subsets,
-            download=download,
-            *args,
-            **kwargs,
-        )
-        self.sample_rate = 44100.0  # musdb is fixed sample rate
-    def __getitem__(self, index):
-        audio_sources = []
-        target_ind = None
-        # select track
-        track = self.mus.tracks[index // self.samples_per_track]
-        # at training time we assemble a custom mix
-        if self.split == "train" and self.seq_duration:
-            for k, source in enumerate(self.mus.setup["sources"]):
-                # memorize index of target source
-                if source == self.target:
-                    target_ind = k
-                # select a random track
-                if self.random_track_mix:
-                    track = random.choice(self.mus.tracks)
-                # set the excerpt duration
-                track.chunk_duration = self.seq_duration
-                # set random start position
-                track.chunk_start = random.uniform(0, track.duration - self.seq_duration)
-                # load source audio and apply time domain source_augmentations
-                audio = torch.as_tensor(track.sources[source].audio.T, dtype=torch.float32)
-                audio = self.source_augmentations(audio)
-                audio_sources.append(audio)
-            # create stem tensor of shape (source, channel, samples)
-            stems = torch.stack(audio_sources, dim=0)
-            # # apply linear mix over source index=0
-            x = stems.sum(0)
-            # get the target stem
-            if target_ind is not None:
-                y = stems[target_ind]
-            # assuming vocal/accompaniment scenario if target!=source
-            else:
-                vocind = list(self.mus.setup["sources"].keys()).index("vocals")
-                # apply time domain subtraction
-                y = x - stems[vocind]
-        # for validation and test, we deterministically yield the full
-        # pre-mixed musdb track
-        else:
-            # get the non-linear source mix straight from musdb
-            x = torch.as_tensor(track.audio.T, dtype=torch.float32)
-            y = torch.as_tensor(track.targets[self.target].audio.T, dtype=torch.float32)
-        return x, y
-    def __len__(self):
-        return len(self.mus.tracks) * self.samples_per_track
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Open Unmix Trainer")
-    parser.add_argument(
-        "--dataset",
-        type=str,
-        default="musdb",
-        choices=[
-            "musdb",
-            "aligned",
-            "sourcefolder",
-            "trackfolder_var",
-            "trackfolder_fix",
-        ],
-        help="Name of the dataset.",
-    )
-    parser.add_argument("--root", type=str, help="root path of dataset")
-    parser.add_argument(
-        "--save", action="store_true", help=("write out a fixed dataset of samples")
-    )
-    parser.add_argument("--target", type=str, default="vocals")
-    parser.add_argument("--seed", type=int, default=42)
-    parser.add_argument(
-        "--audio-backend",
-        type=str,
-        default="soundfile",
-        help="Set torchaudio backend (`sox_io` or `soundfile`",
-    )
-    # I/O Parameters
-    parser.add_argument(
-        "--seq-dur",
-        type=float,
-        default=5.0,
-        help="Duration of <=0.0 will result in the full audio",
-    )
-    parser.add_argument("--batch-size", type=int, default=16)
-    args, _ = parser.parse_known_args()
-    torchaudio.set_audio_backend(args.audio_backend)
-    train_dataset, valid_dataset, args = load_datasets(parser, args)
-    print("Audio Backend: ", torchaudio.get_audio_backend())
-    # Iterate over training dataset and compute statistics
-    total_training_duration = 0
-    for k in tqdm.tqdm(range(len(train_dataset))):
-        x, y = train_dataset[k]
-        total_training_duration += x.shape[1] / train_dataset.sample_rate
-        if args.save:
-            torchaudio.save("test/" + str(k) + "x.wav", x.T, train_dataset.sample_rate)
-            torchaudio.save("test/" + str(k) + "y.wav", y.T, train_dataset.sample_rate)
-    print("Total training duration (h): ", total_training_duration / 3600)
-    print("Number of train samples: ", len(train_dataset))
-    print("Number of validation samples: ", len(valid_dataset))
-    # iterate over dataloader
-    train_dataset.seq_duration = args.seq_dur
-    train_sampler = torch.utils.data.DataLoader(
-        train_dataset,
-        batch_size=args.batch_size,
-        shuffle=True,
-        num_workers=4,
-    )
-    for x, y in tqdm.tqdm(train_sampler):
-        print(x.shape)
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/evaluate.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/evaluate.py
deleted file mode 100644
index e59535cbbd2b7177707663843f11f6ab948f057b..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/evaluate.py
+++ /dev/null
@@ -1,197 +0,0 @@
-import argparse
-import functools
-import json
-import multiprocessing
-from typing import Optional, Union
-import musdb
-import museval
-import torch
-import tqdm
-from openunmix import utils
-def separate_and_evaluate(
-    track: musdb.MultiTrack,
-    targets: list,
-    model_str_or_path: str,
-    niter: int,
-    output_dir: str,
-    eval_dir: str,
-    residual: bool,
-    mus,
-    aggregate_dict: dict = None,
-    device: Union[str, torch.device] = "cpu",
-    wiener_win_len: Optional[int] = None,
-    filterbank="torch",
-) -> str:
-    separator = utils.load_separator(
-        model_str_or_path=model_str_or_path,
-        targets=targets,
-        niter=niter,
-        residual=residual,
-        wiener_win_len=wiener_win_len,
-        device=device,
-        pretrained=True,
-        filterbank=filterbank,
-    )
-    separator.freeze()
-    separator.to(device)
-    audio = torch.as_tensor(track.audio, dtype=torch.float32, device=device)
-    audio = utils.preprocess(audio, track.rate, separator.sample_rate)
-    estimates = separator(audio)
-    estimates = separator.to_dict(estimates, aggregate_dict=aggregate_dict)
-    for key in estimates:
-        estimates[key] = estimates[key][0].cpu().detach().numpy().T
-    if output_dir:
-        mus.save_estimates(estimates, track, output_dir)
-    scores = museval.eval_mus_track(track, estimates, output_dir=eval_dir)
-    return scores
-if __name__ == "__main__":
-    # Training settings
-    parser = argparse.ArgumentParser(description="MUSDB18 Evaluation", add_help=False)
-    parser.add_argument(
-        "--targets",
-        nargs="+",
-        default=["vocals", "drums", "bass", "other"],
-        type=str,
-        help="provide targets to be processed. \
-              If none, all available targets will be computed",
-    )
-    parser.add_argument(
-        "--model",
-        default="umxl",
-        type=str,
-        help="path to mode base directory of pretrained models",
-    )
-    parser.add_argument(
-        "--outdir",
-        type=str,
-        help="Results path where audio evaluation results are stored",
-    )
-    parser.add_argument("--evaldir", type=str, help="Results path for museval estimates")
-    parser.add_argument("--root", type=str, help="Path to MUSDB18")
-    parser.add_argument("--subset", type=str, default="test", help="MUSDB subset (`train`/`test`)")
-    parser.add_argument("--cores", type=int, default=1)
-    parser.add_argument(
-        "--no-cuda", action="store_true", default=False, help="disables CUDA inference"
-    )
-    parser.add_argument(
-        "--is-wav",
-        action="store_true",
-        default=False,
-        help="flags wav version of the dataset",
-    )
-    parser.add_argument(
-        "--niter",
-        type=int,
-        default=1,
-        help="number of iterations for refining results.",
-    )
-    parser.add_argument(
-        "--wiener-win-len",
-        type=int,
-        default=300,
-        help="Number of frames on which to apply filtering independently",
-    )
-    parser.add_argument(
-        "--residual",
-        type=str,
-        default=None,
-        help="if provided, build a source with given name"
-        "for the mix minus all estimated targets",
-    )
-    parser.add_argument(
-        "--aggregate",
-        type=str,
-        default=None,
-        help="if provided, must be a string containing a valid expression for "
-        "a dictionary, with keys as output target names, and values "
-        "a list of targets that are used to build it. For instance: "
-        '\'{"vocals":["vocals"], "accompaniment":["drums",'
-        '"bass","other"]}\'',
-    )
-    args = parser.parse_args()
-    use_cuda = not args.no_cuda and torch.cuda.is_available()
-    device = torch.device("cuda" if use_cuda else "cpu")
-    mus = musdb.DB(
-        root=args.root,
-        download=args.root is None,
-        subsets=args.subset,
-        is_wav=args.is_wav,
-    )
-    aggregate_dict = None if args.aggregate is None else json.loads(args.aggregate)
-    if args.cores > 1:
-        pool = multiprocessing.Pool(args.cores)
-        results = museval.EvalStore()
-        scores_list = list(
-            pool.imap_unordered(
-                func=functools.partial(
-                    separate_and_evaluate,
-                    targets=args.targets,
-                    model_str_or_path=args.model,
-                    niter=args.niter,
-                    residual=args.residual,
-                    mus=mus,
-                    aggregate_dict=aggregate_dict,
-                    output_dir=args.outdir,
-                    eval_dir=args.evaldir,
-                    device=device,
-                ),
-                iterable=mus.tracks,
-                chunksize=1,
-            )
-        )
-        pool.close()
-        pool.join()
-        for scores in scores_list:
-            results.add_track(scores)
-    else:
-        results = museval.EvalStore()
-        for track in tqdm.tqdm(mus.tracks):
-            scores = separate_and_evaluate(
-                track,
-                targets=args.targets,
-                model_str_or_path=args.model,
-                niter=args.niter,
-                residual=args.residual,
-                mus=mus,
-                aggregate_dict=aggregate_dict,
-                output_dir=args.outdir,
-                eval_dir=args.evaldir,
-                device=device,
-            )
-            print(track, "\n", scores)
-            results.add_track(scores)
-    print(results)
-    method = museval.MethodStore()
-    method.add_evalstore(results, args.model)
-    method.save(args.model + ".pandas")
diff --git a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/filtering.py b/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/filtering.py
deleted file mode 100644
index b0f4921e95dc95afa64c7e83d702cd011d8577b6..0000000000000000000000000000000000000000
--- a/my_submssion/openunmix-baseline/sigsep_open-unmix-pytorch_master/openunmix/filtering.py
+++ /dev/null
@@ -1,504 +0,0 @@
-from typing import Optional
-import torch
-import torch.nn as nn
-from torch import Tensor
-from torch.utils.data import DataLoader
-def atan2(y, x):
-    r"""Element-wise arctangent function of y/x.
-    Returns a new tensor with signed angles in radians.
-    It is an alternative implementation of torch.atan2
-    Args:
-        y (Tensor): First input tensor
-        x (Tensor): Second input tensor [shape=y.shape]
-    Returns:
-        Tensor: [shape=y.shape].
-    """
-    pi = 2 * torch.asin(torch.tensor(1.0))
-    x += ((x == 0) & (y == 0)) * 1.0
-    out = torch.atan(y / x)
-    out += ((y >= 0) & (x < 0)) * pi
-    out -= ((y < 0) & (x < 0)) * pi
-    out *= 1 - ((y > 0) & (x == 0)) * 1.0
-    out += ((y > 0) & (x == 0)) * (pi / 2)
-    out *= 1 - ((y < 0) & (x == 0)) * 1.0
-    out += ((y < 0) & (x == 0)) * (-pi / 2)
-    return out
-# Define basic complex operations on torch.Tensor objects whose last dimension
-# consists in the concatenation of the real and imaginary parts.
-def _norm(x: torch.Tensor) -> torch.Tensor:
-    r"""Computes the norm value of a torch Tensor, assuming that it
-    comes as real and imaginary part in its last dimension.
-    Args:
-        x (Tensor): Input Tensor of shape [shape=(..., 2)]
-    Returns:
-        Tensor: shape as x excluding the last dimension.
-    """
-    return torch.abs(x[..., 0]) ** 2 + torch.abs(x[..., 1]) ** 2
-def _mul_add(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:
-    """Element-wise multiplication of two complex Tensors described
-    through their real and imaginary parts.
-    The result is added to the `out` tensor"""
-    # check `out` and allocate it if needed
-    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])
-    if out is None or out.shape != target_shape:
-        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)
-    if out is a:
-        real_a = a[..., 0]
-        out[..., 0] = out[..., 0] + (real_a * b[..., 0] - a[..., 1] * b[..., 1])
-        out[..., 1] = out[..., 1] + (real_a * b[..., 1] + a[..., 1] * b[..., 0])
-    else:
-        out[..., 0] = out[..., 0] + (a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1])
-        out[..., 1] = out[..., 1] + (a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0])
-    return out
-def _mul(a: torch.Tensor, b: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:
-    """Element-wise multiplication of two complex Tensors described
-    through their real and imaginary parts
-    can work in place in case out is a only"""
-    target_shape = torch.Size([max(sa, sb) for (sa, sb) in zip(a.shape, b.shape)])
-    if out is None or out.shape != target_shape:
-        out = torch.zeros(target_shape, dtype=a.dtype, device=a.device)
-    if out is a:
-        real_a = a[..., 0]
-        out[..., 0] = real_a * b[..., 0] - a[..., 1] * b[..., 1]
-        out[..., 1] = real_a * b[..., 1] + a[..., 1] * b[..., 0]
-    else:
-        out[..., 0] = a[..., 0] * b[..., 0] - a[..., 1] * b[..., 1]
-        out[..., 1] = a[..., 0] * b[..., 1] + a[..., 1] * b[..., 0]
-    return out
-def _inv(z: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:
-    """Element-wise multiplicative inverse of a Tensor with complex
-    entries described through their real and imaginary parts.
-    can work in place in case out is z"""
-    ez = _norm(z)
-    if out is None or out.shape != z.shape:
-        out = torch.zeros_like(z)
-    out[..., 0] = z[..., 0] / ez
-    out[..., 1] = -z[..., 1] / ez
-    return out
-def _conj(z, out: Optional[torch.Tensor] = None) -> torch.Tensor:
-    """Element-wise complex conjugate of a Tensor with complex entries
-    described through their real and imaginary parts.
-    can work in place in case out is z"""
-    if out is None or out.shape != z.shape:
-        out = torch.zeros_like(z)
-    out[..., 0] = z[..., 0]
-    out[..., 1] = -z[..., 1]
-    return out
-def _invert(M: torch.Tensor, out: Optional[torch.Tensor] = None) -> torch.Tensor:
-    """
-    Invert 1x1 or 2x2 matrices
-    Will generate errors if the matrices are singular: user must handle this
-    through his own regularization schemes.
-    Args:
-        M (Tensor): [shape=(..., nb_channels, nb_channels, 2)]
-            matrices to invert: must be square along dimensions -3 and -2
-    Returns:
-        invM (Tensor): [shape=M.shape]
-            inverses of M
-    """
-    nb_channels = M.shape[-2]
-    if out is None or out.shape != M.shape:
-        out = torch.empty_like(M)
-    if nb_channels == 1:
-        # scalar case
-        out = _inv(M, out)
-    elif nb_channels == 2:
-        # two channels case: analytical expression
-        # first compute the determinent
-        det = _mul(M[..., 0, 0, :], M[..., 1, 1, :])
-        det = det - _mul(M[..., 0, 1, :], M[..., 1, 0, :])
-        # invert it
-        invDet = _inv(det)
-        # then fill out the matrix with the inverse
-        out[..., 0, 0, :] = _mul(invDet, M[..., 1, 1, :], out[..., 0, 0, :])
-        out[..., 1, 0, :] = _mul(-invDet, M[..., 1, 0, :], out[..., 1, 0, :])
-        out[..., 0, 1, :] = _mul(-invDet, M[..., 0, 1, :], out[..., 0, 1, :])
-        out[..., 1, 1, :] = _mul(invDet, M[..., 0, 0, :], out[..., 1, 1, :])
-    else:
-        raise Exception("Only 2 channels are supported for the torch version.")
-    return out
-# Now define the signal-processing low-level functions used by the Separator
-def expectation_maximization(
-    y: torch.Tensor,
-    x: torch.Tensor,
-    iterations: int = 2,
-    eps: float = 1e-10,
-    batch_size: int = 200,
-    r"""Expectation maximization algorithm, for refining source separation
-    estimates.
-    This algorithm allows to make source separation results better by
-    enforcing multichannel consistency for the estimates. This usually means
-    a better perceptual quality in terms of spatial artifacts.
-    The implementation follows the details presented in [1]_, taking
-    inspiration from the original EM algorithm proposed in [2]_ and its
-    weighted refinement proposed in [3]_, [4]_.
-    It works by iteratively:
-     * Re-estimate source parameters (power spectral densities and spatial
-       covariance matrices) through :func:`get_local_gaussian_model`.
-     * Separate again the mixture with the new parameters by first computing
-       the new modelled mixture covariance matrices with :func:`get_mix_model`,
-       prepare the Wiener filters through :func:`wiener_gain` and apply them
-       with :func:`apply_filter``.
-    References
-    ----------
-    .. [1] S. Uhlich and M. Porcu and F. Giron and M. Enenkl and T. Kemp and
-        N. Takahashi and Y. Mitsufuji, "Improving music source separation based
-        on deep neural networks through data augmentation and network
-        blending." 2017 IEEE International Conference on Acoustics, Speech
-        and Signal Processing (ICASSP). IEEE, 2017.
-    .. [2] N.Q. Duong and E. Vincent and R.Gribonval. "Under-determined
-        reverberant audio source separation using a full-rank spatial
-        covariance model." IEEE Transactions on Audio, Speech, and Language
-        Processing 18.7 (2010): 1830-1840.
-    .. [3] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel audio source
-        separation with deep neural networks." IEEE/ACM Transactions on Audio,
-        Speech, and Language Processing 24.9 (2016): 1652-1664.
-    .. [4] A. Nugraha and A. Liutkus and E. Vincent. "Multichannel music
-        separation with deep neural networks." 2016 24th European Signal
-        Processing Conference (EUSIPCO). IEEE, 2016.
-    .. [5] A. Liutkus and R. Badeau and G. Richard "Kernel additive models for
-        source separation." IEEE Transactions on Signal Processing
