Time-Scale Modification

Time-Scale Modification procedures

The audiotsm module provides several time-scale modification procedures:

The OLA procedure should only be used on percussive audio signals. The WSOLA and the Phase Vocoder procedures are improvements of the OLA procedure, and should both give good results in most cases.

Note

If you are unsure which procedure and parameters to choose, using phasevocoder() with the default parameters should give good results in most cases. You can listen to the output of the different procedures on various audio files and at various speeds on the examples page.

Each of the function of this module returns a TSM object which implements a time-scale modification procedure.

audiotsm.ola(channels, speed=1.0, frame_length=256, analysis_hop=None, synthesis_hop=None)

Returns a TSM object implementing the OLA (Overlap-Add) time-scale modification procedure.

In most cases, you should not need to set the frame_length, the analysis_hop or the synthesis_hop. If you want to fine tune these parameters, you can check the documentation of the AnalysisSynthesisTSM class to see what they represent.

Parameters:
  • channels (int) – the number of channels of the input signal.
  • speed (float, optional) – the speed ratio by which the speed of the signal will be multiplied (for example, if speed is set to 0.5, the output signal will be half as fast as the input signal).
  • frame_length (int, optional) – the length of the frames.
  • analysis_hop (int, optional) – the number of samples between two consecutive analysis frames (speed * synthesis_hop by default). If analysis_hop is set, the speed parameter will be ignored.
  • synthesis_hop (int, optional) – the number of samples between two consecutive synthesis frames (frame_length // 2 by default).
Returns:

a audiotsm.base.tsm.TSM object

audiotsm.wsola(channels, speed=1.0, frame_length=1024, analysis_hop=None, synthesis_hop=None, tolerance=None)

Returns a TSM object implementing the WSOLA (Waveform Similarity-based Overlap-Add) time-scale modification procedure.

In most cases, you should not need to set the frame_length, the analysis_hop, the synthesis_hop, or the tolerance. If you want to fine tune these parameters, you can check the documentation of the AnalysisSynthesisTSM class to see what the first three represent.

WSOLA works in the same way as OLA, with the exception that it allows slight shift (at most tolerance) of the position of the analysis frames.

Parameters:
  • channels (int) – the number of channels of the input signal.
  • speed (float, optional) – the speed ratio by which the speed of the signal will be multiplied (for example, if speed is set to 0.5, the output signal will be half as fast as the input signal).
  • frame_length (int, optional) – the length of the frames.
  • analysis_hop (int, optional) – the number of samples between two consecutive analysis frames (speed * synthesis_hop by default). If analysis_hop is set, the speed parameter will be ignored.
  • synthesis_hop (int, optional) – the number of samples between two consecutive synthesis frames (frame_length // 2 by default).
  • tolerance (int) – the maximum number of samples that the analysis frame can be shifted.
Returns:

a audiotsm.base.tsm.TSM object

audiotsm.phasevocoder(channels, speed=1.0, frame_length=2048, analysis_hop=None, synthesis_hop=None)

Returns a TSM object implementing the phase vocoder time-scale modification procedure.

In most cases, you should not need to set the frame_length, the analysis_hop or the synthesis_hop. If you want to fine tune these parameters, you can check the documentation of the AnalysisSynthesisTSM class to see what they represent.

Parameters:
  • channels (int) – the number of channels of the input signal.
  • speed (float, optional) – the speed ratio by which the speed of the signal will be multiplied (for example, if speed is set to 0.5, the output signal will be half as fast as the input signal).
  • frame_length (int, optional) – the length of the frames.
  • analysis_hop (int, optional) – the number of samples between two consecutive analysis frames (speed * synthesis_hop by default). If analysis_hop is set, the speed parameter will be ignored.
  • synthesis_hop (int, optional) – the number of samples between two consecutive synthesis frames (frame_length // 4 by default).
Returns:

a audiotsm.base.tsm.TSM object

TSM Object

The audiotsm.base.tsm module provides an abstract class for real-time audio time-scale modification procedures.

class audiotsm.base.tsm.TSM

An abstract class for real-time audio time-scale modification procedures.

If you want to use a TSM object to run a TSM procedure on a signal, you should use the run() method in most cases.

clear()

Clears the state of the TSM object, making it ready to be used on another signal (or another part of a signal).

This method should be called before processing a new file, or seeking to another part of a signal.

flush_to(writer)

Writes as many output samples as possible to writer, assuming that there are no remaining samples that will be added to the input (i.e. that the write_to() method will not be called), and returns the number of samples that were written.

Parameters:writer – a audiotsm.io.base.Writer.
Returns:a tuple (n, finished), with:
  • n the number of samples that were written to writer
  • finished a boolean that is True when there are no samples remaining to flush.
Return type:(int, bool)
get_max_output_length(input_length)

Returns the maximum number of samples that will be written to the output given the numver of samples of the input.

Parameters:input_length (int) – the number of samples of the input.
Returns:the maximum number of samples that will be written to the output.
read_from(reader)

Reads as many samples as possible from reader, processes them, and returns the number of samples that were read.

Parameters:reader – a audiotsm.io.base.Reader.
Returns:the number of samples that were read from reader.
run(reader, writer, flush=True)

Runs the TSM procedure on the content of reader and writes the output to writer.

Parameters:
set_speed(speed)

Sets the speed ratio.

Parameters:speed (float) – the speed ratio by which the speed of the signal will be multiplied (for example, if speed is set to 0.5, the output signal will be half as fast as the input signal).
write_to(writer)

Writes as many result samples as possible to writer.

Parameters:writer – a audiotsm.io.base.Writer.
Returns:a tuple (n, finished), with:
  • n the number of samples that were written to writer
  • finished a boolean that is True when there are no samples remaining to write. In this case, the read_from() method should be called to add new input samples, or, if there are no remaining input samples, the flush_to() method should be called to get the last output samples.
Return type:(int, bool)