Intuitive ways of understanding Dynamic Time Warping

Via edit distances and equivalence

Published in

TDS Archive

5 min readJul 3, 2021

Dynamic Time Warping (DTW), despite its sci-fi name, is nothing more than a (particularly useful) technique to measure the distance between two time series, x=(x1,...,xn)and y=(y1,...,ym).

‘Dynamic’ because the algorithm involves dynamic programming
‘Time’ because it is for time series
‘Warping’ because it can be interpreted as calculating the ‘usual’ distance between warped versions of x and y

The mechanics of how to calculate it and various visualisations are already available (e.g. by Roman Tavenard on their blog, by Jeremy Zhang on Medium, and Thales Sehn Korting on YouTube), but what is missing from these explanations is a deeper intuition and a sense that you could have come up with the distance yourself.

My intention here is to present a couple of explanations, via edit distances and equivalence, that will provide you with a more intuitive understanding of what is going on in DTW. You may even be able to take these ideas to construct your own notions of distance!

To help explain, I will use the examplex=(0,1,1,-2,0) and y=(0,0,1,-1) and illustrate why the DTW distance between these two series is 1.

DTW via Edit Distances

Edit distances are commonly used in Natural Language Processing and in DNA analysis. The wikipedia article on Edit Distances gives a good description:

Edit distance is a way of quantifying how dissimilar two strings are to one another by counting the minimum number of operations required to transform one string into the other

The standard operations are: deleting a character, inserting a character, and replacing one character with another, and each one contributes equally to distance.

DTW can be considered as a form of edit distance, in which the operations are as follows:

Can insert or remove a duplicate of an entry in an adjacent space. This has zero cost. E.g. going from (0,1,1,-2,0) to (0,1,1,-2,-2,0) by duplicating the -2 entry would be a zero cost operation. Or going from (0,1,1,-2,0) to (0,1,-2,0), by removing a 1 would be a zero cost operation.
Can edit an entry. If you edit a to b, the cost is the distance between a and b. E.g. going from (0,1,1,-2,0) to (0,1,5,-2,0) by editting the 1 to a 5 would be an operation of cost 4.
Can insert or delete a zero to the end of a series. This has zero cost. E.g. going from (0,1,1,-2,0) to (0,1,1,-2) by removing the 0 at the end would be a zero cost operation.

The DTW distance from x to y would be the lowest cost way of editting x to y using these three operations. Note the reason for the third operation is to handle series of different lengths neatly.

As a mini-exercise/challenge, trying showing that the distance between x=(0,1,1,-2,0) and y=(0,0,1,-1) equals 1 using edit distances.

…

No really, I recommend giving it a go. You learn more by actively engaging then passively reading somebody else’s answer!

…

Here is how I would get from x to y. (Note there may be more than one sequence of edits that achieve the minimum distance):

Start at x=(0,1,1,-2,0)
Then (0,0,1,1,-2,0), by copying the 0. Cost is 0
Then (0,0,1,-2,0), by removing a duplicate 1. Cost is 0
Then (0,0,1,-1,0), by editting the -2 to a -1. Cost is 1
Then(0,0,1,-1), which is y, by removing 0 at the end. Cost is 0

So total cost is 1, and hence the DTW distance is 1.

Note that this gives some idea for what warping consists of. If you were to plot these time series on a chart:

Adding or removing duplicates corresponds to locally squashing or stretching things horizontally
Editting an entry corresponds to locally squashing or stretching things vertically

DTW via Equivalence

Another approach is to ask: what differences between x and y do I want the distance to capture, and which differences do I want the distance to ignore?

There are several ways of phrasing things more formally:

how can I vary x and y without affecting the distance between them
what are the invariances of the distance function
which time series to I consider to have a distance 0 from x
which sequences do I consider to be equivalent to x. This is the perspective I will use.

Once you establish a notion of equivalence, we can then define the distance between x and y as follows:

Let X be the set of all series that are equivalent to x
Let Y be the set of all series that are equivalent to y
Let d be some ‘usual’ notion of distance, e.g. Euclidean distance d(x,y) = sqrt(sum (xi-yi)^2) or distance from L1-norm d(x,y) = sum |xi-yi|.
Then the distance between x and y is defined to be the minimum of d(x', y') where x' is from X and y' is from Y.

This is the general framework for how you can use equivalence to create notions of distance. (Note this is related to how one measures the distance between sets of points, e.g. in clustering, if you know how to measure distance between individual points).

To get the DTW distance using this equivalence framework:

we say two series are equivalent if you can get from one to another by adding/removing duplicate entries or by adding/removing 0’s at the end
d is the distance from the L1-norm, so d(x,y) = sum |xi — yi|

Again, as a mini-exercise, try using this perspective to show that the DTW distance between x=(0,1,1,-2,0) and y=(0,0,1,-1) is 1.

…

And again, I really do urge you to try! If you’re not sure what to do, then it means my explanation is not great; let me know if this is the case and I can improve for the future!

…

This is how I would get the distance of 1:

(0,1,1,-2,0) is equivalent to x'=(0,0,1,1,-2,0)
(0,0,1,-1) is equivalent to y'=(0,0,1,1,-1,0)
Distance between x' and y' is 0+0+0+0+1+0 = 1

Concluding notes

I have provided two different explanations of what the DTW algorithm is doing:

Determining the minimum edit distance, given a certain bunch of possible operations
Determining the minimum L1-norm distance between equivalent versions of the time series

Lastly, I should admit that my answers to the mini-exercises missed an important detail: I showed how it is possible to get 1 but I did not give any explicit argument that this is the smallest possible distance! For the specific example, this is not too hard, but in general this would be the toughest part of determining the distance.

However, that is where the ‘dynamic’ part comes in! What is nice about the DTW algorithm is that it gives you a compact and simple algorithm to calculate the minimums for you.

TDS Archive

Intuitive ways of understanding Dynamic Time Warping

Via edit distances and equivalence

DTW via Edit Distances

DTW via Equivalence

Concluding notes

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in TDS Archive

Written by Lovkush Agarwal

Responses (1)