Intuitive ways of understanding Dynamic Time Warping
Via edit distances and equivalence
Dynamic Time Warping (DTW), despite its sci-fi name, is nothing more than a (particularly useful) technique to measure the distance between two time series, x=(x1,...,xn)
and y=(y1,...,ym)
.
- ‘Dynamic’ because the algorithm involves dynamic programming
- ‘Time’ because it is for time series
- ‘Warping’ because it can be interpreted as calculating the ‘usual’ distance between warped versions of
x
andy
The mechanics of how to calculate it and various visualisations are already available (e.g. by Roman Tavenard on their blog, by Jeremy Zhang on Medium, and Thales Sehn Korting on YouTube), but what is missing from these explanations is a deeper intuition and a sense that you could have come up with the distance yourself.
My intention here is to present a couple of explanations, via edit distances and equivalence, that will provide you with a more intuitive understanding of what is going on in DTW. You may even be able to take these ideas to construct your own notions of distance!
To help explain, I will use the examplex=(0,1,1,-2,0)
and y=(0,0,1,-1)
and illustrate why the DTW distance between these two series is 1.
DTW via Edit Distances
Edit distances are commonly used in Natural Language Processing and in DNA analysis. The wikipedia article on Edit Distances gives a good description:
Edit distance is a way of quantifying how dissimilar two strings are to one another by counting the minimum number of operations required to transform one string into the other
The standard operations are: deleting a character, inserting a character, and replacing one character with another, and each one contributes equally to distance.
DTW can be considered as a form of edit distance, in which the operations are as follows:
- Can insert or remove a duplicate of an entry in an adjacent space. This has zero cost. E.g. going from
(0,1,1,-2,0)
to(0,1,1,-2,-2,0)
by duplicating the-2
entry would be a zero cost operation. Or going from(0,1,1,-2,0)
to(0,1,-2,0)
, by removing a1
would be a zero cost operation. - Can edit an entry. If you edit
a
tob
, the cost is the distance betweena
andb
. E.g. going from(0,1,1,-2,0)
to(0,1,5,-2,0)
by editting the1
to a5
would be an operation of cost 4. - Can insert or delete a zero to the end of a series. This has zero cost. E.g. going from
(0,1,1,-2,0)
to(0,1,1,-2)
by removing the0
at the end would be a zero cost operation.
The DTW distance from x
to y
would be the lowest cost way of editting x
to y
using these three operations. Note the reason for the third operation is to handle series of different lengths neatly.
As a mini-exercise/challenge, trying showing that the distance between x=(0,1,1,-2,0)
and y=(0,0,1,-1)
equals 1
using edit distances.
…
…
…
No really, I recommend giving it a go. You learn more by actively engaging then passively reading somebody else’s answer!
…
…
…
Here is how I would get from x
to y
. (Note there may be more than one sequence of edits that achieve the minimum distance):
- Start at
x=(0,1,1,-2,0)
- Then
(0,0,1,1,-2,0)
, by copying the0
. Cost is 0 - Then
(0,0,1,-2,0)
, by removing a duplicate1
. Cost is 0 - Then
(0,0,1,-1,0)
, by editting the-2
to a-1
. Cost is 1 - Then
(0,0,1,-1)
, which is y, by removing0
at the end. Cost is 0
So total cost is 1, and hence the DTW distance is 1.
Note that this gives some idea for what warping consists of. If you were to plot these time series on a chart:
- Adding or removing duplicates corresponds to locally squashing or stretching things horizontally
- Editting an entry corresponds to locally squashing or stretching things vertically
DTW via Equivalence
Another approach is to ask: what differences between x
and y
do I want the distance to capture, and which differences do I want the distance to ignore?
There are several ways of phrasing things more formally:
- how can I vary
x
andy
without affecting the distance between them - what are the invariances of the distance function
- which time series to I consider to have a distance 0 from
x
- which sequences do I consider to be equivalent to
x
. This is the perspective I will use.
Once you establish a notion of equivalence, we can then define the distance between x
and y
as follows:
- Let
X
be the set of all series that are equivalent tox
- Let
Y
be the set of all series that are equivalent toy
- Let
d
be some ‘usual’ notion of distance, e.g. Euclidean distanced(x,y) = sqrt(sum (xi-yi)^2)
or distance from L1-normd(x,y) = sum |xi-yi|
. - Then the distance between
x
andy
is defined to be the minimum ofd(x', y')
wherex'
is fromX
andy'
is fromY
.
This is the general framework for how you can use equivalence to create notions of distance. (Note this is related to how one measures the distance between sets of points, e.g. in clustering, if you know how to measure distance between individual points).
To get the DTW distance using this equivalence framework:
- we say two series are equivalent if you can get from one to another by adding/removing duplicate entries or by adding/removing 0’s at the end
d
is the distance from the L1-norm, sod(x,y) = sum |xi — yi|
Again, as a mini-exercise, try using this perspective to show that the DTW distance between x=(0,1,1,-2,0)
and y=(0,0,1,-1)
is 1.
…
…
…
And again, I really do urge you to try! If you’re not sure what to do, then it means my explanation is not great; let me know if this is the case and I can improve for the future!
…
…
…
This is how I would get the distance of 1:
(0,1,1,-2,0)
is equivalent tox'=(0,0,1,1,-2,0)
(0,0,1,-1)
is equivalent toy'=(0,0,1,1,-1,0)
- Distance between
x'
andy'
is0+0+0+0+1+0 = 1
Concluding notes
I have provided two different explanations of what the DTW algorithm is doing:
- Determining the minimum edit distance, given a certain bunch of possible operations
- Determining the minimum L1-norm distance between equivalent versions of the time series
Lastly, I should admit that my answers to the mini-exercises missed an important detail: I showed how it is possible to get 1 but I did not give any explicit argument that this is the smallest possible distance! For the specific example, this is not too hard, but in general this would be the toughest part of determining the distance.
However, that is where the ‘dynamic’ part comes in! What is nice about the DTW algorithm is that it gives you a compact and simple algorithm to calculate the minimums for you.