How can you tell if its a martingale?

The useful property of martingales is that we can verify the martingale property locally, by proving either that E[Xt + 1|ℱt] = Xt or equivalently that E[Xt + 1 - Xt|ℱt] = E[Xt + 1|ℱt] - Xt = 0.

cs.yale.edu - Martingales

What does a 200 money line mean?

What does a +200 money line mean? A +200 money line would mean that if you placed a $100 bet, you would win $200.

Last updated Nov 20, 2019

How can I get a quick $200?

How to make $200 fast, as in TODAY! 1 – Rent out your parking spot with Spot Hero. ... 2 – Rent out your clothes on StyleLend. ... 3 – Use Current...

Last updated Oct 24, 2022

Make Your Bookie Cry And Pull The Hair By Generating $23,187

Promotion

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

We start with some definitions:

1. Stochastic processes

A stochastic process is a sequence of random variables X 0 , X 1 , ..., typically indexed either by ℕ (a discrete-time stochastic process) or ℝ (a continuous-time stochastic process; sometimes ℝ+ if we don't consider times less than 0). Interpretation: A random process that evolves over time. Common examples are martingales (described below), and Markov processes, where the distribution of X i+1 depends only on X i and not on any previous states.

2. Sigma-algebras

A sigma-algebra (or σ-algebra) is a family of sets that includes a maximal universe Ω and the empty set ∅, is closed under complement with respect to Ω and under countable unions. The elements of this family are called measurable sets. Any probability space includes a σ-algebra that describes what events are assigned probabilities in the space. For discrete spaces this is often all subsets of Ω. For continuous spaces a restricted σ-algebra is needed to avoid paradoxes; the usual choice for e.g. [0,1] are the Lebesgue-measurable sets. For our purposes the useful property of σ-algebras is that they can be used to represent restricted information. For example, suppose I roll two six-sided dice but only tell you what the sum of the two dice is. What I see is a probability space with 36 elements (all outcomes of the two dice). What you see is a random variable with only 11 values (the possible sums 2..12). You can detect whether some events occured ("is the sum even?") but not others ("is the first die 3?"). The collection of events that you can decide includes both ∅ and Ω, and is closed under complement and union, so it's a σ-algebra. The structure of this σ-algebra is that it includes a set A if and only if whenever A contains some point ω = (x,y), it contains all other points ω'=(x',y') where x+y = x'+y'. We say that a random variable X is measurable with respect to a σ-algebra ℱ (or just measurable ℱ for short) if the event [X≤x] is measurable (i.e., included in the σ-algebra) for all x. For discrete random variables this just means that any event of the form X∈S is measurable. One way to get a σ-algebra is to take a random variable consider the smallest σ-algebra with respect to which X is measurable; this is called the σ-algebra generated by X and is written . In general, we can also consider σ-algebras generated by any collection of events or random variables ; in each case we are looking at the smallest σ-algebra with respect to which all of these events or random variables are measurable. In the two-dice example, the σ-algebra we defined is just . If I tell you the value of the first as well, we get , which happens to include every event in the underlying probability space. We could also have σ-algebras , , , that correspond to different information. For discrete probability spaces, we can think of a σ-algebra as consisting of all unions of a set of elementary events or atoms, which are events with the property that no proper subset except ∅ is also contained in the σ-algebra. In this case, a random variable X is measurable ℱ if and only if it is constant on each atom of ℱ. The atoms of ℱ form a partition of Ω; each outcome ω∈Ω lies in exactly one atom. As a special case, the σ-algebra for discrete X has as its elementary events all nonempty sets of the form X-1(x).

2.1. Filtrations

With a stochastic process, it is natural to talk not only about the value of the process at time t (which is just the random variable X t ), but also what we know at time t. The usual assumption is that we never forget anything; so at minimum we can decide any event in the σ-algebra . If we have additional information (perhaps there is some underlying process used to generate the X i , and we can observe details of this process that are not necessarily included in the value of the X i ), then we have a sequence of nested σ-algebras ℱ 0 ⊆ℱ 1 ⊆ℱ 2 ⊆... . Any such sequence is called a filtration; a stochastic process {X t } is adapted to a filtration {ℱ t } if X t is measurable ℱ t for all t.

2.2. Conditional expectations

We've previously seen the definition of E[X|A] = ∑ x Pr[X=x|A], the expectation of X conditioned on an event A. We can also defined the expectation E[X|Y] of X conditioned on a random variable Y, or in even more generality the expectation E[X|ℱ] of X conditioned on a σ-algebra ℱ. In the first case, we define E[X|Y] = E[X|Y=y] when Y=y; unlike ordinary conditional expectation, which is always a constant, E[X|Y] is a random variable that may take on a different value for each value of y (but is constant if we fix the value of Y). In a discrete probability space, E[X|ℱ] is defined by considering the atoms A of ℱ and defining E[X|ℱ] = E[X|A] whenever we land in A. It is not hard to see that this is equivalent to E[X|Y] if Y generates ℱ. For continuous spaces, E[X|ℱ] is a little trickier to define, and may not even exist if X is unbounded. One way to define it is to observe that the set of all real-valued random variables defined on a given probability space form a vector space (given any random variables X and Y, aX+bY is also a random variable), and that the subset of random variables that are measurable ℱ is a subspace of this vector space. So we can define E[X|ℱ] as the closest point to X in the subspace of random variables that are measurable ℱ. In practice, we are never going to actually do this. Conditional expectations satisfy all the usual properties of expectations, so E[aX+bY|ℱ] = aE[X|ℱ] +bE[Y|ℱ], X≥Y ⇒ E[X|ℱ] ≥ E[Y|ℱ], etc. One very useful fact that only comes up because of the conditioning is that if ℱ 1 ⊆ℱ 2 , then E[X|ℱ 1 ] = E[E[X|ℱ 2 ]|ℱ 1 ]; in other words, when computing an expectation it doesn't matter if we average all at once or in stages, as long as we end up averaging across the same elementary events. A complement to this fact is that if X is measurable ℱ, it acts like a constant: E[X|ℱ] = X and E[XY|ℱ] = X E[Y|ℱ]. (We won't prove these facts here, see e.g. GrimmetStirzaker.)

3. Martingales

So now let's imagine we have a stochastic process {X t } (what we have at time t) adapted to a filtration {ℱ t } (what we know at time t). We can think of this process as describing the state of our finances after we've been playing in a casino for a while. If the casino is perfectly fair (unlike what happens in real life), then each bet we place should have an expected return of 0. In other words, if we want to estimate based on what we know at time t how much money we will have at time t+1, our best guess would be X t . Formally, a stochastic process as above is a martingale if E[X t +1|ℱ t ] = X t . Often we replace ℱ t with the σ-algebra generated by X 0 ...X t and write this as E[X t+1 |X 0 ...X t ] = X t . The useful property of martingales is that we can verify the martingale property locally, by proving either that E[X t+1 |ℱ t ] = X t or equivalently that E[X t+1 - X t |ℱ t ] = E[X t+1 |ℱ t ] - X t = 0. But this local property has strong consequences that apply across long intervals of time, as we will see below.

3.1. Examples of martingales

Let X t+1 = X t ± b t where +b t and -b t occur with equal probability b t is measurable ℱ t , and the outcome ±b t is measurable ℱ t+1 (in other words, my "bet" b t can only depend on what has happened so far and not on the future, but my knowledge ℱ t includes the outcome of all past bets). Then {X t |ℱ t } is a martingale. In general, if Y t+1 -Y t = b t (X t+1 -X t ) where (X t ,ℱ t ) is a martingale and b t is measurable ℱ t , then Y t is also a martingale with respect ℱ t . Proof: E[Y t+1 -Y t |ℱ t ] = E[b t (X t+1 -X t )|ℱ t ] = b t E[X t+1 -X t |ℱ t ] = b t ⋅0 = 0.

Special case: A random ±1 walk is a martingale.

Another random walk: Let X t+1 = X t ±1 with equal probability and let Y t = X t 2 - t. Then E[Y t+1 |Y t ] = ½(X t +1) 2 +½(X t -1) 2 - (t+1) = X t 2 + 1 - (t+1) = X t 2 - t = Y t . So {Y t } is a martingale. Doob's martingale: Let Z be any random variable and {ℱ t } any filtration. Let X t = E[Z|ℱ t ] (suppose that all such expectations exist). Then E[X t+1 |ℱ t ] = E[E[Z|ℱ t+1 ]|ℱ t ] = E[Z|ℱ t ] = X t ; (X t ,ℱ t ) is a martingale.

3.2. Properties of martingales

The defining property of a martingale is that E[X t+1 |ℱ t ] = X t . What if we want to compute E[X t ] without conditioning, or E[X t+k |ℱ t ], where k > 1?

We can easily show by induction on k that, for k≥0, E[X t+k |ℱ t ] = X t . The base is that when k = 0, E[X t+k |ℱ t ] = E[X t |ℱ t ] = X t since {X t } is adapted to {ℱ t } and thus X t is measurable ℱ t . For larger k, compute E[X t+k |ℱ t ] = E[E[X t+k |ℱ t+k-1 ]|ℱ t ] = E[X t+k-1 |ℱ t ] = X t . What about E[X t ], with no conditioning? Here we observe E[X t ] = E[E[X t |ℱ 0 ]] = E[X 0 ]. In other words, martingales never go anywhere, at least in expectation.

We can apply this to the martingales we've seen so far:

For an arbitrary betting strategy on a fair game, I neither make nor lose money on average. So if I have a strategy that makes me $1 by time t with probability p, then I must lose $x with probability (1-p), where E[X t ] = p - (1-p)x = 0 implies x = p/(1-p). This means that I can (by doubling my bet each time I lose, for example) make $1 with probability 1-10 -6 if I am willing to accept a probability of 10 -6 of losing 10 -6 -1 dollars. This is probably a bad strategy for ordinary people but it works out well for insurance companies, who charge a small additional premium to make a profit above the zero expected return. A random walk that runs for t steps starting at position 0 ends with E[X t 2-t] = 0 or E[X t 2] = t.

Which country gambles more?

Best sites to find comparisons between online casinos and deals include Casino.org. By sheer size (not a percentage of the population, but a raw...

Last updated May 22, 2022

Why do Australian people say oi?

"Aussie Aussie Aussie, Oi Oi Oi" is a cheer or chant often performed at Australian sport events. It is a variation of the Oggy Oggy Oggy chant used...

Last updated May 9, 2020

Promotion

4. Martingales as sums of uncorrelated random variables

Let {X t ,ℱ t } be a martingale with X 0 =0. Define, for t>0, Δ t , = X t -X t-1 . Then we can write X t as , where E[Δ i |Δ 1 ...Δ i-1 ] = E[X i -X i-1 |ℱ i ] = 0. In other words, Δ i is uncorrelated with Δ 1 ...Δ i-1 . This is not quite as good a condition as independence, but is good enough that martingales often act very much like sums of independent random variables.

4.1. Variance of Xt

For example, suppose we want to compute the Var[X t ] = E[(X t -EX t )2] = E[X t 2] (recall that we are assuming X 0 = 0, so EX t = 0 by the martingale property). We have Var[X t ] = Var[∑ i≤t Δ i ] = ∑ i,j≤t Cov(Δ i ,Δ j ) = ∑ i≤t Var[Δ i ] + 2 ∑ i4.2. The Azuma-Hoeffding inequality Same setup as above: X 0 = 0, Δ t = X t - X t-1 with |Δ t |≤c t for all t. Then

Variants:

If we don't assume X 0 = 0, this becomes a bound on X t -X 0 . Since {-X t ,ℱ t } is also a martingale, the same bound applies, which gives the symmetric bound Pr[X t -X 0 ≤ -x] ≤ exp(-x2/2∑c i 2). Applying the union bound gives Pr[|X t -X 0 | ≥ x] ≤ 2 exp(-x2/2∑c i 2). The proof is by the usual trick of applying Markov's inequality to E[exp(α∑X t )] for appropriate choice of α, with some tricks in bounding E[exp(α∑X t )]. See Azuma-Hoeffding inequality.

4.2.1. Application: the method of bounded differences

Basic idea: Azuma-Hoeffding applies to any process that we can model as revealing the inputs x 1 ...x n to some function f(x 1 ...x n ) one at a time, provided changing any one input changes f by at most c (the Lipschitz condition). The reason is that when the Lipschitz condition holds, then E[f(x 1 ...x n ) | x 1 ...x t ] is a martingale that satisfies the requirements of the inequality. This allows us to show, for example, that the number of colors needed to color a random graph is tightly concentrated, by considering a process where x i reveals all the edges between vertex i and smaller vertices (a vertex exposure martingale).

5. Stopping times

Let {ℱ t } be a filtration. A random variable T is a stopping time for {ℱ t } if T≥0 and the event [T≤t] is measurable ℱ t for all t. In simple terms, T is a stopping time if you know at time t whether you've stopped or not.

Stopping times are very handy when used with martingales, because of the

Optional Stopping Theorem If (X t ,ℱ t ) is a martingale and T is a stopping time for {ℱ t }, then E[X T ] = E[X 0 ] if Pr[T<∞] = 1, E[|X T |] < ∞, and lim t→∞ E[ X t ⋅[T > t] ] = 0. The first condition says that T is finite with probability 1 (i.e., eventually we do stop). The second condition puts a bound on how big |X T | can get, which excludes some bad outcomes where we accept a small probability of a huge loss in order to get a large probability of a small gain. The third says that the contribution of large values of t to E[X T ] goes to zero as we consider larger and larger t; the term [T > t] is the indicator variable for the event that T is larger than t. It would be nice if we could show E[X T ] = E[X 0 ] without the side conditions, but in general this isn't true. For example, the double-or-win martingale strategy eventually returns 1 with probability 1, so if T is the time we stop playing, we have Pr[T<∞] = 1, E[|X T |] < ∞, but E[X T ] = 1 ≠ E[X 0 ] = 0. Here we have E[X t ⋅[T>t]] = -2t-1(1/2)t-1 = -1 for t>0, so it's the third condition that's violated. (The same thing happens for a simple ±1 random walk that stops at +1, but it's a bit harder to calculate E[X t ⋅[T>t]] in this case.) To prove the optional stopping theorem, it helps to start with a simpler version: Optional Stopping Theorem (baby version) If (X t ,ℱ t ) is a martingale and T is a stopping time for {ℱ t }, then for any n∈ℕ, E[X min(T,n) ] = E[X 0 ]. Proof Define Y t = <>. Then (Y t ,ℱ t ) is a martingale since we can treat [T≤t-1] as a sequence of bets. But then E[X min(T,n) ] = E[Y n ] = E[Y 0 ] = E[X 0 ]. So now we'll prove the full version by considering E[X min(T,n) ] and showing that, under the conditions of the theorem, it approaches E[X T ] as n goes to infinity. First observe that, for any n, X T = X min(T,n) + [T>n](X T -X n ), because either T≤n, and we just get X T , or T>n, and we get X n +(X T -X n ) = X T . Now take expectations: E[X T ] = E[X min(T,n) ] + E[[T>n]X T ] - E[[T>n]X n ]. Condition (3) on the theorem gives lim n→∞ E[[T>n]X n ]→0. If we can show that the middle term also vanishes in the limit, we are done. Here we use condition (2). Observe that E[[T>n] X T ] = . Compare this with E[X T ] = ; this is a convergent series, so in the limit the sum of the terms for i=0..n converges to E[X T ]. But this means that the sum of the remaining terms for i=n+1..∞ converges to zero. So the middle term goes to zero as n goes to infinity. We thus have E[X T ] = lim n→∞ E[X min(T,n) ] + E[[T>n]X T ] - E[[T>n]X n ] = lim n→∞ E[X min(T,n)] = E[X 0,,]. This completes the proof. Using the full-blown optional stopping theorem is a pain in the neck, because conditions (2) and (3) are often hard to test directly. So in practice one generally chooses one of several weaker but easier variants: Optional Stopping Theorem (bounded time) If (X t ,ℱ t ) is a martingale and T is a stopping time for {ℱ t } with T≤n always for some fixed n, then E[X T ] = E[X 0 ]. Proof: X T = X min(T,n) . Optional Stopping Theorem (bounded range) If (X t ,ℱ t ) is a martingale and T is a stopping time for {ℱ t } with Pr[T<∞] = 0 and |X t | ≤ M always for some fixed M and all t, then E[X T ] = E[X 0 ]. Proof: Use the full OST, with (1) given, (2) implied by |X t | ≤ M, and (3) follows from |E[X t [T>t]| ≤ E[|X t |[T>t]] ≤ M Pr[T>t] → 0 since Pr[T<∞] = 1. Optional Stopping Theorem (bounded increments) If (X t ,ℱ t ) is a martingale and T is a stopping time for {ℱ t } where Pr[T<∞] = 1, ET≤∞, and |X t -X t-1 | ≤ c always for all t, then E[X T ] = E[X 0 ]. Proof: Here we go back into the original proof of the OST, but there is a simpler argument that E[(X T -X n )[T > n]] converges to 0. The idea is that |E[(X T -X n )[T>n]]| = |E[∑ t≥n (X t+1 -X t ) [T>t]]| ≤ E[∑ t≥n |(X t+1 -X t )| [T>t]] ≤ E[∑ t≥n c[T>t] ]. Now use the fact that E[T] = ∑ t [T>t] to show that this is again the tail of a convergent sequence, and thus converges to zero.

5.1. Applications

Let X t be a ±1 random walk that starts at 0 and stops if it reaches +a or -b. Let T be the time at which the process stops. We can easily show that Pr[T<∞] = 1 and ET < ∞ by observing that from any state of the random walk, there is a probability of at least 2 -(a+b) that it stops within a+b steps (by flipping heads a+b times in a row), so that if we consider a sequence of intervals if length a+b, the expected number of such intervals we can have before we stop is at most 2 a+b . We have bounded increments by the definition of the process (bounded range also works). So E[X T ] = E[X 0 ] = 0 and the probability p of landing on a instead of -b must satisfy pa - (1-p)b = 0, giving p = b/(a+b). Let X t and T be as above and let Y t = X t 2-t, which we previously showed to be a martingale. Again we have bounded increments (but not bounded range!), since X t 2 ≤ max(a,b)2 for all t and t only changes by 1 at each step. So E[Y T ] = 0, which gives E[T] = E[X T 2]. But we can calculate E[X T 2], since it is a2 Pr[X T = a] + b2 Pr[X t = -b] = a2(b/(a+b)) + b2(a/(a+b)) = (a2b+b2a)/(a+b) = ab.

6. Submartingales and supermartingales

Sometimes we can't guarantee E[X t+1 |ℱ t ] = X t , but we can nonetheless use X t as a lower or upper bound on E[X t+1 |ℱ t ]. An adapted process (X t ,ℱ t ) is a submartingale if X t ≤ E[X t+1 |ℱ t ] for all t and a supermartingale if X t ≥ E[X t+1 |ℱ t ]. Note that the quantity that is "sub" (below) or "super" (above) is always where we are now: so submartingales tend to go up over time while supermartingales tend to go down. If a process is both a submartingale and a supermartingale, it's a martingale. Sub/super-martingales are handy when we can't set up an exact martingale, but we don't mind because we only care about one-sided bounds anyway. Most of the properties we have seen for martingales hold for sub/super-martingales if we replace equality with ≤ or ≥: for example, in a submartingale, E[X 0 ] ≤ E[X t ] for all t.

What are the dangers of betting?

Compulsive gambling can have profound and long-lasting consequences for your life, such as: Relationship problems. Financial problems, including...

Last updated May 30, 2018

What are the 5 types of bar?

Types of bar concepts within this guide: Beer Bar. Wine Bar. Rum bar. Whisky bar. Vodka Bar. Tequila bar. Gin Bar. Champagne bar. More items...

Last updated Jul 7, 2019

Promotion

6.1. The Doob-Meyer decomposition

The easiest way to get these results in general is to write a submartingale (X t ,ℱ t ) as X t = Y t + Z t , where Y t is a martingale and Z t is a non-decreasing predictable process with respect to ℱ t ; this means that Z t+1 is always measurable ℱ t . The existence of this decomposition (known as the Doob decomposition or sometimes the Doob-Meyer decomposition among people who get tired of naming everything after Doob) is easily shown: define Z 0 = 0, Z t+1 = Z t + E[X t+1 -X t |ℱ t ] and Y t = X t - Z t . Then X t is trivially equal to Y t + Z t , and it's not hard to see that Z t is both non-decreasing and predictable (the non-decreasing part is just the submartingale property). To show that Y t is a martingale, compute E[Y t+1 |ℱ t ] = E[X t+1 -Z t+1 |ℱ t ] = E[X t+1 -E[X t+1 -X t |ℱ t ]|ℱ t ] = E[X t+1 -X t+1 +X t |ℱ t ] = X t .

For supermartingales, the same decomposition works, but now Z t is non-increasing.

6.2. Azuma-Hoeffding inequality for sub- and super-martingales

We can use the existence of the Doob decomposition to prove results about a submartingale or supermartingale X t even if we don't know how to compute it. For example, suppose (X t ,ℱ t ) is a submartingale with bounded increments, that is, that |X t -X t-1 | ≤ c t for fixed constants c t and all t. Decompose X t as Y t +Z t as above. From |X t -X t-1 | ≤ c t we have |(Y t+Z t )-(Y t-1 +Z t-1)| ≤ c t or |(Y t -Y t-1 )+(Z t -Z t-1 )| ≤ c t . We want to get a bound on |Y t -Y t-1|; first we get a bound on Z t -Z t-1 by observing that E[Y t -Y t-1 ] = 0 implies that Y t -Y t-1 ≥0 with nonzero probability. Suppose that this event occurs; then everything inside the absolute value is non-negative, and we have (Y t -Y t-1 )+(Z t -Z t-1) ≤ c t which gives 0 ≤ (Z t -Z t-1 ) ≤ c t (if we also take into account the fact that Z t is non-decreasing). But then (Y t -Y t-1 ) must lie between -2c t and c t , or we violate the constraints on X t . So we can apply the Azuma-Hoeffding inequality to Y t with the constants 2c t , which gives Pr[Y t -Y 0 ≤ -x] ≤ . But since X 0 = Y 0 and X t ≥ Y t , this gives the same bound for Pr[X t - X 0 ≤ -x]. A similar bound on Pr[X t - X 0 ≥ x] holds for supermartingales. Note that in each case only one side of the usual Azuma-Hoeffding bound holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls), because it could be that Z t accounts for nearly all of X t .

6.3. Sub- and super-martingales from functions of martingales

Suppose (X t ,ℱ t ) is a martingale, and let f be a convex function. Then Jensen's inequality says that E[f(X t+1 )|ℱ t+ ] ≥ f(E[X t+1 |ℱ t ]) = f(X t ). In other words, when f is convex, (f(X t ),ℱ t ) is a submartingale. Where convex functions turn martingales into submartingales, concave functions turn martingales into supermartingales. This can be very handy if you want to establish that some sequence is a sub/super-martingale quickly. Example Let f(x) = |x|. Then f is convex, so we have that if (X t ,ℱ t ) is a martingale, (|X t |,ℱ t ) is a submartingale. This means, for example, that if we start a ±1 random walk at some random location X 0 , then without knowing anything about the distribution of X 0 we can nonetheless conclude that for any fixed t, E[|X t |] ≥ E[|X 0 |]. This fact is not as useful as one might think for getting bounds: given a martingale, it is almost always better to work with the martingale directly and then apply Jensen's inequality or f afterwards to get the desired bound.

6.4. Supermartingales and recurrences

When we solve a probabilistic recurrence, we get an upper bound on the work that remains in each state. If we have such a bound, we can get a supermartingale that may allow us to prove concentration bounds on the cost of our algorithm. Let C t be the cost we've paid so far, and Y t be an upper bound on the expected cost remaining, with ℱ t the usual σ-algebra describing everything that has happened up to time t. Then we have Y t ≥ E[C ∞ -C t |ℱ t ]. We can reverse this by finding a non-negative Y t such that E[Y t -Y t+1 |ℱ t ] ≥ [C t+1 -C t |ℱ t ], so that the change in Y t pays for the cost of each step. If this holds, then for X t = Y t + C t we have E[X t+1 -X t |ℱ t ] = E[Y t+1 -Y t +C t+1 -C t |ℱ t ] ≤ 0, and (X t ,ℱ t ) is a supermartingale. Now let T be the time at which the algorithm terminates, and suppose that one of the conditions for the Optional Stopping Theorem holds for T and (X t ,ℱ t ) (typical cases are that there is a fixed upper bound on the total time for the algorithm or that there is a fixed upper bound on the cost; if neither holds, we may have to do a little bit of work). Then E[C T ] ≤ E[X T ] ≤ E[X 0 ] = E[Y 0 ] by the supermartingale property. But since we now have a supermartingale instead of just a recurrence, we may be able to get stronger bounds by using Azuma-Hoeffding.

6.5. Example: QuickSort

For example, let's take QuickSort. We previously calculated (see RandomizedAlgorithms) that 2 n ln n is a bound on the number of comparisons needed to sort an array of n elements. Let's try to turn this into a supermartingale. Recall that QuickSort takes an unsorted array and recursively partitions it into smaller unsorted arrays, terminating when each array has only one element. Let's imagine that we do these partitions in parallel; i.e., at each time step we'll partition every block we've got left with more than one element. Since each such step reduces the size of the largest unsorted block, we finish after at most n such steps. We expect each block of size n i to take no more than 2n i ln n i comparisons to complete. So we will use the sum ∑ 2 n i ln n i over all blocks remaining at time t as our estimate Y t of the remaining time. If we can show that the change in Y t pays for the number of comparisons at step t on average, then the supermartingale property will hold for C t +Y t and we'll get E[C ∞ ] ≤ Y 0 = 2 n ln n. Let's consider the partition of a single block of size n i . The cost of the partition is n i -1 comparisons. The expected change in Y t contributed by this particular block is E[2 n i ln n - (2 k ln k + 2 (n i -k-1) ln (n,,i-k-1))], where k is uniform in the range 0..n-1 (we assume that all elements are distinct so that the pivot ends up in a block by itself). The subtracted terms are equal to The -n pays for the n-1 cost on average. We thus get that {C t + ∑ 2 n i ln n i } is a supermartingale, and that the expected number of comparisons to sort n elements is at most 2 n ln n.

We already knew this. Can we learn anything else?

6.5.1. A mediocre high-probability bound using Azuma-Hoeffding

Let's try to get a bound on how much X t = C t +Y t changes at each step. We can compute C t+1 -C t = ∑ (n i -1) ≤ n, and Y t+1 -Y t ≤ 0, so X t+1 -X t ≤ n. In the other direction, when we split a block of size n i into two blocks of size k and n i -k-1, the smallest value we can get for 2 k ln k + 2 (n i -k-1) ln (n i -k-1) is when the split is exactly even (to prove this, use convexity of the potential function). In this case we get a drop from 2 n i ln n i to 4 ((n i -1)/2) ln ((n i -1)/2) = 2 (n i - 1) ln ((n i -1)/2) = 2 (n i - 1) ln (n i (1-1/2n i )) = 2(n i - 1) (ln n i + ln (1-1/2n i )) ≥ 2(n i - 1) (ln n i - ln 2) = 2 n i ln n i - ln n i - 2 n i ln 2 + ln 2, giving a drop of at most ln n i + 2 n i ln 2 + ln 2 = ln (2n i ) + 2n i . It is not hard to see that this expression is maximized when all n i = 1 (which is a sign that our bound is not going to be tight), which gives a total over all i that is at most n(2 + ln 2) < 3n. Now plug this into Azuma-Hoeffding (the one-sided version) to get Pr[C n - 2 n ln n ≥ x] ≤ exp(-x2/8n⋅(3n)2), which starts getting small for x ≥ √(72n3) or approximately x ≥ 9n3/2. This is a pretty weak bound, although it's much better than we get with just Markov's inequality.