Longest Increasing Subsequence, from Bruno Ombreux

December 8, 2006 |

An interesting undergraduate paper, is looking at the New-York phone directory. And… No, it is not about the Benford law. It is about subsequences.

The same approach can be applied to markets. For this, we will consider random mappings between two ordered finite sets: time [1,2,3…..t] and ranked asset returns [r1, r2, r3….rt].

The longest subsequence in asset returns should converge to a Tracy-Widom distribution. It is a beautiful distribution that seems rather ubiquitous. Unfortunately, it is relatively new also, with few implementations. There is some code in S-Plus, but it is using a S-Plus function, ivp.ab, with no obvious equivalent in R. If someone knows of an R module solving ODEs at point B, knowing the solution at point A, please let me know.

So we won’t be checking the asymptotic behavior of any asset. Instead, we will content ourselves with the study of the longest subsequence in subsets of length 15, for which exact frequencies are provided in the New-York directory article. It is easy to extend the study to other lengths, by generating random sequences in R.

We’re using the daily DJIA since 1896 for illustrative purposes, but as explained often on DailySpec, this doesn’t make sense. It would be better to look at more recent returns, preferably intraday, because many observations are required.

Intuitively, longest increasing subsequence behavior could be useful to know. For instance, if it is longer than dictated by randomness, it means that big drops are followed by smaller drops and big rises are followed by bigger rises, than would occur in a random walk. This evokes a “U” shape where it would make sense to buy sharp drops.

Please note that consecutive drops or rises don’t necessarily occur the following day. Subsequences are not defined by consecutive returns. They are an entirely different concept from the one of runs. Instead, they are defined by returns distributed all over the interval. We are looking at some form of market local curvature. But it is a distributed curvature, not a continuous one. This kind of stuff is not captured by usual tests and certainly not by the eyeball.

A chart of realized frequencies against exact frequencies seems to indicate that actual longest subsequences are longer than theoretical ones. At the 5% level, a Kolmogorov-Smirnov goodness-of-fit test rejects the null hypothesis in favor of non-randomness.

Two-sample Kolmogorov-Smirnov test:

data: LN0 and Sexact
D^+ = 0.4667, p-value = 0.03813
alternative hypothesis: greater

Buying sharp drops is a good thing … But that’s for 15-day subsets only and needs more work.

The R code is appended. Peer-reviews are needed due to my propensity to counting mistakes.

# #   Longest increasing subsequences #   # Patience sorting    patience <- function(x,s=NULL)    {    s[1] <- x[1]        for (i in 2:length(x))        {            for (j in 1:length(s))            {if (x[i] <= s[j]) {s[j] <- x[i]; break}}        if (x[i] > s[length(s)]) {s[j+1] <- x[i]}        }    return(s);    }  # data loading    S15exact <- scan("S15exact.txt")    testdata <- read.table("testdata.txt",sep=",",header=T)    testdata <- diff(log(testdata$close))     # to change if need be, eg replacing S15exact by random numbers frequencies    data <- testdata    Sexact <- S15exact  # data ranking, data split, Q samples of length N    N <- 15    Q <- trunc(length(data)/N)     data <- data[1:(Q*N)]    data <- rank(data)    datasplit <- 1:Q    datasplit <- rep(datasplit, each=N)    samples <- split(data,datasplit)  # increasing subsequences     LN <- numeric()    for (i in 1:Q)        {        samples[[i]] <- patience(samples[[i]])        LN[[i]] <- length(samples[[i]])        }    LN <- tapply(LN, factor(LN),sum)  # adding increasing subsequences with zero observed frequencies # there must be a more elegant way     LN0 <- tapply(rep(0,N), factor(1:N),sum)    for (i in 1:dim(LN0))        {        for (j in 1:dim(LN))        {        if (names(LN0)[i] == names(LN)[j]) {LN0[[i]] = LN[[j]]}        }        }    LN0 <- array(LN0)/sum(LN0)  # graphs     plot(LN0, type ="l", col="brown")    lines(Sexact, col="red")  # Kolmogorov-Smirnov  kstest <- ks.test(LN0,Sexact,alternative = c("greater")) kstest

Bruno later adds:

I did a bit more research on Tracy-Widom this week-end. Among a great many other things, it is also measuring the probability of explosion for a random-walk with drift. This makes a lot of sense intuitively. Explosion, the right hand side bar in a “U” shape, and longest increasing subsequences are more or less the same thing.

The fact that it measures probability of explosion for a random walk ‘with drift’ is reassuring. I was a bit worried by the possibility to be capturing only positive drift, and not adding any new information, since subsequence computation is drift-independent.

Recent Posts

List of Authors

Dec

8

Longest Increasing Subsequence, from Bruno Ombreux

Comments

Archives

Resources & Links

Search