As hard as 10 lines of code…

bool ceq(const char *s, const char *t, std::size_t n) {
  int i=0, j=0, size=static_cast<int>(n), k;
  while(i<n && j<size) {
    for(k=0; k<size && s[(i+k)%size]==t[(j+k)%size]; ++k);
    if(k==size) return true;
    if(s[(i+k)%size]>t[(j+k)%size])
      i+=k+1;
    else
      j+=k+1;
  }
  return false;
}
Given two strings s and t, this function returns true if and only if s and t are equivalent under circular shift. s and t are equivalent under circular shift if there is an index k such that for any integer x\ge 0, s[x \mod n]=t[(k+x) \mod n].

It is easy to see that the worst case running time of this function is linear: The asymptotic running time is bounded by the total number of iterations executed by the inner “for” loop. In each iteration, the value of k will be increased and ultimately, the increment will be accumulated into either variable i or j. When either i or j reaches n, the function will return.

Why do I post this small piece of program? It is a perfect example to show a program can be short but still not easy to understand. It is also one of my favorite examples to convince people “loop invariant proof” is a good way to understand a program — if it is not the only way. You may load the program into your favorite debugger, and check closely how each single line is executed; you may feed it with all kind testing cases, and verify every output. But you simply cannot get enough insight to convince yourself why the program is correct. This is what I believe in: Program should be proven to be correct; Testing should only be used to catch typos.

How to prove this program is correct? Let us introduce several notations first.
1. Given string s of length of n>0, define s^{(i)} = s[i, i+1,\ldots n-1, 0,\ldots i-1].
2. Define F(s)= \{s^{(i)}, 0\le i <n\} . That is, the set of strings generated by shifting s.
3. For a string s and a set of strings T, we call s beats T if there is a string t\in T such that s>t.
4. For two set of strings S and T, we call S beats T if for any string s\in S, there exists a string t\in T such that s>t.

Claim: s and t are equivalent if and only if  neither F(s) beats F(t) nor F(t) beats F(s). (Hints: If s and t are equivalent, then F(t)= F(s). So the minimum string in F(s) cannot beat F(t) vice versa.)

Our program holds such invariant: At the beginning of each iteration of the “while” loop, for any x<i, s^{(x)} beats F(t); for any y<j, t^{(y)} beats F(s). We will skip the proof of the invariant since it is trivial. With this invariant, when the while loop stops we have either:
1. k==size: in this case, the two string are equivalent. (return true)
2. i>=size or j>=size: in this case, either F(s) beats F(t) or F(t) beats F(s) , so s and t are not equivalent by the claim.  (return false)

P.S. The function returns false for empty strings (n=0) and one may argue that true should be returned instead.

Advertisements
This entry was posted in Programming. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s