Proving a more general theorem is often easier

Sean Seefried

Sat 26 Feb 2022

Introduction

When programming, I often “generalise from the specific”. I find it easier to work with more concrete concepts first and only later to generalise my solution. At least that’s the way I program in programming-only languages, that is, languages in which I can’t also prove the correctness of my programs.

A simple example of this style of programming might be that I’ll write a function to sort a list of integers before going on to generalise it to sort a list of anything that can be ordered.

But I’ve discovered something counter-intuitive. When you’re also interested in proving your programs correct, generalising later can actually be harder. In my experience proofs on specific structures can be a lot harder than proofs on more general structures. If this is true, in general, then it makes sense to do the hard work of generalising one’s programs up-front because it will make proving them correct easier. Further, I suspect the combined work of a) writing the program and proving it correct will be less than b) writing a specific program, trying to prove it, failing, generalising the program and finally proving it correct.

This disparity in difficulty seems counter-intuitive. I suspect it’s just a consequence of the fact that general structures, lacking concreteness, are defined by their mathematical properties and these mathematical properties are useful in proving correctness.

In this post I’d like to show you an example of a specific program that was hard to prove correct but became much easier once I had generalised it.

Bijections that permute a finite set

I defined a permutation as a bijection from Fin n to Fin n for some n. For convenience I renamed Fin to 𝔽 because I really value conciseness and it has echoes of the ℕ value defined in module Data.Nat.

  Perm : ℕ → Set
  Perm n = 𝔽 n ↔ 𝔽 n

In Agda a bijection is constructed from:

a function going in the forward direction: f
a function going in the backwards direction: f⁻¹
a proof that f⁻¹ ∘ f ≡ id i.e. that f⁻¹ is a left-inverse of f
a proof that f ∘ f⁻¹ ≡ id i.e. that f⁻¹ is a right-inverse of f

Introducing `+1-mod-n` and `-1-mod-n`

To start I thought I’d play around with a very simple permutation that, in the forward direction, evaluates to the successor of its input modulo n.

  +1-mod-n : {n : ℕ} → 𝔽 n → 𝔽 n
  +1-mod-n {ℕ.suc n} m with n ℕ.≟ toℕ m
  ... | no m≢n  = suc (lower₁ m m≢n)
  ... | yes _ = zero

And here is its inverse:

  -1-mod-n : {n : ℕ} → 𝔽 n → 𝔽 n
  -1-mod-n {ℕ.suc n} zero = fromℕ n
  -1-mod-n {ℕ.suc n} (suc m) = inject₁ m

I liked the definition for -1-mod-n much better than the definition of +1-mod-n, but couldn’t think of an alternative way to define the latter. I then got stuck into trying to prove that -1-mod-n is the left-inverse of +1-mod-n.

The proof took me many hours but eventually I came up with the following. It’s a very ugly proof and the only reason I have posted it here is so that you can have the appropriate “ugh”-reaction.

  i≡j⇒j<i+1 : ∀ {i j } → i ≡ j → j ℕ.< ℕ.suc i
  i≡j⇒j<i+1 {i} {j} i≡j =
    begin
      ℕ.suc j
    ≡⟨ cong ℕ.suc (≡-sym i≡j) ⟩
      ℕ.suc i
    ∎
    where
      open Relation.Binary.PropositionalEquality renaming (sym to ≡-sym)
      open ℕ.≤-Reasoning

  open ≡-Reasoning

  left-inverse′ : (n : ℕ) → (x : 𝔽 n) → -1-mod-n (+1-mod-n x) ≡ x
  left-inverse′ ℕ.zero ()
  left-inverse′ (ℕ.suc ℕ.zero) zero = refl
  left-inverse′ (ℕ.suc (ℕ.suc n′)) x
              with ℕ.suc n′ ℕ.≟ toℕ x
  ...  | no n+1≢x with x
  ...               | zero = refl
  ...               | suc x =
      begin
        -1-mod-n (suc (lower₁ (suc x) n+1≢x))
      ≡⟨⟩
        inject₁ (lower₁ (suc x) n+1≢x)
      ≡⟨  inject₁-lower₁ (suc x) n+1≢x ⟩
        suc x
      ∎
  left-inverse′ (ℕ.suc (ℕ.suc n)) x
       | yes n+1≡x =
     begin
       -1-mod-n zero
     ≡⟨⟩
       fromℕ (ℕ.suc n)
     ≡⟨ fromℕ-def (ℕ.suc n) ⟩
       fromℕ< n+1<n+2
     ≡⟨ fromℕ<-cong (ℕ.suc n) (toℕ x) n+1≡x n+1<n+2 (i≡j⇒j<i+1 n+1≡x) ⟩
        fromℕ< (i≡j⇒j<i+1 n+1≡x )
     ≡⟨ fromℕ<-toℕ x (i≡j⇒j<i+1 n+1≡x)  ⟩
       x
     ∎
     where
       n+1<n+2 : ℕ.suc n ℕ.< ℕ.suc (ℕ.suc n)
       n+1<n+2 = ℕ.s≤s (ℕ.s≤s (ℕ.≤-reflexive refl))

I also tried to prove that -1-mod-n was the right inverse of +1-mod-n but got stuck almost straight away and gave up.

A generalisation of the pattern of permutation

If you look carefully there is an obvious generalisation of the pattern of permutation staring us in the face. Imagine all the possible values of Fin n written down, in order, as a list:

[ 0, 1, ..., n - 1 ]

Now imagine splitting this list into two sub-lists:

[ 0, 1, ... n - 2 ] [ n - 1]

Now swap the two sub-lists:

[n - 1] [ 0, 1, ... n - 2 ]

Now join them again:

[ n - 1, 0, 1, ... n - 2 ]

Now, call this list P and imagine that it is indexed by the values of 𝔽 n. We find that P[i] ≡ +1-mod-n i for all i ∈ 𝔽 n.

It turns out that there are a number of functions in modules Data.Fin and Data.Sum that do something very similar to the split-swap-join exercise we just went through above.

The signatures of these functions are:

splitAt : ∀ m {n} → Fin (m ℕ.+ n) → Fin m ⊎ Fin n

swap : A ⊎ B → B ⊎ A

join : ∀ m n → Fin m ⊎ Fin n → Fin (m ℕ.+ n)

The behaviour of join is worth looking at in more detail.

join : ∀ m n → Fin m ⊎ Fin n → Fin (m ℕ.+ n)
join m n = [ inject+ n , raise {n} m ]′

where [_,_]′ is defined as follows:

[_,_]′ : (A → C) → (B → C) → (A ⊎ B → C)
[_,_]′ = [_,_]

and

[_,_] : ∀ {C : A ⊎ B → Set c} →
        ((x : A) → C (inj₁ x)) → ((x : B) → C (inj₂ x)) →
        ((x : A ⊎ B) → C x)
[ f , g ] (inj₁ x) = f x
[ f , g ] (inj₂ y) = g y

Simplifying these means that join could be rewritten as:

  join m n (inj₁ x) = inject+ n x
  join m n (inj₂ y) = raise {n} m y

The type signatures of the two functions on the RHS of the definition above are:

inject+ : ∀ {m} n → Fin m → Fin (m ℕ.+ n)

raise : ∀ {m} n → Fin m → Fin (n ℕ.+ m)

Function inject+ takes a Fin m and “injects” it into the Fin (m ℕ.+ n) type without changing the structure of the value. e.g. For value suc (suc zero) : Fin 3, inject+ (suc (suc zero)) evaluates to suc (suc zero) : Fin 5 for m = 3 and n = 2.

Function raise is a little different to inject+. It adds n to its argument while increasing the size of the type to Fin (n ℕ.+ m). e.g. raise {3} 2 (suc (suc zero)) evaluates to suc (suc (suc (suc zero))) : Fin 5.

We can now now define a function called splitPermute that performs the split-swap-join process that I described earlier.

  splitPermute : (m : ℕ) {n : ℕ} → (𝔽 (m + n) → 𝔽 (n + m))
  splitPermute m {n} = join n m ∘ swap ∘ splitAt m

A really nice thing about splitPermute is that for a given m + n the inverse of splitPermute m is splitPermute n.

Remember function +1-mod-n from earlier? It is the special case of splitPermute where n = 1.

Yet the proof that splitPermute n is the left inverse of splitPermute m is much simpler than the proof that -1-mod-n is the left inverse of +1-mod-n.

The proof makes use of the following three theorems from Data.Fin.Properties and Data.Sum.Properties

splitAt-join : ∀ m n i → splitAt m (join m n i) ≡ i

swap-involutive : swap {A = A} {B = B} ∘ swap ≗ id

join-splitAt : ∀ m n i → join m n (splitAt m i) ≡ i

And here is the proof!

  inverse : {m n : ℕ} → splitPermute n ∘ splitPermute m ≗ id
  inverse {m} {n} =
    begin
      (splitPermute n ∘ splitPermute m)
    ≡⟨⟩
      (join m n ∘ swap ∘ splitAt n) ∘ (join n m ∘ swap ∘ splitAt m)
    ≡⟨⟩
      (join m n ∘ swap ∘ splitAt n ∘ join n m ∘ swap ∘ splitAt m)
    ≈⟨ cong-[ join m n ∘ swap ]∘⟨ splitAt-join n m ⟩∘[ swap ∘ splitAt m ] ⟩
      (join m n ∘ swap ∘ swap ∘ splitAt m)
    ≈⟨ cong-[ join m n ]∘⟨ swap-involutive ⟩∘[ splitAt m ] ⟩
      (join m n ∘ splitAt m)
    ≈⟨ join-splitAt m n ⟩
      id
    ∎
    where
      open import Relation.Binary.PropositionalEquality
      open import Relation.Binary.Reasoning.Setoid (𝔽 (m + n) →-setoid 𝔽 (m + n))

If you’re confused about the use of cong-[_]∘⟨_⟩∘[_] above. It’s just a nice helper function I wrote to help with setoid reasoning on extensional equality (_≗_). It is defined as follows:

  cong-[_]∘⟨_⟩∘[_] :
    {a : Level} {A′ A B B′ : Set a}
    → (h : B → B′)
    → {f g : A → B}
    → f ≗ g
    → (h′ : A′ → A)
    → h ∘ f ∘ h′ ≗ h ∘ g ∘ h′
  cong-[_]∘⟨_⟩∘[_] h {f} {g} f≗g h′ = λ x → cong h (f≗g (h′ x))
    where
      open Relation.Binary.PropositionalEquality using (cong)

Agda’s ability to define mixfix operators using the _ character really shon here. See how nice the fragment below looks? The parts between the square brackets remain untouched while the law between the angle brackets (⟨⟩) applies a law, in this case: swap-involutive.

      (join m n ∘ swap ∘ swap ∘ splitAt m)
    ≈⟨ cong-[ join m n ]∘⟨ swap-involutive ⟩∘[ splitAt m ] ⟩
      (join m n ∘ splitAt m)

Conclusion

In this post I showed you how proving a theorem on a more general formulation of a problem often turns out to be easier than doing it for a specific case. In this particular case you may argue that the specific case could have been defined in terms of splitAt, swap and join and you’d be correct. However, I simply didn’t spot that at the time. It was only by thinking about the general pattern of permutation that I was able to see that these functions should even be used.

One is likely to generalise a program once a specific solution is found, but proving things on general programs is often easier. Assuming this is true it makes sense to do the work of generalising up-front.

Introduction

Bijections that permute a finite set

Introducing +1-mod-n and -1-mod-n

A generalisation of the pattern of permutation

Conclusion

Introducing `+1-mod-n` and `-1-mod-n`