Legend

The diagram consists of States and Transitions.

States:
GoldStart of the recognition of a passage.
Light BlueFound a Book, Chapter or Verse.
MagentaAn intermediate state
Transitions:
n (red)a sequence of digits
w (blue)a sequence of letters
s (black)a single separator, such as . or :
r (green)a single range character, such as -
t (green)a single terminator character, such as , or ;
? - means optional

Introduction

At the heart of Bible software is a passage recognition system. Biblical references are defined in terms of books, chapters and verses. A reference may be just a book, a book and a chapter, or a book, chapter and verse. A range starts with one reference and ends with another. A passage is a collection of these references.

A passage recognition system is trivial if every reference is completely formed. A trivial system would be inflexible for the names of the books of the Bible. If an exact match is found, then the book is known. So Genesis would not be found with Gen, GENESIS, genesis, or anything else. To specify a chapter, a book name would be followed by a single space and then the chapter number. To sepecify a verse, the chapter reference would be followed by a colon, ':', and then the verse number.

The goal of a passage recognition system is to understand all passages that people would. A trivial passage recognition system is not reasonable. Some books of the Bible have more than one name. Most published references use abbreviations. Most people will abbreviate references too. And there are no universal accepted set of abbreviations.

A passage of multiple references increases complexity. For example, Gen 1-2 refers to all of Genesis chapter 1 and 2. While, Gen 1:1-2 refers to the first 2 verses of Genesis. Most people would assume that Gen 1 3 would refer to Gen 1:3.

Details, Details

Token stream

The first thing that is done with a passage is to break it on whitespace into a token stream. These are then fed into the passage recognizer, beginning at the Gold circle. So for a passage to be recognized it needs to start with a book name.

Book name recognition

The names of most of the books of the Bible are a single word, such as John. But some are several words, such as Song of Songs, which is also known as Canticle of Canticles. Some are part of a numbered series: 1 Corinthians or II Corinthians. In some languages the number is followed by a '.' as in 1. Moses. Names may be abbreviated by simple truncation, such as 1 Cor or 1co. Or abbreviated in other ways Jn for John, Jas for James, SOS for Songs. Names may also have internal abbreviations, such as Revelation of St. John. Abbreviations may or may not be followed by a '.'

Chapter recognition

When a number follows a book name, it seems fairly easy to know that it is a chapter. However, if is for a single chapter book, it probably is the verse, as in Jude 3. So Jude 1 could either all of Jude as in Jude, chapter 1, or to the first verse of Jude. As a practical matter, Jude 1 should be disambiguated to the first verse of Jude. But it is equally permissible for it to be written as Jude 1:3.

A chapter may also be prefixed with "ch" or "chap" to make it explicit that the number following is a chapter. Note that these abbreviations are fixed, but not case sensitive and are not internationalized. It may be followed by a period.

Verse recognition

When a number follows a chapter number, it almost always is a verse number. But there odd, very rare situatons where it is not, explained below.

A verse may also be prefixed with "v" or "ver" to make it explicit that the number following is a verse. Note that these abbreviations are fixed, but not case sensitive and are not internationalized. It may be followed by a period.

Ranged References

A biblical reference may specify a range using a '-' to separate two simple references. The range starts with the first verse in the first reference and ends with the last verse of the second reference. The references may be of any kind; book alone; book and chapter; or book, chapter and verse.

A special range marker is 'ff', which means to the end. It can only follow a chapter, where it means the last chapter in a book, or a verse where it means the last verse in a chapter. It may or may not be preceded by a '-'. Examples include: Gen 22-ff, Gen 22ff, Gen 2:3-FF.

Non-Contiguous References.

Also a biblical reference may contain books, chapters or verses that are not next to each other. For example Gen 1, 3, would refer to chapters 1 and 3 of Genesis. Gen 1:3,5 would refer to verses 3 and 5 of Genesis chapter 1. Gen 1:3,2:5 refers to 2 verses in 2 chapters. Gen, Deut would refer to the entire books of Genesis and Deuteronomy.

It is perfectly valid to write a range consisting of adjacent numbers as a Non-Contiguous Reference, e.g. Gen 1:4,5.

Passage Punctuation

In a passage, punctuation has the following meaning:

'-'Specifies a range
',' or ';'Terminates a reference and introduces another.
':'Separates a chapter from a verse.
'.'Separates a chapter from a verse when it follows a chapter.
Otherwise, it denotes an abbreviation.
OtherTreated as a space

Multiple spaces are treated as one. Other than '-', the others are optional.

An example of a fully formed passage: 1 Ptr 3:3-5,7;Rev 9-10:4,5,14:2

Disabmiguation

With punctuation being mostly optional, this will lead to ambiguity. Rules are needed to disambiguate. The following principles should be practiced.

A number belongs to the book name that follows, if it requires a number to be successful.
e.g. John 2 Peter 1:4 becomes John; 2 Peter 1:4 since John 2; Peter 1:4 would fail.
However, John 2 John 3:4 must be John 2, John 3:4, since 2 John only has 1 chapter.
But, John 2 John 12 could be either John; 2 John 12 or John 2; John 12. The latter will be chosen because the 2 is not necessary to go with John 12.

A number that follows a chapter could be understood as either a chapter or a verse, as in Gen 2 3. However, we will take it as the verse, becoming Gen 2:3, not Gen 2,3.

A number that follows a verse is another verse number, as Gen 2:3 4 becomes Gen 2:3,4. Unless of course it is part of the name that follows, as in Gen 2:3 3 Peter 2, which becomes Gen 2:3; 3 Pet 2.