[osis-users] Topic Maps

Patrick Durusau patrick at durusau.net
Fri Jan 15 17:05:54 MST 2010


Troy,

Troy A. Griffitts wrote:
> Patrick? Did I get your attention? :)
>
>   
Yes!
> We've all wanted to, and been talking about, marking up and disambiguating Proper Names in texts for quite a while now. I've been requested by a scholar to provide some statistical data which gives rise to this topic again.
>
> If we wanted to mark, say, place names with a disambiguity id, how would you suggest we do so?
>
> First, if I have a morphologically tagged text, if the morph system allows, I might have a designation like:
>
> <w morph="someSystem:properName">Place</w>
>
> But this will not tell me if it is a place, person, organization, etc. Even if it told me it was a place, is that place a city, river, mountain, etc?
>
> So, Topic Map, gurus... Give me some concrete examples of markup to solve this, or I'm planning to simply create a new lemma/morph system pair and mark this up like:
>
> <w
>   lemma="propnames:jerusalem1"
>   morph="propnamestypes:geo-city">Jerusalem</w>
>
> So save me from my ignorance :)
>
>   
Not a question of ignorance! Not by any means!

The question is one of how much information do you want to store in the 
identifier that appears when you mark a reference to a subject?

Take your example:

<w 
subjectIdentifier="http://www.crosswire.org/names/jerusalem">Jerusalem</w>

Elsewhere, there is a topic in a topic map that has that same 
subjectIdentifier property and it is a records that the subject it 
represents, is an instance of type place, along with names for it in 
other languages and any other information you want to record about that 
subject.

The key is the use of a subjectIdentifier to identify the subject. Why?

Because someone else, in another Bible project may have:

<w 
subjectIdentifier="htttp//www.otherproject.org/geonames/israel/jerusalem">Jerusalem</w>

Now what?

Well, any topic can have a *set* of subjectIdentifier properties which 
signals that both subjectIdentifiers identify the same subject.

(Note I have used the XTM syntax for the attributes but it would be 
possible to declare equivalent subject identifiers even if they were in 
different formats or structures. I am working on an example using XQuery 
to make that point. Probably won't be ready for a week or so. My main 
system died last night but due to disk mirroring and paying a lot of 
money, I got it back late this afternoon.)

That will allow you to disambiguate all the names as well as to add far 
more information that you could possibly put in an attribute. Such as 
marking the morphology of a lemma and displaying for a user the 
distribution of that lemma over a book or range of books. (Assuming you 
represented all of those as occurrences or even associations with 
explicit roles if you liked.

Yes, I have been thinking about topic maps and biblical texts a lot. ;-)

Hope you are having a great day!

Patrick

-- 
Patrick Durusau
patrick at durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) 




More information about the osis-users mailing list