[osis-users] Topic Maps
Patrick Durusau
patrick at durusau.net
Fri Jan 15 17:05:54 MST 2010
Troy,
Troy A. Griffitts wrote:
> Patrick? Did I get your attention? :)
>
>
Yes!
> We've all wanted to, and been talking about, marking up and disambiguating Proper Names in texts for quite a while now. I've been requested by a scholar to provide some statistical data which gives rise to this topic again.
>
> If we wanted to mark, say, place names with a disambiguity id, how would you suggest we do so?
>
> First, if I have a morphologically tagged text, if the morph system allows, I might have a designation like:
>
> <w morph="someSystem:properName">Place</w>
>
> But this will not tell me if it is a place, person, organization, etc. Even if it told me it was a place, is that place a city, river, mountain, etc?
>
> So, Topic Map, gurus... Give me some concrete examples of markup to solve this, or I'm planning to simply create a new lemma/morph system pair and mark this up like:
>
> <w
> lemma="propnames:jerusalem1"
> morph="propnamestypes:geo-city">Jerusalem</w>
>
> So save me from my ignorance :)
>
>
Not a question of ignorance! Not by any means!
The question is one of how much information do you want to store in the
identifier that appears when you mark a reference to a subject?
Take your example:
<w
subjectIdentifier="http://www.crosswire.org/names/jerusalem">Jerusalem</w>
Elsewhere, there is a topic in a topic map that has that same
subjectIdentifier property and it is a records that the subject it
represents, is an instance of type place, along with names for it in
other languages and any other information you want to record about that
subject.
The key is the use of a subjectIdentifier to identify the subject. Why?
Because someone else, in another Bible project may have:
<w
subjectIdentifier="htttp//www.otherproject.org/geonames/israel/jerusalem">Jerusalem</w>
Now what?
Well, any topic can have a *set* of subjectIdentifier properties which
signals that both subjectIdentifiers identify the same subject.
(Note I have used the XTM syntax for the attributes but it would be
possible to declare equivalent subject identifiers even if they were in
different formats or structures. I am working on an example using XQuery
to make that point. Probably won't be ready for a week or so. My main
system died last night but due to disk mirroring and paying a lot of
money, I got it back late this afternoon.)
That will allow you to disambiguate all the names as well as to add far
more information that you could possibly put in an attribute. Such as
marking the morphology of a lemma and displaying for a user the
distribution of that lemma over a book or range of books. (Assuming you
represented all of those as occurrences or even associations with
explicit roles if you liked.
Yes, I have been thinking about topic maps and biblical texts a lot. ;-)
Hope you are having a great day!
Patrick
--
Patrick Durusau
patrick at durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
More information about the osis-users
mailing list