[osis-users] Topic Maps
Troy A. Griffitts
scribe at crosswire.org
Fri Jan 15 21:26:05 MST 2010
Thanks for all the useful info Patrick. I hope you've gone to sleep and are reading this in the morning. Writing from my phone so must be rude and top-post...
Well, I've been asked to produce some statistical data about authors' use of place names. I'd like to report which letters, authors, genre, etc. have higher concentrations of place names. This is a simple computation if the data is marked up correctly.
Would a topic map retain all referrer information from a base text?
Thanks again Patrick!
Troy
Patrick Durusau <patrick at durusau.net> wrote:
>Troy,
>
>Just quickly because it is way past my bedtime!
>
>Troy A. Griffitts wrote:
>> Thanks Patrick. So had we planned a subjectIdentifier attribute on
>> either <w> or <name> (as Peter pointed out we added likely for proper
>> name indication)?
>>
>> Steve, do you remember our discussion when we added marker to the <q>
>> attribute, when we talked about a generalized defaulting mechanism
>> which would allow the header to contain things like:
>>
>> <default>//q[@level="1"]/@marker='"'</default>
>> <default>//q[@level="2"]/@marker="'"</default>
>> <default>//w[@lemma="([^:]*)"]/@lemma="strong:\1"</default>
>>
>> Anyway, I was just wondering what happened to this idea? I'm not sure
>> I'd want to implement a fullblown xquery parser like what would be
>> required in my example above, but some basic defaulting mechanism
>> would still be nice.
>>
>> Patrick, in your example, I'd like to be able to say something like:
>>
>> <default>//w[@subjectIdentifier="(.*)"]/@subjectIdentifier="http://crosswire.org/names/\1"</default>
>>
>>
>> so I could simply use in my doc:
>>
>> <w subjectIdentifier="jerusalem1">Jerusalem</w>
>>
>>
>> But this is merely to clean up my markup in the event our docs are
>> ever opened in an editor by a human, and to potentially prevent errors
>> when hand editing. Sorry, I just like to factor stuff out when possible.
>>
>>
>> Patrick Durusau wrote:
>>> The question is one of how much information do you want to store in
>>> the identifier that appears when you mark a reference to a subject?
>>
>> Yes, having this level of indirection that a subjectIdentifier
>> provides serves a great purpose and is perfect if I'm 'at' an element
>> I want to dig deeper into. But my current objective is to find all
>> place names in a document, which would require me to dereference each
>> identifier, querying the referent for the 'type' of each subject,
>> e.g., "geo-city".
>>
>> Hence my poorly applied lemma/morph scheme:
>>
>> <w lemma="placenames:jerusalem1"
>> morph="placenamestype:geo-city">Jerusalem</w>
>>
>> makes processing for my immediate objective easier. You mentioned
>> above that the question is 'how much information' to store in the
>> identifier itself... So is this suggesting a solution like?:
>>
>> <w subjectIdentifier="geo/city/jerusalem1">Jerusalem</w>
>>
>> This would give me what I need to easily process the data (even if we
>> had to specify the full:
>> subjectIdentifier="http://crosswire.org/names/geo/city/jerusalem1")
>>
>Sorry, why would you be parsing the text to find an entry of a
>particular type? Why not query the topic map, which was built by parsing
>the text. That is what information overlays bring to the table. I was
>using the syntax I was just to illustrate how a user could markup a text
>for later use in building a topic map to run over it.
>>
>> Thanks for the discussion on this!
>>
>>
>> I feel your pain. My primary laptop died in December and I purchased
>> a netbooky hp dm3 thingy to hold me over until I could order a
>> replacement. I just finished MOVING all of my data over to this new
>> little thing's large (by comparison to my old system) 320Gig drive and
>> days later the new drive crashed. Now I'm booting Ubuntu on the new
>> computer with my old 100Gig drive plugged into the USB port (old drive
>> is PATA, new computer is SATA) until my real laptop replacement gets
>> here. And all my data on the 320Gig new drive is lost! I was picking
>> and choosing folders from my old drive and did moves instead of copies
>> so I could remember what I had already grabbed. Stupid me. Did you
>> find an affordable data recovery service?
>>
>No, I have talked to them but never actually used one of them.
>
>I was running mirrored drives so that helped avoid data loss but not the
>down time.
>
>I have an external backup system that should arrive tomorrow that claims
>you can have a constant backup and should your primaries fail, you can
>plug the backup solution into another computer and boot from the
>external usb drive. I won't trust that until I see it done but that
>would be neat.
>
>Still running mirrored drives with that on top of it. Data loss is
>always possible but with that plus copies to another drive I have of the
>critical stuff, the chances should be remote.
>
>Sorry to hear about the new drive! There are a lot of things they can do
>at the data recover services. Not cheap but doable.
>
>Hope you get good news on your drive real soon now!
>
>Patrick
>>
>> Troy
>>
>>
>>
>>
>>
>>>
>>> Take your example:
>>>
>>> <w
>>> subjectIdentifier="http://www.crosswire.org/names/jerusalem">Jerusalem</w>
>>>
>>>
>>> Elsewhere, there is a topic in a topic map that has that same
>>> subjectIdentifier property and it is a records that the subject it
>>> represents, is an instance of type place, along with names for it in
>>> other languages and any other information you want to record about
>>> that subject.
>>>
>>> The key is the use of a subjectIdentifier to identify the subject. Why?
>>>
>>> Because someone else, in another Bible project may have:
>>>
>>> <w
>>> subjectIdentifier="htttp//www.otherproject.org/geonames/israel/jerusalem">Jerusalem</w>
>>>
>>>
>>> Now what?
>>>
>>> Well, any topic can have a *set* of subjectIdentifier properties
>>> which signals that both subjectIdentifiers identify the same subject.
>>>
>>> (Note I have used the XTM syntax for the attributes but it would be
>>> possible to declare equivalent subject identifiers even if they were
>>> in different formats or structures. I am working on an example using
>>> XQuery to make that point. Probably won't be ready for a week or so.
>>> My main system died last night but due to disk mirroring and paying a
>>> lot of money, I got it back late this afternoon.)
>>>
>>> That will allow you to disambiguate all the names as well as to add
>>> far more information that you could possibly put in an attribute.
>>> Such as marking the morphology of a lemma and displaying for a user
>>> the distribution of that lemma over a book or range of books.
>>> (Assuming you represented all of those as occurrences or even
>>> associations with explicit roles if you liked.
>>>
>>> Yes, I have been thinking about topic maps and biblical texts a lot. ;-)
>>>
>>> Hope you are having a great day!
>>>
>>> Patrick
>>>
>>
>>
>> _______________________________________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/osis-users
>>
>
>--
>Patrick Durusau
>patrick at durusau.net
>Chair, V1 - US TAG to JTC 1/SC 34
>Convener, JTC 1/SC 34/WG 3 (Topic Maps)
>Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
>Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
>
>
>_______________________________________________
>osis-users mailing list
>osis-users at crosswire.org
>http://www.crosswire.org/mailman/listinfo/osis-users
More information about the osis-users
mailing list