[sword-devel] Hesychius

Troy A. Griffitts scribe at crosswire.org
Sun Nov 12 01:01:37 MST 2006


Greg, I've spent a while editing this data and have my progress posted here:

http://crosswire.org/~scribe/hesychius.tar.gz

	-Troy.



Greg Hellings wrote:
> On further inspection it appears that the only HTML formatting that
> appears in the above document is a <div....> .... </div> that
> corresponds with every <text> ... </text> element in the exported XML.
>  Thus all of the angle brackets that appear around anything other than
> text/title/page are brackets that are somehow significantly placed
> around Greek words.
> 
> Perhaps this is the limit of where pure XSLT can take us?  It seems
> that it would be better at this point to process the remaining text
> with something like python or perl and have that generate the desired
> OSIS text, since the OSIS has nothing to do with the XML structure of
> the current document but rather with its textual content?
> 
> This really is my last e-mail tonight...
> 
> --Greg
> 
> On 11/10/06, Greg Hellings <greg.hellings at gmail.com> wrote:
>> And I forgot to mention that I had posted it to the wxSword download
>> site on Soureforge:
>> https://sourceforge.net/project/showfiles.php?group_id=142229
>>
>> Sorry!
>>
>> --Greg
>>
>> On 11/10/06, Greg Hellings <greg.hellings at gmail.com> wrote:
>>> Getting the output from their included wiki export page was the
>>> trivial portion of the task (read: I had to guess completely judging
>>> from the directions that were on Wikipedia's site and extrapolate
>>> those to figure out what name WikiSource actually wanted for each
>>> page).  Writing the XSLT is proving to be far more cumbersome.  I just
>>> spent over an hour trying to figure out why my XSLT was not producing
>>> any output, only to realize that the exported file had a default
>>> namespace.
>>>
>>> It will be incredibly difficult to extract any structural information
>>> from the files in an automated system.  For one, I am not familiar
>>> with what Hesychius is, and while I took extensive Greek in my
>>> undergrad course of study, reading through that massive document would
>>> be unwieldy for me at this point, since I could not dedicate huge
>>> amount of time to the work.
>>>
>>> For now I have posted an XML file that is the filtered XML that comes
>>> from the export, with everything except for the page, title and text
>>> fields removed (since the rest of the information simply pertains to
>>> who performed the latest modification to the page and when it happened
>>> and their change log entry).  I have also modified all of the &gt; and
>>> &lt; to be > and < in an effort to return the data to its display
>>> format.
>>>
>>> Someone will need to figure out how to differentiate when the < or >
>>> is pertinent to the HTML/XML or when it is pertinent to the more
>>> specific data within.  The WikiSource document seems to make very poor
>>> use of the < and > characters to both denote a keyword and to
>>> emphasize certain words or phrases, thus making the data even more
>>> difficult to parse.  I don't know that a fully automated solution will
>>> be possible with this data or with the original data... but it's all
>>> just a starting point.
>>>
>>> If you want other files, let me know.
>>>
>>> --Greg
>>>
>>> On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
>>>> Greg,
>>>>         You're amazing!!! I must have played with stuff for hours today trying
>>>> to make sense from the wikimedia export docs.  I even downloaded some
>>>> PyWikipediaBot python thingy but couldn't get it to run either (I am
>>>> inept at python, so I wasn't surprised, though quite frustrated,
>>>> nonetheless).   Thank you!!!  If this might make any difference, my
>>>> personal interest in the lexicon, after it is usable by SWORD, is to
>>>> build a synonyms database from the data.  If there is any indication in
>>>> the data that a synonym for an entry is being listed, I would most
>>>> appreciate a unique <seg type="x-synonym>, or some such.  Thank you
>>>> again, so much, for your work.  I am very excited!
>>>>
>>>>         -Troy.
>>>>
>>>>
>>>>
>>>> Greg Hellings wrote:
>>>>> So yeah... I managed to grab the XML file from the Export (it's fun
>>>>> trying to do that on a webpage written in modern Greek when you're
>>>>> used to ancient Greek and you can't remember what the Koine word for
>>>>> "hyperlink" or "webpage is" :P).
>>>>>
>>>>> It comes to a mere 4.2 MB file, so now the trick will be parsing the
>>>>> text that is wanted out of that and creating an OSIS from it.  The
>>>>> main problem with that is that the text from the file is placed inside
>>>>> of a tag with xml:space="preserve" attribute, and all of the HTML is
>>>>> encoded as entities underneath of that.  Therefore all of the
>>>>> structure of the actual data (other than the large groupings under
>>>>> alpha, beta, gamma, etc) is lost to an XML/XSL parsing combination.
>>>>>
>>>>> Wish me luck... ::dives into a pile of libxml2::
>>>>>
>>>>> --Greg Hellings
>>>>>
>>>>> On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
>>>>>> We had a contributer on IRC, today, post this link:
>>>>>>
>>>>>> http://el.wikisource.org/wiki/%CE%93%CE%BB%E1%BF%B6%CF%83%CF%83%CE%B1%CE%B9
>>>>>>
>>>>>>
>>>>>> It looks promising.
>>>>>>
>>>>>> I know there is a way to download content in XML of a mediawiki site,
>>>>>> but have no experience doing so.
>>>>>>
>>>>>> Anyone want to take a shot at producing a SWORD Hesychius Lexicon, (or
>>>>>> even just a text file from this link?
>>>>>>
>>>>>>
>>>>>> Thanks for everyone's input and help.
>>>>>>
>>>>>>         -Troy.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Peter von Kaehne wrote:
>>>>>>> I spoke yesterday both to Prof Hansen and to Prof Ian Cunningham (who is a collaborator of Hansen)
>>>>>>>
>>>>>>> http://www.csad.ox.ac.uk/CSAD/Hesychius/Hansen.html
>>>>>>>
>>>>>>> Prof Hansen mentioned the TLG and Prof Cunningham confirmed this + said further there is no electronic version of Hansen's work available. I understand that Hansen's work is published in de Gruyters' Sammlung Griechischer and Lateinischer Altertuemer
>>>>>>>
>>>>>>> http://www.degruyter.com/rs/174_AT_E_ED_ENU_h.cfm?rc=19992&id=SER-M1-WDG-HESYCH-B-19992&fg=AT
>>>>>>>
>>>>>>> - a copy of which I found here to buy:
>>>>>>>
>>>>>>> http://www.basis-buch.de/main-173503.html
>>>>>>>
>>>>>>> WRT the TLG. I read the licence in detail and bluntly said, they have no leg to stand upon to deny us using the texts:
>>>>>>>
>>>>>>> They already allowed us to do what we want to do on the base of the licence - even if they get now cold feet on direct questioning. That said, at least Schmidts edition is now public domain anyway and unless there are DMCA-restrictions everyone can copy it out of there anyway.  And outside of DMCA -alike legislation only the public domain-ness woudl appliy anyway.But IANAL etc.
>>>>>>>
>>>>>>> Wrt Latte/Hansen- I am not sure how far Latte's work would constitute an original work in its own right - I presume it does - but again the TLG licence does allow text extraction for scholarly work which is non-commercial.
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------- Original-Nachricht --------
>>>>>>> Datum: Fri, 03 Nov 2006 17:23:03 -0700
>>>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
>>>>>>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
>>>>>>> Betreff: Re: [sword-devel] Hesychius
>>>>>>>
>>>>>>>> Peter,
>>>>>>>>      Thank you for your time and info.  We have an ongoing dialog with UCI
>>>>>>>> regarding the use of the data from TLG.  They have denied our request
>>>>>>>> twice, but I am hoping a detailed third plea might solicit sympathy.
>>>>>>>>
>>>>>>>>      -Troy.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Peter von Kaehne wrote:
>>>>>>>>> The TLG has though also the older edition by Schmidt which should be by
>>>>>>>> now public domain as it is 1861
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> -------- Original-Nachricht --------
>>>>>>>>> Datum: Fri, 03 Nov 2006 15:59:02 +0100
>>>>>>>>> Von: "Peter von Kaehne" <refdoc at gmx.net>
>>>>>>>>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
>>>>>>>>> Betreff: Re: [sword-devel] Hesychius
>>>>>>>>>
>>>>>>>>>> The TLG indeed contains parts of the Hesychius - Latte's work only.
>>>>>>>>>>
>>>>>>>>>> Hansen's work is published on paper only in Germany. Electronic copies
>>>>>>>> are
>>>>>>>>>> not available.
>>>>>>>>>>
>>>>>>>>>> The TLG licence of the text is so that the work might be possible to
>>>>>>>>>> integrate - ie.e. commecial scholarly tools making use of teh whole
>>>>>>>> text are
>>>>>>>>>> forbidden but crosswire might be possible.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------- Original-Nachricht --------
>>>>>>>>>> Datum: Thu, 02 Nov 2006 16:38:36 -0700
>>>>>>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
>>>>>>>>>> An: sword-devel at crosswire.org
>>>>>>>>>> Betreff: [sword-devel] Hesychius
>>>>>>>>>>
>>>>>>>>>>> If anyone has the time to research where we can find an electronic
>>>>>>>> copy
>>>>>>>>>>> of Hesychius' Greek Lexicon, your efforts would be extremely valuable
>>>>>>>> to
>>>>>>>>>>> me right now.  I believe the TLG has a copy of it, but I currently
>>>>>>>> don't
>>>>>>>>>>> have easy access to the TLG.  Thanks in advance.
>>>>>>>>>>>
>>>>>>>>>>>   -Troy.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>>> --
>>>>>>>>>> GMX DSL-Flatrate 0,- Euro* - Überall, wo DSL verfügbar ist!
>>>>>>>>>> NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>> _______________________________________________
>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>>
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list