[sword-devel] Hesychius

Greg Hellings greg.hellings at gmail.com
Thu Nov 9 22:15:48 MST 2006


Getting the output from their included wiki export page was the
trivial portion of the task (read: I had to guess completely judging
from the directions that were on Wikipedia's site and extrapolate
those to figure out what name WikiSource actually wanted for each
page).  Writing the XSLT is proving to be far more cumbersome.  I just
spent over an hour trying to figure out why my XSLT was not producing
any output, only to realize that the exported file had a default
namespace.

It will be incredibly difficult to extract any structural information
from the files in an automated system.  For one, I am not familiar
with what Hesychius is, and while I took extensive Greek in my
undergrad course of study, reading through that massive document would
be unwieldy for me at this point, since I could not dedicate huge
amount of time to the work.

For now I have posted an XML file that is the filtered XML that comes
from the export, with everything except for the page, title and text
fields removed (since the rest of the information simply pertains to
who performed the latest modification to the page and when it happened
and their change log entry).  I have also modified all of the > and
&lt; to be > and < in an effort to return the data to its display
format.

Someone will need to figure out how to differentiate when the < or >
is pertinent to the HTML/XML or when it is pertinent to the more
specific data within.  The WikiSource document seems to make very poor
use of the < and > characters to both denote a keyword and to
emphasize certain words or phrases, thus making the data even more
difficult to parse.  I don't know that a fully automated solution will
be possible with this data or with the original data... but it's all
just a starting point.

If you want other files, let me know.

--Greg

On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
> Greg,
>         You're amazing!!! I must have played with stuff for hours today trying
> to make sense from the wikimedia export docs.  I even downloaded some
> PyWikipediaBot python thingy but couldn't get it to run either (I am
> inept at python, so I wasn't surprised, though quite frustrated,
> nonetheless).   Thank you!!!  If this might make any difference, my
> personal interest in the lexicon, after it is usable by SWORD, is to
> build a synonyms database from the data.  If there is any indication in
> the data that a synonym for an entry is being listed, I would most
> appreciate a unique <seg type="x-synonym>, or some such.  Thank you
> again, so much, for your work.  I am very excited!
>
>         -Troy.
>
>
>
> Greg Hellings wrote:
> > So yeah... I managed to grab the XML file from the Export (it's fun
> > trying to do that on a webpage written in modern Greek when you're
> > used to ancient Greek and you can't remember what the Koine word for
> > "hyperlink" or "webpage is" :P).
> >
> > It comes to a mere 4.2 MB file, so now the trick will be parsing the
> > text that is wanted out of that and creating an OSIS from it.  The
> > main problem with that is that the text from the file is placed inside
> > of a tag with xml:space="preserve" attribute, and all of the HTML is
> > encoded as entities underneath of that.  Therefore all of the
> > structure of the actual data (other than the large groupings under
> > alpha, beta, gamma, etc) is lost to an XML/XSL parsing combination.
> >
> > Wish me luck... ::dives into a pile of libxml2::
> >
> > --Greg Hellings
> >
> > On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
> >> We had a contributer on IRC, today, post this link:
> >>
> >> http://el.wikisource.org/wiki/%CE%93%CE%BB%E1%BF%B6%CF%83%CF%83%CE%B1%CE%B9
> >>
> >>
> >> It looks promising.
> >>
> >> I know there is a way to download content in XML of a mediawiki site,
> >> but have no experience doing so.
> >>
> >> Anyone want to take a shot at producing a SWORD Hesychius Lexicon, (or
> >> even just a text file from this link?
> >>
> >>
> >> Thanks for everyone's input and help.
> >>
> >>         -Troy.
> >>
> >>
> >>
> >> Peter von Kaehne wrote:
> >>> I spoke yesterday both to Prof Hansen and to Prof Ian Cunningham (who is a collaborator of Hansen)
> >>>
> >>> http://www.csad.ox.ac.uk/CSAD/Hesychius/Hansen.html
> >>>
> >>> Prof Hansen mentioned the TLG and Prof Cunningham confirmed this + said further there is no electronic version of Hansen's work available. I understand that Hansen's work is published in de Gruyters' Sammlung Griechischer and Lateinischer Altertuemer
> >>>
> >>> http://www.degruyter.com/rs/174_AT_E_ED_ENU_h.cfm?rc=19992&id=SER-M1-WDG-HESYCH-B-19992&fg=AT
> >>>
> >>> - a copy of which I found here to buy:
> >>>
> >>> http://www.basis-buch.de/main-173503.html
> >>>
> >>> WRT the TLG. I read the licence in detail and bluntly said, they have no leg to stand upon to deny us using the texts:
> >>>
> >>> They already allowed us to do what we want to do on the base of the licence - even if they get now cold feet on direct questioning. That said, at least Schmidts edition is now public domain anyway and unless there are DMCA-restrictions everyone can copy it out of there anyway.  And outside of DMCA -alike legislation only the public domain-ness woudl appliy anyway.But IANAL etc.
> >>>
> >>> Wrt Latte/Hansen- I am not sure how far Latte's work would constitute an original work in its own right - I presume it does - but again the TLG licence does allow text extraction for scholarly work which is non-commercial.
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -------- Original-Nachricht --------
> >>> Datum: Fri, 03 Nov 2006 17:23:03 -0700
> >>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> >>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> >>> Betreff: Re: [sword-devel] Hesychius
> >>>
> >>>> Peter,
> >>>>      Thank you for your time and info.  We have an ongoing dialog with UCI
> >>>> regarding the use of the data from TLG.  They have denied our request
> >>>> twice, but I am hoping a detailed third plea might solicit sympathy.
> >>>>
> >>>>      -Troy.
> >>>>
> >>>>
> >>>>
> >>>> Peter von Kaehne wrote:
> >>>>> The TLG has though also the older edition by Schmidt which should be by
> >>>> now public domain as it is 1861
> >>>>> Peter
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>> Datum: Fri, 03 Nov 2006 15:59:02 +0100
> >>>>> Von: "Peter von Kaehne" <refdoc at gmx.net>
> >>>>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> >>>>> Betreff: Re: [sword-devel] Hesychius
> >>>>>
> >>>>>> The TLG indeed contains parts of the Hesychius - Latte's work only.
> >>>>>>
> >>>>>> Hansen's work is published on paper only in Germany. Electronic copies
> >>>> are
> >>>>>> not available.
> >>>>>>
> >>>>>> The TLG licence of the text is so that the work might be possible to
> >>>>>> integrate - ie.e. commecial scholarly tools making use of teh whole
> >>>> text are
> >>>>>> forbidden but crosswire might be possible.
> >>>>>>
> >>>>>> HTH
> >>>>>>
> >>>>>> Peter
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -------- Original-Nachricht --------
> >>>>>> Datum: Thu, 02 Nov 2006 16:38:36 -0700
> >>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> >>>>>> An: sword-devel at crosswire.org
> >>>>>> Betreff: [sword-devel] Hesychius
> >>>>>>
> >>>>>>> If anyone has the time to research where we can find an electronic
> >>>> copy
> >>>>>>> of Hesychius' Greek Lexicon, your efforts would be extremely valuable
> >>>> to
> >>>>>>> me right now.  I believe the TLG has a copy of it, but I currently
> >>>> don't
> >>>>>>> have easy access to the TLG.  Thanks in advance.
> >>>>>>>
> >>>>>>>   -Troy.
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> sword-devel mailing list: sword-devel at crosswire.org
> >>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> >>>>>>> Instructions to unsubscribe/change your settings at above page
> >>>>>> --
> >>>>>> GMX DSL-Flatrate 0,- Euro* - Überall, wo DSL verfügbar ist!
> >>>>>> NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> sword-devel mailing list: sword-devel at crosswire.org
> >>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> >>>>>> Instructions to unsubscribe/change your settings at above page
> >>>> _______________________________________________
> >>>> sword-devel mailing list: sword-devel at crosswire.org
> >>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> >>>> Instructions to unsubscribe/change your settings at above page
> >>
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel at crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >>
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list