[sword-devel] Hesychius

Greg Hellings greg.hellings at gmail.com
Thu Nov 9 22:20:02 MST 2006


And I forgot to mention that I had posted it to the wxSword download
site on Soureforge:
https://sourceforge.net/project/showfiles.php?group_id=142229

Sorry!

--Greg

On 11/10/06, Greg Hellings <greg.hellings at gmail.com> wrote:
> Getting the output from their included wiki export page was the
> trivial portion of the task (read: I had to guess completely judging
> from the directions that were on Wikipedia's site and extrapolate
> those to figure out what name WikiSource actually wanted for each
> page).  Writing the XSLT is proving to be far more cumbersome.  I just
> spent over an hour trying to figure out why my XSLT was not producing
> any output, only to realize that the exported file had a default
> namespace.
>
> It will be incredibly difficult to extract any structural information
> from the files in an automated system.  For one, I am not familiar
> with what Hesychius is, and while I took extensive Greek in my
> undergrad course of study, reading through that massive document would
> be unwieldy for me at this point, since I could not dedicate huge
> amount of time to the work.
>
> For now I have posted an XML file that is the filtered XML that comes
> from the export, with everything except for the page, title and text
> fields removed (since the rest of the information simply pertains to
> who performed the latest modification to the page and when it happened
> and their change log entry).  I have also modified all of the &gt; and
> &lt; to be > and < in an effort to return the data to its display
> format.
>
> Someone will need to figure out how to differentiate when the < or >
> is pertinent to the HTML/XML or when it is pertinent to the more
> specific data within.  The WikiSource document seems to make very poor
> use of the < and > characters to both denote a keyword and to
> emphasize certain words or phrases, thus making the data even more
> difficult to parse.  I don't know that a fully automated solution will
> be possible with this data or with the original data... but it's all
> just a starting point.
>
> If you want other files, let me know.
>
> --Greg
>
> On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
> > Greg,
> >         You're amazing!!! I must have played with stuff for hours today trying
> > to make sense from the wikimedia export docs.  I even downloaded some
> > PyWikipediaBot python thingy but couldn't get it to run either (I am
> > inept at python, so I wasn't surprised, though quite frustrated,
> > nonetheless).   Thank you!!!  If this might make any difference, my
> > personal interest in the lexicon, after it is usable by SWORD, is to
> > build a synonyms database from the data.  If there is any indication in
> > the data that a synonym for an entry is being listed, I would most
> > appreciate a unique <seg type="x-synonym>, or some such.  Thank you
> > again, so much, for your work.  I am very excited!
> >
> >         -Troy.
> >
> >
> >
> > Greg Hellings wrote:
> > > So yeah... I managed to grab the XML file from the Export (it's fun
> > > trying to do that on a webpage written in modern Greek when you're
> > > used to ancient Greek and you can't remember what the Koine word for
> > > "hyperlink" or "webpage is" :P).
> > >
> > > It comes to a mere 4.2 MB file, so now the trick will be parsing the
> > > text that is wanted out of that and creating an OSIS from it.  The
> > > main problem with that is that the text from the file is placed inside
> > > of a tag with xml:space="preserve" attribute, and all of the HTML is
> > > encoded as entities underneath of that.  Therefore all of the
> > > structure of the actual data (other than the large groupings under
> > > alpha, beta, gamma, etc) is lost to an XML/XSL parsing combination.
> > >
> > > Wish me luck... ::dives into a pile of libxml2::
> > >
> > > --Greg Hellings
> > >
> > > On 11/9/06, Troy A. Griffitts <scribe at crosswire.org> wrote:
> > >> We had a contributer on IRC, today, post this link:
> > >>
> > >> http://el.wikisource.org/wiki/%CE%93%CE%BB%E1%BF%B6%CF%83%CF%83%CE%B1%CE%B9
> > >>
> > >>
> > >> It looks promising.
> > >>
> > >> I know there is a way to download content in XML of a mediawiki site,
> > >> but have no experience doing so.
> > >>
> > >> Anyone want to take a shot at producing a SWORD Hesychius Lexicon, (or
> > >> even just a text file from this link?
> > >>
> > >>
> > >> Thanks for everyone's input and help.
> > >>
> > >>         -Troy.
> > >>
> > >>
> > >>
> > >> Peter von Kaehne wrote:
> > >>> I spoke yesterday both to Prof Hansen and to Prof Ian Cunningham (who is a collaborator of Hansen)
> > >>>
> > >>> http://www.csad.ox.ac.uk/CSAD/Hesychius/Hansen.html
> > >>>
> > >>> Prof Hansen mentioned the TLG and Prof Cunningham confirmed this + said further there is no electronic version of Hansen's work available. I understand that Hansen's work is published in de Gruyters' Sammlung Griechischer and Lateinischer Altertuemer
> > >>>
> > >>> http://www.degruyter.com/rs/174_AT_E_ED_ENU_h.cfm?rc=19992&id=SER-M1-WDG-HESYCH-B-19992&fg=AT
> > >>>
> > >>> - a copy of which I found here to buy:
> > >>>
> > >>> http://www.basis-buch.de/main-173503.html
> > >>>
> > >>> WRT the TLG. I read the licence in detail and bluntly said, they have no leg to stand upon to deny us using the texts:
> > >>>
> > >>> They already allowed us to do what we want to do on the base of the licence - even if they get now cold feet on direct questioning. That said, at least Schmidts edition is now public domain anyway and unless there are DMCA-restrictions everyone can copy it out of there anyway.  And outside of DMCA -alike legislation only the public domain-ness woudl appliy anyway.But IANAL etc.
> > >>>
> > >>> Wrt Latte/Hansen- I am not sure how far Latte's work would constitute an original work in its own right - I presume it does - but again the TLG licence does allow text extraction for scholarly work which is non-commercial.
> > >>>
> > >>> Peter
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> -------- Original-Nachricht --------
> > >>> Datum: Fri, 03 Nov 2006 17:23:03 -0700
> > >>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> > >>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> > >>> Betreff: Re: [sword-devel] Hesychius
> > >>>
> > >>>> Peter,
> > >>>>      Thank you for your time and info.  We have an ongoing dialog with UCI
> > >>>> regarding the use of the data from TLG.  They have denied our request
> > >>>> twice, but I am hoping a detailed third plea might solicit sympathy.
> > >>>>
> > >>>>      -Troy.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Peter von Kaehne wrote:
> > >>>>> The TLG has though also the older edition by Schmidt which should be by
> > >>>> now public domain as it is 1861
> > >>>>> Peter
> > >>>>>
> > >>>>> -------- Original-Nachricht --------
> > >>>>> Datum: Fri, 03 Nov 2006 15:59:02 +0100
> > >>>>> Von: "Peter von Kaehne" <refdoc at gmx.net>
> > >>>>> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> > >>>>> Betreff: Re: [sword-devel] Hesychius
> > >>>>>
> > >>>>>> The TLG indeed contains parts of the Hesychius - Latte's work only.
> > >>>>>>
> > >>>>>> Hansen's work is published on paper only in Germany. Electronic copies
> > >>>> are
> > >>>>>> not available.
> > >>>>>>
> > >>>>>> The TLG licence of the text is so that the work might be possible to
> > >>>>>> integrate - ie.e. commecial scholarly tools making use of teh whole
> > >>>> text are
> > >>>>>> forbidden but crosswire might be possible.
> > >>>>>>
> > >>>>>> HTH
> > >>>>>>
> > >>>>>> Peter
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -------- Original-Nachricht --------
> > >>>>>> Datum: Thu, 02 Nov 2006 16:38:36 -0700
> > >>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> > >>>>>> An: sword-devel at crosswire.org
> > >>>>>> Betreff: [sword-devel] Hesychius
> > >>>>>>
> > >>>>>>> If anyone has the time to research where we can find an electronic
> > >>>> copy
> > >>>>>>> of Hesychius' Greek Lexicon, your efforts would be extremely valuable
> > >>>> to
> > >>>>>>> me right now.  I believe the TLG has a copy of it, but I currently
> > >>>> don't
> > >>>>>>> have easy access to the TLG.  Thanks in advance.
> > >>>>>>>
> > >>>>>>>   -Troy.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> sword-devel mailing list: sword-devel at crosswire.org
> > >>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> > >>>>>>> Instructions to unsubscribe/change your settings at above page
> > >>>>>> --
> > >>>>>> GMX DSL-Flatrate 0,- Euro* - Überall, wo DSL verfügbar ist!
> > >>>>>> NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> sword-devel mailing list: sword-devel at crosswire.org
> > >>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> > >>>>>> Instructions to unsubscribe/change your settings at above page
> > >>>> _______________________________________________
> > >>>> sword-devel mailing list: sword-devel at crosswire.org
> > >>>> http://www.crosswire.org/mailman/listinfo/sword-devel
> > >>>> Instructions to unsubscribe/change your settings at above page
> > >>
> > >> _______________________________________________
> > >> sword-devel mailing list: sword-devel at crosswire.org
> > >> http://www.crosswire.org/mailman/listinfo/sword-devel
> > >> Instructions to unsubscribe/change your settings at above page
> > >>
> > >
> > > _______________________________________________
> > > sword-devel mailing list: sword-devel at crosswire.org
> > > http://www.crosswire.org/mailman/listinfo/sword-devel
> > > Instructions to unsubscribe/change your settings at above page
> >
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> >
>



More information about the sword-devel mailing list