[sword-devel] Open Hebrew Lexicon. (David Troidl), (Daniel Owens)
Daniel Owens
dhowens at pmbx.net
Sun Sep 4 19:48:44 MST 2011
Snip...
On 09/04/2011 01:12 PM, David Troidl wrote:
> Hi Aaron,
>
> On 9/4/2011 10:50 AM, Aaron Christianson wrote:
>> Daniel,
>> This does sound very much like what I am interested in doing, but
>> unfortunately, you seem to be using WeSay, which appears to have some
>> deficiencies in it's Linux versions that will make it unusable for
>> editing a work of this kind (no support for non-latin scripts, and
>> issues copying and pasting non-ascii characters). I'm afraid that I
>> use Linux exclusively, and my ability to contribute to this project
>> would be severely limited.
Yes, for a Linux-only or a Mac person, this is a significant problem. I
am a mainly-Linux person, and I am waiting eagerly for WeSay to be fully
functional in Linux (without holding my breath). The reason I chose
WeSay was to encourage non-techies with an easy-to-use application that
supports structured collaboration using a version control system. It
works great with unicode in Windows, handles multiple contributors
easily, and is developed by people trained and experienced in creating
lexica. One additional useful feature is that it offers the ability to
add semantic domain information. However, for our purposes WeSay is
basically limited to Windows at this point.
> I was going to write to Daniel privately, but maybe this is a topic
> that needs to be brought up here. My concern is the proliferation of
> formats, trying to accomplish the same thing. With Daniel's LIFT
> dictionary, the SWORD TEI-based lexicon format, whatever you would use
> and my ad hoc schema, all with similar goals, there could be a lot of
> duplication of effort.
>
Yes, I also don't like the idea of duplicating efforts.
> I made my schema just to get into the work, and with the intention of
> making it easy to transform to another format, when there was
> something better. I know that the TEI could handle all the
> requirements, but it's huge and forbidding. The SWORD format examples
> I've seen appear dense and hard to understand. I'm not certain if it
> has all the capabilities my lexicon needs. I was going to ask Daniel
> if his LIFT dictionary could handle it all, and what would be required
> to transform between the two. Also if his setup could import
> transformed entries. Now if WeSay is a problem with Linux, is that
> insurmountable? Could the LIFT dictionary be used in another
> context? Or what other format would be better?
On formats: SWORD's implementation of TEI for a lexicon is probably not
the best format. At least I have not considered it to be a good format
for creating a lexicon. I chose LIFT XML because it is a format that
several SIL programs use (WeSay and FieldWorks). It is designed for
lexica, so I imagine it can handle anything we need. WeSay allows you to
create custom fields, which makes it easy to work with. LIFT is just an
XML standard, so there is nothing to prevent one from creating an
application to write to a LIFT XML file.
On applications: I have been ruminating on the problem of WeSay being
Windows-only and wondering if a browser-based solution written in PHP or
something like that would be a "quick" solution for Mac and Linux users.
The PHP code and LIFT file could reside on the contributor's machine
with Mercurial negotiating the differences with the server. That would
mean the PHP program would have to be written to work well with WeSay,
which could be a job in itself. I just don't have the time or expertise
to pull it off. But if someone could do that, it would open up
possibilities for contributors.
Our project is moving so slowly that I am open to changing the way we do
it. Data format questions aside, the following features are needed for
an interface for developing a Hebrew lexicon:
* Support RtoL Unicode
* Easy to use for non-techies (virtually brainless, if possible)
* Changes stored using a version control system allowing for
collaboration
* Support features that are commonly accepted as good linguistic
practice, such as semantic domains
* Customizable for our needs
So far WeSay works the best for that, but it is limited to Windows. I am
open to new ideas.
Daniel
More information about the sword-devel
mailing list