[sword-devel] Turkish to English glossary problem
DM Smith
dmsmith at crosswire.org
Tue Jan 5 10:13:34 MST 2016
Thanks David and Peter.
This module ERtr_en has been around since 2002. No one has complained. Every one of the first 26 examples are wrong. (according to Google translate) Even the one for Christ! I’m of the opinion that the module should be taken down as being bad and not of value. I don’t have the time or inclination to update it.
I found the Ergane website from where the data was obtained. The data is still there, but I can’t run the Ergane 8.0 windows software under Windows 10 to see the data. So I can’t tell if it is a useful source. The latest date (2005) is more recent than our module. So, according to our current practices, this should be gotten from upstream, not downstream.
I went to wordgumbo.com <http://wordgumbo.com/> via the Internet Wayback Machine (as the site is currently unavailable) and found the listing. There it is showing aguustos: 1. August. When I view the source I see a<!u 011f>g<sup>u</sup>ustos. The <!u 011f> is my browser’s debugger’s representation for the non-printing character.
I looked up 011F and it is the code for ğ in the MacTurkish encoding. 011E is for Ğ. Respectively these are codepoints 219 and 218 in UTF-8. I didn’t look at other encodings.
Here is a summary of the number of times that this occurs for the number of entries.
[ERtr_en] Ergane Turkish to English Glossary
Entries=1029 Errors=47
The other LexDict modules that are exhibiting the same problems (invalid UTF-8 character):
[EReo_en] Ergane Esperanto to English Glossary
Entries=16852 Errors=1118
[ERja_en] Ergane Japanese to English Glossary
Entries=552 Errors=21
[ERpo_en] Ergane Polish to English Glossary
Entries=2026 Errors=650
[ERro_en] Ergane Romanian to English Glossary
Entries=1218 Errors=264
[ERru_en] Ergane Russian to English Glossary
Entries=1225 Errors=100
I’m also debugging a different set of problems I’m having with [ERde_en] Ergane German to English Glossary.
I’m suspicious about all the Ergane modules built from WordGumbo.
In Him,
DM
> On Jan 5, 2016, at 10:33 AM, David Haslam <dfhmch at googlemail.com> wrote:
>
> For the word you cited as an example, the missing letter would be Ğ.
>
> The word is AĞUSTOS = Ağustos = the name of the month August.
>
> Given enough samples, it may be feasible to reconstruct the mapping table
> without external help.
>
> This site may help.
>
> http://en.bab.la/dictionary/turkish-english/a%C4%9Fustos
>
> Regards,
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Turkish-to-English-glossary-problem-tp4655585p4655588.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20160105/fee7d210/attachment-0001.html>
More information about the sword-devel
mailing list