<div dir="ltr">The aggregation into a large USFM file is 1 command line (cat .\*\*.dat > shona.sfm) . <br><br>Splitting that into standard book files is 1 more command (csplit /\\id / shona.sfm) . <div><br></div><div>You need to check each \c tag has it's own line , incase the chapter files end abnormally without a final newline/return. <br><br>However, you end up with files numbered 001.dat, 002.dat that then need to be renamed. still trivial, but measured in minutes not seconds. </div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 29, 2017 at 11:27 AM, David Haslam <span dir="ltr"><<a href="mailto:dfhmch@googlemail.com" target="_blank">dfhmch@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Teus has since added all the missing *\toc#* markers to the Shona<br>
<<a href="https://github.com/teusbenschop/shona" rel="noreferrer" target="_blank">https://github.com/<wbr>teusbenschop/shona</a>> repo.<br>
<br>
After the last commit, the USFM tag statistics were as follows:<br>
<br>
Count SFM tag Description (updated for USFM 3.0)<br>
----- -------- ------------------------------<wbr>-----<br>
04948 \add Translator's added words begin<br>
04948 \add* Translator's added words end<br>
01189 \c Chapter<br>
00066 \h Running header (h=h1)<br>
00066 \id Identification<br>
00065 \mt Major title (mt=mt1)<br>
00001 \mt1 Major title (portion 1)<br>
00031 \mt2 Major title (portion 2)<br>
00009 \nb No break with previous paragraph<br>
06445 \p Paragraph<br>
00066 \rem Remark<br>
01774 \s Section heading (s=s1)<br>
00066 \toc1 Table of contents 1 (Long table of contents text)<br>
00066 \toc2 Table of contents 2 (Short table of contents text)<br>
00066 \toc3 Table of contents 3 (Book abbreviation)<br>
31102 \v Verse[s]<br>
15739 \x Cross reference element begin<br>
15739 \x* Cross reference element end<br>
<br>
Observation:<br>
The data structure in the GitHub repository is not one USFM file per book,<br>
but one [USFM] data file per chapter, each in a suitable numbered directory,<br>
plus a separate data file (in directory 0) for the USFM header lines.<br>
<br>
In order to convert the text to OSIS, some preprocessing would be required<br>
to get the source text to one USFM file per book (as used by ParaTExt).<br>
<br>
Best regards,<br>
<br>
David<br>
<br>
<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://sword-dev.350566.n4.nabble.com/Module-upload-Shona-tp4657457p4657513.html" rel="noreferrer" target="_blank">http://sword-dev.350566.n4.<wbr>nabble.com/Module-upload-<wbr>Shona-tp4657457p4657513.html</a><br>
<div class="HOEnZb"><div class="h5">Sent from the SWORD Dev mailing list archive at Nabble.com.<br>
<br>
______________________________<wbr>_________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br>
</div></div></blockquote></div><br></div>