<div dir="ltr">The aggregation into a large USFM file is 1 command line (cat .\*\*.dat &gt; shona.sfm) . <br><br>Splitting that into standard book files is 1 more command (csplit /\\id / shona.sfm) . <div><br></div><div>You need to check each \c tag has it&#39;s own line , incase the chapter files end abnormally without a final newline/return. <br><br>However, you end up with files numbered 001.dat, 002.dat  that then need to be renamed. still trivial, but measured in minutes not seconds. </div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 29, 2017 at 11:27 AM, David Haslam <span dir="ltr">&lt;<a href="mailto:dfhmch@googlemail.com" target="_blank">dfhmch@googlemail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Teus has since added all the missing *\toc#* markers to the  Shona<br>

&lt;<a href="https://github.com/teusbenschop/shona" rel="noreferrer" target="_blank">https://github.com/<wbr>teusbenschop/shona</a>&gt;   repo.<br>

<br>

After the last commit, the USFM tag statistics were as follows:<br>

<br>

Count   SFM tag Description (updated for USFM 3.0)<br>

-----   --------        ------------------------------<wbr>-----<br>

04948   \add    Translator&#39;s added words begin<br>

04948   \add*   Translator&#39;s added words end<br>

01189   \c      Chapter<br>

00066   \h      Running header (h=h1)<br>

00066   \id     Identification<br>

00065   \mt     Major title (mt=mt1)<br>

00001   \mt1    Major title (portion 1)<br>

00031   \mt2    Major title (portion 2)<br>

00009   \nb     No break with previous paragraph<br>

06445   \p      Paragraph<br>

00066   \rem    Remark<br>

01774   \s      Section heading (s=s1)<br>

00066   \toc1   Table of contents 1 (Long  table of contents text)<br>

00066   \toc2   Table of contents 2 (Short table of contents text)<br>

00066   \toc3   Table of contents 3 (Book abbreviation)<br>

31102   \v      Verse[s]<br>

15739   \x      Cross reference element begin<br>

15739   \x*     Cross reference element end<br>

<br>

Observation:<br>

The data structure in the GitHub repository is not one USFM file per book,<br>

but one [USFM] data file per chapter, each in a suitable numbered directory,<br>

plus a separate data file (in directory 0) for the USFM header lines.<br>

<br>

In order to convert the text to OSIS, some preprocessing would be required<br>

to get the source text to one USFM file per book (as used by ParaTExt).<br>

<br>

Best regards,<br>

<br>

David<br>

<br>

<br>

<br>

<br>

<br>

--<br>

View this message in context: <a href="http://sword-dev.350566.n4.nabble.com/Module-upload-Shona-tp4657457p4657513.html" rel="noreferrer" target="_blank">http://sword-dev.350566.n4.<wbr>nabble.com/Module-upload-<wbr>Shona-tp4657457p4657513.html</a><br>

<div class="HOEnZb"><div class="h5">Sent from the SWORD Dev mailing list archive at Nabble.com.<br>

<br>

______________________________<wbr>_________________<br>

sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>

<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a><br>

Instructions to unsubscribe/change your settings at above page<br>

</div></div></blockquote></div><br></div>