Wednesday, June 25, 2008

Update on Using MS Word add-in and DAISY Pipeline to Convert Files

I had posted last week about the adventures I was having in the process of trying to transfer a document using the new MS Word add-in that saves as a DAISY format. This new addition was developed in conjunction with the DAISY Consortium and we should be so thankful to have it available to us.
If you go to the Conversion Tools and Software Page there are some links to players and tools. I read this on the DAISY site under the MS Word ADD-in:

"Microsoft's Word Add-in translates Word documents into DAISY XML..
Further transformation from DAISY XML to a full DAISY file set will be possible using one of the conversion tools such as EasyConverter (Dolphin), eClipseWriter (IRTI), gh Player (gh LLC) and the DAISY Pipeline (DAISY Consortium).
The
"Save as DAISY XML - Microsoft" project area provides current information and related communications about the Microsoft Word add-in, as well as links for downloading the latest version."

I had followed their advice, and since DAISY Pipeline was free, I downloaded it. Make sure you have a current version of JRE (Java Runtime Environment) from Sun Software. You can download the JRE version 6 HERE. With JRE, the Pipeline works great and I got a DAISY formatted file from the Pipeline on my computer.
A problem I had with the conversion using Pipeline was having MS Office '07. To format the Word file, my choices within Pipeline were for starting with a Word '03 xml document. I just re-saved my Word document in the MS '03 xml format and then it could be used by the Pipeline to convert from the Word xml format to a their digital talking book format version.

Now the next challenge:
I tried to play the file in the AMIS reader that is free the AMIS Reader only opens a file with .ncc or .opf format. Mine wasn't that type and so the free reader wasn't working with my newly converted file. When I tried a free trial of the gh Player and it did play the file for me successfully. If you start with a Word '07 document and save it as a DAISY file, iot will open directly in the gh Player by opening a new book and asking it to show all dtbook file types. Your document should show up if you browse to it's location and open it in the player. This bypasses the need for the other file types and using Pipeline, but it means purchasing the player. You can download a free trial version of the gh player to use and test files if you are playing with these applications and see what you get. I am thinking that if I had the free Victor Reader that downloads with a Bookshare account it would work also. If anyone has it and wants to try, let us know.
I am going to keep working on getting a successful working copy from a Word document to a FREE reader using the MS Word add-in. The MS add-in is installed and working properly to convert to a XML file. The Pipeline is installed and working fine to convert the Word file to a Digital Talking Book file - but it doesn't have the file extensions of .ncc (computer numeric control file) or .opf ( an image file - Ebook file commonly a Flipbook file) after it which is needed at this time with AMIS. These formats are new to me and I am learning as I go. Somehow I am still missing a piece. I will continue to work on this and when I get a solution you will be the first to know. If anyone has ideas post them for us.
All the best to you!
Lon

6 comments:

Susabelle said...

I have resisted working with the plugin because it isn't complete. I need a beginning-to-reader solution and this just isn't it...yet. Keep posting, I'm interested in reading about your experiences.

Lon said...

Thanks for your comment. you are right. I went into this thinking I would have a complete end result and was disappointed to see I needed to use converters to take it farther. I still don't quite understand the benefit of the MS add-in when it is all said and done. It is a start even though it isn't, as you say a beginning-to-reader solution. If you are wanting to have it read in ghPlayer, it is recognized in that application directly after the conversion with the MS add-in.
I can have a text to speech reader do that for free, but there aren't the visual adaptations that gh Player has.
I guess it just depends on what the end result is you are wanting. I am definitely still learning about all this and sharing as I go. I would be curious to know what you are wanting it to do that you aren't getting. Maybe we can explore that and someone can use the feedback to get to working on that for you...
Thanks!

Romain Deltour said...
This comment has been removed by the author.
Romain Deltour said...

Lon,

Glad to see you've installed and run the Pipeline successfully!

Let me clarify a few points regarding DAISY standards, the Pipeline and MS Add-in:

1. DAISY Standards

There are two main versions of the DAISY standard: the DAISY 2.02, and DAISY 3 (officially ANSI/NISO Z39.86).

Each of these specifications specify what is a Digital Talking Book (DTB), which is in both case composed by a set of different files.

To make it simple, a DAISY 2.02 full-audio, full-text book is composed of:
- the navigation control center document (ncc.html)
- the text content document (.html)
- a set of audio files (wav, mp3)
- a set of SMIL files (to synchronize the different media, text and audio)

A DAISY 3 full-audio, full-text book is composed of:
- a DTB package file (.opf), which contains metadata and the list of the other files
- a navigation control file (.ncx), basically the table of content
- a DTBook file (.xml) which contains the textual content
- a set of audio files (wav, mp3)
- a set of SMIL files (to synchronize the different media, text and audio)

As you've seen, some players accept a mere DTBook xml file: they use advanced conversion technique and speech synthesis to make a full DTB out of this xml textual content.
Others require a fully built DTB though. DAISY 2.02 players usually expect the ncc.html, while DAISY 3 players will usually expect the .opf file.

2. MS Add-In

The MS DAISY Translator, aka MS "Save as DAISY XML" converts a Word document in a DTBook file. It doesn't produce a full Digital Talking Book yet. That's why this output is only readable in some of the players out there.

The solution, as you found out, is either to use a player that will play a DTBook XML directly using speech synthesis, or to use a conversion tool to make a full DTB out of the DTBook XML file.

For your information, the ability to create a full DTB is planned for a next version of the Word Add-In.

3. Pipeline

To create a DTB from Word with the Pipeline, you have two options:
- Use the built in Word 2003 XML to DTBook converter to produce the DTBook, then use Pipeline Narrator to produce DTBs from the DTBook.
- Use the MS "Save as DAISY XML" Add-In to produce the DTBook xml, then use the Pipeline Narrator to produce DTBs from the DTBook.

Note that the Narrator will create 3 books: a DAISY 2.02 book, a DAISY 3 book with wav-encoded audio, and a DAISY 3 book with mp3-encoded audio.
You should be able to read at least one of these DTB in any DAISY player you have, and as far as I know AMIS 3 (currently in beta version) supports both DAISY 2.02 and DAISY 3 full-text full-audio books.

Wow. That was a longer comment that I initially thought. I'll cross-post to the DAISY forums to keep track of it! (by the way, I invite you to post questions / request for help in these forums)

HTH,
Romain.

Lon said...

Thank you Romain for your insightful explanation. It answers a lot of my questions as I sort this out. My next step will be to use the Narrator application to convert to a full DAISY format to play on the AMIS Player.
I will keep everyone posted...
Lon

Michal said...

Hi, I found some other interesting informations about file formats at file-extensions.org. Nice Day:)