Econoweb RTF Parser & Tools

Last updated: Wednesday, 26 January, 2005.

DescriptionBasic IdeaSample sites made with Econoweb/rtfThis is not really a distribution Obtaining the codesFront-endBack-endsSample documentsCaveat and bugsAuthor

Description

Econoweb RTF parser & tools” is a set of Perl codes that are aimed at a nearly-no-cost solution for Web contents production.

The option that is taken is to use Word processors to create those contents. Creating contents is different from editing Web pages, since you don’t specify the style (how pages look) nor the structure (organization of pages, frames, etc.) of the site.

Basic Idea

The whole production system is made of:

  1. A word processor (the contents workbench).

  2. An XML generator for a pivot structured document type (the front-end).

  3. An XML production chain ending to the site (the back-ends).

This kind of tactics should allow people having valuable information to share, but lacking of Web production resources, to take part to the Web.

For now, “Econoweb RTF parser & tools” uses Word as a word processor (RTF format), Perl and PerlSAX as implementation tools.

Sample sites made with Econoweb/rtf

Of course, this very page is made with econoweb from a Word document: Econoweb.rtf.

Technologia XML Courses Web Site is also made with it. This site is made out of Word documents automatically translated to pages.

This is not really a distribution

This is a very short documentation for various Perl codes I have developed for my own needs.

I have been asked to put those codes on-line, but I haven’t made much effort to make them neither easy to use nor easy to share with other developers.

As a matter of fact, I haven’t found lot of interest for this kind of tools so far. Word2000 is said to support XML, so there may be no point for other people in using this software.

Don’t hesitate to encourage me to devote some energy to this project, if you think that this kind of tools could be of any help for you.

Laurent CAPRANI (English or French)

Obtaining the codes

Here is a copy of the code: econoweb050126, archived as econoweb050126.zip.

Depends on XML::Handler::XMLWriter [1] and XML::Grove.

The codes are divided into front-end or back-end processors:

Front-end

Aka. “Reader”, “RTF Parser” or “SAX driver”.

RTF/control.pl

A table of recognized controls.

Most of this table has been borrowed from Paul DuBois’s RTF tools. It associates control names with control classes.

RTF/parser.pl

An RTF lexical parser.

Events are generated (callbacks are in RTF/handlers.pl) for each control, group or text string.

RTF/handlers.pl

RTF parsing events handlers.

Callback routines interpreting RTF (as of Word97) constructs into structures that represent constituents of a Word document.

RTF/driver.pl

An enhanced SAX driver, not specifically related to RTF.

It ensures well-formed-ness and proper element nesting.

Back-ends

Aka. “writers” or “SAX handlers”…

Following are samples of SAX processors, producing some kind of outputs for some kind of inputs.

RtfMl.pl

This simply translates RTF to an XML vocabulary It is a simple wrapper connecting the parser to XML::Handler::XMLWriter.

DocMl.pl

This one outputs a structured XML format that can be translated to HTML pages using XSLT transforms (e.g. docml2html.xsl).

See “Sample documents” below.

Uses XML::Grove and Data::Grove::Visitor.

html.pl

A Web page generator, driven mostly by the styles in the source document.

Uses XML::Grove and Data::Grove::Visitor.

See the source for usage.

GroveUtils.pm

A set of utilities for generators of HTML based on XML::Grove and Data::Grove::Visitor.

Sample documents

The very document you are reading as an RTF document and as an XML document (DocMl vocabulary). There are other samples in the zip archive.

Caveat and bugs

The RTF parser cannot deal with binary contents (e.g. images and objects).

The RTF parser only knows about ANSI (i.e. windows-1252) character set. For instance, non-English text written on a Mac is not supported.

See also “#FIXME”comments within the code.

Author

Laurent CAPRANI, (English or French) Last updated: Wednesday, 26 January, 2005.


[1]

Both bundled into libxml-perl CPAN module (version 0.08 required).

This page was made from an MS-Word® document, translated to xml & html. Thanks to econoweb free software.