ePub, Format for Electronic Books
ePub is now the standard file for eBooks, thanks to its adaptability to all types of displays of reader devices.
This format is used by most e-readers, but not by Amazon's Kindles.
The IPDF (International Digital Publishing Forum), an association created to provide a standard format for electronic books, has chosen ePub.
In 1999 it first defines a format called Open eBook around HTML, then in 2007, was adopted a set of three specifications for the container, the packaging and the contents, called ePub.
The ePub file
An electronic book is a file with the extension .ePub. It is in zip format and contains documents to inform about the content, for presentation, besides textual, audio and graphics files. All the types of document are defined by the standard.
The ePub Standard comprises three specifications:
- OCF for the container.
- OPF for the packaging.
- OPS for the content.
Each describes the files and their content. To summarize, a .ePub is a compressed ZIP archive that contains files with these extensions: .opf, .ncx, .xml . xhtml, .css, and graphics files.
1) The container
This is a file compressed in ZIP format.
The way files are organized within is subject to the OCF specification (OEPBS Container Format). Two files are used to give the list of contents.
The manifest root
It is an XML file that specifies the name of the manifest file and its mime type. It should be called container.xml and contains this tag:
<rootfiles> <rootfile full-path="OPS/book.opf" media-type="application/oebps-package+xml"/> </rootfiles>
The manifest file book.opf in turn gives lists of all files in the archive.
The mimetype file
This is a file in plain ASCII, which describes the file hierarchy.
--Zip file -- mimetype META-INF/ container.xml OPS/ livre.opf chapter1.xhtml image1.png css/ style.css
It duplicates the content of the XML manifest file book.opf to present it in a more accessible way for indexing tools.
It is defined by the OPF (Open Packaging Format) specification.
It must contain two XML files, the manifest and the table of contents.
It defines a list of all files in the archive and has the .opf extension (eg book.opf):
<manifest> <item id="chapter1" href="chapter1.xhtml" medi-type="application/xhtml+xml" /> ... </manifest>
Table of contents
It is contained in an XML file withe the .ncx extension. It consists of a list of titles and links to files.
But it also contains metadata such as author's name and other references.
NCX means Navigational Center for XML.
<docTitle> <text>Title of the book</text> </docTitle> <navMap> <navPoint id="chapter1"> <content src="chapter1.xhtml" /> </navPoint> </navMap>
3) The files of the content
The OPS (Open Publication Structure) specification, indicates what types of files can be used in an eBook.
The content is a set of XHTML 1.1 files for the text, CSS (limited) and multimedia files.
Each xhtml file operates as a Web page, it specifies the style sheet and contains tags for images and hyperlinks.
Encoding may be UTF-8 or UTF-16.
The supported graphics formats are: GIF, PNG, JPG, SVG.
Amazon uses a semi-proprietary format. The code is actually derived from MobiPocket (with small changes), an alternative to ePub.
The latest version for new Kindles, named Format 8, incorporates HTML 5 for extended multimedia capabilities. With SVG and style sheets, graphics and graphic effects are now available. It becomes possible to perform dynamic comics.
Most of these features are already supported by ePub 3.
Information and Tools
- IPDF website.
- epubcheck. Tool to validate an epub file.
- OPL's ePub Library. PHP code for reading, writing and editing ePub documents.
- ePub logos.