ePub, Format for Electronic Books

ePub is now the standard file for eBooks, thanks to its adaptability to all types of displays of reader devices.

This format is used by most e-readers, but not by Amazon's Kindles. Amazon uses a semi-proprietary format. The code is actually  derived from MobiPocket (with small changes), an alternative to ePub.
The latest version for new Kindles, named Format 8, incorporates  HTML 5 for extended multimedia capabilities. With SVG and style sheets, graphics and graphic effects are now available. It becomes possible to perform dynamic comics.
Most of these features are already supported by ePub 3.

The IPDF (International Digital Publishing Forum), an association created to provide a standard format for electronic books, has chosen ePub.

In 1999 it first defines a format called Open eBook around HTML, then in 2007, was adopted a set of three specifications for the container, the packaging and the contents, called ePub.

The ePub file

An electronic book is a file with the extension .ePub. It is in zip format and contains documents to inform about the content, for presentation, besides textual, audio and graphics files. All the types of document are defined by the standard.

The ePub Standard comprises three specifications:

  1. OCF for the container.
  2. OPF for the packaging.
  3. OPS for the content.

Each describes the files and their content. To summarize, a .ePub is a compressed ZIP archive that contains files with these extensions: .opf, .ncx, .xml . xhtml, .css, and graphics files.

1) The container

This is a file compressed in ZIP format.
The way files are organized within is subject to the OCF specification (OEPBS Container Format). Two files are used to give the list of contents.

The manifest root

It is an XML file that specifies the name of the manifest file and its mime type. It should be called container.xml and contains this tag:

<rootfiles>
    <rootfile full-path="OPS/book.opf" media-type="application/oebps-package+xml"/>
</rootfiles>

The manifest file book.opf in turn gives lists of all files in the archive.

The mimetype file

This is a file in plain ASCII, which describes the file hierarchy.
For example:

--Zip file --
  mimetype
  META-INF/
     container.xml
  OPS/
     livre.opf
     chapter1.xhtml
     image1.png
     css/
        style.css
  

It duplicates the content of the XML manifest file book.opf to present it in a more accessible way for indexing tools.

2) Packaging

It is defined by the OPF (Open Packaging Format) specification.
It must contain two XML files, the manifest and the table of contents.

The manifest

It defines a list of all files in the archive and has the .opf extension (eg book.opf):

<manifest>
    <item id="chapter1" href="chapter1.xhtml" medi-type="application/xhtml+xml" />
    ...
</manifest>

Table of contents

It is contained in an XML file withe the .ncx extension. It consists of a list of titles and links to files.
But it also contains metadata such as author's name and other references.
NCX means Navigational Center for XML.

<docTitle>
    <text>Title of the book</text>
</docTitle>
<navMap>
    <navPoint id="chapter1">
        <content src="chapter1.xhtml" />
    </navPoint>
</navMap>

3) The files of the content

The OPS (Open Publication Structure) specification, indicates what types of files can be used in an eBook.

The content is a set of XHTML 1.1 files for the text, CSS (limited) and multimedia files.
Each xhtml file operates as a Web page, it specifies the style sheet and contains tags for images and hyperlinks.
Encoding may be UTF-8 or UTF-16.

The supported graphics formats are: GIF, PNG, JPG, SVG.

Information and Tools