Digital Documents

Digital Documents

Anastasios Dimou, Apostolos Syropoulos
Copyright: © 2021 |Pages: 12
DOI: 10.4018/978-1-7998-3479-3.ch077
(Individual Chapters)
No Current Special Offers


Digital documents tend to replace printed documents in many ways. For publishers it is cheaper to produce them and readers can get them instantly. However, the spectrum of “printed” material is quite big and so there is no way to have one kind of digital document that can satisfy all those involved in the publishing business (e.g., authors, readers, publishers, etc.). Digital document appeared in the 1960s and ever since they evolve. Sometimes this evolution process is slow and sometimes it's rapid, but the fact is that today there a plethora of digital document forms that can be used for many and different tasks and most of them are briefly documented here.
Chapter Preview


Nowadays documents (e.g., books, articles, papers, etc.) are created electronically and they are printed electronically. Modern printers are programmable and so documents are uploaded to them as programs in some page description language (like PostScript, PDF, and PCL). Currently, PDF is the de facto industrial standard for printable documents so all documents are finally converted to PDF.

Definition A digital document is one that is created exclusively with digital means (e.g., computer programs), can be read with digital means (e.g., electronic book readers), and can be transferred to paper by purely digital means (e.g., fonts are digital and the printing machine transfers the digital document to paper).

Documents are created using either sophisticated GUIs or using an “advanced” text editor. Programs like MS Word and LibreOffice Writer are sophisticated GUIs that are used to create digital documents. On the other hand, one can create the same documents “manually” by using a simple text editor and/or some CLI tools. For example, it is entirely possible to create a document in the Open Document Format for Office Applications (ODF) with such tools and of course most people create their LaTeX documents using simple or advanced text editors like NotePad++.

The various file formats that are used to electronically encode documents based on the idea of markup. To understand what markup is consider a book that submitted for publication. Typically, a copy editor will annotate the text with various types of instructions. This is markup. In case, the markup is entered electronically, then, clearly, it is called electronic markup. Specific markup is a form of markup that tells the formatted to do each point. Generic markup adds information about the logical components of a document. In order to add markup to a text we need a precisely defined notation. Such a notation is called a markup language. The basic elements of such a language are called tags.

A markup language defines tags that group text and affect the way specific parts of a document will be rendered. In certain cases it is possible to define macros, that is, new tags that have the combined functionality of primitive tags. There are several markup languages that are widely used today. These include TeX, DocBook, and HTML and macro packages like LaTeX. However, in a way all these markup languages are descendants of SGML.

In what follows we will give a brief overview of SGML and XML. Then, we will discuss page description languages in some details, and we will continue with an overview of some very popular markup languages.



The Standard Generalized Markup Language (SGML) is the International Organization for Standardization (ISO) standard for document description [see (ISO-Standard, 1986) and (van Herwijnen, 1990)]. SGML is actually a meta-language that allows people to create markup languages. Typically, an SGML document consists of a file that contains marked-up data, the SGML declaration, and the document type definition (DTD). The SGML declaration specifies which characters and delimiters are used. For example, it is common to use the characters <, >, and /> as delimiters. And this why it is so common to see tags like <intro> and </intro>. Also, the The SGML declaration specifies which character set (Unicode or other) is used. Figure 2 shows an SGML Declaration for HTML 3.2. The DTD specifies how one can markup a class of documents. In particular, it defines the structure of a document and it is written in SGML. In general, no two different document classes have the same structure. Figure 1 shows a very simple DTD.

Figure 1.

A very simple DTD from (Syropoulos, Tsolomitis, & Sofroniou, 2003).


Key Terms in this Chapter

Markup: A technology that has been developed for creation of digital documents. The text of a digital document is interspersed with markup elements that are either semantic or typographic annotations.

CLI: Command line interface is a way to run programs from a Unix shell or the Windows command prompt.

GUI: Graphical user interface is a graphical interface that is used to communicate and operate a computer program. Most computer programs come with a GUI since almost all users are accustomed to the use of the mouse.

Complete Chapter List

Search this Book: