On PDF/A Conformance and Font Usage in PDF Documents Provided by Public Sector Organizations

On PDF/A Conformance and Font Usage in PDF Documents Provided by Public Sector Organizations

Thomas Fischer, Björn Lundell, Jonas Gamalielsson
Copyright: © 2023 |Pages: 19
DOI: 10.4018/IJSR.329605
Article PDF Download
Open access articles are freely available for download

Abstract

The use of appropriate fonts and file formats for long-term maintenance of digital assets is a challenge for organizations in the public sector. The article reports from a study which investigated the PDF/A conformance and font usage in PDF files provided by Swedish public sector organizations (PSOs). This article presents an analysis of the PDF files' properties and font usage including a categorization of fonts' licenses. This study is motivated by the PDF/A-1 standard's requirement that ‘only fonts that are legally embeddable in a file for unlimited, universal rendering shall be used.' Analyzing PDF sets from three PSOs, the finding shows that the proportion of files that claim or succeed at conforming to PDF/A greatly varies among the sets despite similar backgrounds. Although the most popular way to make use of fonts is by embedding a subset of the font data, for some fonts expected to be ‘always available,' a considerable proportion of PDF files does not include any font data. This puts the onus of locating this data on the PDF reader which is problematic for long-term archival.
Article Preview
Top

Pdf/A Conformance And Font Usage In Pdf Documents Provided By Public Sector Organizations

Long-term maintenance and archiving of digital assets such as electronic office documents requires the consideration of how to prepare those digital assets for future use. Multiple challenges exist such as the choice of storage technology and file format. Development and use of file formats impose a number of technical and legal challenges (Lundell et al., 2019), and in particular when formats are to be implemented in software (Egyedi, 2007). To allow for a future use of digital assets, file formats that are clearly specified and provided under terms that allow for implementation and use by software projects should be used (Lundell et al., 2023). This includes many formats which are made available as standards by standard-setting organizations such as the International Organization of Standardization (ISO1) and the Organization for the Advancement of Structured Information Standards (OASIS2).

An example of where an existing file format was standardized is the Portable Document Format (PDF), which, somewhat simplified, allows to describe the content of pages that can be printed. The PDF file format has properties relevant for archiving, such as the read-only option. But the format has drawbacks: documents may refer to external data, which may not be available when reading a PDF file, and the format’s specifications may rely on normative references which may be unavailable when implementing tools for those file formats. To address those limitations, a standard—commonly known as PDF/A— was specified by ISO (2005, 2011, 2012, 2020a) with the intent to define a self-sufficient subset of the PDF file format.

Adherence to the PDF/A standard is required by several national archives and national libraries (Bundesarchiv, 2010; Library and Archives Canada, 2015; Rog, 2007). For example, the Swedish National Archives mandate the use of PDF/A-1 if PDF files are to be archived (Riksarkivet, 2009). Determining the conformance to the PDF/A standard faces challenges both due to deficits in the technical specifications of the PDF/A standard and its normative references, and due to implementation deficits in the tools that may get employed to assess PDF/A conformance (Fischer et al., 2021).

To display a PDF file on screen or to print it, the text contained in such a file must be rendered (i.e., put into a graphical representation by using a so-called font program, commonly referred to as “font”). The font program must be available both to the system where the PDF file was originally created as well as to every PDF reader where the file is to be displayed. The technical specifications of each specific version of PDF outline various alternatives of how to make the font available to the PDF reader: embedding parts3 of the font program into the PDF file, relying on standard fonts that are expected to be generally available, and putting the onus on the PDF reader to locate or synthesize a suitable font when displaying the file.

Complete Article List

Search this Journal:
Reset
Volume 21: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 20: 1 Issue (2023)
Volume 19: 1 Issue (2021)
Volume 18: 1 Issue (2020)
Volume 17: 2 Issues (2019)
Volume 16: 2 Issues (2018)
Volume 15: 2 Issues (2017)
Volume 14: 2 Issues (2016)
Volume 13: 1 Issue (2015)
View Complete Journal Contents Listing