Back to top

Purpose

The purpose of this section of the guidelines is to provide guidance on how organisations can determine technical specifications for digitisation projects.

Back to top

Introduction

Most software and hardware provide a range of variable parameters for digital images e.g image resolution, output file formats, colour resolution or bit depth, compression and colour management which will affect the quality and size of a digital image.[1]

It is important to determine the technical specifications for your back-capture digitisation project before commencing any digitisation to ensure consistency, legibility and quality, and that the images are fit for the purpose they are intended for. Knowing your technical specifications can help you with other digitisation decisions such as what equipment you need to purchase.

If you get your technical specifications wrong, you may end up with digital images that are illegible or that do not capture all the essential elements present in the original paper records or simply do not meet your needs. This may nullify any value you can derive from your digitisation project. It can also impact negatively on business processes that rely on the information in the image.

Back to top

Technical specifications

The primary goal with technical specifications is to create a legible digital image of sufficient quality for its purpose that can remain legible and useable for as long as required. For images of long term or archival value, this may mean that they need to withstand time and a number of migrations.

As a general rule, the highest technical specifications that can be realistically supported should be used. This is particularly true when records fall into the following categories:

Records that are... As...
Required as State archives, regardless of whether the original paper records are kept
  • The digital images may be accepted as State archives in place of the originals (currently allowed for records created and received after 1 January 1980)
  • The digital images may be required as evidence of business activity in place of the original paper records if the originals are destroyed, and therefore need to be trustworthy
  • Even if original paper records are kept, high quality master images that will survive over time will maximise return on the digitisation effort and promote the long term accessibility of the images.
Not required as State archives where the original paper records will be destroyed, particularly for high risk records or records of long term value
  • The digital images may be required as evidence of business activity in place of the original paper records and therefore need to be trustworthy
  • Some of the digital images may still need to be kept for long periods of time and the image may have a better chance of remaining readable and useable if it is of good quality.

Note: For some back-capture digitisation projects, you may only need to create one image at a sufficient standard to meet your project and ongoing business needs.
For other projects, such as the digitisation of archival records for long term use, it may be necessary to create a ‘master’ image at optimal quality and derivatives of lower quality for access and use. You will need to determine the right technical specifications for all needs.
If you are creating masters and derivatives, you may wish to refer to the National Library of Australia’s guidelines Digital capture and image creation which contain advice on technical specifications for masters and for Web delivery based on the Library’s experience in digitising their collection.

At Appendix 1 you will find the recommended technical specifications determined by Archives New Zealand and supported by some other Australian archival institutions.[2]  These provide a good guide for digitised records, particularly those that fall in the above categories.

If your organisation decides that these technical specifications are not suitable for its particular digitisation project you can choose to:

  • use a more rigorous set of technical specifications or
  • in some circumstances, use a less rigorous set of technical specifications.

To determine the right technical specifications for your organisation's digitisation project, you should analyse all relevant factors. If you are going to use a less rigorous set of technical specifications, this step is essential. These factors can include (but are not limited to):

  • the nature of the records involved in the project (e.g. how critical are they? Are they often required in court?)
  • the essential characteristics of the records that need to be reproduced and whether the technical specifications can address these
  • whether the original paper records will be destroyed
  • which version of the records will be required as the official records of business
  • the length of time for which the digital images need to be retained or are expected to be accessible (the longer they are required, the more rigorous the technical specifications should be)
  • all intended and potential uses of the images, now and over time.

For example:
An organisation was intending to digitise some very long term, high risk records and destroy the originals. They initially thought the technical specifications in Appendix 1 too high, but when they performed the assessment they decided to retain the rigour of these specifications to ensure that all of the risks they had identified for the legibility and useability of records required long term were addressed.
Another organisation was intending to digitise some shorter term records for delivery to clients over the web. The original paper records were still to be the official records used for privacy and GIPA requests, court cases, auditing and legal purposes. Stakeholders needed quick delivery through low band width connections and this was more important than having optimal quality images. Therefore the organisation adopted less rigorous technical specifications e.g. lossy compression, non-archival file formats for the images.[3]

Note: If you are considering reducing the level of the recommended technical specifications for groups of records that are required as State archives, it is essential that you discuss this with Museums of History NSW first.

When determining your technical specifications you should make sure that:

  • you can back up your decisions with accessible and product-independent technical expertise
  • adequate technical support exists so that you can ensure the ongoing maintenance and migration of digital images when it is necessary[4]
  • expedient decisions to adopt lower technical specifications do not place the organisation at risk.

Note: File size may certainly be a consideration for your organisation. However, you should never see it as the primary consideration when selecting technical specifications. The primary consideration should be the ability to access and use legible images that are fit for purpose and retain the original records' essential characteristics for as long as required. If this cannot be guaranteed with the resources available, the digitisation program may not be viable.

The following sections explain the major elements of the technical specifications you will need to define. For more information see: http://getty.edu/research/publications/electronic_publications/introimages/image.html

Back to top

File formats

What are file formats?

File formats encode information into a form which is able to be processed and used by specific combinations of hardware and software.

The following table describes some categories of file formats:

File format category Description
Rastor Also known as bit-mapped formats. Images take the form of a grid or matrix with each picture element (pixel) having a unique location and independent colour value. Examples are TIFF, JPG/JPEG, GIF and PNG.
Vector

Also known as object oriented formats. Based on a set of mathematical instructions typically used by drawing programs to construct an image.

Not of relevance to digitisation which will use raster formats.

Encoding Also known as metafiles which may contain either vector or raster images. Such formats enable the contents to be consistently displayed and used across different computer programs and operating systems. Typically, they support internal metadata and multi-page images and enable security management. Examples include Adobe PDF and TIFF.[5]

How do you determine the right file format for your digitisation project?

The following table describes some factors to consider when determining the file format required:

Factors to consider For example...
Creation of the best quality possible Organisations should create the best quality image possible given their available resources and the purpose of the images.
The way digital images will be delivered  If good quality masters are too large for some modes of delivery, your organisation can consider creating derivatives in non-archival file formats, e.g. JPEG (while keeping the masters at better quality).
The format’s support by hardware and software platforms Some hardware and software only support certain file formats. This is changing: the trend for interoperability and compatibility has led to a situation where many file formats are supported by a range of hardware and software platforms which is preferable.
Whether the file format is proprietary If digital images are held in proprietary formats, they are at risk of becoming obsolete if the vendors go out of business, or of becoming unreadable if the relationship with the vendor changes. This could particularly be a problem if the records are required long term or as State archives. Where possible, choose open source formats that are widely used, with published technical specifications available in the public domain.
The format’s ability to be read by a plug-in  If the specific production software is not available to all users, plug-ins may be used for viewing digital images.
The format’s use of  embedded objects or links Formats should not contain embedded objects or link out to external objects beyond the specific version of the format. [6]

The format's ability to capture automated metadata or to support colour requirements may be other factors for consideration.

For example:
TIFF allows a range of automated metadata capture where PDF can be more limited.
36-48 bit RBG colour will require a format like TIFF or PNG to support it.

See the Library of Congress digital formats website for a description of file formats for text and still images. [7]

Back to top

Resolution

What is resolution and how is it quantified?

Resolution is a measure of the ability to capture detail in the original work. It is frequently quantified in pixels per inch (ppi) which is a measurement of resolution for computer display. The higher the ppi the better the resolution and the clearer the image.

Note: Dots per inch (dpi) is often used interchangeably with ppi, but actually refers specifically to measurement of the resolution for computer printers.

How do you determine the right resolution for your digitisation project?

It is important to determine the correct resolution prior to undertaking any digitisation, as the resolution of an image cannot be increased. If a higher resolution is required, the record would need to be re-digitised.

The following table describes some factors to consider when determining the resolution required.

Factors to consider For example...
The nature of the records to be digitised Photographs and detailed images require much greater resolution than text based documents.
How the digital images will be used Original paper records that are enlarged or require fine detail for viewing and printing should be digitised at a higher resolution. Original paper records that are reduced for viewing and printing are digitised at lower resolutions.
File size

While higher ppi settings will result in images which are able to contain more detail per inch, they will also increase the file size of an image. Your organisation will need to test and analyse these issues to ensure the selected resolution is fit for purpose.

Note: The file size of a digital image should not be the sole determinant of the resolution selected.[8]

For examples of the difference resolution can make to image quality see: http://getty.edu/research/publications/electronic_publications/introimages/image.html

Note: When selecting a capture resolution it is important to consider the optical resolution of your capture device, as exceeding this resolution may degrade image quality.

Back to top

Colour resolution or bit depth

What is bit depth?

Bit depth is the number of bits (zeros or ones) used to describe the colour of each pixel. Bit depth can range from 1 bit up to 48 bits. Greater bit depth allows a greater range of colours or shades of grey to be represented by a pixel.

Bit depth Result
1 bit, black and white or line art Only black and white pixels
Greyscale Black and white in addition to a range of intermediate greys, requiring 8 bits to describe each pixel
8 bit colour Uses a palette of 256 colours
24 bit colour Enables storage of 8 bits of information describing the red, green and blue components of every pixel, thus enabling a much greater palette of colours
36-48 bit RBG colour Uses an extended colour space, creating a much larger file, and requiring storage in formats that explicitly support this colour depth (e.g. TIFF and PNG).[9]

How do you determine the right bit depth?

The following table describes some factors to consider when determining the bit depth required.

Factors to consider For example..
The nature of the records to be digitised

1 bit depth will usually capture the information in black and white text documents efficiently.

A greater bit depth would be required for documents containing greyscale or colours. In these cases 1 bit depth may make the image illegible.

File size The total number of pixels used to make up an image affects file size. Additionally, the colour depth of each of those pixels has a multiplying effect on the file size.
Note: The file size of a digital image should not be the sole determinant of the resolution selected.[10]

The following table shows the impact of resolution and bit depth on file size. It shows the uncompressed file sizes for an A4 page digitised at different pixel depths and resolutions [11]:

Colour depth Resolution (PPI) Total bits Uncompressed file size (Mb)
1 bit bi-tonal 300 8 700 867 1.04
1 bit bi-tonal 600 34 803 468 4.15
8 bit grey or colour 300 69 606 936 8.30
8 bit grey or colour 600 278,427,744 34.00
24 bit colour 300 208 820 808 24.89
24 bit colour 600 835,283,232 101.96

Note: Capturing a record at a lower than recommended bit depth will possibly result in a digital image that is visibly different from the original record. Choosing a higher than recommended colour depth, such as 24 bit colour for a black and white text document, will not provide any benefits but will result in a larger file size of the image produced and may even introduce small areas of extra colours not present in the original. Therefore your organisation should test and analyse bit depth to ensure that it is fit for purpose.

Back to top

Compression

What is compression?

Compression is a means of reducing the size of a digital image for storage or transmission. Compression techniques can be categorised as either:

Types of compression Description
Lossy Where information is removed from the stored information during the compression process, i.e. file information is lost. Lossy compressions are therefore irreversible.
Lossless Where no information is irretrievably lost and where the decompressed object will always appear exactly the same as the original. Examples include LZW or ZIP lossless compression with TIFF files.

How do you determine the right compression for your project?

The following table describes how to choose an appropriate compression method:

Where... Then...
  • Records are required as State archives or in the long term
  • Original paper records are destroyed after digitisation
  • Master digital images are to be the official records
Lossless compression or no compression should be used otherwise the accuracy of the digital image may be called into question.
  • Records where original paper records will be retained as the official records
  • Derivatives are taken from the master for delivery

Your organisation may choose a suitable type of compression that is fit for purpose (i.e. appropriate to the nature of the record and its intended use). For example, some loss may be acceptable if files can be made smaller and delivered and stored more easily.

Note: Loss should not be so extreme that the image appears noticeably different from the original paper record.

If your organisation decides to use compression, you should test and analyse it for different types of records and their intended use to ensure that the selected compression is fit for purpose and gives you the file size reduction you expect.

For example:
An organisation used JPEG lossy compression for some derivative colour photographic images for delivery over the web, which allowed significant reduction of the images' size. However, when they tried to apply the same compression to images containing drawings, letters or simple graphics, the file size reduction was not as significant.[12]

Back to top

Colour management

What is colour management?

Colour management involves attempting to ensure that an image looks the same across a range of different output devices. Monitors and printers typically use different colour spectra. The standard for colour representation is the ICC colour management system, which uses a standardised and known colour space based on the human eye and then compares all devices to the known standard.

Halftones

In printing, halftones are evenly spaced spots of varying diameter to produce apparent shades of grey with a single colour ink. The darker the shade at a particular point in animage, the larger the corresponding spot in the printed halftone. In order to simulate variable-sized halftone dots in digitisation, dithering is used, which creates clusters of pixels in a ‘halftone cell’. The more black pixels in the ‘cell’, the darker the grey.

Bi-tonal images utilising halftones may be considered as an alternative to using 4 or 8 bit grey to represent greyscales on digitised records. This technique may provide some advantages over using palettised images including wider format compatibility and reduced file size.

However, use of halftones may also introduce a speckled effect to areas of the image that should be white. At too low a resolution halftones will not be beneficial, and halftones at high resolutions may produce a large number of halftone pixels where there should be white space. Some other image processing, notably Optical Character Recognition (OCR), may also be negatively affected if using halftones in text documents.

If you are considering using halftones you should carry out thorough testing to ensure the end results are suitable.

Note: When paper documents that contain halftone images are digitised, a distracting pattern of lines called ‘Moire’ is often produced. To avoid this unwanted effect, most scanning systems have a ‘de-screen’ function to remove the Moire during the scanning process. Post-capture image processing software can also be used to correct these images. Procedures for doing this should be documented.[13]

Watermarks

Care should be taken when capturing a document that contains a watermark, highlighting or hand written annotations using a bit depth of 1. This may cause text to be obscured, leading to a loss of information.  A palettised grey or palettised colour output image would capture the text of the document and the extra information in the watermark. The placement of black paper behind the original when scanning a record containing a watermark can also assist in producing higher legibility of the digitised image.

Back to top

Common questions

See Frequently asked questions for answers to the following questions.

  • What technical specifications (resolution, file format, compression etc) should I use? (also addressed above)
  • What is the different between PPI and DPI with resolution? (also addressed above)
  • Do I need to digitise in colour? (also addressed above)
  • Can I use enhancement techniques for the digital images?
  • Should I use watermarking or fingerprinting for records?
  • What is the difference between PDF and PDF/A?
Back to top

Checklist

Technical specifications Yes No
Are there documented technical specifications for the digitisation project?     
Have the recommended technical specifications (in Appendix 1) been adopted?    

If not, has the organisation conducted an analysis to determine if the technical specifications:

  • are fit for purpose
  • enable the capture of the essential characteristics of the original  paper records
  • enable the retention of the digital images for as long as required?
   
If records are required as State archives has the organisation contacted Museums of History NSW about the proposed digitisation?    
Back to top

Appendix 1

Recommended technical specifications for digitisation in this Appendix were designed by Archives New Zealand.[14]  The highest technical specifications possible and supportable should be selected. If your organisation chooses to vary these technical specifications, they should conduct an assessment of all factors and document this along with the reasons for choosing alternative specifications. The primary considerations should always be to ensure:

  • the legibility of the digital image
  • the reproduction of the original records' essential characteristics and
  • that the image is fit for purpose.
Document type Resolution* Bit Depth File Format Compression
Text only, black and white Minimum 300ppi  1 bit (bi-tonal) TIFF
PDF/A† containing TIFF or
JPEG 2000‡
Lossless compression
Documents with watermarks, grey shading, grey graphics Minimum 600ppi  8 bit greyscale TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression
Documents with discrete colour used in text or diagrams Minimum 600ppi  Minimum 8 bit colour TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression
Black and white photographs Sufficient to provide >3000 pixels across long dimensions 8 bit greyscale TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression
Colour photographs Sufficient to provide >3000 pixels across long dimensions 24 bit colour  TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression
Black and white negatives Sufficient to provide >3000 pixels across long dimensions 8 bit greyscale or 24 bit colour  TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression
Colour negatives and transparencies Sufficient to provide >3000 pixels across long dimensions 24 bit colour TIFF
JPEG2000
PDF/A containing TIFF or JPEG 2000
Lossless compression

*The scale/ratio for resolution here is 1:1.

† PDA/A is a constrained version of PDF version 1.4 with various proprietary fonts and formats removed, issued as ISO 19005-1:2004.

‡ JPEG 2000 is defined in ISO 15444-1:2000.

Note: The digitisation of microforms is outside the scope of these guidelines. However, any approach for digitising microforms should be able to emulate the methods detailed above consistent with the original record on the (typically greyscale) microform (i.e. to produce a resolution of 600ppi in relation to the original document). However, this may vary for textual records to focus more on creating digital images with reasonable or good legibility. JPEG 2000 and PDF/A are recommended formats for microforms.

Note regarding resolution for photographs, negatives and transparencies

With photographs, negatives and transparencies the required resolution will vary according to the size of the photograph or negative. In these cases measure the longest side of the photograph in inches then calculate the required resolution by dividing 3000 by the length of that long side.

For example:
If you have a photograph that is 5 inches by 8 inches, then 8 inches is the longest side. 3000 divided by 8 = minimum 375ppi.

As rough rules:

For photographs with a longest side measuring... ppi
15 inches or greater for the longest side Use at least 200ppi
Between 10 and 15 inches Use at least 300ppi
Between 5 and 10 inches Use at least 600ppi

The National Library of Australia’s Image capture guidelines may also help you to determine a suitable ppi. See: http://www.nla.gov.au/standards/image-capture


Footnotes

[1] Queensland State Archives, Digitisation disposal policy toolkit: Technical specifications, May 2010.

[2] The Archives New Zealand technical specifications have been adopted by Archives New Zealand and Queensland State Archives as recommendations for all records.

[3] Queensland State Archives, op.cit

[4] Archives New Zealand, Digitisation standard, 2007, p.15.

[5] Ibid, p.37

[6] Ibid, p.15

[7] Library of Congress, Sustainability of digital formats, available at: http://www.digitalpreservation.gov/formats/

[8] Queensland State Archives, op.cit

[9] Loc.cit

[10] Loc.cit

[11] Loc.cit

[12] Loc.cit

[13] Loc.cit

[14] Archives New Zealand, op.cit, p.36.

Back to top
Recordkeeping Advice