These FAQs address some of the major questions about digitisation received by State Records’ staff.
Back to top1. Disposal of original paper records
If the originals are eligible for destruction in accordance with General retention and disposal authority: original or source records that have been copied, and they meet all its conditions, they may be destroyed.
Check documentation about the digitisation process and quality control measures taken, so that if required, you will be able to demonstrate that these images are authentic, complete and accessible, and that the other conditions for destruction were met. If you do not have adequate documentation, it may be prudent to retain the originals.
Remember, originals that are ‘required as State archives’ or to be retained in agency that were created or received prior to January 1, 1980 are not eligible for destruction after digitising under the General retention and disposal authority: original or source records that have been copied
The original paper records should be kept for a period of time for quality control purposes. This is to allow for quality checks of the images and provides an extra safeguard in case of loss of images in the copying or registration process.
Your organisation should determine an appropriate period for retaining originals for quality control purposes. This period should be based on an assessment of the:
- level of assurance that a full and accurate record has been created
- level of assurance that the digitised image is being well managed in a recordkeeping system
- robustness of digitisation processes, including quality assurance processes
- level of assurance that the authenticity is being maintained (determined through results of quality assurance processes)
- need for access to the original paper records for other purposes such as legal proceedings.[1]
This assessment should be:
- based on an understanding of your organisation's own digitisation and recordkeeping processes
- suitable for the types of business to which the records relate
- determined in consultation with relevant business units.
Where original paper records are destroyed, the digitised copies of the records must be retained for the records' full retention periods, as required in the relevant retention and disposal authority.
Day boxing is a common practice for business process digitisation. It involves scanning records as they are received (ie as part of business process digitisation) and placing the originals in a ‘day box’ or batch. Records should be covered by retention and disposal authorities if they are to be day-boxed.
In back-capture digitisation projects, if you are destroying the original paper records after digitisation you may box records after scanning, but in all likelihood you will box them according to scanning batches. You should not use day boxing where originals are to be retained (they will usually need to be reconstructed).
It is a condition for destruction in the General retention and disposal authority: original or source records that have been copied that original records awaiting destruction are kept for a certain period of time for ‘quality control’ purposes after digital imaging has occurred. Your organisation can choose this period (e.g. six months) and should apply it consistently.
Those operators doing the digitisation also need to be aware of the types of records excluded from the General retention and disposal authority: original or source records that have been copied.
You will also need to ensure that digital images are registered into a recordkeeping system and given retention periods from the appropriate retention and disposal authority.
Two of the conditions of the General retention and disposal authority: original or source records that have been copied require copies to be authentic, complete and accessible and kept for the authorised retention period. It is important that you meet these requirements if the digital image of the record will replace the original.
Even if you do not have a sophisticated recordkeeping system, if you can meet these conditions of the Authority you do effectively have a recordkeeping system as you can demonstrate that the integrity of the records can be safeguarded over time.
Providing you can meet the other requirements of the Authority that:
- all requirements for retaining originals have been assessed and fulfilled
- originals are kept for quality control purposes for an appropriate length of time after digitisation
you have authorisation to destroy certain original records after digitisation.
If you do not meet these requirements, you are not authorised to destroy any original records after digitisation.
2. Admissibility of digital images in court
In most cases there is no barrier to organisations tendering digital images of records as evidence. They can be considered suitable to submit in legal proceedings in response to Government Information (Public Access) Act (GIPA) applications and for other evidentiary purposes.
However, the value or credibility of the digital image as evidence can still be questioned.
The authenticity of a presented record may be challenged or a judge may be given some other reason to doubt the reliability of a digital image.
In these cases, your documentation regarding how the digitisation was conducted and the digital images created and kept may help to demonstrate that the digital image is an authentic and credible representation of the original. See Legal admissibility of digital records in the relevant guideline for more information.
3. Technical standards
The primary goal with technical specifications is to create a legible digital image of sufficient quality for its purpose that can remain legible and useable for as long as required. For long term digital images, this may mean that they need to withstand time and a number of migrations.
In the Technical specifications section of the guidelines, you will find an explanation of file formats, resolution, compression etc and a table listing the recommended technical specifications for records and considerations if departing from these.
Resolution is a measure of the ability to capture detail in the original work. It is frequently quantified in pixels per inch (PPI) which is a measurement of resolution for computer display. The higher the PPI the better the resolution and the clearer the image.
Dots per inch (DPI) is often used interchangeably with PPI, but actually refers specifically to measurement of the resolution for computer printers.
Generally the PPI will equate roughly to the DPI.
Not necessarily. You need to consider the particular records in question and determine what the ‘essential characteristics’ of the records are that need to be maintained and present in the digital image.
If it is essential to reproduce the colour in order to understand the records or preserve their evidential value, you will need to digitise in colour. If colour is not essential, you can choose not to digitise in colour.
For example:
An organisation had a particular group of records often required in court where colours were vital to understanding annotations on the records eg. green pen meant something different to red pen. In these cases colour was an important essential characteristic of the records that needed to be reproduced. This affected decisions regarding hardware and software and technical (colour management) requirements.
If the colour in a document is just within the letterhead, colour is not an essential characteristic – it is not essential to understanding the record.
Image enhancement techniques may be employed to make an image more exactly resemble the original. However, benchmarks should document acceptable changes and these must be routinely employed.
If enhancements change the evidential value of a record or if acceptable changes are not documented and routinely employed, the organisation may be subject to challenges that the digital images are not authentic representations of the original paper records.[3]
Some digitisation processing and management software may have the ability to modify the appearance of a digital image by adding information such as the date or organisation name. Two such techniques are watermarking and fingerprinting:
- Watermarking is the inclusion of static information on an image at time of storage, perhaps the name of the organisation and date of capture.
- Fingerprinting typically includes information generated when the image is accessed, such as login name of the end user and date / time information.
While this information may be useful, and the inclusion of it as part of the image convenient, these modified images are no longer a true and accurate copy of the original paper records. This is especially relevant where added information, such as a large watermark through the text, makes the content of a record difficult to read.
Organisations should instead retain a digital record as an unmodified representation of the original paper record and capture any additional information as metadata rather than as part of the image.[4]
PDF/A is an ISO standardised version of the Portable Document Format (PDF) specialised for the digital preservation of digital documents. In PDF/A the proprietary fonts are removed. All the information for displaying a document is embedded rather than linked, so the document can be displayed in the same way in years to come. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). Use of standards based metadata is mandated.
PDF/A does have some drawbacks. A PDF/A file is often bigger than a normal PDF file because all the information for its use is embedded.[5] Encryption is not allowed in a PDF/A file, nor is JavaScript, audio or video content.
Note: PDF/A can be edited. If your requirements include security or authenticity you should capture the PDF/A records to a recordkeeping system.
4. Equipment
Optical Character Recognition (OCR) allows the text depicted in a scanned document to be extracted as a text file or word processor document. OCR software is required to recognise the text contained in the image and usually provides search and export capabilities. Another advantage of OCR is that it allows you to automate more metadata capture through document definitions.
You will need to consider the aims of your digitisation project or program and the records in question to determine whether OCR will help you meet your organisation's needs. Your budget and the cost of the OCR in comparison to other software will be a factor. In addition, some documents are not suitable for OCR (see 5.3 for examples).
5. Benchmarks and quality assurance
You will need to determine the extent of quality control required based on the risks of your particular digitisation program or project. See Benchmarks and Quality Assurance for more information.
Yes. Techniques such as ‘sharpening’ and/or ‘clipping’ of highlights or shadows, ‘blurring’ to eliminate scratches, ‘spotting’ or ‘de-speckling’ may be used to touch up specific areas of a digital image. Some software may automatically correct imperfections. The extent of these processes can be set through tolerance levels.
In quality assurance checks you should ensure that these processes are checked to make sure information is not lost (for example, if the tolerances are set too high the dots above the letter ‘i' may be removed). Processes employed should be documented so as to help ensure the authenticity and completeness of the images is not at risk of being challenged.[6]
OCR is rarely a fully automated process and may require operator intervention to assist in obtaining an accurate transcription of a scanned record’s text. Documents containing handwriting, serif fonts, halftones and background text or images or those that are damaged or dirty may not be suited to the OCR process.[7]
Quality faults can be categorised as implementation faults, process faults or operator faults.
- Implementation faults can be avoided, providing appropriate procedural controls are in place to guide the digitisation.
- Process faults are normally out of the control of the operator and need to be addressed by a supervisor to the process.
- Operator faults are the day-to-day faults that are made by the operator as they work.
Implementation faults
- dirty originals
- incorrect file-size and format, where files are made to the wrong size or with the wrong choice of file format
- compression, where files are made with an inappropriate type or level of compression.
Process faults
There are a wide variety of process faults that can be caused by many problems within the workflow. These problems can include:
- incomplete or inaccurate specifications or process documentation
- faulty capture hardware (incorrectly calibrated and characterised devices)
- faulty software (inaccurate image processing or faulty image links within database)
- incorrectly established colour management systems
- low quality original data (either non-digital surrogates like a photocopy or legacy digital image files)
- inaccurate source metadata.
Operator faults
These faults are caused by some form of operator error within the workflow and can include:
- basic capture faults
- cropping that cuts into the image, is too loose, or is uneven
- incorrect orientation of the image, i.e. is the wrong way around, or upside down
- incorrect exposure of the image, i.e. it is too light or too dark
- incorrect focus, i.e. the image is out of focus
- daily calibration, where the capture device has not been calibrated
- basic image processing faults
- file optimisation faults, where incorrect adjustments are made to the colour, contrast and brightness of the image during processing
- incorrect file-naming, where image files are incorrectly named or use non-unique names
- basic metadata attribution faults
- placing digital images into incorrect folders, files or classification structures
- incorrect data entry, where data is incorrectly entered into the management control system
- incorrect use of controlled vocabulary, e.g. using words not established within scope notes.[8]
Your organisation should consider whether it is necessary to add page numbers to a file or volume prior to digitisation (if not numbered already). It does require additional effort (and potentially cost if digitisation is outsourced) but this should be balanced with your organisation’s need to have evidence of the exact order of the original papers on the file. A risk assessment should be conducted when planning the project. If records are high risk or long term/archival and/or tend to be requested in court, your organisation may well decide that the additional effort is justified. Pagination can assist you with quality assurance checking. Where original paper records are to be retained after digitisation, it will also assist you to reconstruct the original paper records.
If page numbers are to be added your organisation’s procedures for digitisation should indicate acceptable ways this must be done. Pagination of archives is generally done in soft pencil so that it can be removed if necessary. Stamping is not generally recommended for archives as the stamps alter the original, and may obscure text or negatively impact on pictures or images. In some cases you may be able to add page numbers to the metadata of digital images.
If your organisation decides not to paginate, you should (at least) determine the amount of pages digitised and compare these to the number of papers in the original records (this can be sampled where relevant). If pages are removed to be digitised in separate batches, e.g. non-standard materials that require a different scanner, flags should be added to ensure pages are returned to the correct order.
6. Outsourcing
Public offices are responsible for meeting the requirements of the State Records Act and standards released under the Act. If your organisation outsources the digitisation of records, it is still responsible for the management of both the source records and the digital images. Therefore, you need to ensure that all relevant requirements are specified in contracts with providers.
In order to ensure that a digitisation project is managed suitably, it is recommended that contracts contain:
- clear guidance on the range and type of records to be digitised
- clear timeframes, costs and expectations including that records should not be altered in any way
- roles and responsibilities of the organisation and service providers
- special requirements for sensitive or personal records or urgent requests
- benchmarks, eg. technical and metadata requirements etc.
- quality assurance measures (including early checks of samples and remediation required if benchmarks are not met)
- an agreed monitoring framework
- a statement that all State records and State archives must remain in NSW unless express permission is given by State Records to take them out of the State. See General authority: transferring records out of NSW for storage with or maintenance by service providers based outside of the State.
Service providers must be made aware of relevant standards including:
- Standard on the physical storage of State records, Principle 6: Careful handling Minimum compliance requirement 4 ‘Records are handled carefully during conversion and converted according to recognised standards’
- Standard on records management contains metadata requirements to ensure records are reliable and trustworthy
- General retention and disposal authority: original or source records that have been copied which specifies the conditions for destroying original records after digitisation.
The monitoring framework that forms part of the contract should ensure that all recordkeeping requirements are met throughout the term of the agreement.
If you are intending to digitise records required as State archives, you should consult State Records. This is to ensure that the specific issues concerning the reproduction of archival records can be discussed and suitable parameters for the digitisation process agreed upon.
7. Planning
With large scale back-capture projects it can help to break them down into smaller components. The parameters of these components should be clearly defined and measurable. This way you can learn as you go along and lessons learnt can be applied to later parts of the project. Segmenting your project may also allow you to use your resources more efficiently.
For example:
If you have records in different formats or of different ages requiring different equipment or methods of capture they can form different parts of the project.
If records are still being used frequently remember to consider how long they can be unavailable and consider whether certain records can be digitised on request, even if this is out of sequence.[9]
If you have multiple parts of the project being performed simultaneously you will need to have very efficient monitoring mechanisms in place.
See the Case study: Housing NSW - Outsourcing digitisation of client files as an example of the considerations required for large scale digitisation.
Footnotes
[1] Queensland State Archives, Digitisation Disposal Policy Toolkit, Quality Assurance Guideline, May 2010, section 3.
[2] Legal advice provided by the NSW Crown Solicitor’s Office to Housing NSW. Reproduced with the kind permission of Housing NSW.
[3] Archives New Zealand, Digitisation standard, 2007, Appendix 7, available at: https://archives.govt.nz/advice/continuum-resource-kit/continuum-publications-html/s6-digitisation-standard, p.16.
[4] Queensland State Archives, Digitisation Disposal Policy Toolkit, Quality Assurance Guideline, May 2010, op.cit., section 2.2.1
[5] Wikipedia, PDF/A, available at: https://en.wikipedia.org/wiki/PDF/A See also http://digitalpreservation.gov/formats/fdd/fdd000125.shtml and http://www.appligent.com/talkingpdf-how-to-implement-pdfa for more information about PDF/A.
[6] Queensland State Archives, op.cit., section 2.2.1
[7] Loc.cit
[8] Archives New Zealand, Digitisation standard, 2007, op.cit, Appendix 7
[9] National Archives of Australia, Digitising accumulated physical records, Commonwealth of Australia, 2011, available at: http://www.naa.gov.au/Images/Digitising-accumulated-physical-records-April-2011_tcm16-47278_tcm16-70173.pdf, p13.
Revised April 2018
Back to top