IRMS005 – Barclay T. Blair on big data, information governance and records management

In this podcast Barclay T.Blair  compares and contrasts:

  • the big data view (that all data is valuable, and the more the better)
  • the information governance view (that some data is good, but other types of data have risks, costs and constraints attached to them that outweigh their potential value)

Barclay discusses with James Lappin:

  • the divide between structured data (in relational databases/Hadoop databases) and unstructured data (documents and e-mail)
  • the continuing need for classification of content – to apply retention, to identify content that needs extra protection (for example to meet privacy concerns), and to meet specific regulatory requirements
  • the impact of auto-classifications on the nature of classifications themselves
  • the challenges of training auto-classification engines
  • the possibility of having auto-classification tools that are trained to classify the content specific to a particular industry sector
  • the similarities and differences between using auto-classification in an e-Discovery context and in an information governance context
  • the US National Archives and Records Administration (NARA) initiatives on archiving e-mail (Capstone) and on encouraging automation
  • the challenges to existing records management and archives theory posed by automated approaches to records management
  • the extent to which courts and the legal process are increasing the transparency of auto-classification
  • the implications of the fact that auto-classification engines make decisions based on complex mathematical techniques that whilst being statistically sound, and academically verified, are not understood by non-mathematicians

The podcast is introduced by Heather Jack.

The conversation between Barclay T.Blair and James Lappin was recorded on October 16 2013.

This podcast is 46 minutes long.

You can play the podcast in your browser with this player:  (needs flash or html 5)