Limitations of records management in Microsoft 365

Office 365 is a ubiquitous and incredibly useful information collaboration platform, and Microsoft continues to evolve the service to meet governance, compliance, and regulatory requirements across the globe.

However, there are some key shortcomings in the Office 365 functionality for continuum records management, which is the model required by the International Standards 15489 and 16175, and by governments in the Commonwealth and beyond.

M365 is still using a fundamentally traditional model for records management, which relies either on users to determine and set retention rules, or on records managers to generate and maintain file plans. The automation capability is limited, and labour intensive to implement and sustain. Items aren’t managed for their whole lifecycle, or as part of meaningful aggregations (either within M365, and across the enterprise).   

The limitations specifically relate to the following issues:

  1. Managing at the aggregation level
  2. Sentencing on creation and continuous review
  3. User and records manager burden
  4. The impracticality of AI and automation
  5. Management of records in other business systems.

Before exploring each of these challenges in more detail, we need a quick review of how M365 records management works.

The M365 records management approach

In Office 365, Records Management is implemented in two main ways:

Retention Labels – for individual items

Retention labels are basically metadata and workflow added to a document or item, which set the retention rule for the item. They are designed to be applied to a type of information asset. For example, ‘Pricing Documents’ might have a 10 year retention.

Importantly, only one retention label can be applied to any one item. So a document can be ‘Pricing Documents’ (10 years), but not also ‘Asbestos Removal’ (75 years). So the labels apply based on what an item is, not really on what it is about. In this case, we can mark it as a pricing document, or we can mark it as an asbestos removal document, but not as a pricing document for asbestos removal.

With only one label allowed, this means you can only have one retention rule applied to the document as well.

Retention Policy – for aggregations

In order for items to be managed as part of an aggregation, Office 365 uses a retention policy, rather than a retention label. This applies the same retention settings to all the content in a site or mailbox, for example. You can use labels to override the policy for individual items in the site or mailbox.

This approach of requiring either each item to be linked to its own single retention rule, or for all items in an aggregation to be assigned the same retention rule, has some ramifications for the practical application of compliant records management.

Now let’s turn to how these capabilities stack up against the continuum requirements.

Issue 1: Items are managed individually, not as part of an aggregation 

A fundamental principle of the International Standards approach to records management is that individual items are not sentenced and disposed of in isolation. The ‘meaningful record’ is the whole story of something, which often includes many parts. It’s all for one and one for all – we keep all the parts of the record until the longest-retention part is ready to be disposed of.

ISO16175-2:2011: Guidelines and Functional Requirements for Records in Electronic Office Environments: The electronic records management system must: Ensure that all records captured within the electronic records management system are associated with at least one aggregation

To see why, let’s consider the record of a bridge construction project. We need to keep the final specifications for the bridge for, let’s say, 75 years. That’s because if something goes wrong with the bridge at any time during or after construction, we need to know what the specs were. But we also need to know who came up with the specs, and who approved them, and whether they were changed along the way (and whether those changes were approved). Another example might be the funding approvals for sports facilities. We don’t just need the final report, we need the history of the decisions.

AS ISO 15489 – 1:2001 (E): Records retention should be managed to meet current and future business needs by retaining the context of the record which will enable future users to judge the authenticity and reliability of records, even in cases where the records systems in which they are retained have been closed or have undergone significant changes

If we gradually chip away at the record by disposing of its individual parts one by one, as their retention in their own right comes due, then we don’t have anything meaningful left at the end of the day. We need to be able to re-tell the whole story, for as long as the story is relevant.

This basic principle breaks down with Office 365 retention labels. In the M365 model, retention labels are applied to each item, not to the aggregation as a whole. And when the retention comes due for that single item, it is disposed of in isolation. It is removed from its business context, and it leaves the aggregation weaker for its removal. By the time the last, longest-retention item in that record is ready to be disposed of, or transferred to Archives perhaps, the rest of the story and evidence behind that business decision, event, or activity has already disappeared.

And this doesn’t only happen when items are formally disposed of by the records team. In M365, if items are deleted or moved by a user before their retention comes due, they are duplicated to a separate preservation or recovery library, completely divorced from their context. This means when they do come due for disposal, they are disposed of without consideration as to whether they still have continuing value, which is also contrary to the Standards.

ISO16175-2:2011: Guidelines and Functional Requirements for Records in Electronic Office Environments: If more than one disposal authority is associated with an aggregation, the electronic records management system must automatically track all retention periods specified in these disposal authorities, and initiate the disposal process once the last of all these retention dates is reached

And a note on disposition – this can be managed in M365, but the metadata for disposed items is not retained, and the records of disposal are only kept for seven years. Usually, lists of records destroyed are Retain Permanent, and the 15489 and 16175-3 Standards also require that all disposition actions are recorded in a metadata profile.  

Issue 2: The sentence is not set or maintained correctly

Using retention labels, we can only apply one type of ‘rule’ to an item. We can mark it based on what it is (a policy, project plan, marketing document etc.), but we can’t also mark it based on what it is about. In modern sentencing models, we classify records based on what they are about (i.e. how they relate to the functions of the business). A ‘report’ about an administrative matter is sentenced one way, and a ‘report’ about a core business matter is sentenced another. An audit of stationery might be kept for one year, while an audit of safety equipment might be kept for 12 years. They’re both audits, but they are about more important or less important things.

The fact that it is an audit goes to the class or activity of the record. What it is about goes to the function. Records authorities usually have rules for each distinct Function/Activity pair.

There are usually have dozens, or hundreds, of function/activity rules, in each records authority. Using labels, we would have to present the user with hundreds or even thousands of options for retention for each item they save, which is not feasible.  

So if retention labels aren’t an appropriate way to accurately sentence records, what about retention policies, which can apply at the aggregation (e.g. SharePoint Site) level?

The challenge with this approach is that it requires the administrators (who set the policies) to know what the longest-retention schedule for all the content in the library is expected to be, at the point when the policy is set. This is very similar to a traditional EDRMS model, where the records team (or user) need to predict what the longest-retention content in a whole file is likely to be at its outset.  

Sentencing in this way is challenging. Content and context changes over time, especially at a site, library, or mailbox level, which can change daily. And retention rules also change. The gain and loss of business functions changes applicable core business records authorities; GRAs are updated and refreshed; and freezes and holds are introduced by Archives multiple times per year. The set-and-forget approach has always resulted in records managers being presented with records ostensibly ‘due’ for disposal that have long-since diverged from their original applied sentence, and that need to be reclassified and resentenced, often for a longer retention period.   

AS ISO 15489 – 1:2001 (E): Disposition authorities that govern the removal of records from operational systems should be applied to records on a systematic and routine basis, in the course of routine business activity. No disposition action should take place without the assurance that the record is no longer required, that no work is outstanding and that no litigation or investigation is current or pending which would involve relying on the record as evidence.

If we sentence only once, and don’t resentence until disposal theoretically comes due, records managers have a lot of work to do to make sure that original sentence is still valid. A better approach is to detect whenever the content or the context of a record changes, and update the sentence then, so that when we are presented with a record for disposal, it is actually due for disposal. This is not possible with either M365 retention labels or retention policies.

And a caveat – the M365 model for marking or declaring something as a Record will make that item immutable, and once the “Record” label is applied, it can’t be removed. In the continuum model, we treat our items as records for their whole life, while they are being continually modified. We don’t declare something as a Record only when we are finished with it. So we don’t really have a use for the Record label, and can be tripped up by it if we do use it.

Issue 3: Efficiency challenges and limits of automation in O365

Just like traditional EDRMS, M365 creates some overhead for compliance for users and records managers. The work required to classify records is still done by ‘people power’.

For retention labels, the “correct” sentence has to be selected and applied by each user. But how many labels do we show to our end user? How many is too many? How to we ensure a label gets applied at all? Who checks that the label the user chooses is correct? Can we check every label that is applied to make sure it is appropriate? 

Documents change, and so do rules. Who tracks this, and updates the labels to match? It can only realistically be the records team because users don’t have enough expertise – so we have now generated two types of manual work. First, work for the users to apply the labels, and second, work for records managers to check and update them. (And one very important thing to note – changing the label in M365 will automatically change the Last Modified date of the item, affecting its records integrity, and also going against the Standards).  

AS ISO 15489 – 1:2001 (E): As well as the content … the business context in which the record was created, received and used should be apparent in the record (including the business process of which the transaction is part, the date and time of the transaction and the participants in the transaction).

There is some capability to ‘pattern match’ labels to apply them automatically, based on content type (if selected by the user); matches to sensitive information (again, sensitivity does not usually correlate to records value and can’t be used to determine retention in isolation); key phrase matching (each string needs to be defined by the records team for this to work); or fingerprinting, where the document is a variation on a known template (again, telling us what the document is, but not what it is about).

Retention policies take the item-level work away, and free the users from that burden. However, the records managers now have to pre-determine (and eventually re-evaluate) classifications for each high-level aggregation in the environment. This is essentially just a traditional ‘file plan’ system, which has high overhead for governance teams, and can also cause productivity and workflow impacts if, for example, the records team needs to assign the retention policy for every new site that is provisioned before it can actually be used (and note, a single retention policy can only be applied to a maximum of 100 sites, so if you have thousands of sites needing the same policy, you will need to create dozens of duplicate policies).  

To be fair, there is always a requirement for records managers to evaluate and potentially update a sentence whenever a record comes due for disposition; this is always a partly manual process. It needs to be – machines aren’t allowed to make discretionary decisions (if they do, we end up with Robodebt). And there is a good disposition review process built into Office 365, so that the records team can easily review a sentence before actioning it, to ensure it is still correct.

In M365, retention policies can be set from date created (~30% of the rules in an average records authority), date of last action (~50%), or a certain event trigger (such as a user leaving the organisation). Around 20% of the rules in the average records authority have expiry date triggers, such as a contract being signed, which is more complex than the type of events that can be configured in M365. Also, as users rarely know what specific retention policy would apply to a contract, for example, or when exactly to apply it, it is unlikely that they will be able to use event-driven triggers effectively, even if they were specific enough. This means that, in practice, the retention periods for records won’t be correct in many cases, and records managers will need to defer those disposal actions for a future date.    

Issue 4: The inbuilt Artificial Intelligence and automation is impractical

Office 365 does allow an Administrator to configure automatic or ‘default’ labels to be applied to items and containers, but again, this can cause problems when assigning the correct label.

This is because records authorities and disposal schedules often have hundreds of different Function/Activity pairs to apply, each with separate types of sub-rule. AFDA Express v2 for Federal government administrative records, for example, has been significantly streamlined, but there are still 86 function/activity pairs, and 257 subclasses within those classes. And that’s just one of many Authorities that would be applicable to an agency.

Allowing users to assign a ‘default’ rule to all content in an aggregation increases the likelihood that the sentence won’t be correct. One item could be added to an aggregation that changes the retention requirement of the whole record, and when we allow defaults to be inherited without any appraisal we don’t account for that likelihood.

Alternately, building a set of queries and models to automate the application of these labels is extremely time consuming, and results in a large and complex “rules engine” to maintain the labels and the circumstances in which they apply. Complexity of the rules and algorithms can obfuscate the justification behind the application of a sentence by the machine, affecting ethical principles requiring that machine decisions are explainable, transparent, and auditable.

One other risk with automation in M365 retention is that auto-applied labels can’t be overridden automatically, making it very hard to keep the retention schedule aligned with the actual content of the record as it changes over time.

As another alternative, Microsoft provide some Machine Learning tools. However, these still require the organisation to supply up to 10,000 curated documents per rule to train the automation. As we have seen, just one records authority can have around 250 rules, which can make any project to establish automation (and the ongoing cost to add new rules) extremely expensive and intensive.   

Issue 5: A limited solution – managing records in other systems

Even if we do choose to use M365 in the same way as a traditional EDRMS, with users selecting classifications for each item, records teams managing file plans for each library, and records being sentenced and disposed of outside their business context, potentially using out-of-date rules, we still won’t have achieved compliant continuum management against the Standards.

That is because O365 records management is a point solution – it only works on documents and emails in the O365 cloud.

But under the Standards, and for good governance overall, we need to manage all parts of a record in context. We cannot just manage the documents about the project that happen to be in SharePoint Online – we also need to relate them to the project invoices in the on-premises finance system, the historical information about the project in the file shares, and the project specialist outputs in other line-of-business systems, for example.  

Because of the aggregation rule, the whole ‘meaningful record’ has to be sentenced and disposed of as a unit. Knowing the retention policy for a single document does not tell you how long it needs to be kept – you can only know how long to keep each item if you know how long to keep every item.

When we sentence content, it has to be across systems and formats, so that the whole record is included. 

To Recap

Office 365, particularly the E5 license version, has many powerful and practical capabilities. The recent focus on records management as a fundamental part of the solution is very welcome. However, Microsoft is a US company, which is a jurisdiction that tends towards a ‘life cycle’ model of records management rather than a continuum one, and the solution is designed to support that approach. For those of us who are obligated to apply a continuum model, it’s important to understand the limitations against the requirements of the International Standards.

In summary:  

  • With O365 you can only apply one label, but records will map to multiple retention rules
  • Manual application of retention labels places a significant burden onto (unskilled) users
  • Application of bulk policies requires a complex file plan which records teams must maintain
  • Use of Machine Learning classifiers for rules requires 10,000 curated examples per rule
  • Retention rules need to be updated as the record changes, but are set-and-forget in O365
  • O365 will only manage its own cloud content, but the ‘record’ may span multiple systems.

This means that records continuum organisations don’t have a compliant solution yet in M365. Fundamentally, the model is essentially a traditional EDRMS one, with the user and governance impacts that come with it. And it also does not manage records in their business context, for their lifecycle, as required by the Standards. M365 is a fantastic platform for creating and using business information, but not a suitable system to records-manage it.