In 2025, Let’s Make Active Learning a Part of Your Team
Happy New Year! It is 2025, and it’s time to use the best AI available, usually for little cost, to assist all document reviews. It is time to apply Active Learning to every review project, every time.
Active Learning is not “generative AI” like ChatGPT (so no need to debate whether it’s biased, or copyright-infringing, or planet-destroying). Active Learning is old(er)-school machine learning: the algorithm looks at the words in each document that we tag as responsive, and looks for patterns of words that tie those documents together, and what differentiates them from documents that are coded non-responsive.
Use of this technology for “TAR 2.0” reviews (explained at sidebar) is already old hat for U.S. courts, and you should be using it. Better yet, you should be using it even when you are engaged in a less formal review, and even consider using it to aid your review of other parties’ productions.
Active learning is the key ingredient of a “Technology Assisted Review” (TAR) “2.0” review strategy. U.S. courts have gotten used to TAR and largely agreed on a framework for conducting TAR 2.0 reviews:
“Subject Matter Expert” reviewer (i.e. senior/midlevel attorney on the case team) starts reviewing and coding. The associate uses searches and known “hot docs” to identify key Responsive documents, and should code a random sample as well.
The algorithm will continuously analyze the results to develop an evolving picture of what patterns and characteristics are likely to make a document Responsive vs. Non-Responsive.
Using the tool, start batching out review sets to your review team that (a) prioritize documents more likely to be responsive, and (b) throw in a few docs where the model does not have enough information to make a decision. The latter category of document will thus increase the “coverage” of its model.
In this way, reviewers scrape all the good stuff floating to the top of the data set, and you learn more about the important aspects of the facts faster.
When such batches are yielding only a handful of marginally responsive documents, pause review and have your experienced associate code a random “validation sample” of the documents remaining unreviewed. If no important documents arise in this sample, you may decide to stop reviewing, because the cost of continuing to review is not proportionate to the value of the responsive documents that might be left behind.
The International Legal Technology Association recently issued guidance for litigation eDiscovery (or “eDisclosure”) in the United Kingdom, presenting a standard process for Active Learning/TAR 2.0 reviews. “By providing a clear, standardised (sic) approach to Active Learning, we hope to eliminate the need for repeated negotiations over methodology and allow legal teams to focus on what matters most – the substantive aspects of their cases,” working group member James McGregor wrote for EDRM.
If you are using a reasonably sophisticated platform*, Active Learning is available to you. It may cost a few PM hours to set up at the outset, and not all vendors are going to offer to do so up-front, especially when you have not yet jumped into large-scale review. But once the machine is humming in the background, even informal coding by case team members will help to “train” it, and a score or rank will be assigned to every eligible document showing how likely it is to be responsive.
Imagine you’re hunting for key deal communications for a Preliminary Injunction motion. You search for the counterparty, and get back a flood of junk, making it harder to narrow down to real deal communications. You can use data visualizers to limit by date, domains, etc., but if you sort the results by Active Learning score, more relevant materials will find themselves at the top, learning from what you have already tagged as relevant!
Here’s why Active Learning should be incorporated into your projects:
Using Active Learning Makes Humans Better. Human reviewers are more engaged when they are looking at relevant, potentially important documents. They become zombies when they’re parsing through a soup of mass emails and other garbage, more likely to miss the Hot document hiding within. Even if you don’t want to “cut off” review, and use the cost-saving features of TAR 2.0, you will still get better work product from review teams by leaving the garbage documents for the end.
Active Learning can be Adapted to the Needs of Your Case. Oh, you might say, but my case is special and complicated! I have six discrete request categories that make a document Responsive, with no overlap. Great! As long as your reviewers are using issue tags along with responsive/nonresponsive coding, you can set up separate models by issue tag. This will let the algorithm to hone in on the clusters of words and phrases that best predict whether a document is responsive to each discrete issue.
Active Learning is a Force Multiplier. If a reviewer codes a single document as non-responsive, then the work they have done applies only to that document. They may see fourteen similar documents in their next batch, and have to code them each non-responsive. But when Active Learning is applied, the act of coding a document is also training the model and down-ranking those similar, non-responsive documents, so that your reviewer won’t have to review them again. By coding one document, you may be affecting the rankings of hundreds, and potentially removing them from the review set entirely.
Active Learning is a Great QC Tool: Before you send out a production, why not look at the production set and ask, which documents in this set have low scores even though they are coded Responsive? Which materials in the compost pile of non-responsive families have very high scores? Taking a look at these is a great tool for improving your production quality and avoiding surprises.
* Relativity and RelativityOne, Reveal, Everlaw, DISCO, and others have their own iterations of Active Learning. Logikcull does not.
Spark is an Everlaw Partner, which means we love working in Everlaw, and serve as a reseller so that we can help set up and manage your workspace for you, and give you access to all Everlaw’s features as a reasonable price. But we’re not exclusive! See more here.
Everlaw calls its Active Learning model Predictive Coding, and like all Everlaw features except Generative AI, it is free to use as part of your hosting charge. (I don’t love the term Predictive Coding, since it sounds like the machine is coding for you. It’s not! It’s just running the probabilities and giving you more information.)
Predictive Coding is in Everlaw’s standard Document Analytics toolset, and there’s no special hoops to jump through to get it set up. There’s a Wizard for that.
All Everlaw’s wizard needs to know is (a) “What do you consider a document that has been Reviewed,” so the model can learn from it; and (b) “What documents are Relevant,” such that the model should look for “more like this.”
You can can set up a model to look for all Responsive documents, or where there is little overlap between issues, models specific to the issues you are coding for:
Bam – done. Enjoy having constantly-updated scores assigned to each document indicating how likely that document is to be Responsive. You can even move these models between projects, to apply your learning from one data set to looking for similar documents in a different data set.
Let’s all decide in 2025 to use this tool with every data set, every time. We can make search results more useful, make reviews more effective, make QC easier – just by letting Active Learning be a member of our team.