Keyword Searches: Great Expectations (and Reality)

Keyword Searches: Great Expectations (and Reality)

The idea behind using a keyword search to augment linear review is far from complex: you use a word or short phrase to identify relevant documents, and then have a reviewer go through them page by page. Theoretically, this method combines the best of both worlds; the efficiency and thoroughness of a computer and the mental capabilities of a human work together so that the process of tagging a document as relevant can be accomplished with a maximum of quality and a minimum of time. However, this dream scenario ignores the fact that most keyword searches are not performed well, and that most reviewers are neither as accurate nor precise as they would like to claim.

So how do you go about bringing the reality more into alignment with the dream? Lawyers need to get educated about the nitty-gritty of keyword search – but nothing too technical – in order to understand what they are actually asking their reviewers and technical staff to do. If an attorney does not understand the basics of inclusion (the query) and exclusion (the filters) and how to use Boolean operators, then they obviously may not be able to get the results they need out of a keyword search. While it may seem intimidating or outside the scope of the attorney’s responsibilities, a little bit of education goes a long way towards improving the review process.

Thinking of keyword search as a sieve may be a helpful analogy. Search queries are often dismissed as poorly crafted because they return a high number of hits. However, depending on the size of the searchable database and the query, a large number of hits may not be a bad thing. If you had a cup of almonds sifted into a gallon of flour, putting this mixture through a fine sieve would return a lot of flour. Yes, the ratio of flour to almonds is high, but at least you’ve separated the mixture into its two constituents, and done so very specifically. Taking the analogy further, it may simply require additional passes through a variety of sieves in order to distinguish whether or not cornmeal is mixed in with the flour. Expecting a single keyword search to definitively identify all relevant and non relevant documents is over optimistic.

Using a test set can also assist if you are unsure about what your query should specifically include and exclude. If the test set is small enough, going through the hits (and not-hits) for different queries can give you an excellent idea of how a query will perform in a larger set of searchable documents, and of how linear review should be focused.

In conclusion, keyword searches need to be approached with realistic expectations and at least cursory knowledge of how and why they work the way they do. It is necessary to know the limitations of a particular methodology before deeming the methodology to be inadequate for the task at hand.

Share this entry

1 Comments

LLM unifies the legal process by combining legal holds, case strategy, matter and budget management, review and analytics in a single, web-based platform. We connect legal strategy to tactics in a way no one else can, so every part of the process is actionable. Our product scales to help corporate and law firm teams gain cost-savings and eliminate inefficiencies.
Send this to a friend