At this moment, communication is more instantaneous and readily available than at any other point in recorded history, and "recorded history" grows exponentially with each passing year. Across the globe, we collectively produce over 2.5 quintillion bytes of new data every day. Over 90 percent of the world's data was created in this decade alone.
It is an undeniable irony that the explosion of communication allowed by modern technology in turn imposes an increasing number of obligations regarding data management and information governance. While the amount of recorded data and the number of communications continues to increase, the standard for discoverable information in a legal dispute under Rule 26 remains largely the same: "any non-privileged matter that is relevant to any party's claim or defense and proportional to the needs of the case." The proliferation of communication enabled by advancing technology is myriad — email, text, instant message, internet posting, chats — and all these communications are subject to discovery.
As we create more data, we must also preserve, review, and produce more data in the context of litigation. Fortunately, the same technology that drives these enormous data sets also provides tools for parsing and analyzing them. Advances in analytics and AI continue to push forward the boundary of what is possible, but an attorney's best tool for efficient analysis is well-established and an early entrant into the analytics arena. Despite the continued emergence of new communications technologies, the most cumbersome part of an eDiscovery review is generally sifting through email messages. Email collections represent the lion's share of collection in most matters, and a vital tool for cutting through the volume is email threading. Email threading is both a process for the organization of data sets and a tool for the reduction of data volume through what is essentially content deduplication.
Email threading is an established practice that should now be part of every eDiscovery attorney's standard workflow, whether it is applied to narrow the scope of review and production, or simply used as a touchstone around which to organize the review workflow.
In threading, a computer program analyzes both the metadata and the text of documents in a data set, assigning each communication a thread group value. The thread group consists of related communications, replies, and forwards to an initial message, plus any attachments. Email programs can perform the same function at the user level. For instance, Microsoft Outlook allows users to filter to "find related messages in this conversation,” and many email programs group conversation threads as the default view.
Email conversations can span weeks, months, or even years, and recipients may drop off or be added as the conversation continues. Threading analytics tools can identify branching conversations by using both metadata (comparing subject lines, dates) and text recognition (comparing the relative weight of message content) and treat these branches accordingly as separate thread groups. Once the thread groups are identified, the threading tool can then identify unique content, elevating it to the top of the review queue.
At this point, threading becomes akin to deduplication. If one has a set of the email correspondence in which two people are emailing back and forth, then the latest-in-time email should contain all the previous correspondence, inline below the latest communication. Therefore, reviewing the earlier-in-time documents won't necessarily add any new information to the review.
In litigation, if both parties agree, counsel can review only the “inclusive” email communications, which consist of the latest-in-time emails plus any lesser included emails that contain attachments. It is important to review attachments with their transmittal emails, for full context. The reduction in review volume that results from threading and review of only inclusive material can represent significant cost savings to the client and can also save valuable time and effort on the part of the attorneys involved. This is especially important in expedited litigation.
While email threading is a powerful tool, it is important to account for any material excised from review. If only the inclusive email is produced, the content of the non-inclusive emails will be visible on the face of the production image. However, the metadata for those communications will not necessarily be included in the production. Because metadata is a discoverable component of a document, parties may agree to provide metadata exports for unproduced documents, especially if a document chain is withheld due to privilege.
Even if parties cannot agree on an email threading scheme at the outset of discovery, the threading process is an invaluable tool for organizing a document review. When a single attorney reviews an entire email thread, consistency of coding across the set is better ensured. In addition, reviewers are likely to move through material faster when they are familiar with the context of the entire thread. Then, on the second pass level, the attorney can review only the inclusive email, and then any coding corrections can be pushed down through the It is an undeniable irony that the explosion of communication allowed by modern technology in turn imposes an increasing number of obligations regarding data management and information governance. thread. This practice can help streamline reviews, reduce time and cost, and produce more consistent results.
It is important to remember that threading is generally relative to the data set as a whole, and not inherent in the document data itself. It is entirely context-specific. For instance, opposing productions may contain the same documents, but the thread values assigned to each set upon threading by their respective counsel will be different. Thread ID numbers derived from separate applications of threading will provide no insight into the relative relationship between documents across those sets, even if the sets contain exact duplicates. If Plaintiff sent an email to Defendant, both parties' productions would contain that email, but any thread value associated with each instance of the document would have no bearing on the opposing production. Additionally, threading must be updated as collection sources are added, whether they be new custodians entirely or refreshed collections. The best practices for email threading are well established, and your discovery vendor will be able to assist you through the process.
While email threading adds some complexity to a review workflow at the outset, the savings in time and cost of review make this analytical tool a powerful and important option in an eDiscovery attorney's toolbox.
Reprinted with permission, originally appeared in DSBA Bar Journal, a publication of the Delaware State Bar Association.