eDiscovery Simplified: Commonly Used Terms

© Rido / Fotolia
© Rido / Fotolia

We can guarantee you’re not alone if you have ever found yourself lost or confused when dealing with your case’s electronic discovery. Veterans to the e-discovery world and eDiscovery vendors, like Avalon, know this stuff like the back of their hand. They know how to talk to talk and walk the walk when it comes to e-discovery, but to some it is like a foreign language. It has also taken them years to learn this stuff!

Avalon handles eDiscovery projects day in and day out. Our dedicated team of eDiscovery specialists and project managers has developed a list of commonly used terms to lend a helping hand to those who may be e-discovery beginners or those who maybe just need a refresher.

While we would love for this list to be all inclusive, our eDiscovery team would have been working on this list for weeks. If there is not a term or process listed that you have questions about, by all means contact us, we are always glad to help! And don’t forget Avalon offers a wide range of eDiscovery services, both individually or in a subscription based package. Happy learning!

Assisted Review– This method of review utilizes advance machine learning, including predictive coding, in order to apply reviewers’ coding decisions to large amounts of data.   A true force multiplier!

Batching– The process of gathering large amounts of electronically stored information together in batches. Typically this process is done so that documents can be allocated to reviewers for tagging.

Big Data – Term used to describe data sets so large and complex that it becomes difficult to process them using traditional data processing applications. Learn more in our blog article What is Big Data?

Boolean Search – This technique is used to connect individual keywords or phrases with a single query, used to avoid false positives, and accurately pinpoint documents of interest. Typical connectors are terms such as AND, OR, and NOT.

Child Document – Refers to a file that is attached to another document. An example would be an email attachment or a graph embedded in a word processing document.

Coding – The method of entering fields of information from a document and saving them in a format that will be linked to that particular document within a database.  There are different types of coding – objective and subjective. Objective coding is coding applied by anyone who can read the language of the document such as date on a scan.   Subjective coding requires knowledge of the underlying investigation such as “Is this good for our case? “

Concept Search – A method of searching for files not based on keywords, but on the subject matter of the document, paragraph, or sentence.  This is different to keyword searching which requires an exact keyword hit. Learn more and see examples in our blog eDiscovery Technology Updates.

Container File – This is a single file that contains multiple other files or documents. A common container file would be a zip file. Container files are typically used due to their considerably smaller file size. Extracted contents are usually anywhere from 50% to 250% larger in size that the original container file.

Culling – The process of eliminating files from a collection of electronic files to reduce the number of documents to be reviewed. Culling techniques include de-duplication, near-de-duplication, email thread analysis, deNISTing and filtering.  Avalon offers all of these culling techniques; learn more by reading our blog Early Case Assessment, Where Have You Been All of My Life?

Custodian – Refers to the individual who has electronically stored information relevant to the pending litigation. This information typically includes emails either to or from the custodian regarding the matter.   Contrary to popular belief, custodian is not a metadata value and it is important that the owner provides their e-discovery vendor with custodians whenever possible. The custodian information is then collected and analyzed to be used in the litigation.

Data Extraction – Refers to the process of breaking down data from electronic documents to identify their metadata and body contents.

Data Mapping – The process of creating a “map” to identify and record both the location and the type of information that is available within in an organization’s network.

De-Duplication – The process of comparing electronic records based on their characteristics to identify and remove duplicate records from the data set, reducing review time and increasing coding consistency.  Learn about different types of de-duplication in our blog Early Case Assessment, Where Have You Been All of My Life?

De-NISTing – Refers to the process of filtering out files that appear on the NIST list to reduce overall processing and review costs.  The US National Institute of Standards and Technology routinely publishes a list of digital fingerprint values for known system files. The De-NISTing process identifies these files so that a decision can be made if they should be set aside or removed from a discovery database.

Discovery – The process of identifying, acquiring, and reviewing information that is potentially significant to the matter and producing information that can be utilized as evidence in litigation.

Document Family – Defined as a group of documents that is connected to each other for purposes of communication. An example would be an email and its attachments.

E-mail Threading – The process of compiling all the emails in your dataset and organizing them into conversations.  The basic premise is, for example: Joe emails Steve, they then reply back and forth to each other twenty times and then, in the middle of that, they forward the messages to a few others who then join in the conversation.   Threading can dramatically increase review speeds of email data by having the entire conversation reviewed by one reviewer as well as the ability to read the final inclusive email as opposed to all of the conversation pieces separately.

Early Case Assessment – Described by a variety of tools or methods for investigating and quickly learning about document collection for the purposes of estimating the risks, costs, and time spent pursuing a particular legal course of action. Learn more in our blog article Early Case Assessment, Where Have You Been All of My Life?

Electronic Discovery – The process of discovery in civil litigation, in which electronically stored information is identified, collected, prepared, reviewed and produced. It is also referred to as e-discovery.

Electronically Stored Information (ESI) – ESI is information created, altered, communicated, stored, and best utilized in digital form, requiring the use of computer hardware and software. Learn more on ESI and read our blog articles Can Your Firm Handle its Own ESI? 

Filtering – The process of using certain parameters to remove documents that do not fit within those parameters in order to reduce the volume of the data set.

Forensics – A handling of electronically stored information in a way that confirms its authenticity, so that the information can be used as evidence in a court of law.

FRCP – An acronym for Federal Rules of Civil Procedure, which governs e-discovery and other elements of federal civil litigation. Most state courts will have their own rules and generally based upon elements of the FRCP.

Harvesting – Also referred to as the collection of ESI. Harvesting is the method of gathering electronic data for future use in your investigation or lawsuit, preferable while maintaining file and system metadata. 

Hosting – Defines a service provided by a third party litigation support firm that provides access to documents relating to a particular matter within a review software platform. The platform can be accessed via the internet by logging in with a username and password.  

Legacy Data – Data whose format has become obsolete making it difficult to access or process.

Legal Hold – Also known as a “preservation order” or “hold order.” A legal hold is the temporary interruption of a company’s document retention and destruction policies for data that might be relevant to a law suit or data that is reasonably anticipated to be significant. Rules on when a legal hold should be applied can vary greatly so it’s best to consult your attorney before applying.

Load File –A file that is used to import data into an electronic discovery platform after processing.

Metadata – Simply put it is data about electronic data. All electronic documents contain other information about the document that is not necessarily on the surface of the document.

Native Format – The format in which an electronic file was originally created in and maintained in said format. A native file format sustains metadata and other details that can be absent when documents are converted to other formats.

Near-duplicate – Documents that contain a high percentage of the same content are referred to as near-duplicates. During the data reduction process near-duplicates are identified thus reducing the time and costs associated with review. Near de-duplication, unlike de-duplication, will involve some subjective decisions by the client so it’s best to discuss implementation with the opposition before applying.

Normalization – The process of reformatting data so that it can be stored in standardized format.

OCR – An acronym for Optical Character Recognition. OCR is defined as the process of converting printed images and copy into machine-encoded electronic text. The method of digitizing printed texts is commonly used so that documents can be electronically edited and searched.

Parent Document – A document to which other documents and files are attached to.

Personal Storage Table (.pst) – A file format used to store copies of messages, calendar events, and other items within Microsoft software (like Microsoft Outlook, Microsoft Exchange Client, and Windows Messaging) – or,  in the most basic of terms, it’s how Outlook stores your email. Read more about .pst files in our blog article .PST File Review – What You Don’t Know Can Hurt You (& Your Case)

Precision – In search results analysis, this is the measure of the level of relevance to the query in the results set documents.

Predictive Coding – This coding process is the combination of machine-learning technology and work flow methods that use keyword search, filtering, and sampling to automate portions of an e-discovery document review aiming to reduce the number of non-responsive and irrelevant documents. Learn more by reading our blog What You Need to Know About Predictive Coding

Processing – Is the e-discovery workflow that formats collected ESI so that it can be culled and searched in a review tool. Processing can differ depending on the application being used. Typically processing includes the extraction of files from folders (.pst, and .zip formats), separation of attachments, conversion of files to formats the review tool can read, and extraction of text and metadata.

Production – The delivery of documents and electronically stored information, to the opposing counsel or requesting party, which meets the criteria of the discovery request.  Typically involves producing the documents as hard copies, on CD/DVDs, or on hard drives to the other party(s).

Recall – In search results analysis, recall is the measure of the percent of total number of relevant documents in the quantity returned in the results set.

Redact – To redact a document is to deliberately cover portions of the document that are considered privileged, proprietary, or confidential. This is usually done by “blacking-out” or “whiting-out” the copy that is to be concealed.

Social Discovery – Defined as the discovery of electronically stored information on the various social media sites used today, including but not limited to: Facebook, Twitter, YouTube, LinkedIn, and Instagram. Learn more about social discovery in our blog: The Digital Fingerprint. Strengthening Your Case With Social Media Discovery

Spoliation – Defined as the destruction or alteration of data that may be pertinent to a legal matter. Spoliation generally will not apply to destruction of data during the normal course of a pre-set retention policy.   If there is any doubt on when to destroy electronic data, consult your attorney.

Structured Data – Data stored in a structured format such as a database.

System Files – An electronic file that is part of the operating system or other control program. These files are created by the computer, not the user of the computer. The most popular system files on a Windows computer include msdos.sys, io.sys, ntdetect.com and ntldr.  Learn about system files in our blog Early Case Assessment, Where Have You Been All of My Life?

Tagging – The process of assigning classifications, such as by relevance or privilege, to one or more documents.

TIFF – An acronym for Tagged Image File Format, and is a common graphic file format for storing bit map images. TIFF images are also the most common file formats for scanned hard copy documents.  Learn more in our blog Don’t TIFF and Tell.

Unicode – The code standard that prepares for uniform representation of character sets for all languages. It is also referred to as double-byte language.

Unitization – The process of splitting image files received in multiple page formats down into individual ‘documents’.

Unstructured Data – Data that is unstructured refers to information that does not exist in the usual row-column database. These text and multimedia data files, such as webpages, videos, audio files or videos, lack the ability to be organized effectively within a database, hence the name “unstructured.”

If you liked this blog you might also be interested in reading: 4 Reasons Your Firm Should Outsource eDiscovery


Find out how you can save by consolidating eDiscovery services.



Case Study: Corporate Managed eDiscovery




Leave a Reply

Your email address will not be published. Required fields are marked *