A Break from Reality: Modernizing Authentication Standards for Digital Video Evidence in the Era of Deepfakes

69 Am. U. L. Rev. 1945 (2020).

* Senior Staff Member, American University Law Review, Volume 70; J.D. Candidate, May 2021, American University Washington College of Law; B.A. History, 2009, Princeton University. I would like to thank the Law Review staff for their tireless assistance with this piece and my family of law school friends who have made this law school experience irreplaceable. Finally, I am forever grateful to my parents, who have never failed to lead the way.

The legal standard for authenticating photographic and video evidence in court has remained largely static throughout the evolution of media technology in the twentieth century. The advent of “deepfakes,” or fake videos created using artificial intelligence programming, renders outdated many of the assumptions that the Federal Rules of Evidence are built upon.

Rule 901(b)(1) provides a means to authenticate evidence through the testimony of a “witness with knowledge.” Courts commonly admit photographic and video evidence by using the “fair and accurate portrayal” standard to meet this Rule’s intent. This standard sets an extremely low bar—the witness need only testify that the depiction is a fair and accurate portrayal of her knowledge of the scene. In many cases, proponents’ ability to easily clear this hurdle does not raise concerns because courts rely on expert witnesses to root out fraudulent evidence; thus, although the fraudulent evidence might pass the fair and accurate portrayal standard, it would later be debunked in court.

The proliferation of deepfakes severely complicates the assumption that technological experts will be able to reliably determine real from fake. Although various organizations are actively devising means to detect deepfakes, the continued proliferation and sophistication of deepfakes will make debunking fake video more challenging than ever. Witnesses who attest to the fair and accurate portrayal standard will likely not be able to identify subtle but important alterations in deepfakes. As a result, fraudulent evidence, authenticated through the Rule 901(b)(1) standard, will increasingly enter courtrooms with a decreasing ability for witnesses and courts to identify fakes. Because the technology to detect deepfakes lags behind the creation methods, deepfakes present a critical threat to courtroom integrity under the current standard.

The rising probability that juries see fake videos warrants a higher burden on the proponent of video evidence. Requiring additional circumstantial evidence to corroborate video evidence is a small but crucial step that will mitigate, but not solve, the coming deepfakes crisis. Further engagement around this topic is necessary to address the deepfakes crisis before it creates irreparable harm.

“[R]eality is not external. Reality exists in the human mind, and nowhere else.”

—George Orwell¹

Introduction

Artificial intelligence and machine learning have enabled unprecedented leaps in mankind’s capability to solve the most pressing issues of the twenty-first century.² Programmers and doctors have worked together to create artificially intelligent programs that synthesize data from millions of patients to diagnose illness with greater precision and speed than ever before.³ Soon, self-driving cars will relieve humans of the deadliest threat on our highways (ourselves).⁴ However, notwithstanding the tremendous promise of improvement that artificial intelligence brings to our world, future generations may someday remember December 2017 as a seminal moment of the digital age that exposed the danger of advanced technological capabilities. As an internet technology website, Motherboard, first reported with great despair, in December 2017, a Reddit user with the online handle “deepfakes” created a series of videos utilizing new techniques that grafted the faces of several well-known actresses into pornographic videos.⁵ Reddit, along with several pornographic websites, quickly featured explicit videos in which Daisy Ridley, Gal Gadot, and other actresses had never actually appeared.⁶

The level of sophistication of this technology was still blossoming; Motherboard reported that “[i]t’s not going to fool anyone who looks closely. Sometimes the face doesn’t track correctly and there’s an uncanny valley effect at play, but at a glance it seems believable.”⁷ However, over the past several years, “deepfakes”—colloquially named after the otherwise unidentified Reddit user who circulated fake pornographic videos—have evolved from videos whose alterations are reasonably discernible by the naked eye to fakes that are challenging for both the human eye and machine detection software to distinguish from real videos.⁸ This progression is predominantly due to the advancement of processes for creating deepfakes that use machine learning programs to continuously improve the fidelity of the videos and render increasingly lifelike representations.⁹

The coming proliferation of deepfakes has created no shortage of alarms in the legal, political, and social spheres, in which scholars predict countless challenges to organized society, ranging from celebrity harassment to political and governmental manipulation.¹⁰ Some scholars have already rushed to address regulatory challenges that deepfakes pose and identify civil remedies for victims of deepfake videos.¹¹ For example, many state privacy torts do not account for artificial rather than actual depictions of the victim,¹² and First Amendment precedent is ill-equipped to deal with the expression of non-obscene but nonetheless manipulative fake videos.¹³ However, despite some recognition that fake video is an imminent threat to courtroom integrity, lawmakers have done little to address the manner in which our evidentiary standards for authenticating photographic and video evidence must adapt to counter this threat.¹⁴

This Comment addresses the need for heightened evidentiary standards to counter the dangerous consequences of deepfakes, a need that is likely to become a central focus to our judicial process as prosecutors, plaintiffs, and defendants all turn to the courts to redress the threat and harms that deepfakes cause. Courts currently rely on an evidentiary standard that assumes authenticating witnesses have sufficient personal knowledge to attest to a photograph’s or video’s authenticity;¹⁵ this standard is now inadequate to meet the intent of the Federal Rules of Evidence. Recent amendments to the Federal Rules of Evidence in 2017 aimed to address the growing influx of electronic media, such as social media posts or websites, into courtrooms.¹⁶ However, the 2017 amendments did not replace or circumvent existing authentication requirements; instead, they allow the proponent of the evidence to offer authentication by certification rather than demanding witness testimony, which can be both costly and time-consuming.¹⁷ Since the 2017 amendments, deepfakes have burst into the national consciousness, and their potentially devastating consequences demand further examination into the authentication standard for photographic and video evidence. Ultimately, current authentication standards for photographs and video fail to account for the inability of witnesses, even those present at the scene depicted, to determine reality from forgery.

Part I of this Comment explores deepfake video creation and the unique difficulty in authenticating or debunking them. The novel creation process that utilizes machine learning networks not only enables extraordinarily high-fidelity forgeries but also severely complicates detection capabilities. Part I also introduces the psychological effect known as suggestibility, which makes deepfakes especially dangerous because of the human memory’s susceptibility to recall events that never happened, compounding the deepfakes problem. Part II outlines the current legal standard that courts use to lay a foundation for the authenticity of video evidence to satisfy the requirement of Rule 901(a) of the Federal Rules of Evidence, primarily through Rule 901(b)(1) or Rule 901(b)(9).

Part III argues that, because of the high fidelity of deepfakes, witnesses no longer meet the recollection element of the personal knowledge standard established by Rule 602 to act as a witness with knowledge to testify that a video is a fair and accurate portrayal of a scene. Witnesses can only attest to the fair and accurate portrayal standard by augmenting their recollection with speculation, and because of the psychological effects of suggestibility, are likely to believe the gaps that their memories have filled. Combined with the conflation of illustrative and substantive evidence that photography and video creates, courts are likely to admit substantive evidence for a jury to consider under far lower standards than Rule 901(a) intended. This Comment recommends a new addition to Rule 901 to establish a foundation of authenticity outside of the presence of the jury to mitigate the risk of unfair prejudice. This recommendation aims to alleviate the problem of deepfakes in the courtroom but admittedly does not solve the problem entirely.

Lastly, this Comment concludes that the current legal standard for establishing a foundation to authenticate videos fails to meet the original intent behind the evidence rules of authentication in light of new and continuously developing photographic and video technology. Transitioning to a heightened evidentiary standard is necessary to anticipate the upcoming deepfakes crisis in our courtrooms, rather than reacting to it as the technology permeates our society.

I. Deepfakes Background

Anyone remotely familiar with graphic design can attest to the relative ease with which various programs, such as Adobe Photoshop, can modify digital images. In fact, a post on Adobe’s blog “Adobe Life” invites Photoshop users to “reimagine[] reality.”¹⁸ The technology behind deepfakes, however, elevates this ability to a level previously unreachable for mainstream graphic design programs. Understanding how deepfakes technology ushers in a new era of manipulation requires grasping two concepts: first, how the creators use machine learning algorithms to generate videos with human likenesses at unprecedented levels of fidelity, and second, how this creation process frustrates current methods of determining real from fake.

A. Deepfakes Creation Through Generative Adversarial Net Machine Learning Cycles

The use of advanced machine learning techniques to create fake videos burst onto the scene in December 2017.¹⁹ The near-apocalyptic journalism that followed Motherboard’s exposure of the exploits of the “deepfakes” user on Reddit quickly caught the attention of technology commentators,²⁰ mainstream news outlets,²¹ and the government.²² Although the concept of doctoring digital photography (or other evidence, for that matter) is not new,²³ the budding creation process behind deepfakes enables creators to mimic reality in a devastatingly realistic fashion.

At the core of this new technology is a process called a generative adversarial net (GAN). University of Montreal Ph.D. student Ian Goodfellow led a 2014 scientific paper that first introduced GAN models.²⁴ In the paper, the authors articulated a process in which two machine learning algorithms are simultaneously pitted against one another.²⁵ One of these programs is a generative model that creates new data samples; the other, known as a discriminator model, evaluates this data against a training dataset for authenticity.²⁶ The discriminator model estimates the probability that the sample came from the generative model (a machine creation) or sample data (a real-world reference).²⁷ These models are known as neural networks because they mimic organic brain function, with interconnected nodes layered to process information far more vast and complex than traditional computer algorithms.²⁸ These two neural networks operate in a cyclical fashion and learn from each other—the generative model program is learning to create false data, and the discriminator model is learning to identify whether the data is artificial.²⁹ The result is a process by which each element of the GAN model learns the other’s methods in a “constant escalation”;³⁰ the generative model constantly improves its ability to create data sets that have a lower probability of failing the detection algorithm as the discriminator model learns to keep up, a process that continuously improves the fidelity of the creation.³¹ This continuous process enables the generative model to build a dataset that avoids the pitfalls that would normally give away a fraud.³²

There are countless commercial and consumer applications of GAN technology. Chris Nicholson³³ has aptly described the breadth of GAN’s incredible scientific potential, stating that “[GAN models] can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, prose. They are robot artists in a sense, and their output is impressive—poignant even.”³⁴ The artistic applications are endless. Some fields, such as the film industry, have already employed ultra-lifelike human likenesses using a variety of methods.³⁵ Researchers are also developing GAN technology for commercial purposes such as enabling shoppers to picture what an article of clothing looks like on a particular person (without the burden of actually trying it on)³⁶ or devising stronger encryption techniques to protect confidential information and communications online.³⁷

Naturally, as benign use of the technology spreads, the dark side of video manipulation is accelerating with equal speed as GAN modeling becomes more widely accessible to those with less noble intentions.³⁸ Actor and director Jordan Peele created deepfake videos of Barack Obama making speeches that never happened to highlight their danger to civil society.³⁹ Politicians are a natural target for deepfake creators because of the volume of publicly available photographs and videos of politicians for the creators to utilize. Malign creators, whether domestic or foreign, can use deepfakes to further drive America’s political polarization and create the sort of “dystopia” that Jordan Peele warned of in his message.⁴⁰

Further, despite Reddit’s and several pornographic websites’ efforts to ban deepfake pornography,⁴¹ malicious actors can still create and distribute deepfake celebrity or otherwise nonconsensual pornographic material in other less regulated corners of the internet. As the software to create lifelike deepfakes proliferates, the degree of difficulty and the skill required to create such videos is dropping, leaving convincing and powerful weapons in the hands of a larger number and greater variety of malevolent actors.⁴²

B. The Challenge of Finding Reliable and Lasting Detection Methods

As GAN programming continues to develop and expand, the ability to detect deepfakes becomes increasingly important in a variety of disciplines. The challenge of reliably and consistently detecting deepfakes further evinces the new era of digital forgery that they have ushered in. The challenge stems from the constantly evolving and cyclical method of deepfake creation.⁴³ The very process that programmers use to create deepfakes relies on incorporating algorithms designed to detect the subsets of data that do not match sample data sets provided to the discriminator model; this cycle’s purpose is to root out inconsistencies.⁴⁴ This process therefore features a unique defense against programs that detect the frauds—any time a new method of determining whether a video is fake emerges, deepfake creators can use that to their advantage in the GAN cycle.⁴⁵

For example, Associate Professor of Computer Science Siwei Lyu of the University at Albany conducted a study in 2018⁴⁶ on the then-current state of deepfake technology with the intent of attempting to pinpoint the reason that the fake videos “felt eerie to him, and not just because he knew they[] [had] been ginned up.”⁴⁷ Professor Lyu identified one of the signs that a human likeness had been artificially created: there was something wrong with the way that the human depictions blinked.⁴⁸ The faces depicted in the deepfakes did not “open and close their eyes at the rates typical of actual humans” because the GAN model simply did not “get blinking” (at least not yet).⁴⁹ Professor Lyu’s paper was a breakthrough in fake video detection by using forensic programs to catch “spontaneous and involuntary physiological activities such as breathing . . . and eye movement, [which] are oftentimes overlooked in the synthesis process of fake videos.”⁵⁰ For the time being, Professor Lyu had struck a major victory against deepfake creators.

However, while Professor Lyu’s success certainly challenged the forgers by rooting out the flaws in their product,⁵¹ the victory was nonetheless muted by the very nature of the deepfake process. Not long after publishing the paper, Lyu’s team began to receive anonymous emails that contained deepfake videos whose stars blinked more normally and therefore passed the detection tests his team had created.⁵² The creators had incorporated a means of detection that the discriminator algorithm had previously not accounted for strongly enough and provided additional reference points for the algorithm to learn from (for example, pictures and videos of humans with their eyes closed, which were underrepresented in the sample data).⁵³ The discriminator then did a better job policing the generative model’s fakes, essentially teaching the generative model how to overcome its prior weaknesses.⁵⁴ The short-lived success of the detection program actually made the forgery mechanism stronger.⁵⁵ The result is an “arms race between the creators and the detectors.”⁵⁶

Through a program called Media Forensics (MediFor), the Defense Advanced Research Project Agency has been following the challenge of deepfake emergence since even before the videos’ namesake Reddit user popularized the concept in December 2017.⁵⁷ Among MediFor’s lines of effort is an automated system designed to create an “integrity score” for an image or video, in which the content of the video is compared against a variety of external empirical facts to root out inconsistencies.⁵⁸ Efforts such as these will always be chasing the forgers, and their breakthroughs will always provide ammunition to the GAN models.⁵⁹ Every data point that gives up a video as fake (such as weather reports to cross-reference against the scene or incorrectly angled shadows that are incongruent with the position of the sun) is a source that deepfake creators can account for by tapping into those data streams for future videos.

C. Fake Video’s Significant Psychological Effects on Viewers

Fraudulent evidence has always been a concern for courtroom integrity. Yet deepfakes raise an even greater level of concern due not only to their ability to seem real, but also to their impact on viewers. The threat that deepfakes pose to courtroom factfinding is not solely due to the high-fidelity human likenesses that are difficult to detect. The nature of viewing video elucidates psychological effects in which people actually believe that they remember things that they did not actually perceive.⁶⁰ This combination is extremely dangerous to witness reliability.

Although many conceive of human memories as an internal video playback system, various studies have shown critical vulnerabilities in our ability to recall memories accurately.⁶¹ Memory is more comparable to “putting puzzle pieces together than retrieving a video recording,”⁶² and is therefore subject to a range of “potential mischief” from both internal and external sources.⁶³ There are a variety of psychological limitations on the accuracy of human memory; the most relevant to deepfakes is “suggestibility.”⁶⁴ Suggestibility is a phenomenon that causes a person to implant memories as a result of leading questions, narratives, or visuals when attempting to recall a past experience.⁶⁵ Due to suggestibility, reconstruction of an experience in the context of prepared materials or leading questions intended to help tell a desired narrative “can cause the witness’[s] memory to change by unconsciously blending the actual fragments of memory of the event with information provided during the memory retrieval process.”⁶⁶

Video exacerbates suggestibility’s effect on memory. In 2010, researchers at the University of Warwick conducted a study illustrating the psychological effect that video has on reconstructing personal observations.⁶⁷ The researchers placed sixty college students in a room to engage in a computerized gambling task.⁶⁸ Following completion of the task, researchers individually showed each subject digitally altered video depicting a co-subject cheating, when in fact none of the subjects had actually cheated.⁶⁹ Nearly half of the subjects were willing to testify that they had personally witnessed a co-subject cheating after seeing the fake video; only one in ten was willing to testify to the same effect after the researcher merely told the subject about the cheating, rather than showing the fake video evidence.⁷⁰

Consequently, deepfakes can have a devastating effect on courtroom integrity. If a party submits a deepfake video to the court, its deceptive harm is not limited solely to the video itself. The lies embedded within a fake video cascade into other portions of the proceedings; viewing fake videos is likely to affect the testimony of witnesses concerning their recollection of events.⁷¹ The legal standard to admit video evidence into a courtroom for a jury to see is unfortunately ill-equipped to address this level of risk.

II. Authenticating Photographic and Video Evidence

While technology generally outpaces the law, it is imperative to discern whether the contemporary legal framework is sufficient to address the potential harm that technological advances present. Some scholars and commentators have grappled with the interplay of deepfakes with privacy law, First Amendment rights, and regulatory challenges.⁷² Additionally, deepfakes bring the possibility of unprecedented levels of distrust in the government and other public institutions if videos emerge featuring public figures saying or doing things that never happened.⁷³ Among the challenges specific to trust in public institutions is that which courtrooms will face in light of the current standards used to admit digital photography and video as evidence.

Common law standards initially governed the admissibility of photographic and video evidence; the McKeever test, originally a standard for admitting audio recordings, stood as a model for admissibility for decades.⁷⁴ The McKeever test began as a strict standard, but it eventually became more flexible as photographic and video evidence became more common in courtrooms.⁷⁵ The McKeever test later gave way to the Federal Rules of Evidence, which codified the test’s main components.⁷⁶ As states codified their own evidence standards based on the Federal Rules of Evidence, courts began to use two theories—the pictorial communication theory and the silent witness theory—to authenticate photographic and video evidence under Rule 901(b).⁷⁷ This Part discusses the history of the standard of admissibility of photographic and video evidence, two common theories under which courts admit such evidence, and the guide that Federal Rule of Evidence 602 provides for authenticating such evidence.

A. The Evolution of Photographic and Video Evidence Authentication

Suspicion of the susceptibility of photographic and video evidence⁷⁸ to modification or tampering is nothing new to courtrooms; courts have articulated their concerns over photographs and motion pictures since the invention of photography, and such concern continued as photography became more prevalent in society.⁷⁹ The modern standard for video authentication prior to admission initially mirrored the strict standards that courts used for sound recordings.⁸⁰ For decades, courts used the seven-part McKeever test⁸¹ as the standard to admit sound recordings as evidence.⁸² The McKeever test required the proponent to establish authenticity based on seven elements at a hearing prior to admission and was eventually expanded to include video evidence.⁸³

As photographs, motion pictures, and recordings became more familiar and common in daily life, their use in court expanded.⁸⁴ Accordingly, courts loosened the McKeever test over time and eventually set it aside in favor of more lenient standards.⁸⁵ Interpreting the McKeever test as “a guide rather than a rule,” and adopting more relaxed tests, courts determined that trial judges should have “wide latitude” to determine whether a proponent of recordings had laid a sufficient foundation for a reasonable jury to conclude that it was authentic.⁸⁶

The authentication standard eventually transitioned from the common law to codification after Congress passed the Federal Rules of Evidence in 1975 after decades of study, delay, and deliberation.⁸⁷ The rules reflected the standards for admissibility of videos that courts had adopted since relaxing the McKeever test: relevance (codified in Rule 401), probative value balanced against undue prejudice (codified in Rule 403), and accuracy (codified in the sufficient to support a finding standard in Rule 901).⁸⁸ Forty-two states have adopted the Uniform Rules of Evidence (based on the Federal Rules of Evidence).⁸⁹

The authenticity of evidence is ultimately a factual determination for the trier of fact (typically, but not necessarily, a jury) to evaluate.⁹⁰ However, before a court admits evidence for the jury to consider, the court “must determine whether its proponent has offered a satisfactory foundation from which the jury could reasonably find that the evidence is authentic.”⁹¹ The process by which a judge addresses proper foundation for authentication does not itself establish evidence as authentic; the jury is still responsible for the ultimate determination of authenticity and therefore credibility.⁹²

Rule 901(a) states that to establish a proper foundation for authentication evidence, “the proponent must produce evidence sufficient to support a finding that the item is what the proponent claims it is.”⁹³ While Rule 901(a) is not particularly specific in its mandate, Rule 901(b) provides a variety of means through which a party can satisfy Rule 901(a), such as nonexpert opinions about handwriting or evidence derived from public records.⁹⁴ Rule 901(b), however, is not exhaustive; there are other means of satisfying Rule 901(a)’s sufficient evidence standard, such as through circumstantial evidence that provides indicia of authenticity.⁹⁵

B. Theories of Authenticating Photographic and Video Evidence

In alignment with Rule 901(b)’s various means of authenticating evidence, courts typically admit photographic evidence under one of two theories: the “pictorial communication” theory and the “silent witness” theory.⁹⁶ Each theory utilizes a different sub-section of Rule 901(b) to meet Rule 901(a)’s sufficient evidence standard for authentication.⁹⁷

The logic behind distinct foundational standards for the pictorial communication theory and silent witness theory hinges on the intended purpose of substantive as opposed to illustrative evidence. Substantive evidence provides an “independent probative value for proving a fact,” such as a physical object recovered from a scene relevant to the case.⁹⁸ Illustrative evidence, on the other hand, accompanies witness testimony and is intended to “aid the trier [of fact] in understanding the witness’s testimony.”⁹⁹ The distinction is important but problematic in the context of photographs and videos because illustrative evidence often becomes substantive by showing the jury more than the witness can recollect or convey, thereby introducing independent, substantive evidence for which there is no foundation.¹⁰⁰ Nonetheless, the pictorial and silent witness theories derive their separate standards from the supposition that illustrative evidence is limited to the perceptions and recollections of the witness’s testimony.¹⁰¹

1. The pictorial communication theory

Courts most commonly admit photographic evidence as illustrative evidence, intended to accompany a witness’s testimony.¹⁰² This application of photographic evidence is known as the pictorial communication theory, in which photographic evidence is intended to be viewed “merely as a graphic portrayal” to supplement a witness’s oral testimony.¹⁰³ Under the pictorial communication theory, the typical means of establishing a foundation for authentication is Rule 901(b)(1), which provides that a “[w]itness with [k]nowledge” testify that an item is what it is claimed to be.¹⁰⁴

Rule 901(b)(1)’s method for establishing an evidentiary foundation is nearly as vague as Rule 901(a)’s standard that it seeks to meet. Applying Rule 901(b)(1), a proponent establishes a foundation for photographic evidence if a witness testifies that the photograph is a “correct and accurate representation of relevant facts personally observed by the witness.”¹⁰⁵ Courts commonly refer to this rule as the “fair and accurate portrayal” standard.¹⁰⁶

The fair and accurate portrayal standard assumes that video is difficult to alter—the standard is rooted in an age of traditional film photography, prior to the advent of digital photography and other media.¹⁰⁷ Traditional photography differs from digital media (whether still photography or video) in several ways.¹⁰⁸ The most relevant difference is that digital media stores individual pixels as data in an electronic file; there is no traditional original image that exists with, for example, older thirty-five millimeter film cameras.¹⁰⁹ Traditional film cameras capture light data as imprinted onto physical film, which can then be protected through a secure chain of custody.¹¹⁰ Digital photography, however, as a “finite set of ones and zeroes,” makes determining whether a digital photograph is an original or a copy nearly impossible.¹¹¹

Additionally, because early digital photography featured lower initial image quality compared to film photography, its proponents commonly needed to enhance digital photographs to aid the trier of fact.¹¹² Thus, an abundance of cases have addressed the issue of non-insidious modifications of video, such as editing, enhancing, taping over, or curating certain portions of a longer video or recording.¹¹³ In these commonplace instances, courts have required no more than satisfaction of the fair and accurate portrayal standard—or the “evidence as a process or system” standard if admitted under the silent witness theory¹¹⁴—to admit the recording.¹¹⁵ For example, the Supreme Court of Arkansas drew a careful distinction between video that had been “enhanced” by adjusting the brightness and contrast of the video with that which was “altered,” such as by changing the “face, features, or physique of someone not present in the original videotape.”¹¹⁶ The court dismissed the defendant’s contention that the video had been manipulated by stating that the jury had ample opportunity to determine whether any alterations were present.¹¹⁷ In these types of cases, courts address both whether the alteration process distorted the image such that the resulting product remains authentic as well as whether the curation conveys a message so different from the original that it is no longer “relevant” under Rule 403.¹¹⁸ For both issues, courts envision having the “original” recording to reference against;¹¹⁹ courts rarely consider the possibility of outright forgery when considering authentication standards for admission.¹²⁰ The rare cases when courts reject photographic evidence are when there is no authenticating witness or the witness expressly rejects the photograph as an accurate depiction.¹²¹ This was the case in United States v. Lawson,¹²² where the defendant offered photographs that were excluded from evidence because the only witness at trial testified that the photographs “did not accurately reflect what he saw.”¹²³

Because of this traditional framework, the fair and accurate portrayal standard is not a difficult hurdle to clear. A witness who testifies as to a photograph’s or video’s accuracy does not need to be the actual photographer or understand the process by which the originator created it.¹²⁴ The standard to establish a foundation is so minimal that issues concerning the possibility that the witness’s fair and accurate testimony is “limited” or “defective” or that the witness is “otherwise unsure of his perceptions” are matters saved for the jury, to which the jury must assign weight to evaluate the evidence’s credibility—not matters of admissibility with which the proponent of the evidence must grapple.¹²⁵ Instead, the standard imposes only a “sufficient to support a finding” requirement on the proponent.¹²⁶

2. The silent witness theory

In addition to the pictorial communication theory, a party may also submit a photograph or video as substantive evidence—that is, the photograph or video is capable of standing on its own to convey what it depicts and, in turn, obviates the need for a witness.¹²⁷ Courts admit photographic evidence in this manner under the silent witness theory.¹²⁸ By treating evidence as a “potential independent source[] of substantive information for the trier of fact,” the silent witness theory has stricter requirements for the admission of photographic evidence than the pictorial communication theory’s requirements for admission.¹²⁹ Evidence admitted under the silent witness theory is generally subject to Rule 901(b)(9), which allows a proponent of evidence to establish a foundation for authentication by “describing a process or system and showing that it produces an accurate result.”¹³⁰

One of the most common examples of evidence admitted under the silent witness theory is security camera footage. Typically, when a party submits video from a closed-circuit television (CCTV) device at, for example, a bank or convenience store, a worker or expert will testify as to the reliability of the video and the process for maintaining an accurate system.¹³¹ For example, in United States v. Rembert,¹³² the government offered no witnesses to testify that a CCTV video fairly and accurately depicted the scene; instead, a bank employee testified as to how the cameras were loaded, how the results were secured, and the internal metadata concerning the date and location of the filming.¹³³ Courts commonly accept details of this nature when the cameras are part of a regulated system that is maintained and operated according to accepted standards, such as those of a police department or bank security system.¹³⁴

Because evidence admitted under the silent witness theory may stand alone as substantive evidence without accompanying witness testimony, courts generally only admit it when the device and process are set up and executed in a controlled environment. Courts have accepted testimony concerning the process and system as it applies to CCTV surveillance videos as described above, as well as x-ray photography and police footage.¹³⁵ However, digital photography that the general public personally creates falls largely beyond this threshold because it lacks a systematic and reliable scientific process and because the proponent cannot demonstrate a secure chain of custody.¹³⁶ For example, even though surveillance or police footage is digitally created, the chain of custody (generally secured through police channels) insulates the product from tampering, and therefore the footage may potentially stand on its own as substantive in ways that evidence admitted under the pictorial communication model theoretically could not.¹³⁷

Over the past several decades, courts have begun to test the digital boundaries of the silent witness theory. For example, in an instance where a police officer took a cell phone video recording of a CCTV surveillance video system of a convenience store, the state failed to establish a foundation when the police officer testified that his video was a fair and accurate portrayal of what the CCTV depicted.¹³⁸ The officer’s fair and accurate portrayal testimony was insufficient where he could only speak to his knowledge of the depiction of his cell phone tape; in other words, the officer had no more personal knowledge that the video of the scene of the crime was a fair and accurate portrayal than anyone else.¹³⁹ Without Rule 901(b)(9) evidence concerning the reliability of the CCTV itself, the recording was inadmissible.¹⁴⁰ Cell phone videos present particularly unique challenges in the silent witness theory context because of the lack of reliability concerning the process and preparation of such videos. Courts have distinguished video recordings originating from cameras worn by an undercover police officer and prepared by state officials from videos taken by an undercover officer with a cell phone in otherwise the same context.¹⁴¹ In McFall v. State,¹⁴² the Court of Appeals of Indiana addressed this very issue when the prosecution introduced evidence of a controlled drug buy using video from a confidential informant’s cell phone.¹⁴³ Whereas normally police officers equip an informant with government owned and managed recording equipment and secure it from the informant following an operation, here the detective did not exercise control over the informant’s cell phone and filming process throughout the operation.¹⁴⁴ The prosecution therefore could not attest to the accuracy of a process or system under Rule 901(b)(9) because the informant’s personal phone was not subject to the same standard operating procedures and chain of custody that the police use for typical surveillance equipment.¹⁴⁵

These cases demonstrate courts’ acknowledgement of the risk that digital photography poses and their hesitance to incorporate it into the silent witness theory without an authenticating witness. Despite these risks, courts have refused to incorporate any changes to the pictorial communication standard when it comes to digital photography.¹⁴⁶

C. Rule 602 Caselaw Establishes a Baseline for Distinguishing Personal Knowledge from Speculation and Logically Applies to Rule 901(b)(1) Witnesses

Normally, a judge will not exclude an eyewitness if her memory or perception is limited; as long as the testimony could assist a reasonable trier of fact in establishing the facts, the court will allow the witness to testify.¹⁴⁷ However, a judge has discretion to exclude evidence (prior to its admission) when a witness’s personal knowledge is particularly uncertain or unreliable or when there is not enough evidence that a reasonable juror could give some weight to the testimony.¹⁴⁸ For example, in Nolin v. Douglas County,¹⁴⁹ the judge did not admit a document when the witness stated that he was only “somewhat familiar with the document.”¹⁵⁰ Thus, judges must walk a fine line between the minimum amount of personal knowledge required to testify and imperfect knowledge that crosses the threshold into speculation.

This fine line determines whether a witness has the requisite personal knowledge to testify to the fair and accurate portrayal standard to establish a foundation of authenticity under Rule 901(b)(1). Since Rule 901(b)(1) does not specifically define knowledge, other sections of the Federal Rules of Evidence are instructive.¹⁵¹ The most relevant section in this context is Rule 602, which requires witnesses to have personal knowledge of the matters about which they testify. Rule 702 allows expert testimony based on “scientific, technical, or other specialized knowledge;”¹⁵² because most witnesses with potential fair and accurate portrayal testimony will not have such expertise, Rule 602’s personal knowledge requirement is a more appropriate standard for knowledge than Rule 702 in this context.

Rule 602 requires that a witness have personal knowledge of the matter about which she is testifying for the testimony to be relevant.¹⁵³ Because other subdivisions of Rule 901(b) describe means of authentication based on either personal or specialized knowledge,¹⁵⁴ Rule 602 and its associated caselaw applies to Rule 901(b)(1) by logical extension despite the lack of a definition of knowledge in Rule 901(b)(1) itself. Thus, examining the Rule 602 standard for personal knowledge helps articulate the requirement for whether a witness testifying to the fair and accurate portrayal standard has the requisite personal knowledge for Rule 901(b)(1). The Rule 602 standard helps define the line between personal knowledge shortcomings that pass the foundational requirements for a jury to consider and those that the court rejects at the foundational stage as speculative, as was the case in Nolin.¹⁵⁵

In applying Rule 602 for determining personal knowledge, courts have long resisted refusing to allow a witness to testify merely because the court believes the witness to be obviously mistaken or dishonest.¹⁵⁶ The only appropriate circumstance for a court to reject a witness’s testimony is when no reasonable trier of fact could believe that a witness perceived what she claims.¹⁵⁷ Courts’ inclination is for the jury, as the trier of fact, to assign weight to testimony in accordance with its perception of the witness’s reliability and other factors to aid in its judgment.¹⁵⁸ Personal knowledge of objects or events under Rule 602 is comprised of four elements: “(1) sensory perception; (2) comprehension about what was perceived; (3) present recollection; and (4) the ability to testify about what was perceived.”¹⁵⁹ Each of these four elements is required for a judge to allow a jury to hear a witness’s testimony.¹⁶⁰

The first requirement for personal knowledge under Rule 602 is sensory perception, which courts commonly label “observation.”¹⁶¹ Although this shorthand most immediately invokes sight, sensory perception may be based on any of the five senses.¹⁶² To satisfy the sensory perception element, the witness must have the ability to perceive and must in fact have perceived what she is testifying to; the witness’s ability, however, may be limited, or even minimal.¹⁶³ Courts have long recognized that the personal knowledge standard to admit a witness’s testimony does not require positive or absolute certainty.¹⁶⁴

For a court to exclude a witness for lack of sensory perception, the witness must have not been able to perceive relevant facts directly. For example, in State v. Tutt,¹⁶⁵ when “it was dark, [and the witness] could[] n[o]t make out exactly what was happening,” the court precluded the witness from testifying because of an inability to visually perceive what she purported to testify to.¹⁶⁶ Similarly, in McCrary-El v. Shaw,¹⁶⁷ the Eighth Circuit affirmed the trial court’s exclusion of the deposition of a witness who claimed to have seen a confrontation between the defendant and several correctional officers from an adjoining jail cell.¹⁶⁸ The court reviewed a diagram of the jail layout and found that no reasonable person could conclude that the witness could see anything of relevance.¹⁶⁹ As these cases demonstrate, the personal knowledge standard allows a witness’s limitations and gaps in perception but not a complete inability to perceive.¹⁷⁰

The second element of personal knowledge is recollection, which, like sensory perception, does not need to be perfect to satisfy the test. Of course, no human memory is flawless. Incomplete or limited memory is usually sufficient to satisfy this requirement and is generally a matter to which a trier of fact must assign weight.¹⁷¹ For example, in United States v. Sinclair,¹⁷² the court admitted the testimony of a drug user despite allegations of a “clouded memory,” relying on its confidence in the jury’s traditional role of determining witness credibility.¹⁷³

There is, however, an important line that a witness crosses with too many memory or perception gaps; eventually, the witness can only convey the testimony coherently by filling the gaps with hearsay or speculation.¹⁷⁴ Witnesses commonly attach caveats to the accuracy of their memory, such as “I believe,” “to the best of my recollection,” or “I cannot be positive, but I think.”¹⁷⁵ The critical threshold, which the trial judge wields tremendous latitude in determining, is where the witness can only convey the narrative of her testimony by filling relevant gaps with speculation.¹⁷⁶ At this point, it is proper for a judge to exclude the testimony as speculative.¹⁷⁷ The speculation threshold is similar for the recollection and perception components of personal knowledge. The witness in McCrary-El could not convey a complete narrative without speculation because he could not perceive key elements of the story due to his lack of vantage point from which to observe the relevant events;¹⁷⁸ the Sinclair witness, on the other hand, could convey a complete story, even if the opposing party called his ability to recall into question, because he was able to perceive to the subject of his testimony.¹⁷⁹ The key element that distinguishes these cases is whether the ability to perceive or remember is essentially nonexistent or merely limited, distorted, or otherwise imperfect.

Rule 602’s third element is comprehension. Even when a witness perceives an event through direct sensory perception, she must still comprehend what she sees to have personal knowledge to testify on the matter.¹⁸⁰ Again, a witness’s comprehension does not need to be perfect. For example, a court may admit a child’s testimony, even if she did not fully understand what was happening, so long as the other elements are met.¹⁸¹ A witness’s comprehension of her perceptions will never be without inference, as a natural degree of inference is always present in human comprehension.¹⁸² To understand sensory perceptions, a person has no choice but to connect those perceptions to past experiences and draw inferences about what she perceives.¹⁸³ Ultimately, the judge controls the amount of latitude to grant to a witness by either requiring more literal perceptions or allowing more inferences to describe the events that the witness perceived.¹⁸⁴

The final element is the ability to testify based on the first three components. This is closely related to the third element of comprehension, but refers to the witness’s comprehension at the time of testimony rather than at the time of perception.¹⁸⁵ For example, when a witness has been hypnotized to refresh her memory or has suffered a brain injury since the event at issue, she may no longer be able to comprehend the line of questioning or her perceptions of the event, even though she understood the event at the time she perceived it.¹⁸⁶ If she is not able to comprehend at the time of questioning, she cannot satisfy the personal knowledge requirement.¹⁸⁷

The personal knowledge standard from Rule 602 direct testimony helps illustrate the knowledge required to meet the knowledge standard of Rule 901(b)(1). Thus, a witness must meet Rule 602’s personal knowledge elements to testify as to whether photographic evidence is a fair and accurate portrayal.¹⁸⁸ To have the requisite knowledge, the witness must base her fair and accurate portrayal judgment on the direct use of her own senses, must have comprehended what she perceived at the time as well as at the time of her testimony, and must have a recollection of that prior perception. The witness is, of course, entitled to an imperfect memory as well as limitations in perception.¹⁸⁹

III. Authenticating Witnesses Can No Longer Reliably Testify to the Fair and Accurate Portrayal Standard to Authenticate Photographic Evidence

Over the past twenty-five years, several scholars have noted the risk that evidentiary standards are too low to address advances in digital photography,¹⁹⁰ but they have made little progress in motivating any changes to the standards.¹⁹¹ Two factors have historically mitigated the impact of such a low bar: first, the court could rely on expert witnesses to assist with authenticity determinations, and second, it was still extremely difficult to create high quality fake video. The dawn of the deepfakes era brings this deficiency to the forefront with a new sense of urgency.¹⁹² The proliferation of deepfakes technology renders obsolete the assumptions upon which the fair and accurate portrayal test relies; witnesses can no longer meet the fair and accurate portrayal standard within the legal standard of personal knowledge required to authenticate video evidence.

The unworkability of the fair and accurate portrayal standard is born out of a convergence of several factors. Deepfakes vastly increase the likelihood that authenticating witnesses will be unable to identify material changes from the actual scene that the video depicts.¹⁹³ Moreover, fake video is more likely to corrupt an authenticating witness’s memories to lead her to actually recall the falsehoods that the video depicts.¹⁹⁴ The authenticating witness’s inability to detect alterations from what she observed and the possibility of false memories leads to a complete inability for the witness to attest to a video as a fair and accurate depiction. The only way to attest that a video is a fair and accurate portrayal is by speculating on vast amounts of detail which, critically, witnesses are likely to believe as their own memory when the court shows them a fake video.¹⁹⁵ When combined with the disconnect inherent in the pictorial communication theory,¹⁹⁶ the result is a high probability of the court presenting to a jury fraudulent substantive evidence that has been authenticated by a witness without proper personal knowledge.

A. Muddled Theories: Video Causes Pictorial Communication Evidence to Leech into Substantive Evidence

The standard for admitting photographic evidence without an accompanying witness is far more comprehensive than when a witness is available to testify that the visual is a fair and accurate depiction.¹⁹⁷ However, the natural result of society’s familiarization with and trust in photography and video recordings is that illustrative evidence’s impact perpetually bleeds over into substantive effect; scholars have articulated this concern for some time, yet the problem remains.¹⁹⁸ Under the pictorial communication theory, photographic evidence should, strictly speaking, “illustrate[] the witness’[s] testimony, . . . add[ing] nothing further.”¹⁹⁹ But this belies the natural human experience of consuming photographic evidence—such evidence conveys more information to the trier of fact than the witness could possibly have seen or heard but also may not have picked up every detail that the witness actually perceived. This dilemma is both technical, in the sense that a photograph is “not a replication but a representation, a constructed—and hence fallible—image,”²⁰⁰ and experiential, in that the witness could not possibly recollect every single detail a recording conveys and simultaneously may very well recall information that the recording device did not capture. Courts have acknowledged the risk inherent in “[t]he masking of the substantive effect of photographs under the rubric of ‘illustrative evidence’” as lacking “conceptual honesty.”²⁰¹ The resulting effect is that photography admitted under the low standard of the pictorial communication theory can easily have the practical effect of substantive evidence as if admitted under the silent witness theory but without meeting the more stringent requirements of Rule 901(b)(9).²⁰² This occurrence is rooted in the judicial system’s confidence in the reliability of the photographic process, despite the fact that film theory teaches camera operators how to deliberately invoke reactions through a host of techniques.²⁰³

Although courts acknowledge an underlying risk to digital photography and video, the fair and accurate portrayal standard has nonetheless been seemingly immune to reconsideration. The Supreme Court of Arkansas, for example, has recognized the risk that it is easier to manipulate digital, rather than traditional, images yet it refused to impose a higher burden of proof for their admissibility when a defendant challenged the admission of surveillance video under the fair and accurate portrayal standard.²⁰⁴ The lack of evolution of the admissibility standard is in part because challenges to the veracity of digital images are rare.²⁰⁵ This is likely attributable to the legal community’s lack of awareness of the risk inherent in digital images compared to older technology.²⁰⁶ Additionally, when a party does challenge a digital image, the challenge typically addresses an overt enhancement of the image rather than the image’s authenticity.²⁰⁷ Moreover, courts may fear that elevating admission standards for photographic evidence will stifle the efforts of law enforcement, whose use of digital equipment during crime scene investigation has become commonplace,²⁰⁸ or will slow the trend towards the convenience that comes with increased computer use in litigation.²⁰⁹

The consequences of the natural bleed over from illustrative to substantive evidence is that digital photography and video, admitted under the easily satisfied standard of Rule 901(b)(1), tend to convey substantive fact far beyond what the legal standard assumes or intends.

B. Witnesses Can No Longer Meet the Personal Knowledge Standard of Rule 602 to Attest to Photographic Evidence as a Fair and Accurate Depiction of a Scene

Establishing a foundation for admitting photographic evidence under the pictorial communication theory requires witness-with-knowledge testimony that the photograph or video is a fair and accurate depiction of the scene that it illustrates; to attest to this standard, a witness must be able to satisfy Rule 602’s personal knowledge requirement.²¹⁰ Because witnesses are unable to perceive alterations or fabrications in deepfake videos, they can no longer determine whether the video’s depiction is a fair and accurate portrayal of their memory. Using the personal knowledge standard articulated in Rule 602 caselaw, witnesses will commonly fail the recollection element of personal knowledge that a video is a fair and accurate portrayal.²¹¹ The personal knowledge standard allows for significant gaps in the ability to recollect, but it does not permit gaps so central to the testimony that the testimony crosses the threshold into speculation.²¹² Because witnesses cannot possibly recall all of the detail conveyed in a photograph or video, their limitations are likely to go beyond fuzziness or uncertainty and become speculative.

The underlying problems with the fair and accurate standard did not emerge with the invention of deepfakes; these problems have existed ever since photoshop became a commonly used verb.²¹³ Rather, deepfakes critically reduce the already limited effectiveness of authentication witnesses. Deepfakes exacerbate the inability of witnesses to determine their own recollection limitations and communicate the extent to which their limitations affect their ability to attest to the fair and accurate portrayal standard. Deepfakes’ lifelike fidelity reduces the likelihood that authentication witnesses will reliably rise to the task of stating either that something looks different from the way they remember it or that they do not recall it at all; the visuals are too convincing and too likely to take advantage of the suggestibility flaw inherent in our memories.²¹⁴ Thus, the speculation that occurs in blanketing the entire depiction as fair and accurate crosses the threshold of acceptable gap filling.²¹⁵

Even a well-intentioned witness with no intention of deceiving the court will be unable to meet the threshold. The following example is illustrative. A criminal defendant offers a video made using a commercial iPhone. It depicts the defendant at an event with a date and location known to the public, such as a concert or other public event, thus providing an alibi. A witness who was at the event may recognize a variety of features that are true: the concert stage, the events transpiring in the background, or other individuals present. But the witness will not be able to discern small changes that are undetectable to her, such as the insertion of the defendant’s likeness onto another individual who was actually present at the event. The proponent of the evidence cannot ask whether the witness recalls every detail in the video—the amount of detail makes the task inconceivable for both the proponent and witness. Instead, the proponent asks the witness to testify whether the picture is a fair and accurate portrayal of the scene that she remembers. Following the witness’s fair and accurate portrayal testimony, the jury will see evidence with small but significant alterations.²¹⁶

The blending of pictorial communication and silent witness theories sheds light on why a witness in this context can no longer meet the recollection element of personal knowledge.²¹⁷ The witness here likely has a variety of memories from the event depicted. She may remember which speaker or entertainer the event featured, some details on how the event was laid out, or what the stage looked like. By recalling any of these factors, she likely feels comfortable attesting to the video as a fair and accurate portrayal of the scene. If asked to testify whether she remembers these specifics, she certainly passes the personal knowledge standard for any of them, even if she expresses some uncertainty.²¹⁸ However, if asked specifically whether she saw the defendant at the event, the witness may have no recollection. Nonetheless, the witness testifies that the entirety of the scene is a fair and accurate depiction of her memory. Despite the proponent offering the video under the pictorial communication theory, the jury sees all of the surrounding details encompassed by the video, whether the witness recalled them or not. The witness cannot possibly recollect that volume of detail if the court (unrealistically) examined her recollection of each and every individual detail of the video. The only way for the witness to testify that the video is a fair and accurate portrayal is speculation because of the likelihood that the witness cannot detect whether changes have been made. Of course, she may specifically state that she remembers a manipulated part of the video and identify it, but in doing so, she has authenticated substantive facts that she did not actually remember and likely has no reason to suspect that there were any limitations on her fair and accurate assessment.²¹⁹

Witnesses’ inability to perceive changes in fake video is twofold: not only are witnesses unlikely to be able to perceive changes, but they are also willing to affirmatively remember portrayals in video that were altered and did not actually take place.²²⁰ Critically, a witness’s inability to perceive changes in the depiction does not reflect in her understanding of her own perceptions. The psychological suggestibility that fake video has on memory²²¹ warps the reliability of an authenticating witness. Professor Kimberly Wade’s psychological study is a convincing demonstration that photographs and video are powerful tools to refresh a witness’s memory, even when the memory that the imagery invokes never happened.²²² The suggestibility problem inherent in fake video and fake narratives vastly increases the likelihood that a witness believes that she has the personal knowledge to authenticate a video, even absent any intended deception by the witness.²²³

Suggestibility along with the precision of deepfakes pose both technological and psychological restraints to a witness’s determination of fair and accurate portrayal. Such testimony does not merely represent a limitation on the witness’s recollection capability when attesting to the authenticity of a video—it represents a complete inability to make the determination of her personal knowledge, which pushes a witness’s personal knowledge past the level of uncertainty normally allowed to establish a foundation. The previous example of the alibi video is distinct from examples in which witnesses were unsure of their perceptions, had limited sensory perceptions available, or had incomplete information.²²⁴ In each of these scenarios, there was some ability for the witness to recognize and articulate the limitations of her personal knowledge, whether in perception or recollection.²²⁵ Here, however, a witness can only label the entire video or scene as a fair and accurate depiction by using those facts that she does recall from the scene and augmenting them with speculation. This is especially dangerous when combined with the psychological effect of suggestibility that is especially strong with video—the witness is likely to convey inherently speculative fair and accurate depiction testimony confidently and without doubts as to her recollection capability.²²⁶

This analysis does not characterize the evidence’s probative value itself to be speculative—the video, if authenticated, may be highly probative or speculative in its own right depending on what it depicts and what its proponent intends to demonstrate to the jury.²²⁷ Here, the witness’s fair and accurate portrayal testimony, not the evidence, becomes speculative—that is, it has little probative value on establishing the foundation for authentication.

At first glance, this characterization of the witness’s fair and accurate portrayal testimony seems to fly in the face of the strong tradition of a minimalist standard in which the proponent does not need to eliminate “all possibilities inconsistent with authenticity, or to prove beyond any doubt that the evidence is what it purports to be.”²²⁸ However, there is an important distinction from the long list of examples of courts admitting testimony based on shaky memories and imperfect observations²²⁹: in each of these examples, the witness can qualify her imperfect memories by articulating the degree of limitation or imperfection. She can communicate how “positive” or not she is, or how well she was able to perceive the facts by explaining, for example, how dark it was, how far she could see, or whether she could make out facial features. She could also describe her vantage point and identify physical or environmental limitations. Ultimately, these examples all provide a minimal articulable basis for recollection and perception as a foundation. The deepfakes problem, compounding pre-existing issues with digital photography, creates an authenticating witness who cannot articulate her level of confidence or capability when it comes to labeling an entire video sequence as fair and accurate; the potential forgeries are too high quality,²³⁰ and psychological factors create a sense of certainty that does not reflect the true degree of speculation.²³¹ The result is that the only way for a witness to testify that a video sequence is a fair and accurate depiction is by augmenting her memory with speculation, even if she do not realize she is doing it.²³²

C. Digital Photographic Evidence Warrants a More Stringent Means of Authentication

Because witnesses will no longer be able to meet the legacy standard of Rule 901(b)(1)’s knowledgeable witness by attesting that a video is a fair and accurate portrayal, courts need to look elsewhere for a sufficient finding that photographic evidence is what its proponent claims it is. This new standard does not necessarily replace Rule 901(b)(1), which is still applicable for a variety of other forms of evidence.²³³ Instead, a proposed new section would specifically govern the unique challenges that digital photography in the modern age present:

(Proposed New) Rule 901(b)(11): Before a court admits photographic evidence under this rule, a party may request a hearing requiring the proponent to corroborate the source of information by additional sources.

As mentioned earlier, the processes offered in Rule 901(b) to establish a foundation for authentication are not exhaustive; Rule 901(b)(1) or 901(b)(9) are not the exclusive options.²³⁴ A proponent may also use circumstantial evidence to establish a foundation for authentication without adhering to one of the processes enumerated in Rule 901(b).²³⁵ This new rule essentially codifies an existing means of authentication and requires it for photographic evidence. Thus, even if the proponent cannot produce a witness with personal knowledge, methods of proving authenticity “can be infinite in variety, limited only by the circumstances pertaining in the particular case.”²³⁶

A Rule 901(b)(11) hearing would consider authentication factors beyond the bare bones requirement of 901(b)(1). A starting point for elements for the court to consider at this stage is the presence of additional corroborating evidence, as the court would consider in instances where the proponent establishes its foundation outside of the traditional 901(b)(1) or 901(b)(9) paths.²³⁷

Returning to the example above, if the government called for a Rule 901(b)(11) hearing to challenge the alibi video that the defendant submitted, the court would require more than a knowledgeable witness to establish a foundation for authenticity. For example, if the proponent offered a ticket stub or other circumstantial evidence of the defendant’s attendance, it would corroborate the authenticity of the video. The proposed rule would not rule out the utility of the witness through direct testimony. If a witness testified that she personally saw the defendant at the alibi event (a specific observation that a witness is far more likely to recollect concretely) as opposed to whether the entire scene is a fair and accurate portrayal, then the witness would easily meet the personal knowledge standard.

D. Increased Scrutiny Prior to Admission is Worth Risking Excluding Relevant Evidence Because of the Heightened Risk of Jury Prejudice Associated with Photographic Evidence

The alibi example may seem redundant; if there was a witness to corroborate a defendant’s alibi, then why does the defendant need the video in the first place? The more troublesome instance is where the video is the only source of evidence concerning the alibi, whether because the videographer is unavailable for some reason or is a criminal defendant herself and unwilling to testify at trial.²³⁸ The proposed rule likely poses a threat to the volume of digital media submitted in court. The immediate counterargument to address elevated foundational standards for authentication of photographic evidence is that a jury can consider these factors at trial just the same as the court can in a preliminary hearing. After all, nearly all forms of evidence, from written documents to oral assertions, are vulnerable to the potential for fraud; the system depends on a jury (with help, if necessary, from expert witnesses) to assign weight to evidence based on credibility and relevance.

But the heightened risk of forgery inherent in deepfakes warrants heightened admission standards. Photography, and to a greater extent, video, have a stronger effect than other forms of evidence; they cannot be so easily dismissed once seen.²³⁹ While the emotional power of photographic, and especially video, evidence is generally thought of as an issue of probative value for courts to consider under Rule 403—such as when evidence is relevant but contains extremely graphic content that renders it unduly prejudicial—suggestibility is not the type of emotion that typically factors into the Rule 403 calculus.²⁴⁰

In the context of questionable or competing forms of evidence, juries have a tendency to cast aside other, less interesting forms of evidence when presented with the ease and convincing nature of viewing photographic evidence.²⁴¹ Juries are also remarkably poor at adhering to limiting instructions²⁴² or even admonishments to disregard inadmissible evidence.²⁴³ In fact, they also “paradoxically pay greater attention to information ruled inadmissible than if the judge had not drawn attention to the admissibility of the information and simply allowed it into evidence.”²⁴⁴ Thus, even if a party casts doubt on the authenticity of photographic or video evidence, once the court admits it, the vivid images remain in a jury’s mind. By virtue of video’s emotional effect and the tendency to prioritize it above other forms of evidence, the risk of waiting for a jury to consider initial corroborating evidence concerning a video’s authenticity justifies the court’s consideration of these factors prior to admission.²⁴⁵

A preliminary hearing to consider circumstantial authentication factors does not solve the deepfakes evidentiary crisis—but it does mitigate it. The proposed standard for establishing a foundation would still be limited and does not render photographic evidence forgery-proof; a jury still ultimately determines credibility and weight of the evidence that is admitted. Because of the challenges in creating effective detection measures (and the especially worrisome challenge that such measures will improve the forgery process),²⁴⁶ regulation and potential criminal solutions are in order to address deepfakes on a larger scale and stem their potential entry into the courtroom.²⁴⁷ Until then, a preliminary hearing process would bolster the confidence in video evidence for a jury to consider, rather than allowing all photographic evidence to pass the foundational stage with a testimonial witness who lacks the requisite personal knowledge to attest to the evidence’s validity.

Conclusion

The age of machine learning has contributed to human achievements and triumphs equaled only by the risk that it creates when placed in the wrong hands.²⁴⁸ Unfortunately for our trusting eyes, this atom cannot be unsplit, and artificial intelligence-enabled video creation is likely here to stay. As regulators scramble to address the risks posed by fake video created through GAN techniques, the legal standard for authentication of video evidence has fallen behind; evidentiary standards need to evolve to accommodate our changing world. The result will likely be a reduction in the reliance on photographic evidence in court after nearly a century of the steady rise in confidence and reliance upon photographic evidence to capture moments lost to human memory.

A Break from Reality: Modernizing Authentication Standards for Digital Video Evidence in the Era of Deepfakes

Share this post

Contact Us

Join Our Mailing List