The Machinist

The Machinist - Hany Farid on the erosion of shared reality in the age of deepfakes

April 25, 202651 mins

Show notes

Podcast: Berkeley Talks (LS 29 · TOP 10% what is this?)
Episode: Hany Farid on the erosion of shared reality in the age of deepfakes
Pub date: 2026-04-06

Get Podcast Transcript →
powered by Listen411 - fast audio-to-text and summarization

Two decades ago, when Hany Farid first began studying digital misinformation and manipulated media, fake content was easier to detect. Today, that landscape has shifted with a speed that he describes as “breathtaking.” In just the last year or two, he says, we’ve moved from an era where a computer takes seconds or minutes to produce a static file to "full-blown interactive deepfakes" that can hold a live conversation in real time.

In this Berkeley Talks episode, Farid, a digital forensics expert and professor at UC Berkeley’s School of Information, discusses the rapidly accelerating landscape of generative AI and the unique threat it poses to our collective understanding of the world.

Farid notes that tools once reserved for governments or well-funded organizations are now freely available, radically expanding the threat landscape. “We have taken a mechanism that was in the hands of state-sponsored actors and bad actors and given it to 8 billion people in the world," he says. This democratization of powerful technology makes it much easier to create convincing false images, audio and video — and much harder to trust what we see online.

And he explains that human perception is no longer a reliable defense, as his research shows people are only slightly better than chance at identifying AI-generated content.

To reduce the damage to our shared sense of reality, Farid suggests solutions should focus on the systems that profit from harmful content, including platforms and ad networks that help it spread. He also gives a warning about news consumption: “Stop getting your news from social media. That’s not what it was designed for.

Despite the rise of deepfakes and online deception, Farid says he rejects the idea that there is no truth or fact. He believes that, although it takes effort, people can still work together to understand what is happening in the world.

This lecture, which took place on March 13, was part of LNS 110: Brilliance of Berkeley, a course featuring distinguished researchers working on the world’s most pressing issues.

Watch Farid's lecture (with slides) on YouTube.

Listen to the episode and read the transcript on UC Berkeley News (news.berkeley.edu/podcasts/berkeley-talks).

Music by HoliznaCC0.

Screenshot from lecture.

Hosted on Acast. See acast.com/privacy for more information.

The podcast and artwork embedded on this page are from UC Berkeley, which is the property of its owner and not affiliated with or endorsed by Listen Notes, Inc.

Listen to episode ↗

Transcript

Speaker 0

This is Berkeley talks, a UC Berkeley news podcast from strategic communications at Berkeley. You can follow Berkeley talks wherever you listen to your podcasts. New episodes come out every other Friday. You can find all of our podcast episodes with transcripts and photos on UC Berkeley News at news.berkeley.edu/podcasts.

Speaker 1

Alright. Let's get started. Who is excited to talk about deepfakes today? Whoo. Alright.

Professor Hani Farid is a professor at UC Berkeley in the School of Information. He's also the cofounder and chief science officer at Get Real Security. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception. Over the past 2 decades, Farid has been a leading figure in the development of techniques used to detect manipulated images, videos, and audio. His work has helped advance the scientific foundations foundations of media authentication and has been applied in journalism, law enforcement, and national security contexts.

Farid frequently collaborates with technology companies, governments, and media organizations organizations to combat online misinformation and develop tools for identifying AI generated or altered media. Please welcome professor Hani Farid.

Speaker 2

This falls under the category of now for something completely different that has nothing to do with proteins, except for our my brother being our common bond here. Okay. So there's I think everybody has heard about deepfakes. You've all heard about generative AI. You all know what's going on.

I just want to level set a little bit and then talk about what I think is the landscape with respect to this content, with respect to us as individuals, societies, democracies, economies, and then what does the future look like to the extent that we can see a little bit into the future. So first of all, we have been talking about deepfakegenerative AI for about 10 years, which seemed hard to imagine, the term first started bubbling up in around 2015, but it was really quite nascent for the first 8 years, and it's really only in the last 2 years that you've seen really, really dramatic changes. So broadly speaking, deepfake refers to fully AI generated, machine generated content: language, images, audio and video. I don't deal with language. I'm going to put that aside for a little bit, but you've all been using your favorite large language model.

We're talking about multimedia. This, for example, is an image generated using a prompt. You literally type a young woman making a heart with her hands, and this is the image that came out 100 fully percent AI generated, no Photoshop, no graphics, nothing. It takes nothing to do this other than a laptop and an Internet connection. And most of these are free to generate.

So the day of the 6 fingers and the weird faces and the hair, all of that is over. These images are hyper, hyper realistic. And I'll tell you in a little bit how realistic we think it is. Video, just I would say in the last 6 to 9 months, has started to move through what we call the uncanny valley. That is, it's so realistic you can't tell the difference.

I going play for you of this video. This video was also 100% fully AI generated with a single text prompt, a YouTuber filming from their bedroom. So what you're about to see, the audio, the video, everything is fully generated in seconds on a computer. So let me go play this, and then we'll talk about it.

Speaker 3

What's going on, everybody? Welcome back to another video. Today,

Speaker 2

we're gonna

Speaker 3

be talking about something a little bit different.

Speaker 2

That's insane that you can do that with a text prompt. If you would ask me a year ago, 2 years ago, I'd be like, no. Not for another 5, 10 years. And I would have been wrong about, by the way, when these things are coming. We know they're coming, but they are accelerating at a pace that is unbelievable.

Now, the 2 things I showed you, the image and the video, were what we do what we call an offline process. You type into the computer, it generates something. It takes about a few seconds, and then you get it back, and then you can do whatever you want with that file. What we've started to see in the last few months is full blown interactive deepfakes. Okay?

So I'm going to play you a little video, which is a clip of me talking to what, I guess, we would probably call these days an AI agent. So let me just set up the stage for what's happening. So this is a commercial service. You connect through something that looks like Zoom. You can see that here.

I'm talking. This is me and talking in the bottom. And in real time, the agent is listening to me, forming a response, generating a voice, and then synthesizing the video to speak back to me. I'm going to play for you a little clip of it. But just for this demo purposes, we are playing 20 questions.

Okay? And you're just going to hear a little snippet of it. And just understand that everything you see in the top there is fully AI generated, the text, the audio, the video in real time. Right? I'm literally just recording my screen.

Okay. Let me play this. Are they a famous actress?

Speaker 4

No. The person I'm thinking of is not a famous actress, but they are a pretty well known figure of popular culture. Keep trying.

Speaker 2

Are they a musician?

Speaker 4

No. The person I'm thinking of is not a musician. You're not even close. Try again.

Speaker 2

Are they a scientist? Getting snarky. No.

Speaker 4

The person I'm thinking of is not a scientist. Right. Think more like entertainment. Try again.

Speaker 2

Are they a musician?

Speaker 4

I already told you. They're not a musician. Percatcher. You're repeating yourself. Try something new.

Speaker 2

You repeat. First of if you're yell that by an AI agent, they better have a British accent. That's amazing. By the way, I did that on purpose. I wanted to see if she had memory, and she didn't.

I got scolded. And by the way, never figured it out. I was so distracted by the agent. It is amazing. This is 1 of half a dozen services now that you're going to get on video calls with customer support, your doctor, your lawyer, and it's going to be a full blown AI agent.

And you saw the delay was maybe half a second or so. It's incredible. It was a little muted because of the way I just recorded the screen, but it is you it blows your mind that this is on here today. I mean, it's not going anywhere. Okay.

So the first so that's sort of state of the art. Things are really, really good and only accelerating, only accelerating. Every few weeks, we see advances in image, audio, video and now agentic AI. So there's no this train has left the station. There is no slowing down.

This is sort of our new reality, and we'll talk about what the implications are in a minute. Now the first thing you want to ask yourself, because I get this question a lot as well, I look at stuff all the time. I think I can tell the difference, right? I know when I'm looking at an image that's AI generated or a video or a voice or a person I'm talking to, you don't. You're actually quite bad at it.

The only thing you're good at is having a lot of confidence in it, but it turns out you're quite bad at it. And let me tell you how bad you are. So, in addition to developing mathematical and computational techniques that I'll be talking about, we do perceptual studies in my lab. We show people content, and we ask them to judge real, fake, AI or not. So, this is 1 a study that we did about 2 and a half years ago with Sophie Nightingale, was then a post doc in my lab, now a professor at Liverpool in The U.

K. We showed people half the images were like the ones on the top. Those are all fake faces. Half of the images were real. We told them half were fake, half were real.

We set them in a lab setting. We set them up for success. We told them this is a task you are doing, and their average accuracy was 55 and 65. Chance, if you're flipping a coin, was 50%. They're basically slightly better than chance.

Okay? So, it's over. And these images were generated 2 and a half years ago. So, this has only gotten harder at this point. With voice, this was a joint study with Emily Cooper, who's on the faculty here, and Sarah Barrington, who's a PhD student in my lab.

Same thing. You listen to voices, half real, half fake. We tell you what you're going to do, we set you up for success, we give you training, we have you listen to some examples. Chances, again, percent. Your accuracy is somewhere between 6065%, slightly better than chance.

It's over. And these were this is about a year ago now. So this is only improved. Full blown video. This study is in review right now.

We had half the videos were real like this, half of them were AI generated with the same underlying content. So, barbershop, barbershop, a brook water moving in a brook, water the content was exactly the same. And here again, chance is 50%. And here, you're a little bit better. You're like 62, 68.

But, you know, give it 8 weeks, 12 weeks and it will be over. You're just really bad at this. I mean, I don't sorry, that sounds really judgmental. I just we're bad at it. Our visual system just has not evolved to do this task well.

And we have to face that reality. And this is nothing. This is nothing compared to if you're on social media. Our visual system didn't adapt to this. The millions of years of evolution, it's not like we had pressure to tell you whether something was real or not.

This is not something we're equipped to, and it is nothing compared to you being on social media and doom scrolling and do seeing things that are emotionally charged and politically charged and partisan, where you're way worse than this. So this is our new reality, the things that you read, see and hear online, you're having a really hard time knowing if they're real or not. So think the current conflict in Iran, think Venezuela, think Minneapolis, think every major conflict in the last few years, a lot of what you are seeing online is just not real. And that's the problem. So let's talk about the weaponization of this technology.

There are positive implications of this. No, there's no question about it. I wouldn't deny that. But there is real weaponization of this. So if you've been reading the news over the last few months, you have seen that Grok AI was allowing people to create what's called nonconsensual AI images, where they would take images primarily of women and children and nudify them, that is render them without clothes.

And it was doing that on their services and then, of course, hosting those images on X. And it is brutally bad. It is ugly, ugly part. In fact, it's where the term deepfakes came from. It was a moniker of a Reddit user who used some of the early technology to do this.

And this is everywhere. School kids are doing this. Adults are doing this. Cyber criminals are doing this. They're using it to extort children.

They're using it to weaponize against women, extort them, embarrass humiliate them and drive them off platforms. And it is rampant. It is rampant around the world and it is awful. It is an awful thing happening to individuals. And the companies are just profiting by this and doing nothing about it.

About 2 and a half years ago, at 9 about 20 minutes after The U. S. Stock market opened on the East Coast, somebody posted a fake image of the Pentagon being bombed. And it wasn't a very good fake, by way. It's 2 and half years ago, so the images were just so so.

In about 90 seconds, the stock market dropped a half $1,000,000,000,000 in 90 seconds before people figured out we weren't under attack and it had recovered. By the way, there's pretty good evidence that there was several things going on here. There was the AI image that was posted that made it look like we were being attacked. And then humans started responding to that, and then the AI trading bots responded to the humans and there was this massive $05,000,000,000,000 sell off before figured out people figured out what's going on. And now I don't know if somebody was trying to manipulate the market, but you damn well know somebody noticed that.

And you are going to see market manipulations around earnings calls, around IPOs, around things like this. So there is a potential huge massive impact to our economy. You heard I'm a Chief Science Officer over at Get Real. I talk to executives at Fortune 500 companies almost every day. And every day I hear the same thing is that they are losing millions, tens of millions of dollars to scams that are starting to hit the enterprise, not just your parents and your grandparents, those stupid phishing scams that everybody gets.

This is 1 where somebody was on a call with who they thought was their chief financial officer. It was a completely AI generated person, impersonating the face and the voice. They were closing a $25,000,000 deal. They sent them the wire instructions and $25,000,000 went bye bye in about 10 seconds. This is not the first time, it is not the last time.

And by the way, for every 1 of these that you see in the newspaper, I can tell you there's 10 of them that you don't, because people don't want to talk about it. It's embarrassing. So the enterprise is getting attacked at a scale that they have not seen before. This is an amazing thing. The FBI has now, over the last 2 years, released 3 reports about this.

A lot of you are probably applying for jobs. And I guarantee you, all of you are doing them over Zoom, and in many cases, you're doing them right away with a Genetic AI. You're not even talking to a human in the beginning. And what's starting to happen is that people are scamming this process. So, for example, the North Koreans drive their entire economy with fake IT workers.

They have North Koreans applying for jobs here in The U. S, competing with you, by the way. They are masking their voices, masking their faces, masking their location. They're penetrating these companies. In many cases, they're just good workers collecting paychecks and giving it to the North Korean government.

In many cases, they're installing malware. In many cases, they're installing viruses. And in many cases, they're stealing IP. We talked to a U. S.

Defense contractor who had 5 North Korean, a U. S. Defense contractor. Every single Fortune 5 company has done this. There's also these bait and switches where people the people will hire somebody to do an interview for them, and then that person gets a job and somebody else shows up to work.

It's a bait and switch. Nobody is connecting the dots. So the enterprise is now getting attacked on human resources. By the don't get any ideas, now I saw somebody smiling back there. It's a good idea.

This isn't an instruction manual. This is geopolitics. In the early days of the Russian invasion of Ukraine, the mayors of Madrid, Berlin, and Vienna were on a call with who they thought was the mayor of Kyiv. It wasn't. It was the Russians, and they were extracting information about NATO movements.

You're now getting on real time video calls. This has happened in The US, members of the Center of Foreign Relations. Members of this White House have been on calls with Russians, and they thought they were talking to somebody else. So this is now geopolitical implications as well. So this is our reality.

Think about what I just enumerated, individuals, enterprises, societies, democracies, mention that insane disinformation that is spreading online around elections, around conflicts, around protests, around candidates. This is our new reality. We don't have a shared reality is our new reality. We are living in alternate realities, and that is exceedingly dangerous for us as individuals, institutions, societies and democracies. So what do we do about it?

I mean, there are days where I just think, the Internet was an interesting experiment. Let's just turn it off and let's get on with our lives. But I don't think that's going to happen. And so, what we spend our time most of our time doing is developing techniques that can help news agencies, law enforcement, national security and enterprise figure out what the hell is going on in their world. And I want to just give you a sampling of how we do this.

It's not going be obviously an exhaustive list of things. And 1 of the things that you have to understand about generative AI is that it is very, very good. You saw those images and videos. It is very good. But it actually doesn't know about the physical world.

It's purely a statistical inference engine. It's just extracting statistics from billions and billions of uploads, many of them from people in this room. If you've uploaded anything online, it's learning from you. But it doesn't know about physics, it doesn't know about geometry, it doesn't know about cameras, it doesn't know about optics, it doesn't know anything except statistics. And it turns out statistics are pretty powerful.

So, 1 a big part of how we think about analyzing content is to look for deviations from physical reality, things that simply don't make sense physically. So let me give you a couple of examples of that. Okay. Anybody who's taken an art class knows what I'm about to tell you if you've learned about perspective or 3 point perspective. So what I'm showing you here is an image of a flat surface, the floor, and on that floor is a tiled tiles.

And those tiles, of course, if I was looking on them the way I'm looking down right now, they would be square. But of course, when I image them, they're not square, They have these sort of trapezoidal shapes. Why? Because of linear perspective. Right?

And the reason is, is that when things are further away from us, whether that's a camera or this thing on my face, my eyes, things are smaller relative to their distance. Oliver in the back of the room has a head that's very small relative to this. It's not because his head is actually small. Sorry, Oliver. He's sort of my boss, too.

Can't believe I'm doing this. It's because he's far away from me, but my brain is not confused by that. My brain understands perspective projection. So if I take these parallel lines and I annotate them and extend them outwards, they intersect at a single point. It's called the vanishing point.

And that's just a physical property of the physical world and linear perspective. And beautifully, mathematically beautifully, if I even take lines on parallel surfaces that are in the physical world parallel, and I draw lines, they intersect at a common vanishing point. And that has to be true for any man made object that is both flat and has parallel lines in it. Just a really nice simple geometric property that we've known about since Renaissance painters. Now, again, generative AI is statistical in nature, and it doesn't always get these things right, but your brain doesn't notice.

So this is a case we worked on a few months ago. This was before we invaded Iran, by the way. So, this was purportedly showing an Iranian nuclear facility. I'm showing you 4 frames of it being bombed. And it was I'm not playing the video because it's a little gross.

So, I'm just going to show you the 4 frames. And we love man made objects, because man made objects are extremely regular. So, here is 1 frame of the video, and you can see that I've annotated 1, 2, 3, 4 lines, and you can see that when you extend them outwards, they don't intersect. Right? There's a physical implausibility.

And we can do a perturbation analysis to ask, well, if we're off by 1 pixel, does it matter? And so we can ask that how likely is this to happen within the constraints. And then when you find these anomalies, it's game over, right? Something's wrong. Something's wrong with these things.

And now we can go back to the news outlet and tell them that it's not real. Okay. Second technique. This is 1 of my favorite ones, because I love shadows. I'm obsessed with them.

Even when I walk outside, I always notice where shadows are. So, if you go outside now, it's a sunny day, and stand outside, look at where your shadow is, and if it's being cast over here, you know where the sun is. It's on the other direction. Why? Because if I take a point on a shadow and connect it to the corresponding point on the object and keep going, I must intersect the light source.

That's the very definition of a shadow, right? And there's a very simple constraint in the physical 3 d world. Literally, you can draw, and doesn't matter if the sun is infinitely far away like the sun. It doesn't matter if the light is anywhere. And it doesn't matter what the shadow is being cast onto.

Now, in the physical 3 d world, I have a line that constrains 3 things. Point on a shadow, point on an object, light source. Now, an image of that virtual line, if you will, and it remains aligned. Why? Because it's a linear perspective, unless I've got some weird lens distortion in my camera.

So in the image, I also have a constraint that tells me the relationship between a shadow and object and a light source. Now, if the object doesn't have that kind of distinctive feature, what do I do? I have this like potato like thing here. What do I do with that? So there's a point on the shadow here.

I know that there's a point on an object that it corresponds to, but I don't know what it is. But I know it's somewhere on that object. So here, I can draw this wedge shaped constraint that says the light must be somewhere in that region. I know that. I don't know exactly where it is, but it's somewhere in there.

Now, if I'm outdoors and I take a photo and there's the single dominant light source like the sun, what do I know? Everybody's shadow better be consistent with that dominant light source. That's the physical constraint. And if it's not, something is wrong. Something is physically implausible.

Okay. Alright. So let me give you an example of that. I can't you show the image I actually did for this because it was classified, but I'll give you an example of that. So this is an image of a bunch of soldiers walking down a path that I generated myself, and it looks great.

Right? You don't really see anything obvious. They've all got a shadow. In fact, this shadow is particularly beautiful here under the feet. You don't you see all the soldiers have shadows, everything looks fine.

But if you start going in and analyzing those shadows very carefully, here's 1 constraint, 2 constraint, 3 constraints, and suddenly there's multiple light sources in here. And something is wrong, right? Unless they're on a different planet with more than 1 sun, which I suppose is possible, but I don't care about that, something is physically implausible. Yep. Okay.

Alright. So those are 2 image things. Let's talk about video for a little bit. And I promised Oliver I'd leave time for questions, 01:30. Okay.

I think we're good. Alright. So this was a video made by Jordan Peele, who I just adore. And it was made, god, maybe 5 years ago. So really early on.

It's not the best deepfake, but I'm gonna show it for for for a second to make a point. So go ahead and watch this, and then we'll talk about what it is.

Speaker 5

Now, you see, I would never say these things, at least not in a public address, but someone else would. Someone like Jordan Peele. This is a dangerous time. Moving forward, we need to be more vigilant with what we trust from the Internet.

Speaker 2

Okay. So first, he does a really good Obama. You gotta be impressed by it. So this is what's called a lip sync deep fake. So what Jordan Peele did is he's he's speaking in Obama's voice, of course, but that could have been now AI generated today.

You couldn't have done that when he made this. And that's an actual video of President Obama talking, and that's a lip sync deepfake where they replace just his mouth to be consistent with the new audio track. Think about, by the way, the power in that. You can take anybody's video and make them say anything you want them to say, which is pretty striking. Okay?

So these things are quite common, and what's so powerful about them is the only thing that's manipulated is a mouth. Everything else is real. Right? This is very localized manipulation. So this is work of Sarah Barrington, who you can actually see right there, and Maddie Bohaczyk, who's a student down at Stanford who works in our lab.

And this is a great story, by the way. I like telling this story to students, because this is how science is actually done. I was giving this talk, and I showed that Obama video, and there was a woman in audience who was a lip reader for the hearing impaired. And she came up to me afterwards, she said, you know, if you don't listen to the sound, the mouth is moving wrong. It's just wrong.

It's not it's not like, if I'm lip reading, I'd see he's not saying the words I'm hearing. I'm like, oh, that's really cool, because wouldn't have noticed that because I'm not reading lips. I'm just listening to it. And then we had this idea as well, what if the mouth is not moving properly? And that seemed like a really cool signal.

So here's what we do. We take the video, we throw away the audio and we do lip reading, automatic lip reading. This is a computational technique. We didn't invent it. It came out of the computer vision literature.

And then we take the audio and we transcribe it, audio to text. So, here, Sarah up there is saying, I just think it's a really feel good and excellent piece of cinema responding to what her favorite movie is. But when we read her lips, what she's saying is I just had its bread roll, it's your preserve about the media. It's just complete nonsense gibberish, right? And you don't notice.

Your brain doesn't notice because you're doing sensory integration. You see the mouth, you see the and by the way, you know this is true because if you ever watched if you were like me and you grew up on Sesame Street, when the puppets talk, they just sort of move their mouth like this and it looks fine. Like, it's not like we're that discriminating in how the mouth is moving. We don't care. So what we do is we simply take these 2 transcriptions, We measure the distance to them using a standard mathematical techniques.

Fake audios have are really different. Real audios are oops, I got it backwards. The distance between real is small. The distance between fake ones is high, and we just split the baby down the middle, right? We just see these differences.

The mouth is not saying what the ear is hearing. Really, really simple technique. And you don't notice it. You just don't notice it, but we can measure it. Now, as we were playing around with these things with Obama, we started noticing things.

So, we were looking at a lot of videos of him, and we started thinking about could we build a more specific deepfake detector that was not anybody talking, right, or any image or any video. What if we could start learning characteristics of individuals, people like Obama, world leaders, CEOs? Or let's agree when they say something, implications are different than somebody like me says something. So I'm gonna play you a series of very short clips of president Obama in the opening moments of his weekly address, his video address. See if you notice anything.

Buddy, hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody.

Hi, everybody. Everybody. Hi, Those are all different. And every single video, he does the same thing. Hey, everybody.

He just does, like, little head bob backwards. Right? It's just like this little tick. Right? We all do it, by the way.

We all do it when we talk. Have different ones. By the way, I noticed when I talk to my brother, we have the same ones. That's like a little weird, but we like them not surprising either. But we all have like these specific mannerisms that we have.

And we thought it'd be really cool if we could learn those. And for people like Obama and CEOs and prime ministers and kings and queens, there's a lot of footage of them. It's not hard to find. And so here's an example of that. On the top is an example of an authentic Obama video.

And what I'm measuring here, a horizontal axis is time and vertical axis is 2 things: head rotation, up and down, like this and then whether he's smiling or frowning, in orange. So those are the 2 plots. What do you notice? They're correlated. So Obama has this other tick.

When he smiles, he tilts his head up a little bit. And when he frowns, he tilts his head down a little bit, which I love. And you see it. You can see it in his videos. Now, think about the lip sync deepfake that I showed you from Jordan Peele a few minutes ago.

What's the difference there? The mouth is doing what the fake is telling it to do. The head is doing what's in the original. I've created a chimera. Right?

These things don't know about each other and they're decorrelated. Something's wrong. This is not Obama. This is not a pattern consistent with him. And so we can learn these distinct patterns for individuals by making measurements of how the eyes move, how the mouth moves, all these different, what are called action units.

This young man right here worked on this last summer, I think. Right? Charlie worked on this last summer for us. And then we can cluster them. So we've gotten these manners from Obama, O'Rourke, Booker, Biden, you know, go down the list here.

And what's important here is that all the Obama videos are here. Everybody else is somewhere else. And then interestingly, the deepfakes are here. This is not actually a deepfake Obama detector. This is an Obama detector.

Right? I don't care if you're a deepfake Obama or Booker or Harris or Buttigieg, you're not Obama. And so we learn what's called the 1 class model that just says, this is the person there is Obama. And it's not face biometrics and it's not voice biometrics, it's mannerisms, how you talk and how you express yourself. And we've actually been able to also introduce hand motions into this as well, because people talk with their hands in very distinct ways.

Okay. So, since I have a couple minutes no, I'm going to skip this. I'm going skip this. This is weird and don't do it. And that's the short answer.

Don't ask Grock. Don't here, I'm going to condense this into 5 seconds. Don't ask Grock if an image or a video is real or not. It doesn't know. It's unbelievably stupid.

In fact, don't use Grock at all. Okay. So, I want to just finish with responsibility. I get this question a lot, which is we're entering a weird time where this technology is being weaponized us. You can't deny that.

Who's responsible? And the answer is everybody is. So, let's start with the people in the room. We are responsible. Stop, for the love of God, getting your news and information from social media.

This is not what it was designed for. It's not good at it, and you're being lied to on a regular basis. This is not a place to spend your time. In fact, just get off of social media. It's so bad for you.

Overwhelming evidence that it's bad for your mental health and physical health, and I swear to God, it drops your IQ 20 points. So we have a responsibility. Fake news works because we click on it and we share it. We are part of the problem whether we want to admit it or not. So that's on us.

Right? Now, working upstream, the social media platforms, the Facebooks, the TikToks, the YouTubes, the Instagrams, all of those platforms have taken an incredibly cavalier attitude to online safety for the last 25 years and profited handsomely for it. They have a responsibility to make their platforms safer. The AI companies that are commercializing monetizing these tools are not doing enough to keep them safe. They are allowing their tools to create nonconsensual intimate imagery.

They're allowing it to create child sexual abuse material. They're allowing it to create fake explosions in Tehran. They're allowing it to create fake videos of presidents talking and they know that they're doing it and they're profiting from it and they're not doing enough. That's their job. And then, of course, upstream is regulators.

Our regulators have to do better. Our congressmen, our congresswomen, our presidents, our leaders have to do better. I'm not this is not a partisan statement. I'm not particularly hopeful where we are in this country right now with leadership. This is a White House that has made clear they have no interest in regulating AI, but there is some leadership coming out of The UK, coming out of the EU, coming out of Australia.

And I'm hoping that we will eventually absorb some of that, but we need regulation at the highest levels. And here's how I know that, because when it comes to things like cars and airplanes and medicines and foods and pharmaceuticals and everything that we buy physically, we have incredibly high standards for safety. But somehow we've let Silicon Valley off the hook for 25 years. The game's over, right? There is no more online and offline world, world, right?

And it has real consequences, and when we don't regulate Silicon Valley, and we have to do better. Okay. I'm going to stop there, and I'm happy to take some questions. Thanks, everybody.

Speaker 6

Questions. That was fantastic. So my question is, how easy would it be to fix the images to overcome all of the mean, so it's like warfare

Speaker 2

I got it. Yeah. Great question. It's absolutely 100% right question, which I've just told you a bunch of stuff we do. Isn't the adversary going to make it better?

Okay. 2 things. First of all, I haven't told you everything we do. I'm not stupid. Okay.

Good. So we have holdbacks. We do things that we don't talk about. But now let's get to your question. It's a 2 part answer.

1 is that it's not actually that easy because this is 3 d physics and 3 d geometry, and these things are inherently 2 d. When it's rendering these things, it doesn't know about the 3 d world. So it would have to reason about the 3 d world and put all of those things in, and that's very, very hard to do, number 1. Number 2, there's actually not a lot of incentive. And here's why.

Because say what you will about OpenAI and Sam Altman and Midjourney and Gemini, they're not trying to defeat me. They're not actually my adversary. This isn't like malware and ransomware and spam, where they really are adversarial. They're just trying to make pretty pictures. And if the physics are wrong and the geometry is wrong, they don't actually care.

Right? As long as your brain doesn't care, they don't care. So, there's not a lot of incentive. And even if there was, it's incredibly difficult because they have to do full blown 3 d rendering. Is it possible that we'll get there?

Sure. And that's why we have holdbacks, and that's why we do a lot of things that we don't talk about. But then, you got to get everything right. You got to get every single pixel and the statistics and the I mean, now it's getting hard. But is it possible?

Sure. But it's the I did a panel the other day where somebody was telling me, I don't understand why you work on these problems. You're you're going to you're going to get defeated. And I said, well, let me ask you this. Did you lock your front door when you left the house this morning?

He said, yes. And I'm like, well, then shut the hell up about it. Because people can pick locks, you can batter doors. We do things that give reasonable precautions. Yeah?

Have 1 follow-up. Please.

Speaker 6

So, do you imagine a world in which I could put on a pair of glasses that will immediately tell me whether what I'm looking at is real or fake?

Speaker 2

No. And here's why. Well, not in my lifetime because it's we can't operate at that scale. If you think about the billions and billions and billions of uploads to do that, I mean, I forget about the glasses, even a plug in on your browser, the scale at which that has to operate, the computational demands and the accuracy that you would demand is so unbelievably high. So I don't see this yet coming down to consumers level in the near future, which is why we work with AP and Reuters, all the major news outlets, because the way we should get information is not to become investigative reporters.

We are not capable of doing that. I know a lot of really, really smart investigative reporters. We can't do it. So we rely on people who do that job really well, who talk to me. I think that's the mechanism for getting the truth.

I'll let you pick who there's a couple there's 1 here and 1 here. Thank you very much. That was deeply interesting. I thought you were going say disturbing, but okay.

Speaker 7

Kind of bouncing off that question and off of your answer, this ultimately sounds kind of like a problem. I'm drawing an internal parallel to what's going on in Iran with attributable drones versus high value systems. It seems like a very similar problem where the creation of deepfakes is easy, cheap, fast, reliable, and the detection of deepfakes is costly, expensive, and difficult. So, yeah, we can make sure that videos of Obama and Trump and Buttigieg are real. Yeah.

But if someone uploads a video of me saying something

Speaker 2

Yeah.

Speaker 7

On Twitter, no one's gonna care.

Speaker 2

Yeah. You're screwed. People who know. Yeah. Yeah.

Right? Yeah.

Speaker 7

Yeah. So is there a solution, do you think, in the near future to to to this attritable warfare? Yeah. Are we just screwed?

Speaker 2

We're we're pretty screwed. But but let me let me try to give you a little bit more hope. So first of all, you're a 100% right. Miss and disinformation are cheap, and reliable information is expensive. And that's the reality of our world.

Right? And that's why social media is littered with false information, plus you can make a lot of money by lying online and that's what people are doing. So now let's get to your question. So everything I've talked to you, we call passive forensics or reactive. We wait.

So I wake up every morning, I've got a flood of email from every major news outlet, and we're just like, alright, what the hell is going on? Is this a real drone? Is this an attack? What's and then but we're all responding after the fact. Right?

And to the question earlier is can that work at scale? And the answer is not really. Right? We've got to pick our battles. So, there's a whole another effort that's called active forensics.

And the way this works is there's an effort called the C2PA, the Coalition for Content Providence and Authenticity. And the way it works is that this isn't true now. There's a couple of cameras that do this. But the idea would be when you pick up this phone to record police misconduct, human rights violation, a drone attack, whatever, this device will authenticate for you. So, on chip, cryptographically signed, it will say, okay, it is March 13 at 01:45 p.

M. In Berkeley, California, maybe even my identity, if I want to give that up. Here's what has been recorded. I'm going to cryptographically sign all that. Maybe put that on a blockchain, on a centralized ledger, so nobody can manipulate it.

And then when I share this with the world or with a news outlet or with law enforcement, it can authenticate it. That will work at scale. And if it really does get deployed to the tune of billions of devices, then when you don't have that signature, you're like, all right, I don't trust this. So I think that could work, but it requires a phenomenal infrastructure. Now the good news is there's some effort.

So Leica has a camera, Sony has a camera, all high end cameras, not the not these things that are in our pocket. So if Apple and Samsung overnight decided to do it, anybody recording anything, we'd be able to authenticate very, very quickly. Not a lot of financial incentives for them to do it, by the way. So that's sort of where the rub is, yes. But I think that there's at least a technology that I could envision that could work at, to your question, at scale.

But this this standard has been around for about 5, 6 years now, and the penetration is very, very low. But I think as we get more desperate and more in need of it, I think we'll start to see an uptick.

Speaker 8

Yeah. So in light of the development of indistinguishable AI videos, what is 1 thing, 1 big change you think is going to happen that you don't that you think people don't expect?

Speaker 2

Yeah. I think the real time I mean, think most people, you know, know when they go online to be a little cautious, even around video. I think people are not expecting you're going to get a FaceTime call from what looks like your parents and it's going to be a scam. I think people aren't ready for that. So I think that we have to you know, this is like a whole new level of phishing scams.

It's not going to be a text message saying this is your boss, your mom, your dad, whatever. It's going to be a FaceTime call. And it's going to sound like and look like your loved 1 and you're not going to know. And I think people aren't ready for that. By the way, I have a really simple analog solution to My wife and I have code words.

Somebody calls and something is out of the ordinary, what's the code word? And that's not hypothetical. This happened to me. I was working a really sensitive case, legal case with lawyer. Last year.

He got a phone call from my phone number in my voice talking about the case, and about 3 minutes into the call, he got suspicious, hung up, called me back, and it wasn't me. So this is happening. So now I think people are not ready for that, that real time, the call at 2 in the morning, mom, dad had been in an accident, got to send money. This is happening now. People aren't ready for that.

Speaker 8

In terms of if I may ask another In terms of like structural things, like and this is quite radical, for example, non mainstream media goes bankrupt because people can only trust the biggest 1. Do you expect 1 change structurally?

Speaker 2

Yeah. First of all, I think journalism is in trouble again. You saw what happened in the Washington Post 2 weeks ago, 300 unbelievably talented reporters got eviscerated. I have mixed feelings about this. On the 1 hand, I grew up at a time when there were 3 news channels, ABC, CBS and NBC.

And then the Internet came and CNN and 20 fourseven, and the idea was that more information is better. It turns out that's actually not quite true. I think we were probably a better informed public when we had 3 newscasts. I'm not saying we should necessarily go back to that, but there's a lot of noise out there. And I'm not sure that more is better in this case.

I would rather have less, but higher quality. But I am worried about consolidation. You're seeing tech oligarchs, multi multi billionaire getting on trillionaires, swooping in and just, you know, buying everything up. That's not going to be good for us. You're too young to know this, but back in the day, news was not meant to be profitable.

It was the price that the companies paid for the airwaves. I actually think the death of journalism, or that's a strong word, but what has hurt journalism was CNN, when they said we can monetize news 20 fourseven. I don't think news should be monetized. It's here for the public good. And I think this monetization of news and information is dangerous.

But, you know, I'm also not naive and I realize this is the way the world works.

Speaker 9

Yeah. You were talking about social media and being very concerned about anything you see there or digest from there. What about regular media? How accurate is stuff that we're seeing on networks and

Speaker 2

Good. I think that's the right question to ask. So here's what I can tell you. Everybody makes mistakes. Mainstream media makes mistakes.

The New York Times, The Post, everybody makes mistakes. But here's the difference. First of all, they're trying to get it right and you can't say that about Elon Musk and social media. People aren't necessarily trying and there's no consequence for getting it wrong every single day. So I have much, much more confidence in what I read in the large networks than I do what I see on social media because first of all, they have standards, they have ethics, they have consequences.

They have an unbelievably smart people who work incredibly hard every day to bring you reliable information. Do they get it right a 100% of the time? Of course not. Alright. So, what do you do?

You don't just pick 1 newspaper. You read 3 of them and you wait. You don't need to get your news in the first 30 seconds of something happening for God's sake. This isn't a race. We're not sprinting.

Take your time. And by the way, you got to read past the first paragraph. This is for the young people in the audience. You know, TikTok is not going to give you news about what's happening in Gaza or in Iran. You got to like you got to dig in.

This is really complicated and it's hard. And you got to reserve judgment. You got to keep your biases aside. But do I think they do better than social media? A 100 times better.

A 100 times better. I would much rather get my information from BBC, NPR, New York Times, Washington Post, Wall Street Journal than anywhere else. I feel like I'm definitely better informed about what is going on in the world. She's a journalist, by the way, just so you know.

Speaker 10

I am. I am. Hi, Hani.

Speaker 2

Thank you

Speaker 10

for And being thank you for this talk. I have a question about Slopaganda being posted by our current administration.

Speaker 2

Yeah.

Speaker 10

So the Trump administration posts these AI generated videos, sometimes AI altered photos Yeah. Of protesters. What impact on the conscious consciousness of the of the American people do these AI generated videos and Yeah. Slopaganda institutional shitposting?

Speaker 2

Like Yeah.

Speaker 10

What impact does that have on our consciousness?

Speaker 2

I hadn't heard slopaganda. I I like that. The shitposting's pretty good too. Okay. Couple of things.

1 is misinformation, you don't have to get people to believe your lies. That's actually not what Russian state sponsored propaganda does necessarily. They just create noise. They create chaos. They create uncertainty.

And, that's actually relatively easy to do. You don't I don't have to believe them. Just create noise. Right? And then, the signal gets buried.

It's a really effective strategy, in fact, number 1. Number 2, what you are referring to is that this White House, and not just the Oval Office, but many of its agencies, is routinely posting fake images, fake videos. Some of them is clearly propaganda. Some of it is less obviously fake, which is disturbing, particularly coming out of Minneapolis. I think the problem with that there's a couple of problems with that.

First of all, I think it just demeans the office of of of the of the White House, and it's it's beneath us. And I think that's a dangerous precedent to set, number 1. Number 2, I think the White House is not thinking through this very carefully, because there's going to come a day where they want they're going post a video and they're going to want us to believe it. And why should I trust them if if half the things they posted fake and half of them are real? What they're eroding our trust in this government, which is they may not care about today, but there's going come a day where they care about this.

And I think that's incredibly dangerous. So, I am bothered by the White House and the various agencies posting this, and they are unapologetic about it, by the way. They see no problem with this. They think it's funny. They think this means.

But I think it's an unbelievably dangerous precedent. And so I think the thing that you have been seeing is not just an erosion in the visual record, but it's also an erosion in the trust in institutions. What this administration has done, it has demonized people like you, really smart reporters, institutions, academics, scientists, and institutions. And that's incredibly dangerous combination, because we are the people, the journalists and the scientists and the academics who are trying to figure this out. We are the 1 with the skills.

When you erode trust in what they say, you simply dismiss us. And now, what are we left with? We're street gangs at this point, right? We're clans, right? There's no more shared reality.

There's no more trust. There's no more truth. And I think that's incredibly dangerous for a stable democracy. He's been really patient over here raising his hand.

Speaker 11

Alright, Kinky. It's very interesting.

Speaker 12

That's 1 word for it. All the prevention that you mentioned is more on the consumer side. And what I learned is that prevention and consumer side is 2 times or 3 times harder than on the supply side. Right? What type of if you can regulate the supply side, what

Speaker 2

type Yeah. Of Good. First of all, it's not 2 or 3 times. It's 1000 times harder. And the reason why it's 1000 times harder, there's 8,000,000,000 people in the world.

And there's what? A couple of dozen AI companies. Right? So, I I don't disagree with you. I'm all for going up the chain and where the bottleneck is.

So, when it comes to for let me give you an example of that, to combating child sexual abuse material and non consensual intimate imagery and that type of individual abuse. I don't want to go after the individual. You can't arrest your way out of this. I mean, would like to if I can, but you can't. And you can't even go after the platforms.

Right? You can't go after X and Facebook, and even Apple said these are too big to fail. You know where you go? To 4 companies, Visa, MasterCard, American Express and PayPal. These things are getting monetized.

There are 4 financial institutions that if I cut off your financial your ability to monetize, it's over, right? And then the other 1 is ads. There's 2 ways you monetize on the Internet, credit cards and ads. So ads, easy. It's Google.

Google controls like 80% of the market. Get Google to stop putting ads on these companies and get 4 financial companies to cut them off. And I'll tell you why I know this works, because it has. Pornhub, a couple of years ago, 1 of the biggest adult content generators or hosters, was hosting child sexual abuse. They were hosting non consensual intimated imagery.

Nicholas Kristof discovered this magically somehow I don't know where he'd been, but okay and wrote an op ed about it. And Visa and he called it out. He said Visa, Mastercard this is the right call. Visa, Mastercard, American Express. Overnight, they terminated their relationship with Pornhub.

Pornhub had to sell the company, and that company had to completely change the way they do business. So, light switch. Right? You go upstream, you find the bottleneck. You want to go even up higher?

Cloudflare, Google, Amazon, infrastructure, Internet service providers. There's choke points. You can't exist on the Internet without the help of a huge number of infrastructures. You I can cut you up. Now, there's something dangerous here.

Right? If we start deciding, you know, who gets to use these things this is illegal material. I don't think this is particularly controversial. So, I'm with you. Go upstream and just choke off the money.

Everything falls from there. That's the strategy. Oh, sorry, he's going up. Sorry.

Speaker 13

Hi, thank you. I wanted to ask about your strategy of analyzing personal mannerisms as a way of determining what is real and what is fake. A concern that I would think of for that would be it doesn't seem so hard to train a model just to pick up on those mannerisms themselves. And so what separates mannerisms from any other statistical pattern

Speaker 2

Yeah.

Speaker 10

That you're going be able to pick up?

Speaker 2

It's a great question. Here's the difference. Time. Time is what was our advantage. When we measure mannerisms, we measure them over 10 second windows.

That's 300 frames at a normal 30 hertz frame rate. These synthesis engines work 1 to 2 frames at a time. Time is my superpower. You can't synthesize by waiting 10 seconds. Why?

Because you have to deliver it in real time. I have a huge advantage when I'm looking at a Zoom call. My adversary has to synthesize every thirtieth of a second. I can wait 10 seconds and then look to see what you did. I have a huge advantage.

I have a huge advantage. And even if it's offline, doing synthesis where you are holding in memory 300 frames and slot, it's impossible. You just can't do it today computationally. So that's my advantage with the mannerisms this time. We can close.

Speaker 11

Thank you, Doctor. Heng. I was particularly interested when you said that we have seemed to lost truth and reality in the age of AI, but it relates to the philosopher John Butler in his book, Simulacra and Simulation. He argued that in twentieth century, science have already replaced reality. For example, even when we read the BBC News, we don't see the reality.

We see the signs, words, representation of reality. So I think truth has long been lost. Like when we read BBC News, we don't see facts. We see facts rearranged by words and signs. And if 1 can manipulate storytelling and narratives, then 1 can manipulate what fact is.

And so the same event, for example, the strike of Gaza can be reported with very different narratives and have very different effects. So I think AI kind of just makes the situation even worse. But it's not the first time we're ever experiencing the loss of truth and your reality.

Speaker 2

Good. Okay. 2 things. Things. First of all, everything I've described is part of a continuum.

Didn't come out of nowhere. When the printing press was invented, we were pushing propaganda and lies, and we even know when radio came along, did that, and when TV came along. This is part of the continuum. Different, what is radically different, is that if you wanted to push propaganda and lies before the Internet, you needed to have a radio station, a printing press, a publishing company, or a radio a TV station. It was not in the hands of the average person.

And we have taken a mechanism that was in the hands of state sponsored actors and bad actors and given it to 8,000,000,000 people in the world. The threat vector has changed radically. A 15 year old in Macedonia can interfere with our election in The United States with 350,000,000 people, and that wasn't true 10 years ago or 20 years ago. And so that is a radically different threat vector, is who can do this kind of damage, number 1. Number 2 is I fundamentally disagree that there's no truth and no facts.

I agree that there is times where things are confusing and we don't fully understand it. But there are also days when we know exactly what happened, people don't want to believe it. Hundred and 75 people died in a school in Tehran. That is a fact. Right?

A Tomahawk missile blew that school up. That is a fact. Right? We can dispute about why it happened and how it happened and who's responsible, that's different. But there are facts and then there are interpretations of that and consequences to that and how we deal with that and what the root causes of that.

But I simply reject this notion that there is no truth or fact. I think it is hard and think I it takes hard work, and it takes effort, but we can come to to an understanding of what is happening in the world in most situations, I would argue.

Speaker 1

Amazing. I think that's the last question we have time for.

Speaker 2

Okay.

Speaker 6

Thank you so much.

Speaker 2

You. Take care.

Speaker 0

You've been listening to Berkeley Talks, a UC Berkeley news podcast from strategic communications at Berkeley. Follow us wherever you listen to your podcasts. You can find all of our podcast episodes with transcripts and photos on UC Berkeley News at news.berkeley.edu/podcasts.