Truth Goggles – Sorry for the Spam

The Value of a Super Villain

Dan — Wed, 25 Jul 2012 16:26:56 +0000

I may have graduated, but I still get very good advice from my mentors. The most recent came from Ethan Zuckerman: “Dan, please try not to get fired in your first month. That would be really embarrassing for everyone.” His delivery reflected a hint of genuine concern.

There are many reasons why he might have said this, but two stand out. For one thing I had just given a presentation about NewsJack, a media manipulation platform that I created from Mozilla’s Hackasaurus with Sasha Costanza Chock. When NewsJack was released it was immediately met with a Cease and Desist from the New York Times (note that The Times is the parent company of The Boston Globe).

It is also possible that he was inspired because I had just confessed on stage that one of my first thoughts when walking into The Globe’s headquarters was “I wonder what it would take to bring down this organization.” I’m betting it was the juxtaposition.

The Backstory

An evil newspaper editor?

During my first few days at the globe I wanted to understand opportunities for innovation as quickly as possible, but to do that I needed to understand their resources and values. It occurred to me that if you want to identify an organization’s most valuable assets but you don’t know where to start, you should just pretend to be a super villain and plot their destruction.

Assuming you’re a competent villain, whatever you end up targeting should be important. Not only that, but the target will reflect your personal passions and expertise. Try the mental exercise yourself and share the results. I dare you.

For example, to take down a newspaper you could…

Open up their paywall (if it exists), steal their content, and make it freely visible to the world without giving them any form of recognition or compensation.
Eliminate their productivity, either by instigating a massive strike or by hiring away all of their employees.
Scare away their advertisers so they lose a significant revenue stream and can no longer pay their bills.
Destroy their infrastructure (printing presses, websites, etc), thus disabling their ability to ship product.
Corrupt their editors and slowly replace key actors with your henchmen so that the paper becomes your mouthpiece.
Buy sharks with laser beams attached to their heads.

A super villain’s master plan needs to be intricate enough to be interesting and difficult enough to be impressive. Blunt ideas like “take down their website” or “steal all their money” are a bit too obvious. It must also be simple enough for a diverse audience to understand. If nobody can figure out what you did, why it was sinister, or how it actually worked then it is hardly going to make headlines. Finally, it can’t be a series of bee stings; the evil needs to be condensed enough that it could fit in a tweet.

The Plan

My evil plan didn’t take long to imagine (given my recent work). If I were evil and wanted to destroy a newspaper I would ruin their brand’s credibility. This could be accomplished in many interesting and convoluted ways, but the “how” isn’t the point, the important question is “why?”

A media product will die miserable and alone unless it differentiates itself from the rest of the Internet. Luckily, newspapers have something that the chaff doesn’t: they have the capacity to create trustworthy information experiences. They are the ones with paid reporters asking the hard hitting questions, they have the editors and the internal fact-checkers, they don’t have an agenda and aren’t trying to manipulate me! right?

You could tie yourself to a bungee cord, close your eyes, and jump off a cliff… or you could read the New York Times.*

Well, maybe. As a reader I don’t know where content comes from or how much journalism went into it. All I have is faith in their brand. I trust that the sources I read are doing their jobs. That faith didn’t come from nowhere. I might have liked what they had to say in the past, or I saw my parents reading their paper, or their brand has a strong reputation. Regardless, I am now far more likely to trust what they have to say than I am to trust, for example, what my crazy friends like to read.

Just to drive this home: given the way content is presented today I could read the exact same article on the front page of the New York Times, Fox News, or the Huffington Post and my decision to trust it would be more strongly influenced by my opinions of the publisher than by the content itself.

To drive it home a different way: hijacking a newspaper’s credibility is as simple as imitating their brand.

Save the Day

The wheels are turning and it is already out of my control! IP lawyers are powerless compared to the forces of the anonymous web! But seriously, brand is a really fragile way to differentiate on the Internet. So what’s a newspaper to do?

Take a page from Apple and redefine the way people consume content. Train your readers to expect a certain experience not just from your website, but from every source of news. Make sure that experience is either expensive or impossible for alternative sources to replicate. Newspapers need to make their readers expect proof of everything. People should feel uncomfortable trusting information without explicit, functional credibility.

Newspapers have journalists doing research, checking facts, and taking names. They have multiple people and multiple systems touching every piece of content before it gets published, so why does the product usually end up being a bunch of words with prose-based evidence?

News organizations need to make the world hold information to their standards.

Like I said earlier, it makes sense that this particular plot and solution are coming from me. I dedicated my thesis to credibility layers — interfaces that lead to credible information experiences based on more than faith and trust. There are many paths to differentiation. Some are evil, some are entertaining, and some could even change the world.

* Drawing courtesy of Lyla Duey

Truth Goggles Study Results

Dan — Thu, 07 Jun 2012 14:21:35 +0000

Last month, I ran a user study to test the effectiveness of Truth Goggles (a credibility layer/B.S. detector for the Internet). The tool attempts to remind users when it’s important to think more carefully. If you’re curious, you can check out the demo page.

Now that the study has officially concluded, the numbers have been crunched, and the thesis has been submitted, I want to share what I learned from the resulting data and feedback.

I’ll warn you upfront: All conclusions drawn here should be taken with a grain of salt. The participants were not a random sample of the Internet, and as such, the results don’t reflect the general population. I think they are quite exciting nevertheless!

The Questions

There are many ways that a tool like Truth Goggles could be considered successful. A bare minimum is that users should prefer it to the non-augmented consumption experience (you know, the kind you have normally). Another measure of success might reflect the number of claims that users explored when the tool was enabled, or maybe the quality of that exploration.

These questions are interesting, but they all require different study designs. Here is what I considered when putting the study together.

Did people use Truth Goggles? It is difficult to accurately measure the use of a tool when working with a “captive” audience (i.e., study participants). Truth Goggles does not yet contain enough facts to be regularly useful in the real world, so the study had to simulate a reading experience and present articles with known fact-checked claims, so this question wasn’t explored too deeply.
Did people enjoy using Truth Goggles? This study was run online, so the only way to get direct feedback was by asking users directly. I also gave participants a chance to choose to enable or disable Truth Goggles for the final few articles after the tool has been completely exposed to them. Presumably, if they hated the interface they would have disabled it.
Were users exposed to more fact checks? In order to compare a change we must have a baseline and the ability to measure exposure. This study wasn’t quite comprehensive enough to address this directly, although I did keep track of how often users chose to view “More” information about a fact check (which took them directly to PolitiFact’s site).
Did users engage with the fact checks? To understand levels of engagement, the tool would need to keep track of what content was actually read and comprehended, as opposed to what content was simply rendered on a screen. Once again, tracking the use of the “More” button was a good indication of engagement.
How well did Truth Goggles enable critical thinking? Although critical thinking doesn’t require a change of opinion, it seems reasonable to believe that a change of opinion does indicate thought. By measuring the drift in beliefs about fact-checked claims after using Truth Goggles, it was possible to better understand the interface’s ability to facilitate updated beliefs.
Did Truth Goggles affect levels of trust in consumption experiences? This question is deeply relevant, but given the format of this study I did not attempt to measure trust in a robust way. I did give users an opportunity to comment on how they felt Truth Goggles affected their trust.

The final study design reflected aspects of each of these questions; however, “Did people enjoy using Truth Goggles,” “Did users engage with fact checks,” and “How well did Truth Goggles enable critical thinking” ended up getting the most focus.

The Preparation

Before the study began I selected, tagged, and pre-processed 10 political articles to create a pool of content that I knew would have fact-checked claims in them. For the most part, this involved going through PolitiFact, Googling the phrases, and hoping that some good articles would show up. Most of what I used was published in 2012 and came from a variety of sources with varying degrees of credibility.

I also added some tracking features to Truth Goggles in order to better understand what was clicked and explored. This meant I would know when users viewed a fact check or when they would interact with other parts of the interface. Finally, I had to create the actual study website, which added some randomization and guided participants through the process.

The Participants

The study was conducted online over the course of five days. Participants were recruited through email and Twitter. When I did the initial number crunching there were a total of 219 participants, 88 of whom completed the entire process. These numbers increased to 478 and 227, respectively, before the study officially concluded. This analysis reflects my thesis work, and only considers results from the original 88 participants who completed the entire study.

Unfortunately, the participant pool contained a disproportionate number of friends, individuals familiar with the concept of Truth Goggles, and professionals already aware of the challenges surrounding media literacy. The vast majority (about 90%) of those who actually completed the process were strong and moderate liberals. All of these biases were anticipated, but nevertheless they significantly limit the potential impact of the study.

The Process

From start to finish, the study took each participant around 20 or 30 minutes. After being shown the initial instructions, people were asked to rate 12 claims on a truth scale from 1 to 5. They only had 10 seconds per claim to answer, so this was really trying to get at a person’s gut reaction based on the information sitting in his or her head.

After the survey was completed, the treatments began. Everyone was shown a series of 10 articles which contained the previously rated claims. The first two articles were always shown with Truth Goggles disabled. The next six were presented with different Truth Goggles interfaces to help call out fact-checked phrases. For the final two articles, participants were able to choose one of the four interfaces (including “None”).

Once the article reading ended, participants were asked to re-rate the claims from the beginning of the study. At this point, they had been exposed to explanations and context for most of them, so this time they were supposedly providing “informed” answers as opposed to gut feelings. After the second round of ratings, the study wrapped up with a short exit survey, where participants had a chance to yell at me in the comments and tell me what they thought about the experience.

The Irony

Before going any further, I want to be clear that Truth Goggles does not assume that fact-checking services are correct. To the contrary, the hope is that users will question fact checks just as much as they would question any source, and consider all evidence with scrutiny. This philosophy is problematic for evaluation, because it is difficult to measure belief accuracy without considering something to be “true.”

Lacking a better metric, the source verdicts (i.e., PolitiFact’s ratings) were used as grounding for accuracy for this analysis. This means that from an evaluation perspective, I considered interfaces to be more effective if users ended up with beliefs in line with PolitiFact’s verdicts. Since belief dissemination is not the goal of Truth Goggles, the system must eventually use more sources (e.g., Factcheck.org and Snopes) to keep users on their toes.

The Results

In my thesis, I slice and dice the study data in more ways than I care to think about. But this isn’t my thesis, so I’m going to spare everyone a lot of pain and stick to the high-level observations.

Truth Goggles increased accuracy and decreased polarization. Participants changed their beliefs about the fact-checked claims after reading the articles, regardless of whether or not a credibility layer was rendered. But without Truth Goggles those updates resulted in more polarization and less accuracy. In particular, when Truth Goggles was disabled people tended to become overly trusting of claims that appeared in articles. With Truth Goggles active, however, beliefs became nuanced and more accurate.
When using credibility layers, people became less incorrectly skeptical but they remained just as incorrectly trusting. Truth Goggles was able to help skeptics become more trusting when trust was appropriate, but was not as effective at convincing false believers that they should become more doubtful. This means that participants who were not already overly trusting of a claim would tend to update their beliefs in a way that resulted in more accuracy when using a credibility layer. If you incorrectly believed a claim, however, you weren’t likely to correct yourself.
Normal reading caused people to become more incorrectly trusting but they remained just as incorrectly skeptical. Without a credibility layer, participants who were not already overly distrusting of a claim would tend to overly trust that claim after reading its related article. This means that if someone was highly skeptical of a claim before reading the article, they wouldn’t change their minds. But if they were more neutral or already trusting, then seeing the claim in an article would cause them to believe it more strongly.
Almost everyone enabled Truth Goggles when given a choice. Only two out of the 88 participants who completed the study chose to view their final articles without using some variation of Truth Goggles. The vast majority of participants (70%) selected “highlight mode,” the least obtrusive of the three possible interfaces. These numbers unfortunately don’t mean much because it is entirely possible that participants simply wanted to play with the tool. They could be far worse, though.
There were virtually no significant differences between the three interface types. It was no surprise that “Highlight Mode” was the most popular, since it did nothing but highlight text and didn’t bully people into clicking things. Less anticipated was the fact that “Safe Mode” and “Goggles Mode,” which force exploration, did not outperform Highlight Mode. I suspect that this was a study artifact — forced interaction was unnecessary during the study because the novelty of Truth Goggles meant people might be curious enough to click regardless of the interface — but it was interesting nonetheless.

The short version of these results is that Truth Goggles helped combat misinformation, but there is still plenty of room for improvement. There also clearly needs to be a more comprehensive, longer-term user study.

For me, the big surprise was that that people were so prone to trusting content just because it appeared in an article or opinion piece. I was absolutely thrilled to see that effect get completely squelched through credibility layers. The results from the exit survey are also incredibly exciting, but that is a post for another day.

Achievement Unlocked: Thesis

Dan — Mon, 21 May 2012 06:00:21 +0000

Remind me to never do that again.

On Friday I officially handed in my thesis, titled “Truth Goggles: Automatic Incorporation of Context and Primary Source for a Critical Media Experience.” For those who don’t know already, it was about an automated bullshit detector for the Internet / an interface to help people think carefully called Truth Goggles. The final version weighed in at a nice round 145 pages.

I’ll let the dust settle before putting this monstrosity online. I also want to write some more condensed posts about the interesting parts because I know nobody is ever going to read the damn thing. Those will come later. For now I give you a few bullet points.

The Gist

Here’s the basic story of the document:

I learned about the millions and millions of reasons why my idea could never work.
Not having a strong sense of self preservation I kept on going anyway and tried to create “Truth Goggles!”
I worked really hard to design and implement an interface that people could value even if they didn’t trust the sources behind the tool.
I ran a user study and learned that the interfaces worked pretty well when it came to protecting people from misinformation, and that almost everyone who took the study really wants to be able to trust information again.

The Gems

I’ll give a quick preview of some lessons learned. Each of these points deserves a post of its own but since this isn’t my thesis I’m going to just put out my own observations and thoughts. The posts later will probably be more “scientific” and “explanatory” (i.e. “boring” and “less quotable”).

When people consume information they are struggling hard to maintain their identity. That’s all there is to it. There is plenty of evidence that people consume information with ideological motivations. Those motivations often cause them to accept or reject information based on how well it aligns with what they already believe. I have a theory that if you could just remind someone that there’s nothing to fear — that you aren’t trying to change who they are — you will suddenly be able to actually communicate with them.
Trying to tell people what to think is a losing battle. When the first round of press for Truth Goggles came out back in 2011 I paid attention to every single comment on every single report about the idea I could find. Lots of people liked it, but a lot of people were instantly dismissive due to concerns about bias. I heard their point, agreed with it, and realized what journalists saw ages ago: there is no way to create a universally respected system that also tells people what to think. I changed course and settled for a system that would remind people when to think instead. I think that is a better mission anyway.
Credibility breeds respect, and respect breeds open minds. Several participants in the Truth Goggles user study commented that having a credibility layer made them more willing to consider perspectives and messages that they might have normally ignored completely. Think about that for a second. It makes sense, right? It is much easier to respect what a person is saying if you can trust them. Usually “respect” and “trust” are like “chicken” and “egg”, but if you’re using something like Truth Goggles it is possible to develop trust and let the respect follow if it ends up being deserved.

This entire experience has given me a lot of hope about information online and the people who consume it. I’ve said before that credibility was the future of journalism and I’m half tempted to expand that statement to say that credibility could save the world. I’ll probably need to run a few more tests though.

As for the next steps for Truth Goggles, that is to be determined! I’m going to at least keep exploring some of the processes and technologies behind phrase detection, but once I graduate and start my fellowship at the Boston Globe in June I’ll need an explicit way to keep it alive. Stay tuned.

Look Ma, NPR!

Dan — Mon, 12 Dec 2011 03:37:54 +0000

Three weeks ago I went to a happy hour organized by the Neiman Lab, I mentioned my thesis project, Andrew Phelps said “that sounds cool, can I write about it?” and I said “sure why not!” I assumed that the post would get about as much traction as professional blog posts usually get: a few hundred eyeballs and some useful feedback.

After the article was pushed it started getting twitter attention. Soon afterwards NPR, CBC, and The Register contacted me. I ended up with a two-minute piece on Weekend Edition, a longer interview on Day 6, a surprisingly balanced and long piece on TechCrunch, and the official title of Boffin by the crazy Brits. This was unexpected.

Trust Me: Credibility is the Future of Journalism

Dan — Sun, 11 Dec 2011 22:32:54 +0000

My colleague Matt Stempeck said it best: “Dan, I know that your life has been a tornado wrapped in a hurricane wrapped up in a whole box of tsunamis this week, but you really need to start wearing pants to work.”

It turns out only part of that quote is accurate, but you’ll never know which one for sure! This is why, before I can graduate from MIT, I have to create an automated bullshit detector. The basic premise is that we, as readers, are inherently lazy. It isn’t just that we’ll believe almost anything — remember that time in 1938 when we believed aliens were invading the planet just because someone on the radio said so? Yeah. That happened. The real problem is that we’ll often believe what we want to believe (or disbelieve what we don’t want to believe).

It’s hard to blame us. Just look at the amount of information flying around every which way. Who has time to think carefully about everything? Not me, that’s who’nt. This is why I’m working on a tool called Truth Goggles that will help hone our critical abilities; one that will help us identify pieces of information that are worth inspecting a little bit more closely before deciding how it fits into our world views.

Thesis Goggles

When I wrote “before I can graduate from MIT” earlier in this post I wasn’t lying; I have decided to pursue Truth Goggles for my thesis. I’m definitely not the first person to explore this problem space but there is a lot of room to contribute. New technology has opened up new possibilities, needs have become clearer, and there is a wide variety of possible solutions and unanswered questions just sitting around waiting to be explored.

In November I presented the idea to the Media Lab community using the following slides:

Crit Day Presentation (Truth Goggles)

View more presentations from Daniel Schultz

The feedback I got was mixed, but what can you expect from a day called “Crit Day” which is short for “Critically Injure Pride, Hopes, and Dreams of Graduating Day.” Here were the main questions asked:

This doesn’t seem like it will scale considering Politifact only has a few thousand fact checked claims. Why aren’t you using the crowd to fact check?

My time at MIT will be spent focusing on the interface and user interaction rather than the generation and aggregation of source information. There are enough difficult questions surrounding the interaction layer. I don’t think it is worth complicating things further by trying to create a crowd-based journalism platform (which is essentially what crowd sourced fact checking amounts to).

Isn’t this just a mashup of technologies and data sets? How is what you are doing novel?

It’s true that I’m not inventing new algorithms. I’m applying existing algorithms in novel ways. Credibility layers aren’t robust right now, and they come with their own sets of interesting questions in terms of user experience and system design. My contribution will be to frame those questions, answer some of them, create a prototype, and test that prototype. This won’t be as trivial as just throwing more information on a screen and calling it a day, the interface has to be designed with care.

Do you expect to incorporate primary source data?

My initial prototype probably won’t pull from sources other than Politifact and other fact checking services, but I will definitely be thinking about ways to use other sources of data. Primary source content will eventually help with information scalability since raw footage and raw data could help computers find potentially dubious claims (and help readers make determinations about those claims).

Bullshit, This is Clearly Science Fiction

There are a lot of hard questions lurking behind corners here. In fact, most of them aren’t even trying to hide; they’re just sitting obnoxiously in the middle of the room. Some are technical, some are philosophical, but all of them need to be addressed intelligently for something like Truth Goggles to actually have a chance of working. I’ll rattle off a few of them.

Who determines the truth? Journalists? Experts? Crowds? Individuals? Algorithms?
Sometimes there is a right answer and sometimes there is room for debate. Can you tell which is which? How do you reflect the difference?
How does the tool account for bias in sources?
How does the tool account for bias in users?
Will the system actually know enough to be regularly useful?
This could easily just make consumers more lazy, how do you prevent that?
What happens when the tool is wrong?
How will this change the way people produce content?
Where do Journalists fit into the picture?

As I’ve pondered these questions I’ve come to the following absolute conclusion: Credibility layers need to empower critical ability. I’ve also decided that it’s OK for the system to make mistakes but it is never allowed to lie. This means the interface should be less focused on telling the reader what to think and much more focused on reminding (and helping) the reader to think at times when thinking is most important.

I’ve also come up with a list of weaker claims to throw out there for discussion:

Credibility layers don’t have to speak to everyone, but they need to empower the open minded.
Journalists are our best bet for deep analysis and identifying truth that requires lots of time and effort (e.g. investigation and concept synthesis).
Algorithms are our best bet for identifying contextual evidence (e.g. data, trends, and sources of sound bytes).
Mobs can’t be trusted to decide what is true and false, but they are the key to figuring out what is worth thinking about.

Over the coming months I’ll be cranking out interfaces, prototypes, and eventually some good old fashioned boring academic papers about this idea. In the mean time if you’re interested in Truth Goggles I’ll be trying to post updates as regularly as possible on my blog, on twitter (@slifty), and eventually on the newly registered truthgoggl.es.

Learning Lab Final Project: ATTN-SPAN

Dan — Tue, 09 Aug 2011 11:36:50 +0000

Part 1: Introduction

ATTN-SPAN Intro.

Part 2: Prototype and Development Plan

The Good News: I created a proof of concept prototype of the ATTN-SPAN platform powered by the Metavid project.

The Bad News: Metavid is having a lot of stability issues right now, so you probably won’t be able to use my prototype. I made a screen cast just in case.

Relying on a 3rd party for the most important aspect of an application is a major risk; one that I must mitigate. This brings me to my first batch of design work: the content scraper.

Scraping, Slicing, and Scrubbing C-SPAN

How do you get from a TV channel to a rich video archive and how do you get there automatically? The goal is to convert C-SPAN into a series of overlapping video segments that are identified in terms of state, politician, topic, party, action, and legislative item. Some of this is straightforward and some of it might be impossible, but here’s an overview of the planned nuts and bolts:

DirecTV offers TV content in a format that is easy to record digitally and VLC is a free tool that can do that recording. Combine the two and we can download C-SPAN streams into individual files that are primed and ready for analysis.
Once a video file is in our clutches we can use VLC once again to separate out the video from the Closed Captioning transcript.
Now we have a transcript and a raw video file. Next we register all of this information (in a database) so that we can look it all up later, and then convert the video file in to streaming-friendly formats and store it alongside the original recording.
C-SPAN consistently shows a graphic on the bottom of the screen that says who is talking, their state, their party, and what is being debated. By using a technique called Optical Character Recognition (OCR) we can pull this text out of the video image. Once pulled, we can add that to our database so that we can access all of this information for any moment in the video.
At this point we have most of the information we need, but there is still room for fine tuning. We can use audio levels and the closed captioning transcripts to try to identify moments of inactivity, normal dialogue, and heated dialogue.

These steps are enough to split up and categorize C-SPAN footage into an organized video database, but there are still more ways to flag special moments in the footage. For example, we may want to identify changes in speaker emotion in order to give our algorithms the ability to craft more engaging episodes. This is possible through the work of Affective Computing group at the MIT Media Lab, a group which has developed several tools that perform emotional analysis using facial recognition.

We may also want to identify specific legislative action (e.g. “calling a vote”). This could be accomplished by looking for key words in the transcript (e.g. “call a vote”) and possibly through common patterns in the audio signal (maybe there are identifiable sounds, such as a gavel hitting the table). Both of these concepts require additional research.

Creating a Profile and Constructing an Episode

If video events are the building blocks then viewer interests are the glue. The creation of a personalized episode requires two things: A user account, and a context. The user account provides general information like where you live, what issues you have identified as important, and (if you are willing to connect with Twitter or Facebook) what issues your circles have been discussing lately.

The context comes from time and cyberspace. Every night, after congress closes their gates, your profile is used to create a short, rich video experience designed to contain as much relevant content from that day as possible. At this point you might get an email begging you to watch, or maybe you log in on your own because you are addicted to badges and points and you want as much ATTN-SPAN karma as you can get.

There is another way to access this content though, and that is through the web sites you visit anyway. Imagine if you could read an article about the National Debt on the New York Times (or in a chain email) and actually see quotes from your own senators in the report. What if you could supplement the national report with a video widget that lets you browse what your house members had to say when they controlled the floor during the debt debates.

From a technical perspective this isn’t that far fetched. Truth Goggles, one of my other projects, is a bookmarklet that will analyze the web page you are viewing, fact check it, and rewrite the content to highlight truths and lies. This impossible feat is fairly similar to what I’m proposing here.

Adding Rich Information

Once an episode is pieced together we can look up the information surrounding the video to know who is talking and what they are talking about. What else can be added and how do we get it? Existing APIs offer some good options:

Contact Information – Thanks to the Sunlight Labs Congress API it is possible to get the contact information for any member of congress on the fly. Thanks to VOIP services it is possible to create web-based hooks to call those people with the click of a button.
Campaign Contributions – The New York Times offers a Campaign Finance API which can help you understand where the person on screen gets his or her money.
Voting Records – The New York Times also offers a Congress API that will make it possible to know vote outcomes from related bills as well as information about the active speaker’s voting records.
Truth and Lie Identification – My Truth Goggles project can be easily adapted to work with snippets from video transcripts. This will allow ATTN-SPAN to take advantage of fact checking services like PolitiFact and NewsTrust.

This is a good start, but I would also like to show links to related news coverage and create socially driven events based on community sentiment (for instance to track moments that caused people to get upset or happy). This won’t come for free, but it should be accessible given the right interface design.

Part 3: A Note to the Newsies

So that’s the idea and the plan. What’s the value?

It seems plausible that ATTN-SPAN, a system that analyzes primary source footage and pulls out any content that is related to a particular beat could be useful as a reporters tool, but what about your subscribers? ATTN-SPAN can augment an individual article so that it hits everybody close to home. Suddenly one article becomes as effective as two dozen. Moving past text, for larger organizations with a significant amount video footage ATTN-SPAN can be tweaked to use your programming instead of (or in addition to) C-SPAN.

At this point I have to warn you that this is not the first nor will it be the last project to work with C-SPAN. A 2003 demo out of the Media Lab used C-SPAN as one of several sources of information in a platform aimed to provide citizens with Total Government Awareness. Metavid, the platform I used in my initial prototype, already makes C-SPAN more accessible by enabling searches and filters. The list surely goes on.

So why is this a more powerful project? Well, the real goal of ATTN-SPAN isn’t to get more people watching C-SPAN. In fact I tricked you: this project isn’t about government awareness at all. It’s actually part of an effort to make indisputable fact (“blunt reality” and “primary source footage”) a more prominent part of the media experience without requiring additional effort from the audience. Newsrooms do an amazing job of reporting events and providing insight, but for deeper stories there simply isn’t enough time or money to cover everybody’s niche without going beyond the average person’s attention span.

Thus ends my pitch.

The code for both prototypes mentioned in this post can be found on github: ATTN-SPAN and Truth Goggles. Please forgive any dirty hacks. I would be thrilled if anybody wants to offer suggestions or even collaborate. On that note, please get in touch on Twitter @slifty.

Introducing Truth Goggles

Dan — Mon, 01 Aug 2011 20:21:56 +0000

I’m working on a magical button. This button, when pressed, will tell you (an average person who just wants to know what is happening in this crazy world) what is true and what is false on the web site you are viewing. I have a fair amount of the platform finished already and you can check it out here. Be warned: Right now it only knows one fact. I’m workin’ on it! Reading a news article? Click the button and see how much you can trust it. Reading an email from Uncle Jim saying that the sky is falling? Not so fast Uncle Jim! Oh wait… no nevermind it turns out he’s right this time.

Anyway, I wanted to explain a bit about how this all works, which will in turn help me organize my thoughts on what the next steps are going to be. First, some important terminology:

A Claim is a general statement that is intended to be factual but, in reality, could use a bit of fact-checking by a third party (i.e. it is not trivially true).

A Snippet is an instance of a claim — it is the place where a claim is referenced, for example, a newspaper article or a tweet.

A Verdict is the truth of a claim — this is determined by fact checking organizations who have spent a lot of time looking at the big picture and coming to a logical conclusion.

For instance, if the statement “The U.S. government calculates inflation without adding in the price of food and energy” is a claim, then this would be an example of a snippet (With the context being the entire snippet, and the content being “the government removed food and energy prices from its measure of inflation to hide rising prices”):

While advising his Fox News viewers to talk about inflation at their Thanksgiving dinners, Glenn Beck falsely claimed that the government removed food and energy prices from its measure of inflation to hide rising prices, that a survey showed economists are “worried” about inflation, and that Social Security recipients are not receiving a cost-of-living adjustment because the government “changed the calculation.”

Want to see it in action? ~~Try clicking this link: Apply Truth Goggles~~ (Be warned, it doesn’t work ~~in Internet Explorer~~ at all right now)

Each claim will have many snippets, but each snippet will have one claim. A snippet has context (for instance the entire tweet) and content (the portion of the tweet that is a paraphrase of the claim).

So where does a Claim come from, and how does it get associated with snippets?

The Birth of a Claim

Claims are pulled from fact checking services such as NewsTrust and Politifact. This gets me:

The claim’s text
The claim’s verdict (true, mostly true, mostly false, false, under evaluation)
The URL for more information about the claim’s verdict (If you want to know WHY something is true or false)
Additional information links (sites that provide information about the claim)
Additional context (descriptions, words, tags, etc. which will allow us to understand what the claim concerns)

At this point we have the claim and a bunch of information surrounding the claim. So how does a claim become linked to a snippet? And how do snippets get identified on a web page? These are the difficult questions for this project and they have a few possible solutions.

Creating snippets

First off, a snippet is automatically created for each claim — the automatically generated snippet is simply one that has the claim as both the content and the context. (i.e. the claim “Cows turn purple once every three years at midnight” would have a snippet of the exact same text, so that if anyone directly wrote that snippet it would be properly identified).

That’s great, but there might be snippets that reference the claim without using any of the same words. For instance there might be a paragraph on color changing cows with the sentence “bovines turn violet about three times a decade” or the even more linguistically convoluted “Unlike dogs, cows are known to change their color.” Both of those represent new snippets that are related to the cow claim. How do we link them up?

There are three basic flavors of answer: automatically, by hand, or a mix of the two (semiautomatically). Going by hand is not ideal because that puts a lot of reliance on the end user to be willing to spend a lot of time digging through claims and snippets and connecting the dots. Going automatically would be wonderful, but it kind of requires a computer to be able to understand language — as you might expect, this is not an easy problem. This is where the hybrid becomes attractive. For a given proposed snippet the computer can do its best to identify any claims that it think really might be related. Then it can ask the user to help out if they are willing. Then, if the user says so, the snippet can be associated with the claim and down the line it will know for sure.

Associating snippets to content

Once we have a database of claims and snippets we want to be able to associate them with the web content that a user sends in for analysis. (i.e. when you click your truth goggles button and the server runs through the content looking for snippets so that it can highlight claims. I have two choices here: I can either do the simple and reliable method of looking for perfect matches (i.e. a snippet has to perfectly match the text) or I can try to be a little more clever. In this case I am going to the less clever route, because the whole point of the snippet *creation* process is to cover the cases where there is text that is really close to a snippet.

The next steps are:

Designing and implementing the snippet creation process
Designing an interface to present the verdicts and claim information in more detail
Writing scripts to update the claims with the latest results from fact checking databases

The long term ways that this project could be expanded are:

Incorporating social media to aid in claim and verdict mining
Adding in the ability to view news through the lens of RELATIVE truths rather than just the attempted absolute truth. For instance, what would a superliberal democrat believe? What would a tea party member believe? What do people from Ohio think?

And with that, it’s time to continue hacking away!