Learning Lab Final Project: ATTN-SPAN

Part 1: Introduction


Part 2: Prototype and Development Plan

The Good News: I created a proof of concept prototype of the ATTN-SPAN platform powered by the Metavid project.

The Bad News: Metavid is having a lot of stability issues right now, so you probably won’t be able to use my prototype. I made a screen cast just in case.

Relying on a 3rd party for the most important aspect of an application is a major risk; one that I must mitigate. This brings me to my first batch of design work: the content scraper.

Scraping, Slicing, and Scrubbing C-SPAN

How do you get from a TV channel to a rich video archive and how do you get there automatically? The goal is to convert C-SPAN into a series of overlapping video segments that are identified in terms of state, politician, topic, party, action, and legislative item. Some of this is straightforward and some of it might be impossible, but here’s an overview of the planned nuts and bolts:

  1. DirecTV offers TV content in a format that is easy to record digitally and VLC is a free tool that can do that recording. Combine the two and we can download C-SPAN streams into individual files that are primed and ready for analysis.
  2. Once a video file is in our clutches we can use VLC once again to separate out the video from the Closed Captioning transcript.
  3. Now we have a transcript and a raw video file. Next we register all of this information (in a database) so that we can look it all up later, and then convert the video file in to streaming-friendly formats and store it alongside the original recording.
  4. C-SPAN consistently shows a graphic on the bottom of the screen that says who is talking, their state, their party, and what is being debated. By using a technique called Optical Character Recognition (OCR) we can pull this text out of the video image. Once pulled, we can add that to our database so that we can access all of this information for any moment in the video.
  5. At this point we have most of the information we need, but there is still room for fine tuning. We can use audio levels and the closed captioning transcripts to try to identify moments of inactivity, normal dialogue, and heated dialogue.

These steps are enough to split up and categorize C-SPAN footage into an organized video database, but there are still more ways to flag special moments in the footage. For example, we may want to identify changes in speaker emotion in order to give our algorithms the ability to craft more engaging episodes. This is possible through the work of Affective Computing group at the MIT Media Lab, a group which has developed several tools that perform emotional analysis using facial recognition.

We may also want to identify specific legislative action (e.g. “calling a vote”). This could be accomplished by looking for key words in the transcript (e.g. “call a vote”) and possibly through common patterns in the audio signal (maybe there are identifiable sounds, such as a gavel hitting the table). Both of these concepts require additional research.

Creating a Profile and Constructing an Episode

If video events are the building blocks then viewer interests are the glue. The creation of a personalized episode requires two things: A user account, and a context. The user account provides general information like where you live, what issues you have identified as important, and (if you are willing to connect with Twitter or Facebook) what issues your circles have been discussing lately.

The context comes from time and cyberspace. Every night, after congress closes their gates, your profile is used to create a short, rich video experience designed to contain as much relevant content from that day as possible. At this point you might get an email begging you to watch, or maybe you log in on your own because you are addicted to badges and points and you want as much ATTN-SPAN karma as you can get.

There is another way to access this content though, and that is through the web sites you visit anyway. Imagine if you could read an article about the National Debt on the New York Times (or in a chain email) and actually see quotes from your own senators in the report. What if you could supplement the national report with a video widget that lets you browse what your house members had to say when they controlled the floor during the debt debates.

From a technical perspective this isn’t that far fetched. Truth Goggles, one of my other projects, is a bookmarklet that will analyze the web page you are viewing, fact check it, and rewrite the content to highlight truths and lies. This impossible feat is fairly similar to what I’m proposing here.

Adding Rich Information

Once an episode is pieced together we can look up the information surrounding the video to know who is talking and what they are talking about. What else can be added and how do we get it? Existing APIs offer some good options:

  • Contact Information – Thanks to the Sunlight Labs Congress API it is possible to get the contact information for any member of congress on the fly. Thanks to VOIP services it is possible to create web-based hooks to call those people with the click of a button.
  • Campaign Contributions – The New York Times offers a Campaign Finance API which can help you understand where the person on screen gets his or her money.
  • Voting Records – The New York Times also offers a Congress API that will make it possible to know vote outcomes from related bills as well as information about the active speaker’s voting records.
  • Truth and Lie Identification – My Truth Goggles project can be easily adapted to work with snippets from video transcripts. This will allow ATTN-SPAN to take advantage of fact checking services like PolitiFact and NewsTrust.

This is a good start, but I would also like to show links to related news coverage and create socially driven events based on community sentiment (for instance to track moments that caused people to get upset or happy). This won’t come for free, but it should be accessible given the right interface design.

Part 3: A Note to the Newsies

So that’s the idea and the plan. What’s the value?

It seems plausible that ATTN-SPAN, a system that analyzes primary source footage and pulls out any content that is related to a particular beat could be useful as a reporters tool, but what about your subscribers? ATTN-SPAN can augment an individual article so that it hits everybody close to home. Suddenly one article becomes as effective as two dozen. Moving past text, for larger organizations with a significant amount video footage ATTN-SPAN can be tweaked to use your programming instead of (or in addition to) C-SPAN.

At this point I have to warn you that this is not the first nor will it be the last project to work with C-SPAN. A 2003 demo out of the Media Lab used C-SPAN as one of several sources of information in a platform aimed to provide citizens with Total Government Awareness. Metavid, the platform I used in my initial prototype, already makes C-SPAN more accessible by enabling searches and filters. The list surely goes on.

So why is this a more powerful project? Well, the real goal of ATTN-SPAN isn’t to get more people watching C-SPAN. In fact I tricked you: this project isn’t about government awareness at all. It’s actually part of an effort to make indisputable fact (“blunt reality” and “primary source footage”) a more prominent part of the media experience without requiring additional effort from the audience. Newsrooms do an amazing job of reporting events and providing insight, but for deeper stories there simply isn’t enough time or money to cover everybody’s niche without going beyond the average person’s attention span.

Thus ends my pitch.

The code for both prototypes mentioned in this post can be found on github: ATTN-SPAN and Truth Goggles. Please forgive any dirty hacks. I would be thrilled if anybody wants to offer suggestions or even collaborate. On that note, please get in touch on Twitter @slifty.

, , ,

  • Mark Reginald James

    Dan, other than legislative chambers, what do you think are some examples of newsie raw streams that ATTN-SPAN could process? I can think of TV bulletins news radio from around the world. Skype calls by members of your social network?

    The political aspect is something a big search engine or news website should implement, allowing one to search for or get feeds of personalized editions of raw footage or their transcripts, perhaps automatically determined based on location and habits. It’s interesting because much of the time it will be creating an interesting story where no normal news story exists, not even from local news services.

    It also got me thinking that a news source that offers its readers a regularly-updated activity summary article on a general topic like politics needs to make these custom assemblies rather than generic, in the manor of a custom feed of stand-alone articles.

    • Hey Mark,
        Thanks for the comment and sorry about the delay!  Glad this idea got you thinking!

        To answer your question, there are two kinds of answer: the “imagination” answer which assumes that there is no such thing as a technological or copyright roadblock, and the realistic answer which assumes that we will only be able to process certain kinds of video at this point.  The imagination answer is that any video in any digital format could be included — i.e. youtube videos which may contain footage of rallies or first hand speeches or television stations across the world that might have local coverage or useful commentary on an issue.  Radio is a great point as well, since this could be adapted to be a purely audio application.

        The realistic answer is that for a video feed to be processed by this system in a way that works well, I really do need a transcript and ideally I have a way to pull out that extra metadata.  For substantial feeds, such as network news with 24/7 coverage and regular formats, this could be done through a custom script for each network. For small newsrooms willing to play along, maybe I could get topical tags associated with whatever video they create.  For transcripts there are some great transcription services (hey I worked for one briefly!) which could make it possible for newsrooms to transcribe a few minutes of video for a few bucks.

        The REAL answer is somewhere in between.  If you think about the system in terms of functional components, it is totally viable to provide an API that lets people bypasses the scraping process and add their video to the system.  In other words, if you have a 4 minute video that is transcribed (or maybe I could work with a SpeakerText to let you add your video so long as you cover the minimal costs of transcription), and you already know it is of Barack Obama, and you already know that it is about the health care debate.  There is no reason that you, as a trusted source, couldn’t just add that clip to the system with all of that information.  This means that the system could get access to video that would be impractical to automatically parse (such as footage from small newsrooms who may not have standard formatting like C-SPAN), so long as that newsroom was willing to play along and got enough value to warrant that play.

  • Dan, I’ve been doing this with Virginia legislative video for a few years (here’s a writeup of my process), and if there’s anything from my work that would help you with yours, let me know. (I’m a 2011 News Challenge winner.)

    • Oh that is beautiful — I will surely be getting in touch soon.  Thanks so much for letting me know!