Interactive databate #buildTheNews29 Apr 2015
For the Times Build the news hackathon we decided to do system that given a video would generate transcription, identify the different speakers, provide summary of main topics and keywords as well as emotional charge of the speaker.
We decided to use the upcoming elections debate as a use case, but this could be generalised to other interview based video content.
And provide a analytics dashboard for the journalist to view a heat map of:
- most viewed
- most commented
- most shared
of the viewers engagement with the piece.
This we felt is also very important. For instance, if you are a journalist writing on that debate looking at this data could give you clues about what your readers are already thinking about it, and make it easier to engage in a conversation them in your next article. As well as other possible use cases where analytics of this kind would be useful.
In terms of the analyses of the video based interview the ambition was that of using existing technology a round natural language processing, and sentiment analysis technology to give an insight into the subtext of an interview.
on the morning of the hackathon I was reading through the New York Times innovation report for some inspiration, when it mentioned that very often interactive project for the news are done as a one off, and that a more sustainable approach would be to create structures where form and content could be separated to make it reusable.
We drew on a number of open source technology, APIs and libraries
These are some of the technologies we used:
- Speech to Text (Youtube Captioning)
- Speaker diarization (Lium - same library used by BBC)
- Sentiment Analysis Topic & Keywords (Monkey Learn )
- Summarization (Text Summarization API )
- Srt → HTML (library)
- Social media Share (library)
- Interactive transcriptions , using popcorn library . from opensource Aljazeera US Election Debate Hyperaudio .
photo credit @MattieTK