Why Facebook Graph Search is a Good Thing for Speech Recognition

Album Cover: Wincing the Night Away

"It's like I'm perched on the handlebars of a blind man's bike."
The Shins / Spilt Needles

Posted on January 29, 2013 8:49 AM in Computers

While working on speech recognition for the first decade of my career, I participated in and observed many usability studies centered around speech recognition and voice user interfaces (or VUIs). Despite the wide range of applications and tasks that were being tested, there was always a common theme: your average Joe or Jane speech recognition user talks to computers and devices the same way he or she types a Google search. When searching for ringtones, users simply said "Rihanna." When attempting to tune to a radio station, users almost always said nothing more than the name or frequency, e.g. "710" or "KIRO."

Anyone who has worked on speech recognition knows that the more sounds you have to work with, the easier it is to recognize the spoken utterances. In fact, if you pay close attention to the hypothesis -- the text utterance generated based on your spoken utterance -- returned by the dictation engines built into Windows or under the hood of Google's iOS app, you'll notice that some of the words actually change while you're speaking. For example, as you're saying something like "where is the stadium from here?" you may see a progression of hypotheses appear such as "were is" followed by "where is the" and so forth. As the underlying speech recognition engine gathers more sounds, it can contextualize the words in such a way that it forms a stronger, and more likely, hypothesis of what you actually said.

Since more data is better, it's easy to see how users' familiarity and comfort with "Google search speak" actually poses a problem for speech recognition. In fact, I feel it is one of the biggest hurdles that speech recognition will have to overcome in order to become a more natural (see NUI) and widely used form of input.

That leads me to the point of this blog entry, which is that Facebook Graph Search, despite any of its other strengths and weaknesses, is a step in the right direction for speech recognition. In an article titled For Search, Facebook Had to Go Beyond 'Robospeak', it is revealed that the team that built Graph Search included two linguists and focused on teaching "Facebook's computers how to communicate better with people." In fact, the lead of the natural language processing part of the project is quoted as saying "It used to be you had to go to the computer on the computer's terms. Now it's the user."

Teaching computers to communicate more like humans is a big part of the sea change that needs to happen. The bigger part, though, is teaching users that it's okay to talk to computers like they talk to other humans. That is a hard problem to solve, and it is going to take a while, particularly since the gigantic troves of data the big powerhouses like Google and Microsoft base their speech-enabled search products on are made up mostly of typed search queries. With another powerhouse like Facebook changing the paradigm of how users search, though, users are getting their first chance to build some trust with a more "human-like" interface. Once that trust becomes more widespread, it won't be long before the more popular VUIs begin to rely on or at least more openly encourage human-speak vs. robospeak.

With Facebook taking the lead with Graph Search and already cozying up to linguists and the like, it isn't that big of a stretch of the imagination to picture them getting into the speech recognition space themselves in the not-too-distant future.

Comments

Bill Marshall on January 31, 2013 at 9:24 AM:

Hey, Bernie! I'm not familiar with Graph Search, but I think you're on the money that one of our key challenges is teaching users how to speak to the machines like they would a person. Having seen many of the same usability sessions you have, as well as many call logs at my current job, it's clear that people do follow prompting examples when given. The problem is when the interaction is open, and people aren't sure what they can say. Too much prompting is a pain and slows interactions, not enough prompting and folks are lost.

The "Google search model" of voice input is a killer for us, and I agree that when big players like Facebook, Microsoft, Apple, and Google can deliver high-profile user experiences that model "good" input, we'll see the tide turn in our favor. Siri was the biggest leap so far. I'm interested to see what comes next.

Permalink

Bernie Zimmermann on February 10, 2013 at 7:32 PM:

Hey Bill! It's great to hear from you. Thanks for dropping by and sharing your thoughts!

Permalink

Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.

Name:

Email Address:

Website:

Comments:

Check this box if you hate spam.