The Siri application released with the iPhone 4s has the potential to radically impact how we interact with technology. Ed Wrenbeck, former lead developer from Siri, talks through his history with the app…
On March 24th, 2008 I first talked with my friend Darren about a new venture he was involved with which was simply called “Stealth Company” at the time. (try http://stealth-company.com) This was four months before the iPhone app store would officially launch.
We left off at that point with the general thought that we’d like to work together, but we didn’t come back to the project again until mid July of 2008. I officially started working on the Siri application in early August 2008. I was the only iPhone/Objective-C developer on the team. The work I was doing with the native iPhone application was really treated as an R&D exercise. The go to market strategy was based mostly on a web experience.
When the iPhone application started to come together, aggressively utilizing the full capabilities of the device, it quickly became apparent that Siri was meant to be a first class mobile experience that could take advantage of the power and features available to a native application. Siri never really left its web roots behind though. From the earliest days right up until the last version offered on the app store, the user interface has been almost entirely an HTML web view with native elements tightly integrated around it. Siri stretched to the very limit what was possible to do with an integrated web interface on a mobile device.
The Siri application was first shown publicly at D7, the “All Things D” conference in May 2009. (http://allthingsd.com/20111004/in-depth-with-siri-the-full-demo-from-the-d7-conference-plus-an-old-school-bonus/) It was the first time I really got the sense that I was involved with a product that would be used by millions of people. Impressing a veteran technology writer such as Walt Mossberg is not an easy task.
Voice Recognition – the first piece
Today Siri is closely associated with the voice recognition capabilities from Nuance, but VLingo was the first voice recognition system used in the app. Nuance would not come into play until just a few months before the application went onto the app store in early 2010. The team was almost fanatical about getting the voice recognition just right, and we went through endless tests trying to tune these platforms to ultimately maximize the results.
Commercials for digital assistants always seem more natural than the reality, and there’s a reason. State of the art voice recognition systems will rarely, if at all, return results with 100% confidence in the translated string. You can expect a 60-70% return on average. Siri cannot change that limitation; however the team was able to find ways to enhance it.
Natural Language Processing – what Apple saw in Siri
Often times reviews on iTunes would comment on how the Siri app seems to do a better job than the Nuance app or Dragon dictate. The iPhone application is just the portal to the brains of Siri running on a bunch of servers. There, it can take a sentence and dissect it naturally. An example might be “Book a table at Il Fornaio in Novi for 7PM” and it determines that “Il Fornaio” is likely the name of a place and “Novi” is likely a location. This is referred to as natural language processing, and it is incredibly difficult to get right.
Many users and technology pundits are wondering what’s different from other voice recognition apps on other platforms. Platforms such as Android and Windows Phone 7 offer their own voice interaction capabilities. Don’t forget, iOS offered its own voice control interface prior to the introduction of Siri on the 4S. But they are missing the point of Siri itself, which is about what happens after the voice command has been turned into text.
Tell another system to “Book a table at Il Fornaio at 7:00 with my mom;” the system can no doubt create a calendar entry at 7:00 and might even know who your mom is. It might even be possible for that device to figure out the closest Il Fornaio restaurant. What differentiates the NLP logic in Siri is that it will maintain context so you could say: “Also send her an email reminder.” Siri will understand ‘her’ and compose the email accordingly.
This comprehension of context requires a great deal of logic and processing behind the scenes. The media commonly refers to this as “Artificial Intelligence” but in reality…
It’s not really AI
With Natural language processing in the mix it feels more human, like it understands you. This in itself adds to the mystique surrounding Artificial Intelligence. But real AI can’t fit on a phone in our world…yet.
Siri is basically a contextual, semantic, personalized search engine. We affectionately called it a “Do” engine. A search engine can evaluate text strings and look for matching results. A “Do” engine maintains awareness of the user, everything it knows about that user and processes strings in the context of the user.
One interesting part of Siri is its ability to utilize five or six translations of a voice command to determine what the user is trying to say. When trying to recognize the name of a restaurant such as “Il Fornaio” or a city named “Novi” the voice recognition engine will typically return several possible matches, as these are not simple words in the dictionary. The recognition engine might see these as lower confidence scores in a direct voice translation, but in the context of a user, his surroundings and behavior, they may change their rank.
Sometimes, Siri’s uncanny ability to understand the environment and the user even surprised me. One of my common test requests would be something like “Find a Tigers’ game”. One time I happened to be in San Jose and, not thinking, tested Siri with this request and was delightfully pleased when it offered to help me get tickets to an upcoming game in San Francisco when the Tigers came to town in a couple of weeks.
That last item, behavior, is a biggie. It requires trust – which Siri got with Apple.
Big Brother factor – why Siri needed Apple
For Siri to be really effective, it has to learn a great deal about the user. If it knows where you work, where you live and what kind of places you like to go, it can really start to tailor itself as it becomes an expert on you individually. This requires a great deal of trust in the institution collecting this data. Siri didn’t have this, but Apple has earned their street cred.
Developers are realizing that they can deliver amazing experiences when they understand more about the user. However, users today are careful to not give away too much of that information. With Apple’s strong reputation behind it, there’s a massive potential for success here.
The formula is there, the market is perfect.
At Vectorform, I’m dealing with all sorts of exploratory technology, things that are pushing boundaries even further than my work with Siri. We are constantly looking at the digital landscape, mapping out current innovations and how they impact future trends.
The release of Siri to the masses has massive shockwave potential in our book.
What we’re looking for on the horizon:
An open SDK – Intelligence needs to come from many sources. Siri will need to mine collective input from as many data sources as possible in order to grow and stay relevant. Siri can’t be a marketing engine for select partners. True digital assistants have to put the user first, just as the user would do when making a decision on their own. There is enormous potential for Siri to integrate with data sources of many different types. Opening Siri up while ensuring the quality of the data will be a challenge for Apple. However, the recent success with the iOS and Mac App Stores has given Apple an effective way to influence its third part developers.
A single voice – The original vision for Siri was a personal assistant available to you from any device, anywhere, at any time. Apple has pulled back on that vision a bit and made Siri an accessory to your iPhone. Siri needs to start growing again and become your advocate across all your digital platforms. Long term, Vectorform is interested in determining what kind of role Siri can play in a vehicle or…
Ed Wrenbeck is an iOS Architect at Vectorform. In a previous life, he was the lead iPhone developer of Siri. “While I do know the intimate details of the Siri Application, I have no inside knowledge of the new Siri released by Apple.”