Amazon Alexa is Not Artificial Intelligence

I’ve been reading a lot of articles lately about digital assistants like the Amazon Echo and Google Home. In my reading, I’m seeing more and more writers categorize these simple devices as Artificial Intelligence (AI). While I’m not an expert in the field, merely an interested party, this makes no sense at all.

Amazon Alexa isn’t AI; it’s a system that parses a voice request and decides what to do with the request based on the specific code that’s been written for it to respond to the specific query, or queries like it. Alexa can only do what it’s coded to do, and nothing else. You can prove this pretty quickly by asking it a question that requires intelligence (the I in AI) to answer and you’ll quickly get an “I don’t know what to do with that” response.

Now, Amazon has done a lot of innovation around delivering a system that gets better over time at being able to understand what you’re asking for, but that’s not intelligence, that’s better Natural Language Processing (NLP). The better it gets at understanding your query, the better the Alexa development team can get at writing code that deals with that particular type of request.

As I’ve written here before, Alexa is code-bound – it can only do what it’s coded to do. As a developer, I can define some phrases I’m interested in dealing with, then write some code to have Alexa execute whenever it hears any one of those phrases uttered by an Echo user. If someone (a developer) hasn’t defined the phrases AND written the code to respond to them, then Alexa can’t do anything with it.

That’s not Intelligence.

Now, the folks at Amazon have coded fallbacks, right? But they’re still hand crafted code to deal with a specific situation. If you ask Alexa to do something it doesn’t know what to do, it does a quick Internet search on the phrase it heard you say, and posts the search results to the Alexa App.

Again, that’s not Artificial intelligence.

Try to have a conversation with it, and you’ll fail miserably. Some developer has coded cheeky responses to specific types of general purpose questions (like “How are you?”) but those are still hard-coded responses to specific queries. Alexa knows how to tell me a Star Trek joke, only because someone coded it to do so – not because it is intelligent. Correct me if I’m wrong, but Its not hearing “Joke” and “Star Trek” then figuring out if the jokes it has fit the bill, I imagine some developer built a database of Star Trek jokes, or at least gave Alexa a list of sites where it can go to find Star Trek jokes. That’s not intelligence.

What finally drove me to write this post was two articles in this month’s MIT Technology Review magazine (Volume 120 | no. 5), one is called Alexa, Understand Me and the other Growing Up with Alexa. The first article repeatedly glorifies Alexa as an AI, explaining all sorts of ways that the Alexa team uses data to fine tune its NLP and explaining how that makes it a better AI. Its a very interesting article about how big of a challenge it is to make Alexa seem smart, but nothing about what they’re doing is AI. The second article deals with Alexa as what it is: a personal digital assistant. The difference between the two articles and how they portray Alexa as two completely different things makes me believe that the no editor at the magazine read both articles. If they had, they probably wouldn’t have published both of them as they’re contradictory in their definition about what they’re writing about.

A quick Google search on “Intelligence” came up with this definition:

“the ability to acquire and apply knowledge and skills.”

Alexa does this – it acquires skills and applies them, but not through its own actions. Developers define skills and publish them to the Alexa service. Alexa’s NLP capabilities connect what’s asked with the available skills. That’s not intelligence, that’s coding. That’s a piece of software responding to the logic in its code.

Artificial Intelligence is about intelligence. For Alexa to be AI, it would need to be able to do something it wasn’t coded to do. That’s intelligence. Figuring something out. When Alexa can figure out how to do something it doesn’t know how to do using the things it does know how to do as building blocks, then you can make the argument that its intelligent. When it can make leaps of understanding, then its intelligent. As long as it can only do what it has been coded to do, then it is an appliance and nothing more.

When the WWA Overlaps the Physical World

I started a new job at Microsoft a few weeks ago, and, surprisingly, one of the first things they handed me was an Apple MacBook Pro. This isn’t standard procedure, but in my role, I’ll be working with development tools for Android and iOS applications, so a MacBook is required to do (almost) anything with iOS. Anyway, as I setup my Mac, I was having trouble dragging files and apps around. There was something weird going on, but I realized pretty quickly that I had to have a mouse to be productive. I have a mouse here with me in Redmond, but it’s a USB mouse and, well, the MacBook no longer exposes regular, everyday USB ports. Sigh.

After work, I quickly headed to the nearest Apple store and purchased a mouse. As I checked out, the sales associate asked me to confirm my email address where they could send the receipt. Right there on his iPhone screen was my new Microsoft address I’d only had for a few hours now. What happened? How did they have my Microsoft email address? I’m certain Microsoft didn’t share it with them, so who did?

Me, I did it. I’d completely forgotten that I setup a new iTunes account using my work email address and my for-work credit card. I did that because Apple won’t allow you to have an iTunes account without providing a credit card number.

The connection here to the World Without Apps (WWA) is a little shaky here, I know, but as this happened, I was immediately struck by how my online life was bleeding into the physical world. It should not have surprised me that Apple could lookup my email address based on the credit card I was using, that’s easy. But in this WWA, how far will this go?

How long before your personal search history starts showing up on store POS (point of sale) terminals? Can you imagine checking out at Macy’s and the clerk saying something like “hey, I noticed you searched for underwear the other day, did you know they were on sale? Here’s a coupon” or how about “Hey, I noticed you saw Jonny Lang last week, did you see his new line of skinny jeans we have for sale?”

This is actually one of the possible side-effects of our government’s removal of restrictions on how ISPs can use your historical browser history. How far will it go? What happens when an agent (a computer program) interacts with the store you’ve entered and they share data? That clearly has strong benefits from a streamlining the shopping experience standpoint, but where could it go wrong? Who decides how much information the store should have access to? You do, but how do you do that in the WWA? Nobody knows.

Hacking AI for Fun and Profit

I knew all along that AI was a component of the World Without Apps. What I didn’t expect was how quickly AI-driven actions would become a home-grown option. I subscribe to The Mag Pi magazine (a publication from the Raspberry Pi Foundation) and even got my first project published in the current issue (https://www.raspberrypi.org/magpi/issues/58/).

In the previous issue, the magazine included the complete Voice Kit AI project from the Google AIY (AI Yourself) project – basically a complete Google Assistant including an enclosure, speaker, button, Audio HAT, and more. My son and I quickly assembled the project, and now he has an almost Google Home device in his bedroom. This is absolutely interesting because Google funded sending thousands of these devices all around the world, but more interesting is that the core project here is extensible. The project runs on a Raspberry Pi, and you can add commands to the Google Assistant project code. Once you do that, you basically write the code to respond to your specific voice commands and you can make this device do anything, absolutely anything.

Your regular Google Assistant go to the cloud for execution (searches, weather reports and so on), but, if you connect some specialized hardware to it (not something specific, but any hardware you can control from the Raspberry Pi) and correspondingly add your own code, suddenly your project becomes much more interesting.

From a Maker standpoint, this dramatically enhances the types of projects that I can make with this thing. I no longer have to deal with ANY of the complexities of voice interaction, the platform (Google AIY) takes care of that for me. All I have to do is connect my hardware, add the command to the acceptable command list, write some code, and I’m all set.

This will ultimately take us to something I’m worried about: AI everywhere. When companies (and hackers) start embedding AI into everything around us, suddenly we have multiple systems all listening for commands and stepping over each other. I got a good taste of this while watching this Spring’s Google I/O keynote. Everytime the presenter said “OK Google” to demonstrate a new capability on stage, my local Google Home device would wake up and answer. I had to put my phone in my pocket during the presentation so it couldn’t hear and answer as well.

What do you do when your car and phone both have AI capabilities? How does each device know it’s the target of the query? Will I then need to preface every query/command with the name of the target device I’m targeting? Probably at first, but ultimately, we’ll get to a single, overarching AI that understands what it can interact with locally. You’ll speak to the ceiling or your thumb, or whatever, and an available, compatible device will perform the action for you, whatever it is.

That’s where this is ultimately going, I’m certain of it. When that happens, we’re in the World Without Apps.

Facebook and The World Without Apps

There’s been a lot of buzz around Facebook – primarily because of the murder that took place on Facebook Live coupled with Facebook’s failure to recognize it and pull down the video in a reasonable time. Apparently Facebook spent a lot of effort building systems to recognize copyright violations (which causes them issues) and little on technologies that recognize harmful acts (this causes victims issues, but not Facebook directly).

Facebook’s ability to recognize a murder in progress wouldn’t have helped the victim here – he would have been dead before the police could have gotten there. However, if Facebook focused their attention on determining the scope of a live session, then in abuse or rape scenarios, they’ll be able to get help there before the ‘event’ is over. A valuable use of the technology, I imagine.

Anyway, back to the WWA.

At Facebook’s conference this month, Facebook highlighted their plans for augmented reality. Augmented reality is a different variant on the World Without Apps. Instead of interacting with your environment (your home, your car, your office, etc) or your smartphone via voice as I’ve highlighted in this site, with Facebook’s approach, you’ll socialize in a virtual reality world. This ‘world’ is essentially served by an app, and anything you do in that world will be exposed by an app running in the cloud somewhere (exposed through your virtual reality goggles). Extra capabilities provided by third parties? Basically extensions to the app. So, it feels like where Facebook is going is the opposite of the WWA – they’re planning (if I can oversimplify here) one big app you ‘live’ in…virtually.

When interacting with the ‘real’ world, Facebook’s plan is that you’ll do it through special glasses that add a computer screen in front of your eye or on a wall or table. Initially this feels like an extension of the WWA, but it’s not. Zuckerberg said “We don’t need a physical TV. We can buy a $1 app ‘TV’ and put it on the wall and watch it.” So Facebook is still focused on apps; in this case, selling you additional functionality to use in an augmented reality world.

Code-locked Agents Don’t Get It

I live in North Carolina, and there’s a big storm heading our way tonight. I knew we would be getting snow, but we’re all edge because we’re trying to figure out if we’ll have enough to do some sledding, but not too much that it’s not all gone in time for School on Monday.

This morning, I asked Alexa when we would get snow tonight, and she responded with:

There’s no snow in the forecast today for CITY_NAME, but there’s a Winter Storm Watch in effect between 7 PM tonight and 1:00 PM Saturday.

Wait a minute. We don’t have snow in the forecast, but there’s a winter storm watch in effect? Yeah, Alexa, I’m pretty sure that a Winter Storm INCLUDES snow.

That’s the problem with code-locked agents, they only know what they know, and Alexa apparently doesn’t know that snow’s a component of most every Winter Storm.

When I asked Google Now the same question, she didn’t answer me, but did bring up a browser window with search results for the winter storm. At least she was able to relate “snow” with “weather and get me the answer I wanted. I asked the question verbally, I sure would have liked it if she answered me instead of giving me search results, but at least her answer was more direct than Alexa’s.

Are Native Mobile Apps Doomed?

Yes, they are. But not for the reason most people think.

I’ve been monitoring the press, looking for articles that spark topics for this blog, and I’m only now starting to see work that relates to this topic, but for the wrong reasons – hence this post.

  • Pundits and bloggers are starting to pick up on the fact that mobile apps are dead (or near death). Surprisingly, they’re making this announcement in response to advances in the mobile web browser, specifically progressive web apps. Don’t get me wrong, progressive web apps are interesting, and the use of them will have a big impact on user experience (while at the same time, simplifying development). Here are some examples:

My argument here is that Agents, not apps, are the future of mobile. Apps are old news. So quaint, allowing me to do one thing, and hopefully do that one thing well when I can tie multiple systems together through agents and the WWA.

We lost the chance to have one, universal agent we could all tune to our needs when Samsung purchased Viv. All we have right now is islands of interaction (Alexa, Google Now, Siri, Viv, and so on) with code-locked solutions. What will it take to kick this up to the next level?

Samsung Buys Viv

I checked my inbox this morning and found some amazing news. Apparently Samsung has purchased Viv; you can read the Samsung press announcement. The reason I see this as amazing news is that it’s something that I never expected would happen. I met with Viv’s founders a little less than a year ago, before their public demo, and they assured me that they felt that Viv would be the last jobs they ever had.

More importantly, this acquisition is going to have a huge, negative impact on the effectiveness and overall reach of Viv.

As I listened to Viv’s founders discuss what they were making, it was clear to me that in order for them to be successful, they couldn’t let themselves be bought by anyone. I imagine with certainty that both Apple and Google made attempts to acquire Viv, but to join either of those companies would limit their ability to be the ‘Global AI’ they said they wanted to be.

As I’ve written before, the issue I have with Alexa, Cortana, Google Now (now Google Assistant), and Siri is that these solutions are code-locked, they can only do what they’re directly coded to do. Each solution gets a little bit of flexibility from leveraging search to create a response when they don’t have a pre-coded solution to a problem, but at the end of the day, any interesting transactions ANY of those solutions deliver, only happen because someone has coded the transaction.

Google Assistant, demonstrated this week by Google, is a little better than the other solutions in that it maintains some context, a short term memory of recent interactions or direct access to the smartphone screen, to help it deliver more interesting results.

Viv, on the other hand, delivers transactions, any kind of transaction, because it knows what you want and it knows what it knows and can stitch it all together, no matter how complex, to deliver the goods. That is, well, was, a game changer for me.

Not anymore.

I understand why Samsung wanted Viv; they want to create the best interactive consumer devices, and owning Viv is a great way to do that. Unfortunately, with Viv as an independent entity, Viv’s capabilities would have been available ANYWHERE. Your car, office, smartphone, garage door opener, newspaper box, and more could all have had intelligence added to them by leveraging Viv. At the same time, you’d also see Google, Apple, Microsoft all license Viv to add an extra layer of interaction to their services. Companies and third party developers would augment Viv’s AI; it would be made available through devices and services all over the world and you’d only have one system to learn, one set of preferences to manage, and universal access to everything.

Not anymore.

Viv’s sale to Samsung will take them out of the market and you’ll only find Viv’s capabilities in Samsung devices and the limited subset of device manufacturers who license the technology.

The quality of your dynamically assembled experiences will vary depending on whether you’re working with Google Assistant, Alexa, Cortana, or Siri. You’ll have to learn the command interfaces of each, deal with voice recognition idiosyncrasies of each, and won’t be able to stitch together actions based on previous activities if they were done through different services. You definitely won’t have a single set of preferences you can manage, you’ll have to train each service to understand your specific needs. Sigh.

It’s interesting for Dag and the other folks at Viv – once again, they build a company and sell it before ever really going to market. It doesn’t suck to be them.

Semantic Web Services

When I first started thinking about the World Without Apps, I wasn’t thinking it would be a smartphone phenomenon; instead I imagined everything except the smartphone being the key to this revolution (my house, car, office and so on). I never expected smartphone processors to be able to handle the load or that cloud technologies would be so prevalent and so powerful that they could support things like Google Now and Apple’s Siri. How wrong I was – smartphones are core to this WWA revolution, and I’ll write more about that later.

Regardless of how WWA is implemented, there’s a core piece of technology that enables it, the ability for web services to self-describe themselves to any consumer. As I mentioned in an earlier post, Amazon Alexa, Apple Siri, Google Now, and Microsoft’s Cortana are code-locked; they can only do things that they’re coded to do and nothing more. Now, these technologies have an out that enables them to handle most any request (at least most but Siri): when they can’t figure out what you’re asking them, they revert to delivering search results hoping you’ll find the answer you want there.

However, sometimes search doesn’t work that well for consumers as shown in the following figure. In this example, I asked Siri how long it would take me to drive to the moon. A reasonable question, I thought; I was watching a special on the Apollo program and I was curious. Siri heard my question correctly as you can see from the figure (although she skipped adding a question mark for some reason), but even when correctly ascertaining the meaning of my words, she still didn’t get it and instead told me how long it would take me to drive to Howl at the Moon (I didn’t even know we had one of those locally).

Siri Drive to the Moon
Siri’s Silly Answer

 

For voice interaction to work (whether it’s through an app or some universal agent), the data aggregator (the agent) has to know how to know a lot of different things. It has to know things, and it has to know how it knows things or at least where to find things. I’m not talking about Watson playing Jeopardy, but some sort of universal agent that just, well, knows things.

How does it know what it knows? Semantic Web Services.

We have the Semantic HTML, technology that lets web pages self-describe itself so computers can divine information from the page more easily. In order for digital assistants, agents, or whatever you want to call these things to be able to do your bidding, they have to have a catalog capabilities available to them. That means that developers expose their capabilities through web services then describe them in such a way that multiple agent platforms can consume them when they need to.

The flaw of the Google Now, Siri, Cortana or Alexa approach is that there’s no way to augment a service’s capabilities without writing platform-specific code. Google Now, Siri and Cortana expect local apps to provide services, so to extend those services you’ll have to build an app and convince users to load it on their phones. You can extend Google search by publishing data in a way Google can easily consume, and therefore make your stuff available to Google Now, but that’s a hack and there’s no guarantee Google will use your data in the way you intended.

Alexa and Viv on the other hand expose a cloud API developers can use to publish their capabilities to the service. Alexa enables developers to define what words they want used to invoke their service, and you can only consume your services when those specific words (or their variants) are uttered. Viv on the other hand is supposed to be exposing an SDK developers can use to describe their services and publish them to Viv. Once there, Viv can use them in any way she wants to deliver results to her users.

The way I look at this is that development organizations have to start thinking about how they’re going to expose their money making capabilities in a very generic way so that they can be consumed by ANYTHING. The web and mobile apps become core channels, ones that always need accommodation, but new channels, or at least new agent technologies, will pop up and demand attention. What we need to accommodate this is a standard definition of semantic web services, so developers can focus on only one way to easily expose their capabilities to the world. Instead of wrapping services in each vendor’s SDK, we need one common SDK that accommodates all. With this in place, there’s no limit to what the World Without Apps can accomplish.

Voice-enabled Agents Everywhere

One of the reasons I became so certain that we were entering the age of no apps is the rise of voice agents. My BlackBerry had voice control capabilities years ago and Android and iOS both added capabilities well. My 2007 car supported an option for voice control, but I never installed one. The precipice was when I purchased an Amazon Echo for the kitchen. I’d started using Google Now and Siri for things, random searches, phone calls and opening apps, but when the Echo came into the house, it changed music for me.

I’m a huge fan of Sonos (www.sonos.com); I’ve got 4 of them in the house and I was expecting to add more before the Echo came to town. For a long time, I desired the ability to control my Sonos devices via voice control. I just wanted to be able to walk into a room and ask the device to start playing music. Sonos hasn’t adopted voice control, but as soon as I got an Echo, I found that I could play whatever music I wanted plus do more (such as check my calendar or put items on my todo list). What I quickly found was that I abandoned the Sonos device in the same room where I had an Echo since it was so much more work to open up an app to pick what to play when I could just ask for it. It didn’t matter that the Echo sound quality was lower than Sonos, the convenience factor so outweighed sound quality that the Sonos is now collecting dust. Apparently a lot of Sonos customers have made the same switch.

The biggest issue is that so many personal or household devices are adding voice control, that there’s so many devices listening to you at all times. I even have multiple Android devices, so when I say “OK Google” multiple devices answer. In my house, I have 6 Android devices, one iOS device, two Echo devices – all of which are constantly listening to us, waiting for the next command. We also have an Xbox as well as a couple of smart TVs, all of which are listening as well.

So, in this World Without Apps (WWA), apps are going away at the same time that devices that are listening to us, in order to help us, of course, are increasing. We’ll have voice-enabled agents everywhere, all vying for our attention. How exactly do we deal with that?

Well, I imagine our cell phones become less important. With ubiquitous network connectivity and smart devices surrounding us, why do we need to carry a physical device around with us? I imagine that we’ll need access to screens in order to be able to interact with those data-driven apps I mentioned in an earlier post. Beyond that, you can interact with your surroundings without having a physical device in-hand. Initially, those agents will be able to communicate with you through a Bluetooth headset for example, but eventually, once embedded technology becomes prevalent, you’ll be able to have private interactions with these agents through the electronics embedded into your brain. How cool would that be?

A World With Less Apps (WWLA)

In my first post, I defined what I mean when I talk about the World Without Apps (WWA), the premise for this site. I may not have been entirely honest with you.

In reality, I recognize that we’re really never going to enter a world without any apps, what we’re really entering is a world with less apps (WWLA). The reason I picked WWA instead of WWLA is that WWA is, well, more aggressive. Is it more startling to think of a world with no apps then work backwards to a realization that we’ll really have less apps going forward or to start with WWLA and leave it at that?

I think we need to focus on WWA – work to understand what that means then focus our development efforts on delivering that world to the betterment of, well, everyone. Will we ever get there? Perhaps, but likely not soon (in the next 10 years). But, if we can simplify app-like capabilities and interaction interfaces to the point where more can get done without needing fingers poking at glass (FPAG), then we’ve accomplished something.

Will we continue to need apps? Yep, absolutely. There are many data input, data browsing and analysis apps that could be delivered without apps and screens, but those will primarily be enterprise apps and the subset of data-driven apps that consumers need (such as banking, travel and so on). In those environments, I expect that voice will continue to be the primary interaction, and smartphone or tablet screens simply being data display or data-interaction targets.

As agents become more capable, they’ll take over more and more tasks from apps leaving few apps around for us to use.