Amazon Alexa is Not Artificial Intelligence

I’ve been reading a lot of articles lately about digital assistants like the Amazon Echo and Google Home. In my reading, I’m seeing more and more writers categorize these simple devices as Artificial Intelligence (AI). While I’m not an expert in the field, merely an interested party, this makes no sense at all.

Amazon Alexa isn’t AI; it’s a system that parses a voice request and decides what to do with the request based on the specific code that’s been written for it to respond to the specific query, or queries like it. Alexa can only do what it’s coded to do, and nothing else. You can prove this pretty quickly by asking it a question that requires intelligence (the I in AI) to answer and you’ll quickly get an “I don’t know what to do with that” response.

Now, Amazon has done a lot of innovation around delivering a system that gets better over time at being able to understand what you’re asking for, but that’s not intelligence, that’s better Natural Language Processing (NLP). The better it gets at understanding your query, the better the Alexa development team can get at writing code that deals with that particular type of request.

As I’ve written here before, Alexa is code-bound – it can only do what it’s coded to do. As a developer, I can define some phrases I’m interested in dealing with, then write some code to have Alexa execute whenever it hears any one of those phrases uttered by an Echo user. If someone (a developer) hasn’t defined the phrases AND written the code to respond to them, then Alexa can’t do anything with it.

That’s not Intelligence.

Now, the folks at Amazon have coded fallbacks, right? But they’re still hand crafted code to deal with a specific situation. If you ask Alexa to do something it doesn’t know what to do, it does a quick Internet search on the phrase it heard you say, and posts the search results to the Alexa App.

Again, that’s not Artificial intelligence.

Try to have a conversation with it, and you’ll fail miserably. Some developer has coded cheeky responses to specific types of general purpose questions (like “How are you?”) but those are still hard-coded responses to specific queries. Alexa knows how to tell me a Star Trek joke, only because someone coded it to do so – not because it is intelligent. Correct me if I’m wrong, but Its not hearing “Joke” and “Star Trek” then figuring out if the jokes it has fit the bill, I imagine some developer built a database of Star Trek jokes, or at least gave Alexa a list of sites where it can go to find Star Trek jokes. That’s not intelligence.

What finally drove me to write this post was two articles in this month’s MIT Technology Review magazine (Volume 120 | no. 5), one is called Alexa, Understand Me and the other Growing Up with Alexa. The first article repeatedly glorifies Alexa as an AI, explaining all sorts of ways that the Alexa team uses data to fine tune its NLP and explaining how that makes it a better AI. Its a very interesting article about how big of a challenge it is to make Alexa seem smart, but nothing about what they’re doing is AI. The second article deals with Alexa as what it is: a personal digital assistant. The difference between the two articles and how they portray Alexa as two completely different things makes me believe that the no editor at the magazine read both articles. If they had, they probably wouldn’t have published both of them as they’re contradictory in their definition about what they’re writing about.

A quick Google search on “Intelligence” came up with this definition:

“the ability to acquire and apply knowledge and skills.”

Alexa does this – it acquires skills and applies them, but not through its own actions. Developers define skills and publish them to the Alexa service. Alexa’s NLP capabilities connect what’s asked with the available skills. That’s not intelligence, that’s coding. That’s a piece of software responding to the logic in its code.

Artificial Intelligence is about intelligence. For Alexa to be AI, it would need to be able to do something it wasn’t coded to do. That’s intelligence. Figuring something out. When Alexa can figure out how to do something it doesn’t know how to do using the things it does know how to do as building blocks, then you can make the argument that its intelligent. When it can make leaps of understanding, then its intelligent. As long as it can only do what it has been coded to do, then it is an appliance and nothing more.

When the WWA Overlaps the Physical World

I started a new job at Microsoft a few weeks ago, and, surprisingly, one of the first things they handed me was an Apple MacBook Pro. This isn’t standard procedure, but in my role, I’ll be working with development tools for Android and iOS applications, so a MacBook is required to do (almost) anything with iOS. Anyway, as I setup my Mac, I was having trouble dragging files and apps around. There was something weird going on, but I realized pretty quickly that I had to have a mouse to be productive. I have a mouse here with me in Redmond, but it’s a USB mouse and, well, the MacBook no longer exposes regular, everyday USB ports. Sigh.

After work, I quickly headed to the nearest Apple store and purchased a mouse. As I checked out, the sales associate asked me to confirm my email address where they could send the receipt. Right there on his iPhone screen was my new Microsoft address I’d only had for a few hours now. What happened? How did they have my Microsoft email address? I’m certain Microsoft didn’t share it with them, so who did?

Me, I did it. I’d completely forgotten that I setup a new iTunes account using my work email address and my for-work credit card. I did that because Apple won’t allow you to have an iTunes account without providing a credit card number.

The connection here to the World Without Apps (WWA) is a little shaky here, I know, but as this happened, I was immediately struck by how my online life was bleeding into the physical world. It should not have surprised me that Apple could lookup my email address based on the credit card I was using, that’s easy. But in this WWA, how far will this go?

How long before your personal search history starts showing up on store POS (point of sale) terminals? Can you imagine checking out at Macy’s and the clerk saying something like “hey, I noticed you searched for underwear the other day, did you know they were on sale? Here’s a coupon” or how about “Hey, I noticed you saw Jonny Lang last week, did you see his new line of skinny jeans we have for sale?”

This is actually one of the possible side-effects of our government’s removal of restrictions on how ISPs can use your historical browser history. How far will it go? What happens when an agent (a computer program) interacts with the store you’ve entered and they share data? That clearly has strong benefits from a streamlining the shopping experience standpoint, but where could it go wrong? Who decides how much information the store should have access to? You do, but how do you do that in the WWA? Nobody knows.

Facebook and The World Without Apps

There’s been a lot of buzz around Facebook – primarily because of the murder that took place on Facebook Live coupled with Facebook’s failure to recognize it and pull down the video in a reasonable time. Apparently Facebook spent a lot of effort building systems to recognize copyright violations (which causes them issues) and little on technologies that recognize harmful acts (this causes victims issues, but not Facebook directly).

Facebook’s ability to recognize a murder in progress wouldn’t have helped the victim here – he would have been dead before the police could have gotten there. However, if Facebook focused their attention on determining the scope of a live session, then in abuse or rape scenarios, they’ll be able to get help there before the ‘event’ is over. A valuable use of the technology, I imagine.

Anyway, back to the WWA.

At Facebook’s conference this month, Facebook highlighted their plans for augmented reality. Augmented reality is a different variant on the World Without Apps. Instead of interacting with your environment (your home, your car, your office, etc) or your smartphone via voice as I’ve highlighted in this site, with Facebook’s approach, you’ll socialize in a virtual reality world. This ‘world’ is essentially served by an app, and anything you do in that world will be exposed by an app running in the cloud somewhere (exposed through your virtual reality goggles). Extra capabilities provided by third parties? Basically extensions to the app. So, it feels like where Facebook is going is the opposite of the WWA – they’re planning (if I can oversimplify here) one big app you ‘live’ in…virtually.

When interacting with the ‘real’ world, Facebook’s plan is that you’ll do it through special glasses that add a computer screen in front of your eye or on a wall or table. Initially this feels like an extension of the WWA, but it’s not. Zuckerberg said “We don’t need a physical TV. We can buy a $1 app ‘TV’ and put it on the wall and watch it.” So Facebook is still focused on apps; in this case, selling you additional functionality to use in an augmented reality world.

Code-locked Agents Don’t Get It

I live in North Carolina, and there’s a big storm heading our way tonight. I knew we would be getting snow, but we’re all edge because we’re trying to figure out if we’ll have enough to do some sledding, but not too much that it’s not all gone in time for School on Monday.

This morning, I asked Alexa when we would get snow tonight, and she responded with:

There’s no snow in the forecast today for CITY_NAME, but there’s a Winter Storm Watch in effect between 7 PM tonight and 1:00 PM Saturday.

Wait a minute. We don’t have snow in the forecast, but there’s a winter storm watch in effect? Yeah, Alexa, I’m pretty sure that a Winter Storm INCLUDES snow.

That’s the problem with code-locked agents, they only know what they know, and Alexa apparently doesn’t know that snow’s a component of most every Winter Storm.

When I asked Google Now the same question, she didn’t answer me, but did bring up a browser window with search results for the winter storm. At least she was able to relate “snow” with “weather and get me the answer I wanted. I asked the question verbally, I sure would have liked it if she answered me instead of giving me search results, but at least her answer was more direct than Alexa’s.

Are Native Mobile Apps Doomed?

Yes, they are. But not for the reason most people think.

I’ve been monitoring the press, looking for articles that spark topics for this blog, and I’m only now starting to see work that relates to this topic, but for the wrong reasons – hence this post.

  • Pundits and bloggers are starting to pick up on the fact that mobile apps are dead (or near death). Surprisingly, they’re making this announcement in response to advances in the mobile web browser, specifically progressive web apps. Don’t get me wrong, progressive web apps are interesting, and the use of them will have a big impact on user experience (while at the same time, simplifying development). Here are some examples:

My argument here is that Agents, not apps, are the future of mobile. Apps are old news. So quaint, allowing me to do one thing, and hopefully do that one thing well when I can tie multiple systems together through agents and the WWA.

We lost the chance to have one, universal agent we could all tune to our needs when Samsung purchased Viv. All we have right now is islands of interaction (Alexa, Google Now, Siri, Viv, and so on) with code-locked solutions. What will it take to kick this up to the next level?

A World With Less Apps (WWLA)

In my first post, I defined what I mean when I talk about the World Without Apps (WWA), the premise for this site. I may not have been entirely honest with you.

In reality, I recognize that we’re really never going to enter a world without any apps, what we’re really entering is a world with less apps (WWLA). The reason I picked WWA instead of WWLA is that WWA is, well, more aggressive. Is it more startling to think of a world with no apps then work backwards to a realization that we’ll really have less apps going forward or to start with WWLA and leave it at that?

I think we need to focus on WWA – work to understand what that means then focus our development efforts on delivering that world to the betterment of, well, everyone. Will we ever get there? Perhaps, but likely not soon (in the next 10 years). But, if we can simplify app-like capabilities and interaction interfaces to the point where more can get done without needing fingers poking at glass (FPAG), then we’ve accomplished something.

Will we continue to need apps? Yep, absolutely. There are many data input, data browsing and analysis apps that could be delivered without apps and screens, but those will primarily be enterprise apps and the subset of data-driven apps that consumers need (such as banking, travel and so on). In those environments, I expect that voice will continue to be the primary interaction, and smartphone or tablet screens simply being data display or data-interaction targets.

As agents become more capable, they’ll take over more and more tasks from apps leaving few apps around for us to use.

What is the World Without Apps (WWA)?

As I launch this site, it seems appropriate to describe the overall premise its built on.

Mobile as a product category is changing lives everywhere. Everyone reading this article probably has a smartphone and likely has at least one tablet as well. It’s even likely that this article’s being read on a mobile device as well since desktop PCs and even laptops are less frequently used for web browsing. What makes mobile devices so useful and popular are the apps the devices execute to enable its user (you) to accomplish things. From the beginning of computer time, apps have been hand crafted by one or more software developers to accomplish something. App users use one or more apps to deliver some result. If one app is enough for a job, users fiddle with fields, buttons and menus to, well, do something. If one app isn’t enough, users learned that they could take the outcome (results) from one app and plunk it into another app to get the next result, stitching apps together in series to reach a specific goal.

What then is The World Without Apps? Well, it’s being able to accomplish a specific goal without using any apps at all. We’re in a world driven by apps today, but we’re quickly moving to a place where apps are no longer necessary for many tasks. Let me see if I can explain…

Early Inter-app Interaction

When Microsoft created Dynamic Data Exchange (DDE) and later Object Linking and Embedding (OLE), amazing new things were possible. An app could now call out to another app and execute some task, then use the results. This enables developers to dynamically link apps, building apps that focused on a limited set of tasks, then stitch together those apps to deliver a more…sophisticated result. This was a great capability, although a bit flaky. This inter-app communication capability created dependencies and the potential for memory leaks that could wreak havoc in complicated processes and on computers with limited resources.

Inter-app Interaction on Mobile Devices

In mobile apps, this interconnection is easier – it’s a simple process to enable capabilities in an app that can be invoked from another app. Apps can publish capabilities that only sibling apps can invoke (a secure approach) or an app can publish capabilities that any app can invoke. With this last approach, an app tells the OS it wants to do something (usually through a file or URI) and any arbitrary handler can respond indicating that it can handle the task. This works, but unless each program involved in the activity was coded by the same developer, the app initiating the request can’t necessarily control what happens in the invoked app.

Agents to the Rescue

Voice added an interesting interaction to mobile devices. Now, instead of poking around at different apps, you could simply tell the device what you wanted and it would try to figure it out for you. Basically this is nothing but voice enabled search – the ability to speak a query instead of type it in. The voice interaction system had the ability to perform certain tasks for you such as opening an app, calling a specific contact or specified phone number or even sending or responding to text messages. Beyond delivering search results for you, the extra capabilities provided by voice enabled search exist only because the browser or the mobile device OS developer added additional capabilities – code-locked capabilities that can only deliver what a developer has coded the engine to deliver.

With Google Now, Google added capabilities to the Android OS that enables the device to guess some information that you might find helpful and deliver that to you through a special panel and, later, through notifications. Apple even added a vaguely similar capability to iOS where an extra panel was added to the home screen where iOS tries to guess what apps or information you’ll want to use next. Microsoft’s Cortana can do a little of that as well, but you have to invoke her to gain access to that feature. This means is that your mobile device is figuring out how it can help make your next task easier for you. This approach is the first step in the World Without Apps.

It’s pretty interesting how quickly this became a very capable service. I remember going to a concert with my son and noticing that my Android device starting letting me know that it knew how to get me back to my car and also how long it would take me to drive home. When out for lunch, Google Now would start notifying me how much time I had before I’d need to start heading home in time to make my next call or how long before I’d have to leave to make my next appointment.

The most startling example was when I was driving from Raleigh to Charlotte and Google Now notified me about an accident on the highway in front of me. What was most surprising about this is that I wasn’t using the device’s navigation capabilities (Google Maps), I was instead using the navigation system built into my car (which my phone new nothing about). The device knew I wasn’t home, but knows where home is. It also knew that I was on the highway and that there was an accident in front of me and decided to let me know about it (how it knew about the accident escapes me). That’s the best existing example of the WWA that I have – the device using information it has about me and my environment and deciding what to do about it.

Now, in all of the examples I just provided, it was an app doing this – Google Now, but that particular app is tightly integrated with the device OS (and other apps running on the device). It won’t be long before Google Now stops being a stand-alone app and instead becomes a core service running in the OS. Once that happens, individual apps will start to disappear; they’ll be subsumed by Google Now (or Google Assistant, the new flavor of this) and any app-specific capability that is needed will be converted to a system service running in the background and available to the OS.

Keep in mind though that everything Cortana, Google Now, and Siri can do, they can do because some developer has written code to enable that feature. All of the intelligence we’re feeling from our devices isn’t intelligence at all, it’s simply code-locked features a developer has written and included in the OS or in a virtual agent. All of this gets much more interesting when mobile devices, or any sort of device in our surrounding environment, is able to deliver capabilities beyond actions that have been pre-coded by a developer. That’s the end game WWA and ultimately why I started this site – to write about the ever changing capabilities in this space ultimately leading to the day where we’re not writing mobile apps, but instead enabling capabilities that can be consumed by anything in order to make our lives easier.