Hacking AI for Fun and Profit

I knew all along that AI was a component of the World Without Apps. What I didn’t expect was how quickly AI-driven actions would become a home-grown option. I subscribe to The Mag Pi magazine (a publication from the Raspberry Pi Foundation) and even got my first project published in the current issue (https://www.raspberrypi.org/magpi/issues/58/).

In the previous issue, the magazine included the complete Voice Kit AI project from the Google AIY (AI Yourself) project – basically a complete Google Assistant including an enclosure, speaker, button, Audio HAT, and more. My son and I quickly assembled the project, and now he has an almost Google Home device in his bedroom. This is absolutely interesting because Google funded sending thousands of these devices all around the world, but more interesting is that the core project here is extensible. The project runs on a Raspberry Pi, and you can add commands to the Google Assistant project code. Once you do that, you basically write the code to respond to your specific voice commands and you can make this device do anything, absolutely anything.

Your regular Google Assistant go to the cloud for execution (searches, weather reports and so on), but, if you connect some specialized hardware to it (not something specific, but any hardware you can control from the Raspberry Pi) and correspondingly add your own code, suddenly your project becomes much more interesting.

From a Maker standpoint, this dramatically enhances the types of projects that I can make with this thing. I no longer have to deal with ANY of the complexities of voice interaction, the platform (Google AIY) takes care of that for me. All I have to do is connect my hardware, add the command to the acceptable command list, write some code, and I’m all set.

This will ultimately take us to something I’m worried about: AI everywhere. When companies (and hackers) start embedding AI into everything around us, suddenly we have multiple systems all listening for commands and stepping over each other. I got a good taste of this while watching this Spring’s Google I/O keynote. Everytime the presenter said “OK Google” to demonstrate a new capability on stage, my local Google Home device would wake up and answer. I had to put my phone in my pocket during the presentation so it couldn’t hear and answer as well.

What do you do when your car and phone both have AI capabilities? How does each device know it’s the target of the query? Will I then need to preface every query/command with the name of the target device I’m targeting? Probably at first, but ultimately, we’ll get to a single, overarching AI that understands what it can interact with locally. You’ll speak to the ceiling or your thumb, or whatever, and an available, compatible device will perform the action for you, whatever it is.

That’s where this is ultimately going, I’m certain of it. When that happens, we’re in the World Without Apps.

Facebook and The World Without Apps

There’s been a lot of buzz around Facebook – primarily because of the murder that took place on Facebook Live coupled with Facebook’s failure to recognize it and pull down the video in a reasonable time. Apparently Facebook spent a lot of effort building systems to recognize copyright violations (which causes them issues) and little on technologies that recognize harmful acts (this causes victims issues, but not Facebook directly).

Facebook’s ability to recognize a murder in progress wouldn’t have helped the victim here – he would have been dead before the police could have gotten there. However, if Facebook focused their attention on determining the scope of a live session, then in abuse or rape scenarios, they’ll be able to get help there before the ‘event’ is over. A valuable use of the technology, I imagine.

Anyway, back to the WWA.

At Facebook’s conference this month, Facebook highlighted their plans for augmented reality. Augmented reality is a different variant on the World Without Apps. Instead of interacting with your environment (your home, your car, your office, etc) or your smartphone via voice as I’ve highlighted in this site, with Facebook’s approach, you’ll socialize in a virtual reality world. This ‘world’ is essentially served by an app, and anything you do in that world will be exposed by an app running in the cloud somewhere (exposed through your virtual reality goggles). Extra capabilities provided by third parties? Basically extensions to the app. So, it feels like where Facebook is going is the opposite of the WWA – they’re planning (if I can oversimplify here) one big app you ‘live’ in…virtually.

When interacting with the ‘real’ world, Facebook’s plan is that you’ll do it through special glasses that add a computer screen in front of your eye or on a wall or table. Initially this feels like an extension of the WWA, but it’s not. Zuckerberg said “We don’t need a physical TV. We can buy a $1 app ‘TV’ and put it on the wall and watch it.” So Facebook is still focused on apps; in this case, selling you additional functionality to use in an augmented reality world.

Code-locked Agents Don’t Get It

I live in North Carolina, and there’s a big storm heading our way tonight. I knew we would be getting snow, but we’re all edge because we’re trying to figure out if we’ll have enough to do some sledding, but not too much that it’s not all gone in time for School on Monday.

This morning, I asked Alexa when we would get snow tonight, and she responded with:

There’s no snow in the forecast today for CITY_NAME, but there’s a Winter Storm Watch in effect between 7 PM tonight and 1:00 PM Saturday.

Wait a minute. We don’t have snow in the forecast, but there’s a winter storm watch in effect? Yeah, Alexa, I’m pretty sure that a Winter Storm INCLUDES snow.

That’s the problem with code-locked agents, they only know what they know, and Alexa apparently doesn’t know that snow’s a component of most every Winter Storm.

When I asked Google Now the same question, she didn’t answer me, but did bring up a browser window with search results for the winter storm. At least she was able to relate “snow” with “weather and get me the answer I wanted. I asked the question verbally, I sure would have liked it if she answered me instead of giving me search results, but at least her answer was more direct than Alexa’s.

Are Native Mobile Apps Doomed?

Yes, they are. But not for the reason most people think.

I’ve been monitoring the press, looking for articles that spark topics for this blog, and I’m only now starting to see work that relates to this topic, but for the wrong reasons – hence this post.

  • Pundits and bloggers are starting to pick up on the fact that mobile apps are dead (or near death). Surprisingly, they’re making this announcement in response to advances in the mobile web browser, specifically progressive web apps. Don’t get me wrong, progressive web apps are interesting, and the use of them will have a big impact on user experience (while at the same time, simplifying development). Here are some examples:

My argument here is that Agents, not apps, are the future of mobile. Apps are old news. So quaint, allowing me to do one thing, and hopefully do that one thing well when I can tie multiple systems together through agents and the WWA.

We lost the chance to have one, universal agent we could all tune to our needs when Samsung purchased Viv. All we have right now is islands of interaction (Alexa, Google Now, Siri, Viv, and so on) with code-locked solutions. What will it take to kick this up to the next level?

Samsung Buys Viv

I checked my inbox this morning and found some amazing news. Apparently Samsung has purchased Viv; you can read the Samsung press announcement. The reason I see this as amazing news is that it’s something that I never expected would happen. I met with Viv’s founders a little less than a year ago, before their public demo, and they assured me that they felt that Viv would be the last jobs they ever had.

More importantly, this acquisition is going to have a huge, negative impact on the effectiveness and overall reach of Viv.

As I listened to Viv’s founders discuss what they were making, it was clear to me that in order for them to be successful, they couldn’t let themselves be bought by anyone. I imagine with certainty that both Apple and Google made attempts to acquire Viv, but to join either of those companies would limit their ability to be the ‘Global AI’ they said they wanted to be.

As I’ve written before, the issue I have with Alexa, Cortana, Google Now (now Google Assistant), and Siri is that these solutions are code-locked, they can only do what they’re directly coded to do. Each solution gets a little bit of flexibility from leveraging search to create a response when they don’t have a pre-coded solution to a problem, but at the end of the day, any interesting transactions ANY of those solutions deliver, only happen because someone has coded the transaction.

Google Assistant, demonstrated this week by Google, is a little better than the other solutions in that it maintains some context, a short term memory of recent interactions or direct access to the smartphone screen, to help it deliver more interesting results.

Viv, on the other hand, delivers transactions, any kind of transaction, because it knows what you want and it knows what it knows and can stitch it all together, no matter how complex, to deliver the goods. That is, well, was, a game changer for me.

Not anymore.

I understand why Samsung wanted Viv; they want to create the best interactive consumer devices, and owning Viv is a great way to do that. Unfortunately, with Viv as an independent entity, Viv’s capabilities would have been available ANYWHERE. Your car, office, smartphone, garage door opener, newspaper box, and more could all have had intelligence added to them by leveraging Viv. At the same time, you’d also see Google, Apple, Microsoft all license Viv to add an extra layer of interaction to their services. Companies and third party developers would augment Viv’s AI; it would be made available through devices and services all over the world and you’d only have one system to learn, one set of preferences to manage, and universal access to everything.

Not anymore.

Viv’s sale to Samsung will take them out of the market and you’ll only find Viv’s capabilities in Samsung devices and the limited subset of device manufacturers who license the technology.

The quality of your dynamically assembled experiences will vary depending on whether you’re working with Google Assistant, Alexa, Cortana, or Siri. You’ll have to learn the command interfaces of each, deal with voice recognition idiosyncrasies of each, and won’t be able to stitch together actions based on previous activities if they were done through different services. You definitely won’t have a single set of preferences you can manage, you’ll have to train each service to understand your specific needs. Sigh.

It’s interesting for Dag and the other folks at Viv – once again, they build a company and sell it before ever really going to market. It doesn’t suck to be them.

Semantic Web Services

When I first started thinking about the World Without Apps, I wasn’t thinking it would be a smartphone phenomenon; instead I imagined everything except the smartphone being the key to this revolution (my house, car, office and so on). I never expected smartphone processors to be able to handle the load or that cloud technologies would be so prevalent and so powerful that they could support things like Google Now and Apple’s Siri. How wrong I was – smartphones are core to this WWA revolution, and I’ll write more about that later.

Regardless of how WWA is implemented, there’s a core piece of technology that enables it, the ability for web services to self-describe themselves to any consumer. As I mentioned in an earlier post, Amazon Alexa, Apple Siri, Google Now, and Microsoft’s Cortana are code-locked; they can only do things that they’re coded to do and nothing more. Now, these technologies have an out that enables them to handle most any request (at least most but Siri): when they can’t figure out what you’re asking them, they revert to delivering search results hoping you’ll find the answer you want there.

However, sometimes search doesn’t work that well for consumers as shown in the following figure. In this example, I asked Siri how long it would take me to drive to the moon. A reasonable question, I thought; I was watching a special on the Apollo program and I was curious. Siri heard my question correctly as you can see from the figure (although she skipped adding a question mark for some reason), but even when correctly ascertaining the meaning of my words, she still didn’t get it and instead told me how long it would take me to drive to Howl at the Moon (I didn’t even know we had one of those locally).

Siri Drive to the Moon
Siri’s Silly Answer

 

For voice interaction to work (whether it’s through an app or some universal agent), the data aggregator (the agent) has to know how to know a lot of different things. It has to know things, and it has to know how it knows things or at least where to find things. I’m not talking about Watson playing Jeopardy, but some sort of universal agent that just, well, knows things.

How does it know what it knows? Semantic Web Services.

We have the Semantic HTML, technology that lets web pages self-describe itself so computers can divine information from the page more easily. In order for digital assistants, agents, or whatever you want to call these things to be able to do your bidding, they have to have a catalog capabilities available to them. That means that developers expose their capabilities through web services then describe them in such a way that multiple agent platforms can consume them when they need to.

The flaw of the Google Now, Siri, Cortana or Alexa approach is that there’s no way to augment a service’s capabilities without writing platform-specific code. Google Now, Siri and Cortana expect local apps to provide services, so to extend those services you’ll have to build an app and convince users to load it on their phones. You can extend Google search by publishing data in a way Google can easily consume, and therefore make your stuff available to Google Now, but that’s a hack and there’s no guarantee Google will use your data in the way you intended.

Alexa and Viv on the other hand expose a cloud API developers can use to publish their capabilities to the service. Alexa enables developers to define what words they want used to invoke their service, and you can only consume your services when those specific words (or their variants) are uttered. Viv on the other hand is supposed to be exposing an SDK developers can use to describe their services and publish them to Viv. Once there, Viv can use them in any way she wants to deliver results to her users.

The way I look at this is that development organizations have to start thinking about how they’re going to expose their money making capabilities in a very generic way so that they can be consumed by ANYTHING. The web and mobile apps become core channels, ones that always need accommodation, but new channels, or at least new agent technologies, will pop up and demand attention. What we need to accommodate this is a standard definition of semantic web services, so developers can focus on only one way to easily expose their capabilities to the world. Instead of wrapping services in each vendor’s SDK, we need one common SDK that accommodates all. With this in place, there’s no limit to what the World Without Apps can accomplish.

Voice-enabled Agents Everywhere

One of the reasons I became so certain that we were entering the age of no apps is the rise of voice agents. My BlackBerry had voice control capabilities years ago and Android and iOS both added capabilities well. My 2007 car supported an option for voice control, but I never installed one. The precipice was when I purchased an Amazon Echo for the kitchen. I’d started using Google Now and Siri for things, random searches, phone calls and opening apps, but when the Echo came into the house, it changed music for me.

I’m a huge fan of Sonos (www.sonos.com); I’ve got 4 of them in the house and I was expecting to add more before the Echo came to town. For a long time, I desired the ability to control my Sonos devices via voice control. I just wanted to be able to walk into a room and ask the device to start playing music. Sonos hasn’t adopted voice control, but as soon as I got an Echo, I found that I could play whatever music I wanted plus do more (such as check my calendar or put items on my todo list). What I quickly found was that I abandoned the Sonos device in the same room where I had an Echo since it was so much more work to open up an app to pick what to play when I could just ask for it. It didn’t matter that the Echo sound quality was lower than Sonos, the convenience factor so outweighed sound quality that the Sonos is now collecting dust. Apparently a lot of Sonos customers have made the same switch.

The biggest issue is that so many personal or household devices are adding voice control, that there’s so many devices listening to you at all times. I even have multiple Android devices, so when I say “OK Google” multiple devices answer. In my house, I have 6 Android devices, one iOS device, two Echo devices – all of which are constantly listening to us, waiting for the next command. We also have an Xbox as well as a couple of smart TVs, all of which are listening as well.

So, in this World Without Apps (WWA), apps are going away at the same time that devices that are listening to us, in order to help us, of course, are increasing. We’ll have voice-enabled agents everywhere, all vying for our attention. How exactly do we deal with that?

Well, I imagine our cell phones become less important. With ubiquitous network connectivity and smart devices surrounding us, why do we need to carry a physical device around with us? I imagine that we’ll need access to screens in order to be able to interact with those data-driven apps I mentioned in an earlier post. Beyond that, you can interact with your surroundings without having a physical device in-hand. Initially, those agents will be able to communicate with you through a Bluetooth headset for example, but eventually, once embedded technology becomes prevalent, you’ll be able to have private interactions with these agents through the electronics embedded into your brain. How cool would that be?

A World With Less Apps (WWLA)

In my first post, I defined what I mean when I talk about the World Without Apps (WWA), the premise for this site. I may not have been entirely honest with you.

In reality, I recognize that we’re really never going to enter a world without any apps, what we’re really entering is a world with less apps (WWLA). The reason I picked WWA instead of WWLA is that WWA is, well, more aggressive. Is it more startling to think of a world with no apps then work backwards to a realization that we’ll really have less apps going forward or to start with WWLA and leave it at that?

I think we need to focus on WWA – work to understand what that means then focus our development efforts on delivering that world to the betterment of, well, everyone. Will we ever get there? Perhaps, but likely not soon (in the next 10 years). But, if we can simplify app-like capabilities and interaction interfaces to the point where more can get done without needing fingers poking at glass (FPAG), then we’ve accomplished something.

Will we continue to need apps? Yep, absolutely. There are many data input, data browsing and analysis apps that could be delivered without apps and screens, but those will primarily be enterprise apps and the subset of data-driven apps that consumers need (such as banking, travel and so on). In those environments, I expect that voice will continue to be the primary interaction, and smartphone or tablet screens simply being data display or data-interaction targets.

As agents become more capable, they’ll take over more and more tasks from apps leaving few apps around for us to use.

What is the World Without Apps (WWA)?

As I launch this site, it seems appropriate to describe the overall premise its built on.

Mobile as a product category is changing lives everywhere. Everyone reading this article probably has a smartphone and likely has at least one tablet as well. It’s even likely that this article’s being read on a mobile device as well since desktop PCs and even laptops are less frequently used for web browsing. What makes mobile devices so useful and popular are the apps the devices execute to enable its user (you) to accomplish things. From the beginning of computer time, apps have been hand crafted by one or more software developers to accomplish something. App users use one or more apps to deliver some result. If one app is enough for a job, users fiddle with fields, buttons and menus to, well, do something. If one app isn’t enough, users learned that they could take the outcome (results) from one app and plunk it into another app to get the next result, stitching apps together in series to reach a specific goal.

What then is The World Without Apps? Well, it’s being able to accomplish a specific goal without using any apps at all. We’re in a world driven by apps today, but we’re quickly moving to a place where apps are no longer necessary for many tasks. Let me see if I can explain…

Early Inter-app Interaction

When Microsoft created Dynamic Data Exchange (DDE) and later Object Linking and Embedding (OLE), amazing new things were possible. An app could now call out to another app and execute some task, then use the results. This enables developers to dynamically link apps, building apps that focused on a limited set of tasks, then stitch together those apps to deliver a more…sophisticated result. This was a great capability, although a bit flaky. This inter-app communication capability created dependencies and the potential for memory leaks that could wreak havoc in complicated processes and on computers with limited resources.

Inter-app Interaction on Mobile Devices

In mobile apps, this interconnection is easier – it’s a simple process to enable capabilities in an app that can be invoked from another app. Apps can publish capabilities that only sibling apps can invoke (a secure approach) or an app can publish capabilities that any app can invoke. With this last approach, an app tells the OS it wants to do something (usually through a file or URI) and any arbitrary handler can respond indicating that it can handle the task. This works, but unless each program involved in the activity was coded by the same developer, the app initiating the request can’t necessarily control what happens in the invoked app.

Agents to the Rescue

Voice added an interesting interaction to mobile devices. Now, instead of poking around at different apps, you could simply tell the device what you wanted and it would try to figure it out for you. Basically this is nothing but voice enabled search – the ability to speak a query instead of type it in. The voice interaction system had the ability to perform certain tasks for you such as opening an app, calling a specific contact or specified phone number or even sending or responding to text messages. Beyond delivering search results for you, the extra capabilities provided by voice enabled search exist only because the browser or the mobile device OS developer added additional capabilities – code-locked capabilities that can only deliver what a developer has coded the engine to deliver.

With Google Now, Google added capabilities to the Android OS that enables the device to guess some information that you might find helpful and deliver that to you through a special panel and, later, through notifications. Apple even added a vaguely similar capability to iOS where an extra panel was added to the home screen where iOS tries to guess what apps or information you’ll want to use next. Microsoft’s Cortana can do a little of that as well, but you have to invoke her to gain access to that feature. This means is that your mobile device is figuring out how it can help make your next task easier for you. This approach is the first step in the World Without Apps.

It’s pretty interesting how quickly this became a very capable service. I remember going to a concert with my son and noticing that my Android device starting letting me know that it knew how to get me back to my car and also how long it would take me to drive home. When out for lunch, Google Now would start notifying me how much time I had before I’d need to start heading home in time to make my next call or how long before I’d have to leave to make my next appointment.

The most startling example was when I was driving from Raleigh to Charlotte and Google Now notified me about an accident on the highway in front of me. What was most surprising about this is that I wasn’t using the device’s navigation capabilities (Google Maps), I was instead using the navigation system built into my car (which my phone new nothing about). The device knew I wasn’t home, but knows where home is. It also knew that I was on the highway and that there was an accident in front of me and decided to let me know about it (how it knew about the accident escapes me). That’s the best existing example of the WWA that I have – the device using information it has about me and my environment and deciding what to do about it.

Now, in all of the examples I just provided, it was an app doing this – Google Now, but that particular app is tightly integrated with the device OS (and other apps running on the device). It won’t be long before Google Now stops being a stand-alone app and instead becomes a core service running in the OS. Once that happens, individual apps will start to disappear; they’ll be subsumed by Google Now (or Google Assistant, the new flavor of this) and any app-specific capability that is needed will be converted to a system service running in the background and available to the OS.

Keep in mind though that everything Cortana, Google Now, and Siri can do, they can do because some developer has written code to enable that feature. All of the intelligence we’re feeling from our devices isn’t intelligence at all, it’s simply code-locked features a developer has written and included in the OS or in a virtual agent. All of this gets much more interesting when mobile devices, or any sort of device in our surrounding environment, is able to deliver capabilities beyond actions that have been pre-coded by a developer. That’s the end game WWA and ultimately why I started this site – to write about the ever changing capabilities in this space ultimately leading to the day where we’re not writing mobile apps, but instead enabling capabilities that can be consumed by anything in order to make our lives easier.