Tuesday, April 10, 2012

A critical review of the "Google Glass Video"


So around here there's been a lot of discussion about the Google Glass video:

I've been telling people "this is coming" ever since I read the review of this patent application.

But now Google has laid down the gauntlet with their ambitious video of what future user interfaces might look like. No keyboards, no mice, no touching, no arm motion of any kind -- just a series of head nods and possibly some pupil tracking.

The video itself is a fun watch and I'm going to attempt to break it down in more detail than I probably should, and do an analysis of the technology behind it.  


First I should mention that everything in the video is technically possible, but not necessarily plausible. What is potentially more interesting is what is real and what isn't, and what Google chose to show and what it didn't. 

There is, as near as I can tell, only part of the video that is highly unlikely to ever happen. 


The Beginning:
The video starts as our protagonist has a laptop sitting on the table - interesting?
I think they do an intentional fade to hide the fact there is a device on the coffee table that alarmingly resembles an iPhone on the table (maybe it's a Nexus 1).
But where is the tablet?

A few seconds in, we see the first "fade effect" where reality gets blurred out and the user interface from the glasses are in the foreground. This time the user interface is triggered by looking down and when he does it brings up a big menu with lots of alerts (running apps). It's not clear how a person might interact with all those.

0:05 - Pour the Coffee
As the protagonist is pouring the coffee staring at a white wall, the device
  recognized the coffee, and white wall - knew he was 'taking a break' and activated it's displaying popup alerts. These are modeled like the Android alerts, and they're probably coming from his cell phone.
Google Goggles (not to be confused with Google glass) can reliably recognize coffee pots and cups today.

0:14 - Look Outside
As the protagonist is moving, notice the alerts automatically go away to avoid

causing an accident or disorientation.

As he looks up it detects the skyline, the device displays the weather.
This is the first time we get a clear indication that the device has a small camera monitoring the direction of his pupils. Then at 0:18 when he holds still, the device confirms he definitely is looking at the skyline and is focused on the weather and it displays more detailed information. Computationally this is massive since it requires recognizing what it's looking at, but by using GPS, geolocation, etc. it's got a lot of hinting using existing skyline and location data from Google that could be pre-computed for fast interaction. 

Google already has patents for detecting things like skylines and common objects like coffee pots and has for years.

If you're monitoring pupils you can tell where the eye is going and what has been read - this can lead to much more interactive user interfaces since the eye moves faster than the brain.  In fact the eyes and eyelids are the fastest muscle in the body (much faster than fingers) so in theory - a combination of eye movements and blinks could be used to convey any number of signals to a device.

This probably seems silly, but I'm one of the people who believes texting today wouldn't have been possible if Nintendo and eventually Sony hadn't taught my generation how to be so darn agile with our thumbs.

0:19 - Mmmm.. Breakfast.
He's eating a breakfast sandwich wrapped in paper - so clearly he went out.

There was supposed to be more to the story here. Makes me wonder what got
cut from the video.

At 0:21 his buddy (Paul) icon pops up in the right corner. This person is clearly
  in one of his Google Circles with permission to interrupt while eating.
At 0:22 the voice detection is automatically on and it realizes the context of the conversation will be a response to what "app" is currently in focus. Understanding context in a conversation is very important (and very computationally difficult).

At 0:26 the protagonist chooses Strand Books, the system clearly has a high confidence
in the message and confirms right away without any auditory "confirm".  Perhaps there was an ocular confirmation.
My best guess is that because it has the Strand Books app loaded already it can guess with high certainty that it's the proper location.
The interface did not check his calendar, but it did use the context of the conversation to infer that Paul would be at Strand Books.
One can assume this confirmation was transmitted to Paul, who gave a simple nod and then it was accepted by both parties.

0:36 -- Time for a subway ride? Not so fast!
What they're trying to show here is the device knew where the protagonist was going (Strand Books) and which subway he'd need to take to get there: Green line train #6.
It recommends a walking route - or perhaps he selects it by turning his head to the left slightly.
The key is - it knows where he's trying to get. 

This is in my opinion the least plausible situation for two reasons:
First - the NYC still runs OS2 Warp for it's cash/debit  system -- so it might require some upgrades before it's fully augmented-reality enabled.
Second - I've only been to New York once in my life and I stayed near Union Square and Grammercy Park and within a day or two I knew how to get around. The idea that a hipster
who resides in New York would want walking directions automatically is pretty silly.  (Are you listening Google?)

There are some other mistakes in the video
that make me suspect it was written by a Californian.  (But why not shoot this in the Bay area then?)

0:49 -- What cute poochies!!
I guess the dogs are just there to be cute? Or what interaction was missed?

Being from San Diego, and having visited New York I discovered that one of the fastest ways to make a New Yorker uncomfortable is to look a complete stranger in the eyes and smile, or say something nice to them.
In NYC you would NEVER pet a strangers dog -- so either this is a Californian's view of NYC, or the dude in the video knew the dog walker and we missed a whole interaction.
The woman has three dogs, she's probably a professional dog walker, so it's plausible she's a friend of the hipster protagonist.  But since we never see them interact, chances are he doesn't know her and she's not going to let that dude get anywhere near those dogs.
Petting a stranger's dog in California is normal. In New York around Union Square everybody
walks around with their headphones on and avoids any type of eye contact - she wouldn't even stop. And for that reason I think this part of the video is totally implausible and would likely NEVER actually happen.

0:55 -- the Monsieur Gayno conspiracy!
Pretty cool, it uses optical character recognition to recognize the
text Monsieur Gayno live, the date, and venue. In reality the poster would probably have a QR code embedded on the poster
someplace to provide more reliable information. In fact it's hard to find a poster these days that doesn't have a QR code. 


At 0:59 the entire screen fades when he interacts with the reminder application. Again we see the "reality blur" while he's interacting with the reminder application.

Now at 1:00 -- my favorite Easter egg.
The date of the poster -- 05 May 12, is in the future. Maybe it's a future release date.  Yes that's right, we're looking at the future.  He references buy tickets for tonight, or maybe he wants to do it tonight. The context here is key because even I can't understand when/how he wants tickets. 
I originally assumed that the concert is tonight (as I think most  people watching did) - but if it is then this video was shot in the future.  I think he wants a reminder tonight, but why not just buy tickets right now, or see who else from his hipster circle would want to go to the concert?

Now for the conspiracy part: It's interesting because Monsieur Gayno has a Facebook, and Google+ page (now). B
ut he doesn't exist, he's not real. 
Things that were omitted that shouldn't have been, our protagonist doesn't invite his friend or his crush Jess, and he doesn't buy the tickets there. The icon goes to a notepad, save the date on the 31st - so perhaps that is the day it's supposed to be.
The venue Teacrab Hall, NYC doesn't appear to be a real place either. 

1:03 - Welcome to Strand Books
At 1:03 - he walks into the book store and asks "where is the music
section?".
When he entered the Strand bookstore, the app was loaded (or was already installed and
given permission to run) .. either by location, or by a qr code on the door or poster. 
It also appears to do an auto-check in, or at least acknowledge the location.
There's been discussion about letting future mobile apps automatically provide free wifi keys when the person is constrained to a geographic location (ex: free wifi for customers).
This is the first time I've seen allowing an app to run/interact based on a geographic location - which I think is a neat idea.
"This app wants permission to auto-start when you're in range of this wifi" 
is something I'm likely to do, so I can see specials, etc. as I walk through a store.
Google Play already has features that make it intelligent about downloading information from the cloud when I'm connected to WIFI so I don't burn up my bandwidth allocation from my provider.  So a store gives me free WIFI while I shop as long as I download their app. Free WIFI also lets them track my location with a very high degree of accuracy using triangulation -- much better (and more power friendly) than GPS.  Best of all they can do it without permission or changing any device settings.

At moment 1:10 - our protagonist picks the book "oh yes, this is it",

indicating he knew which book he was going to buy.  Both the selection of the book, and the payment of the book are both omitted from the video. Interesting - Strand Books is a real place:
www.strandbooks.com (more on this later)

1:16 - So where is Paul anyway?
The question 'Is Paul here yet' - the system is smart enough to
recognize which 'Paul' he's asking about. Again demonstrating how important context is. The video shows:
"Paul is sharing his location" so that tells us that Paul is also equipped and has given our protagonist access to his location. I wonder if his glasses are smart prescription glasses or just smart glasses.

1:32 - The mud truck
The mud truck is real, and it's claim to fame is parking at the
corner of 4th Avenue and 8th Street in between two Starbucks. While I haven't been there, it's reputation is it sells a better quality product,  at a better price and apparently does quite well at it . Showing that the little guy can beat Starbucks.
Perhaps this is a metaphor for Apple, or just the little guy beating the big corporate giant.

1:40 - Neat Picture (or is it?)
A lot of people think the picture taking is obvious. I agree, it's actually too obvious, which makes it suspicious to me. It doesn't fit, so many other interesting things that could have been included got cut to keep the video to 2:30 - so why include a fairly obvious application like capturing and sharing a photo.
The picture of the glass is neat, but if you look closely on the ground is written:

"The past is preSENT the future", "The T"

The Google search of the term results in two matches, one is a Youtube link to a techno/rave soundtrack. The second is an academic paper published in 2008 about the approach for conducting large-scale social/economic field data for a scientific study using the Internet.

If it's a coincidence, it's a pretty cool one.
I'm not sure what the point of that picture was.
I do know the paper was written by Steven D. Levitt, John A. List.  Steven Levitt is one of my favorite authors (Freakonomics). In addition the picture is of a person holding up a martini glass, in a sort of salute.  The question is -- do you realize the amount of field data that can be acquired using these wireless enabled camera glasses.

1:54 - Running Late

At 1:54 he says I'm running late, but this was a cut from an earlier
scene, a lot of time has elapsed.  Notice how as he's moving fast the graphics become a bit more transparent. 
It's an interesting user interface approach.  The use of blur, and transparency is major - it requires massive computational power to discover the intent of what it should do, and which call it should hold.  (Notice Jess is on hold while he runs up the stairs).
 
The dude in the video is clearly in shape - because he's on top of a fairly tall building after running up the stairs with a ukelele and he's not even out of breath.

At 2:05 the camera is off, not sure why he's looking down - I can
only assume that originally the camera was assumed to be on, either way.  Bone-headed indicators like "camera off" means the default position is "camera on". 

Jess isn't at all suspicious that the camera is off, if this was my girlfriend she'd instantly start screaming "Where the hell are you? Why isn't your camera on? What are you hiding??"

Fortunately at 2:07 he turns the camera on, one can assume that is a button on the outside of the device or maybe from his cell phone (we can't see his hands).

Either way at 2:12 - when Jess (Jessica)
says "it's beautiful" I certainly agree.

Final Thoughts

I found it interesting he went to a book store to buy a book. Obviously that
could have been downloaded, or watched as a video. A book is clearly something *MOST* people think about buying online - so it might be a nod to the upcoming importance of local search.

I'm surprised he already owned a ukelele and carried it with him all day,
while he went to the book store to buy a book to learn to play it. Perhaps they weren't going for plausibility here, perhaps this person just doesn't like Amazon. :-)
 
In conclusion - I think we may have a good indicator why Google may have purchased Motorola Mobility. I think Google has every intention to let any company who wants to make an Android smart phone because the phone will become nothing more than a personal smart box that provides processing power and an interface to the network.

Did I miss something, do you have your own thoughts? Please share!

1 comment:

Zoovy Mitch said...

Actually, New Yorkers tend to be a lot nicer than Californians who just think about themselves and most often are not too honest in their interactions with one another...Californians just feel uncomfortable in NYC, due to the "nature of an actual city as compared to the many "bedroom" cities of CA, esp Southern CA which by the way, is not very liked very much by Northern Californians".