Max – #noAlexa

Voice assistants are a thing, as you can see with the success of Alexa and Mark Zuckerberg’s Jarvis demo. When I saw this, Max – our voice assistant – was already up and running in its initial setup ūüėČ

Ivonne¬†created a Node.js project which enables a always-listening service to power our digital assistant. It is built to run on a Raspberry Pi 3 with Raspbian Jessie. We call him Max inspired by the comic ‘The Thirteenth Floor’ where Max serves as the Maxwell building’s main computer.

Below you will learn what components we have used to create our servant.

General Architecture and Approach

The system base is the Raspberry Pi system. It is setup with active loudspeaker connected to the line-out. It connects to a USB microphone (SAMSON Meteorite USB Condenser Microphone) and a blink(1) mk2 USB notification light. The Raspberry Pi runs with a 8GB SD Card and connects to the network using internal Wifi.

To create Max, the node process listens to the microphone to capture hotwords. Once a hotword is identified, the process actively records the speech and forwards it to an online speech-to-text service/Natural language processing to get the intent of the speaker. The intent and entities are used to apply business logic and trigger actions.

To signal the system state to the user, the blink is lit up in different colors to indicate standby/listening/processing/… status.

Max will also use text-to-speech capabilities to give feedback to the user as voice.

The first use case was a TV remote control utilizing a Samsung TV which can process key commands per TCP socket and SOAP Webservices.

Components

Audiorecording

Max uses arecord wrapped into node “mic” package. Inputstream of mic is piped to the hotword detector. Once a hotword is detected, Max switches to active listening and processes recorded data until a silence event is emitted by the “mic” or if a max recording time is reached.

Hotword detection

Max uses snowboy for hotkey detection using personal models (pmdl) created on snowboy website.

Speech processing

Mic delivers data events. Data chunks are collected in a buffer array. Once the hotword detection triggered, the collected data chunks after the hotword and further speech are streamed to wit.ai. This allows to have a continuous speech and does not require to have a pause to indicate active listening making interaction very fluid.

Speech-to-text is powered by wit.ai which is trained for the required intents and entities.

Business Logic (Bot-logic)

The logic checks the data and internal context to derive actions and control the system state. A local MongoDB is used for persistence of context and other data.

The Bot-Logic components are constructed as Node.js modules to handle the trained intents. So for each intent a new sub-module is created and the context store is used to allow all modules to share some data and insights to the user’s needs.

Sound Output

To give Max a voice, initially espeak was used. Max now utilizes Pico TTS libraries to provide a soft (female) voice output. Pico TTS is powered by node speaky module which wraps the pico libraries for access from node.

Next to voice, Max plays short soundfiles to e.g. signal error states.

blink(1)

blink(1) provides a node package for controlling the blink lights.

Samsung TV

For pre-2014 tv-sets channels can be controlled via send-key API using samsung-remote npm module. In addition, the (non public) SOAP API is used.

Ready, Set, Action

To see how Max looks and sounds like now, check the tweet below. As we are german, Max is speaking german, too.

We are expanding the use case of Max over time. Currently Max can control the TV, handle reminders, send Slack messages, provide weather and time information, can query Wikipedia and read latest news pushed by twitter feeds. Because Max can distinguish between Ivonne and myself the services are customized to our preferences.

Advertisements

¬ĽIt’s on my list¬ę for Apple Watch

Ivonne¬†and I are happy to present the next iteration of our shopping list app ¬ĽIt’s on my list¬ę.

Our app is now available for Apple Watch and we tried to keep things simple:

The shopping list is pushed to the Apple Watch app. So you can tick off the items on the watch while walking through the grocery store. The order of the items is taken from the main app, so you can optimize the path through the store.

You can defer items to the end of the list and … that’s it.

The app features a glance which acts as a quick start and shows the number of items and the top items to buy.

shoppinglist_watch_main_en@2x

It’s on my list

App_Store_Badge@2x


Developer lessons learned

While we created the Watch app, we learned some lessons about WatchKit for watchOS 2.0, iOS and swift. Apple provides great guides and documentation, so building a Watch app is fairly easy and straight forward. As always, one should read the documentation thoroughly. Beyond this, we found following items worth mentioning. They might help you to avoid some pitfalls during coding your Apple Watch app.

No Translation in Watch companion app

We tried to localize the app name displayed within the Watch companion app. But it seems, it is stuck to the base location. You might follow this stackoverflow post to know more.

And as a bonus, currently you cannot programmatically open the Apple Watch companion app from your iPhone app. Again, see stackoverflow.

Simulator and background tasks

Testing with background task has some issues in the simulator. The remaining-time-in-background was not working out right, but there were no issues on actual devices. Our app even uses NSTimer in background, which is possible, as long as you stick to the rules of background processing.

Bitcode is mandatory

Apple says this:

For watchOS and tvOS apps, bitcode is required

In general this is not issue, but we are using the Dropbox SDK which was not updated to support bitcode. So we ended up including the Dropbox SDK source and compiled it on our own – which was fairly simple and straight ahead. But please make sure to add the Dropbox URL schema

For iOS 9+ the source code for your Info.plist file should now have the following:

<key>LSApplicationQueriesSchemes</key>
<array>
 <string>dbapi-2</string>
</array>

Callbacks to WCSession are not dispatched to main thread

All callbacks to WCSession do not run in the main thread, so you might need to dispatch your work accordingly. Also this might cause timing issues, because as soon as a WCSession exists, callbacks might be triggered – even if your app was just launched in background and your app start lifecycle is still on its way.

transferUserInfo: callback has an error exit

Read the WCSessionDelegate docs. We did not realize, that there is an error callback for transferUserInfo calls. We thought this is a fire&forget call, similar to updateApplicationContext, but we were wrong.

Context Menus trigger WKInterfaceController Activation Events

We did not expect this, but showing a context menu actually will result in your WKInterfaceController didDeactivate (and willActivate accordingly).

Fresh install might kill initial Watch app start

A fresh install of our watch app would not start while the Watch Companion App is open and active. No crash report was created, but it looked like the initial start just took to long and the system killed the process.

The culprit was, that we initialized some properties when the first Interface Controller was loaded. Just using swifts lazy keyword saved us

lazy var communicationManager = CommsManager()

Glances do not get lifecycle methods

This is well documented, but you might still stumble over it.

Have fun and be awesome!

2015 desk stuff

This is a short inventory of items Ivonne and I are using for our iOS dev hobby. I am trying out new tools regularly or need to switch services as the world changes. I prefer services with one time fees or free plans.

Development

  • Xcode (obviously)
  • Brackets (for HTML, CSS¬†and Javascript)
  • Textastic as quick text editor

Collaboration

  • SourceTree, managing git
  • Dropbox, mainly for hosting the git remote repository
  • Slack for exchange urls/articles for our projects and for git push notifications
  • Asana for project and task management
  • AirDrop so often, that it should be on this list

Design

Besides the collaboration items, Ivonne has her own tooling and we really do not keep in sync.

A J2ME tale – touch before the iPhone

tl;dr We wrote a Java Mobile Sudoku game with touchscreen capabilities in 2006 which shipped preloaded on a touch-only phone distributed by MTV in France (yes, that¬†MTV) – just before the rise of the iPhone. ¬†It showed¬†a glimpse¬†into¬†the future. And we are proud of it ūüôā

Once upon a time – to be more precise in Dec 2005 Ivonne an myself decided to write a J2ME game. I did write¬†some midlets before, but this time we wanted to use our Nokias to play¬†sudoku games (you guessed it…). So we did and the code was optimized to run on CLDC1.0 and MIDP 1.0 profiles and the jar was below 64k.

It could make use of CLDC 1.1 and MIDP 2.0 profile features if available and already had some (imho) great features:

It generates and solves puzzles. You could also input and store your own puzzles. It would already allow you to enter pencil marks on such tiny screen (see screenshots below) which are indicated by colored dots.

There are some hint-features within the game, as well as options to adjust colors and behavior. On devices equipped with a pointer, you could use it to enter numbers (Read: Touchscreen). Finally fellow devs added internationalization providing norwegian, russian, slovak, hungarian, polish,  french, greek language sets next to english and german.

This is how it looks on a Nokia 7210 and with pointer input capability:

5ud0ku - 72105ud0ku - pointer

It is still downloaded hundreds of times/month from my website and google returns a bunch of download sites.

Ivonne and I are very proud of that tiny game.

After¬†5ud0ku went out to the world, we have been asked to sell it (we didn’t) and we have been asked to allow it being preloaded on an Modelabs¬†MTV 3.0 phone. Yep, MTV – Music Television – did create their own phone / branding. So while this thing never changed the world, it already was a screen only device with only four hardware buttons and our game was one of few games out which could be easily operated by the¬†touch of your fingers. The MTV 3.0 came with a stylus for its resistive touchscreen, but you could work it with your fingers. Looking back from today, it was a glimpse into¬†the future.

Find some pics below of that MTV 3.0 mobile from a time just before the rise of the iPhone and a screenshot of the feature website.

5ud0ku_mtv3
5ud0ku on MTV3.0
dial_mtv3
Dialpad MTV3.0
mtv.fr/mobile in 2006
mtv.fr/mobile in 2006 courtesy of The Wayback Machine

video – 5ud0ku running on the MTV 3.0