Firefox for Android Accessibility Refresh

When we first made Firefox accessible for Android, the majority of our users were using Gingerbread. Accessibility in Android was in its infancy, and certain things we take for granted in mobile screen readers were not possible.

For example, swiping and explore by touch were not available in TalkBack, the Android screen reader. The primary mode of interaction was with the directional keys of the phone, and the only accessibility event was “focus”.

The bundled Android apps were hardly accessible, so it was a challenge to make a mainstream, full featured, web browser accessible to blind users. Firefox for Android at the time was undergoing a major overhaul, so it was a good time to put some work into our own screen reader support.

We were governed by two principals when we designed our Android accessibility solution:

  1. Integrate with the platform, even if it is imperfect. Don’t require the user to learn anything new. The screen reader was less than intuitive. If the user jumped through the hoops to learn how to use the directional pad, they endured enough. Don’t force them through additional steps. Don’t make them install addons or change additional settings.
  2. Introduce new interaction modes through progressive enhancements. As long as the user could use the d-pad, we could introduce other features that may make users even happier. There are a number of examples of where we did this:
    • As early as our gingerbread support, we had keyboard quick navigation support. Did your phone have a keyboard? If so, you could press “h” to jump to the next heading instead of arrowing all the way down.
    • When Ice Cream Sandwich introduced explore by touch, we added swipe left/right to get to previous or next items.
    • We also added 3 finger swipe gestures to do quick navigation between element types. This feature got mixed feedback: It is hard to swipe with 3 fingers horizontally on a 3.5″ phone.

It was a real source of pride to have the most accessible and blind-friendly browser on Android. Since our initial Android support our focus has gone elsewhere while we continued to maintain our offering. In the meantime, Google has upped its game. Android has gotten a lot more sophisticated on the accessibility front, and Chrome integrated much better with TalkBack (I would like to believe we inspired them).

Now that Android has good web accessibility support, it is time that we integrate with those features to offer our users the seamless experience they expect. In tomorrows Nightly, you will see a few improvements:

  1. TalkBack users could pinch-zoom the web content with three fingers, just like they could on Chrome (bug 1019432).
  2. The TalkBack local context menu has more options that users expect with web content, like section, list, and control navigation modes (bug 1182222). I am proud of our section quick nav mode, I think it will prove to be pretty useful.
  3. We integrate much better with the system cursor highlight rectangle (bug 1182214).
  4. The TalkBack scroll gesture works as expected. Also, range widgets can be adjusted with the same gesture (bug 1182208).
  5. Improved BrailleBack support (bug 1203697).

That’s it, for now! We hope to prove our commitment to accessibility as Firefox branches out to other platforms. Expect more to come.

Firefox for Android Accessibility Refresh

Using Pre-release Firefox on Linux

Every committed Mozillian and many enthusiastic end-users will use a pre-release version of Firefox.

In Mac and Windows this is pretty straightforward, you simply download the Firefox Nightly/Aurora/Beta dmg or setup tool, and get going. When it is installed it is a proper desktop application, you could make it your default browser, and life goes on.

In Linux, we rely much more on packagers to prepare an application for the distribution before we could use it. This usually works really well, but sometimes you really just want to use an upstream app without any gatekeepers.

The pre-release versions of Firefox for Linux comes in tarballs. You unpack them, and could run them out of the unpacked directory. But it doesn’t run well. You can’t set them as your default browser, the icon is a generic square, and opening links from other apps is a headache. In short, it’s a less than polished experience.

So here is a small script I wrote, it does a few things:

  1. It downloads the latest Firefox from the channel of your choosing.
  2. It unpacks it into a hidden directory in your $HOME
  3. It adds a symbolic link to the main executable in ~/.local/bin .
  4. It adds symbolic links for the icon’s various sizes into your icon theme in ~/.local/share/icons.
  5. It adds a desktop file to ~/.local/share/applications.

It doesn’t require root privileges, and is contained to your home directory so it won’t conflict with the system Firefox installation or touch the system libxul. Typically, you only need to run the script once per channel. After a channel is installed, they will get automatic updates through the actual app.

Nightly running in Fedora 22
See the nice icon?

So, here are some commands you could copy to your terminal and have pre-release Firefox installed:


curl  | python - nightly


curl  | python - aurora


curl  | python - beta


curl  | python - release

Using Pre-release Firefox on Linux

(re)Introducing eSpeak.js


Look! A flashy demo with buttons!


A long time ago, we were investigating a way to expose text-to-speech functionality on the web. This was long before the Web Speech API was drafted, and it wasn’t yet clear what this kind of feature would look like. Alon Zakai stepped up, and proposed porting eSpeak to Javascript with Emscripten. This was a provocative idea: was our platform powerful enough to support speech synthesis purely in JS? Alon got back a few days later with a working demo, the answer was “yes”.

While the speak.js port was very impressive, it didn’t answer many of our practical needs. For example, the latency was not good enough for making a responsive UI, you could wait more than a couple of seconds to hear a short phrase. In addition, the longer the text you wanted to synthesize, the longer you needed to wait.

It proved a concept, but there were missing pieces we didn’t have four years ago. Today, we live in the future of 2011, and things that were theoretical then, are possible now (in the future).


Today, Emscripten will compile C/C++ code into a subset of Javascript called asm.js. This subset is optimized on all current browsers, and allows performance to be about 2x native. That is really good. eSpeak is a pretty lightweight library already, the extra performance boost of asm.js makes speech instantaneous.

Transferable Objects

Passing data between a web worker and a parent process used to mean a lot of copying, since the worker doesn’t share memory with the parent process. But today, you can transfer ownership of ArrayBuffers with zero copying. When the web worker is ready to send audio data back to the calling process, it could do so while maintaining a single copy of the audio buffer.

Web Audio API

We have a slick, full featured Audio API today on the web. When speak.js came out in 2011, it used a prefixed method on an <audio> element to write PCM data to. Today, we have a proper API that enables us to take the audio data and send it through an elaborate pipeline of filters and mixers, or even send it into the ether with WebRTC.

Emscripten Got Fancy

This was my first time playing with it, so I am not sure what was available in 2011. But, if I have to guess, it was not as powerful and fun to work with. Emscripten’s new WebIDL support makes adding bindings extremely easy. You still get a chance to do some pointer arithmetic, but that’s supposed to be fun. Right?

So here is eSpeak.js!

I wanted to do a real API port, as opposed to simply porting a command line program that takes input and writes a WAV file. Why? two main reasons:

  1. eSpeak can progressively synthesize speech. If you provide a callback to espeak_Synth(), it will be called repeatedly with as many samples as you defined in the buffer size. It doesn’t matter how long the text is that you want synthesized, it will fill the buffer and return it to you immediately. This allows for a consistent low latency from the moment you call espeak_Synth(), until you could start playing audio.
  2. eSpeak supports events. If you use a callback, you get access to a list of events that provide a timestamp in the audio, and the type of event that occurs there, such as word or sentence boundaries.

And, of course, with all the recent-ish platform improvements above, I was really time for a fresh attempt.

Future Work

  • Break up the data files. Right now, eSpeak.js is over a 2MB download. That’s because I packaged all the eSpeak data files indiscriminately. There may be a few bits that are redundant. On the flip side you get all 99 voice/language combinations (that’s a good deal for 2MB, eh?). It would be cool to break it up to a few data files and allow the developer to choose which voices to bundle or, even better, just grab them on demand.
  • Make a demo of the speech events. It makes my head hurt to think about how to do something compelling. But it is a neat feature that should somehow be shown.
  • ScriptProcessorNode is apparently deprecated. This is going to need to be ported to an AudioWorker once that is widely implemented.

I’m done apologizing, here is the demo.

(re)Introducing eSpeak.js

Look what I did to fix Mozilla

You’re welcome.
A heart shaped box with Mozilla embossed on top.

When you have a hammer, every problem looks like a nail. In my case, I have a 3D printer, and every problem is an opportunity to stare at the hypnotic movements of the extruder.

It has a nifty hinge/magnet mechanism, here is a video:

You could download and print this via the thingiverse page, or you could check out the github repo and hack on it.

Some lucky duck has me as a Secret Dino, and will be getting this in the mail soon!

Look what I did to fix Mozilla

Am I Vision Impaired? Who Wants to Know?

There has been discussion recently if websites should have the ability to detect whether a visitor is using a screen reader. This was sparked by the most recent WebAIM survey that highlights a clear majority of users would indeed be comfortable divulging that information to sites.

This is not a new topic, there is a spec in the works that attempts to balance privacy, functionality and user experience. This is also a dilemma we have as implementers and have discussed this extensively in bug reports. Even my esteemed colleague Marco put down his thoughts on the topic.

I have mostly felt confusion about this question. Not about the privacy or usability concerns, but really about the semantics. I think the question “do you feel comfortable disclosing your assistive technology to the web” could be phrased in a dozen ways, each time exposing bias and assumptions about the web and computing.

The prevailing assumption is that the World Wide Web is a geo-spatial reality loosely based on the physical world. Just like a geographical site, a site on the Web resides in a specific locality. The user is a “visitor” to the site. The “site” metaphor runs very deep. When I was first shown the Web, in 1994 I remember visiting the Louvre, seeing the Mona Lisa and signing a guest book. In this world, the browser is a vehicle that takes you to distant and exotic locations. Their names suggested it: Internet Explorer, Netscape Navigator, Safari, Galeon, and the imperialistic Konquerer.

White House Home Page, circa 1994
You mean I could visit the White House from my home?? Do I need to wear a tie???

This paradigm runs deep, even though we use the Web in a very different way today, and a new mental model of the Web is prevailing.

When you check your mail on Gmail, or catch up on Twitter, you are using an application. Your browser is just a shell. In your mind, you are not virtually traveling to Silicon Valley to visit a site. You feel ownership over those applications. It is “my” twitter feed, that is “my” inbox. You will not sign a guest book. Just look at the outcry every time Facebook redesigns its timeline, or after Google does some visual refresh to its apps. Users get irate because they see this as an encroachment on their space. They were happy, and then some ambitious redesign is forcing them to get reacquainted with something they thought was theirs. That is why market-speak invented the “cloud”, which ambiguates the geography of websites and reinforces the perception that the user should stop worrying and love the data centers behind their daily life.

Depending how you see the web at any given moment may change how you view the question of assistive technology detection.

If you are applying for a loan online, you are virtually traveling to a loan office or bank. Whether you have a disability or not is none of their business, and if they take note of it while considering your application for a loan that would be a big problem (and probably illegal). In other words, you are traveling to a site. Just like you would put on a pair of pants or skirt before leaving the house, you expect your browser to be a trusty vehicle that will protect you from the dangers and exposure in the Wide World of the Web.

On the other hand, you may use Microsoft’s Office 365 every day for your job or studies. It really is just an office suite not unlike the one you used to install on your computer. In your mind, you are not traveling to Redmond to use it. It is just there, and they don’t want you to think about it any further. The local software you run has the capability to optimize itself for its environment and provide a better experience for screen reader users, and there is no reason why you would not expect that from your new “cloud office”.

But What About User Privacy?

The question of AT detection is really more about perceived privacy than actual privacy. If you had a smartphone in the last 5 years, you probably got frustrated with the mobile version of some website and downloaded the native version from the app store. Guess what? You just waived your privacy and disclosed any kind of AT usage to the app and, in turn, to the website you frequent. This whole “the Web is the platform” thing? It is a two way street. There is no such thing as an exclusively local app anymore, they are all web-enabled. When you install and run a “native” app, you can go back to that original mental model of the web and consider your actions as visiting a site. You may as well sign their guest book while you’re at it.

In fact, “local” apps today on iOS or Android may politely ask you to use your camera or access your address book, but profile your physical impairments? They don’t need special permission for that. If you installed it, they already know.

In that sense, the proposed IndieUI spec offers more privacy than is currently afforded on “native” platforms by explicitly asking the user whether to disclose that information.


I have no simple answers. Besides being an implementer, I don’t have enough of a stake in this. But I would like to emphasize a cliche that I hear over and over, and have finally embraced: “the Web is the platform”. The web is no longer an excursion and the browser is not a vehicle. If we truly aspire to make the web a first class platform, we need to provide the tools and capabilities that have been taken for granted on legacy platforms. But this time, we can do it better.

Am I Vision Impaired? Who Wants to Know?