Video Subtitles and Localization

Let’s talk about localization and subtitles – not captions. From Wikipedia:

“Subtitles” assume the viewer can hear but cannot understand the language or accent, or the speech is not entirely clear, so they only transcribe dialogue and some on-screen text. “Captions” aim to describe to the deaf and hard of hearing all significant audio content—spoken dialogue and non-speech information such as the identity of speakers and, occasionally, their manner of speaking – along with any significant music or sound effects using words or symbols.

So far I worked on two projects that involved subtitles – Web We Want and Firefox: Choose Independent – and this is what I learned in the process. Continue reading

l10n: String Length and Verbosity across Languages

A few months ago I was discussing with @kaze about the truncation plague on Firefox OS, and he came out with a sentence that left me doubtful:

according to the desktop metrics I had, French is the least compact locale and Chinese is the most compact one

So I had to check it somehow 😉 (in Italy they would call me Saint Thomas for being skeptical).

The basic idea was simple: use Silme to analyze all locales available in Mozilla l10n repositories, comparing string lengths between English and another language.

Here’s the resulting Python script (beware my slowly improving programming skills) and a table with the results (data can be sorted by clicking on column headers).

Sample and Reference

I’m using mozilla-beta as a reference, and comparing each locale against en-GB. Why not en-US? The reason is simple: en-US strings are scattered across the entire mozilla-central repository, so I should do tricks like Transvision in order to create a pseudo en-US string-only repository. Using en-GB leads to less precise results (see below), but for the sake of this analysis I considered it an acceptable compromise.

I’m not checking all folders, only the main ones (‘browser’, ‘dom’, ‘mail’, ‘mobile’, ‘netwerk’, ‘security’, ‘services’, ‘suite’, ‘toolkit’, ‘webapprt’). This still generates an archive of almost 18,000 strings for locales translating all products, so it seems a decent sample.

Caveats and Weird Results

String 1: en-GB 2 characters, locale X 4 characters -> +2 characters, +100%
String 2: en-GB 8 characters, locale X 4 characters -> -4 characters, -50%
Average for locale X: -1 characters, +25% (sum of differences divided by total number of items).

Not sure if this is the best choice, but I couldn’t think of an alternative. Note also that I’m ignoring single character strings (access keys, shortcuts).

In the table you’ll see a global column (average results) and “buckets”, with string groups based on en-GB original length. Too bad these groups are often unreliable because of the “concatenation conundrum”, where one string could be created by concatenating 3 different labels.

Typical example to create a sentence with a link (note that concatenation should be always avoided):

sentence.before = Hey, this is a = very interesting link
sentence.after = .

In Italian this could be localized as

sentence.before = Ehi, questo = link
sentence.after = è veramente interessante.

Do you see what just happened here? Length comparison based on groups just became less interesting, both averages and maximum/minimum differences.

Anyhow, here’s a good image (graph based on global difference in percentage) that I’d like to call “Why using English as a reference for designing UI may not be a great idea”.

Length Comparison - mozilla-beta
Open link in a new tab/window to see the full image

Why not use Gaia directly?

This sounds like a good idea: we have a real en-US repository, and we don’t have concatenations. But there are some disadvantages as well:

  • Most locales already did at least two rounds of QA, so a lot of strings have already been (heavily) shortened to fit in the UI. So data could be less useful and interesting.
  • Several locales are incomplete on gaia-l10n. For this very reason I excluded all locales with less than 1000 strings translated.

Here’s the same table for Gaia. And, again, a similar graph based on global difference in percentage.

Length Comparison - GaiaOpen link in a new tab/window to see the full image

Fun facts:

  • We know that en-GB is 0.16% longer than en-US, at least on Gaia.
  • A simple word as “OK” (2 characters) can become as long as “Kulungile” (9) in Xhosa, or “Ceart ma-thà” (12) in Scottish Gaelic.

Summit 2013 planning assembly: a wonderful begin

Note: this is a guest post from Iacopo Benesperi, a fellow Mozillian from the Italian community.

This week-end took place in Mozilla’s Paris office the Summit 2013 planning assembly: a gathering of about 65 people from all around the world and representing all areas of the Mozilla project, with both paid staff and volunteers, aimed to plan and shape the next Summit, that will take place the first week-end of October in Bruxelles, Toronto and Santa Clara.

TL;DR: it’s been a great assembly. If we manage to accomplish at the next Summit half of the things we’ve discussed in this week-end, it will have been the best Mozilla event ever.

The aim of the assembly was not to define a schedule for the event and fix everything but to talk about which are the important topics that animate the Mozilla project these days, start to discuss them and shape them in a way so that we can come out with a good format for the Summit to address them and try to give and propose solutions for them. To do this, all the planning committee has taken interviews to fellow Mozillians in the last month to have a wider view of which is the temperature of the project in these days and act as a representative for the comments expressed.

I went to Paris without a clear idea of what we would have accomplished there, but I’m impressed with the result we had.
First of all, this assembly was facilitated by people of, who proposed a peculiar way to proceed with it. I was a bit skeptic with the method proposed, but it turned out that some of their methods are really great (like unpanel) and we will definitely adopt them for the next Summit, while some others still look like rubbish (I may still be proved wrong).
The second important fact is that we talked little about technology and a lot about Mozilla, its community, its communication (internal and external) and the interactions between its components and people. On one hand, as Gandalf pointed out, this is a sign that we trust implicitly our technology and the fact that it will be discussed at the Summit, because this is a big portion of what Mozilla is about. On the other hand, it’s a sign that there’s a general awareness, not only among community members but also (finally) among employees and paid staff and board of directors that we have communication problems between the different parts of the projects and especially between paid staff and volunteers, and the time is now mature to address and try to solve them. What I’m talking about is not only communication to get things done but also communication related to the decision-making process.

So, it will be interesting to experiment discussions around different time-zones and locations. I will probably post more about the assembly and the planning for the Summit in the next days, when ideas and thoughts will have settled down a bit and I’ll have had the time to read all the ideas and documentation we produced during this two days. What I felt important to communicate immediately is the fact that the next Summit will be a wonderful occasion to talk not only about our technologies but also about who we are, what we want to do and where we want to go. It will be an occasion for the community to teach and mentor the newest community members and more importantly all the new employees to let them understand and feel the power and importance of our community and it will be, in general, an occasion to have our voice finally be heard and taken into consideration not only in the tasks at hand, but in building the new policies and guidelines that will drive all the project in the future.

I’m sure we’ll try, in the next months, to provide some initial information and documentation about what have been discussed and decided so far so that you can arrive at the Summit prepared to give your contribution to the conversation, so that we can take the most out of the Summit and make it really matter in our future.

As I said at the beginning: if we manage to discuss and propose solutions to half of the problems and concerns raised during this two days, we will have had the best Mozilla event ever; one that will have strengthened and made our project more mature.

Mozilla Italia at Fa’ la cosa giusta 2013

Last weekend, March 15-17, Mozilla Italia took part in Fa’ la cosa giusta 2013 in Milan (Fa’ la cosa giusta means Do the right thing!). For our association this was the fourth time in this particular event: we participated from 2007 to 2009, then we moved to Florence for a couple of years (event called “Terra Futura”) and took a break in 2012. In Milan there were six active members from our community, which is quite a gathering considering how spread we are through Italy, and two guests who helped us during these three days.

Mozilla Italia a Fa' la cosa giusta 2013

Citing from the official site: Fa’ la cosa giusta is a fair about ethical consumerism and sustainable living, with over 700 exhibitors hosted on 29,000 square meters. This year edition had more than 72,000 visitors, among them 3,300 students from 17 different schools.
The number of tech-related exhibitors is always quite limited: for example near our booth there was an area where people from Ubuntu and Document Foundation had talks about their communities and their products, there were also other realities like a lab which takes in old IT hardware (printers, computers, etc.) and restore them, or even low cost 3D printers (Waspproject).

Taking part in this kind of events, compared to more tech-oriented exhibitions, has some positive aspects. For example trying to explain the Open Web, or the importance of web standards and diversity to people who can’t really understand the difference between a browser and a search engine is quite a challenge (Q: “What software do you use to browse the Internet?” A: “I use Google.”). At the end of these three days we welcomed a lot of people at our booth, even a couple of puppies: some of them wanted help for some problems they were having with Firefox or Thunderbird, some others wanted to know more about Mozilla or just say hello to us.

More frequently questions: what is Mozilla? Why are you here, how do you fit in? In some ways answering this last question was the most interesting: what Mozilla does, how Firefox and all other products are created by a non-profit organization and a unique community equally built on employees and volunteers from around the world, what we do as an association in Italy, how our ideals and principles help creating and driving initiatives like WebMaker or WebFWD. And then see these people agree with us 🙂

Fa' la costa giusta 2013 - Mozilla Italia

On Saturday and Sunday the main point of interest was this developer phone running Firefox OS. A lot of people stopped by to see and try the phone: some of them knew the project, thanks to the good coverage of the last MWC 2013 in Barcellona, others didn’t know it at all and wanted to understand what Mozilla is working on. Again, there were questions a lot more frequent than others:

  • How and when will Firefox OS be commercialized? Distribution should start soon in some countries (e.g. Spain, Brazil, Poland, etc.), and then cover other areas. In the meantime people, in particular those interested in developing Apps for the new OS, can try Firefox OS with an emulator or desktop builds.
  • When will Firefox OS be available in Italy? Well, we don’t know 😉 Personally I hope at some point during 2014, considering that Telecom Italia is listed among the partners on the official page.
  • Will I be able to install Firefox OS on my phone and replace Android/Windows? It depends, but it can’t be excluded given the open nature of the project.

I made a set of photos on Flickr, considering the amount of people stopping at our booth I wasn’t really able to shot many photos. Enough said: at the end of those two days I almost had no voice left 😉

Once upon a time there was a string freeze… pt.2

Since it probably looks like my favorite hobby is whining without a reason, let’s check what happened so far (always an optimist…) in this cycle.

Broken strings in Mozilla Beta

  • Bug 797036 – Update updater strings and icon
  • Bug 803344 – poor discoverability of the enable/disable menu item for Social API

Landing strings in Beta means that we did something wrong before (haste of moving forward features that weren’t probably ready, “we need this in ESR”, etc.).

Broken strings in Mozilla Aurora

Obviously the two changesets landed on beta, plus:

  • Bug 795691 – b2g fixes for the web console actors
  • Bug 800373 – Change marketplace strings to ‘Firefox Marketplace’

Consider several adding/removing strings both in beta and aurora (e.g. Bug 803630 or Bug 760951) and you’ll get the picture.

Bug 797036 is a good example of how bad we are working on the l10n side lately:

  1. changes land on central on Oct 02 16:34:08 (end of cycle is only 6 days ahead)
  2. the day after I wrote a comment in the bug about the bad review (that’s pure luck, I don’t work on localization every day, and there are very few localizers doing this kind of checks on central)
  3. nobody reacts, bad strings move to aurora and we need to break string freeze

For a starter a better review process could have avoided all this.

Once upon a time there was a string freeze…

Nine months ago I wrote this post. Are things better now? Not at all, they keep getting worse.

When people asked me “how can you be happy with the rapid release cycle?”, I always answered “because finally I have a clear schedule”. Now imagine how I feel about the rapid release cycle.

I’m not a developer, I’m not an engineer either,  but guess what: if you’re breaking things every single cycle, you’re doing it wrong. I think it would be a good time to start thinking about it, maybe before localizers start giving up.

l10n Memo for the Next Meeting

My personal short memo for the next meeting, even if I’m sure Axel is already on this:

  • Aurora is supposed to be string frozen, so that localizers have a full cycle to update their localization, test their work and sign-off the best changeset available for Beta. This worked quite well for 5 releases, why did everything go wrong this time? We’re just a couple of days away from the end of this cycle (Firefox 10 release, Jan 30th), a backout on toolkit broke everything1 and then a bug on devtools added even more confusion.
  • Being a Mozilla localizer already requires an awful amount of technical skills, please don’t even think of adding more stuff on top of that (“why can’t we or localizers just retrieve the previous string from hg blame?”).
  • Working on two different repositories is painful (see Native Fennec), I realized that I can’t transplant changesets around because often they change more strings that I need, so I have to move text around manually. I’m scared of seeing what will happen when I’ll merge my work from central to aurora.

1 Thanks to our l10n logic this is not literally true, since products fall back to the English string. From my point of view, this still means “breaking things”: exposing a partial translated UI means lowering the quality of our work, and that’s not something I like to do.

Native Localized Android Build

Not without difficulties or issues, but it seems we finally have a localized native build on Android. Kudos to all people involved, I’m pretty sure there’s a lot of work behind this small screenshot (and there’s still a lot more to do).

Having said that, I still wish for better communication toward the project’s “periphery” (see for example how lively is). When I found out that a native build was going to be released as Firefox 11 I was seriously annoyed: what about localized builds? Then I discovered on Mobile Test Drivers‘ mailing list that localization for Firefox 11 would start on aurora instead of central. Good to know, too bad I had absolutely no clue about that 😉

What needs to happen before you open your eyes?

As already happened in the past on this blog, this is a guest post from my friend Iacopo Benesperi (iacchi), a long-standing member of the Italian Mozilla community. I agree with him that we’re living hard times, what’s happening inside our (small) community is probably happening elsewhere. And if it’s not clear, this post is here because we still care about Mozilla and its future 😉

This last has been a hard year for the Mozilla world, in many ways. What is left of this year, among other things, is a very tense situation; a feeling of estrangement in many people, a sour taste in the mouth.
It’s not hard to notice it: you can see it in a good bye letter due to a resignation; you can see it in more and more posts on the planet; you can see it in mail exchanges or chats with other community members; you can see it in long time community members leaving because they don’t believe anymore in what Mozilla is doing; you can see it in all the blog posts of lovers of the free (as in freedom) Web who, because of this reason, care about Mozilla; you can see it in the comments of normal, non-techie users on your national forum, people that choosed Mozilla not necessarily because it’s better, but because it’s different; you can see it in the words of an extension writer, saying that he’ll stop updating his extensions because it’s become impossible for him to keep the pace; you can see it almost everywhere. Everything you need to do to notice it is to open your eyes and start paying attention on what’s around you.

In this last year Mozilla has lost many people but many, many more are the ones who are just an inch away from leaving; people who stay because they care so much about Mozilla and the Manifesto, who think that they can still fight to push everything back on the right path, on what they believe is the right path.
I won’t say here what’s wrong and what should be done, it’s all written already. What I will say here is that if you keep going this way, when (not if) all the people mentioned above will leave Mozilla, or stop believing in it, you may even be able to keep working somehow; but from that moment on, every success (what you consider as a success, anyway) you’ll get, if you will get some, will be a Mozilla Company success, the Mozilla Project being dead already. You’ll may even be able to win the browser war in the future, going on this way, but you’ll have lost your soul in the process.