TTS is Not a Four Letter Word

To be perfectly clear, this blog is not sanctioned by, endorsed by, or even remotely associated with, Oxford University Press, my fantastic employer. What I say here is my opinion and my opinion alone. This is especially true for this article as I am in no way representing the view of OUP.

For reasons that aren’t entirely obvious to me, the Text-to-Speech (TTS) debate continues to rage months after Amazon was forced to disable TTS functionality on the Kindle. Unfortunately, as with most things, the debate has devolved into discrete business or political vantage points. The Authors Guild sees TTS as a dilution of rights; the publishers see it undermining audio books; the visually impaired see any limitation of TTS as treading on their legal rights; the digerati bristle at any limitation on any technology (especially if it allows open access to content).

There are many other opinions that revolve around this issue – but very few seem to rise above self interest and try to understand TTS for what it is, where it currently exists, and the true meaning of audio rights and how they are exploited.  As a publisher, I can see that TTS technology is improving by leaps and bounds and will one day soon rival the human voice. However, while I agree that TTS is getting much better, I do not think it will ever commercially rival a full length recorded audio book performance by a human being.  I feel strongly that recorded performance is the key – not the mimicking of a human voice.

Before I jump too deeply into the issues, let me quickly review the source of the debate.  Amazon released the Kindle 2.0 with a new standard feature, TTS software embedded into the operating system. This enables the Kindle to “read aloud” the words represented on the screen.  However, for reasons only known to Amazon, no discussion of this feature was presented to publishers prior to the release of the new device, setting up an awkward moment soon thereafter.

Within days of the Kindle 2.0 press conference, rumors were swirling about a powerful agent rallying three or four of the biggest audio book publishers to push back on Amazon. A February 25 NY Times op-ed piece by the Authors Guild’s Roy Blount Jr. put the full force of the AG behind an effort to get Amazon to turn off this feature.  Two days later Amazon acquiesced to the pressure and agreed to disable the feature (at least for those publishers who do not want it.  But this turn by Amazon led to a counter reaction: in April, a disabled readers group staged a noisy protest outside the NYC offices of the Authors Guild claiming the Authors Guild’s stance harmed the rights of the visually impaired.

Since then I have been reading a steady stream of listserv opinionating, blogosphere explicating, and more op-ed debating representing every conceivable opinion on the merits or faults of Amazon’s foray into TTS. Yet none seem to cut to the core of the practical issues that TTS on the Kindle raises. So, here are three questions that I feel anyone debating this issue should tackle.

What is TTS and where is it used?

Text To Speech is the artificial production of human speech. Voice synthesis software has been around since the early 1980’s and has steadily improved with the advances in the personal computer. Today, all Windows and Macintosh operating systems used around the world have pre-installed, fully functioning TTS functionality built in.  That’s right, the computer you are reading this on now has all the functionality (and more) that the Authors Guild and publishers have gotten up in arms over. Go ahead, buy an ebook from anywhere and download it onto your computer. Activate the TTS feature and voila – you are listening to your computer read you the ebook. (Ironically, because of the current limitation of Kindle’s DRM, there is no PC/Mac reader so one cannot use TTS with any Kindle ebooks, except on the iPhone.)

What can one do with TTS on the Kindle?

This may seem like an odd question – but it actually cuts to the core of the issue. TTS is software that “reads” the words and, using synthesized speech algorithms, sends audio signals to speakers that produce the sensation that the page is being read aloud. This is done in real time – there is no recording process involved no matter how many times one goes forward and backward over the “page.”  As a result, there is no reproduction nor is there any distribution of the work that would cause and rights alarms to go off.

However, in the strictest reading of copyright law, one could raise the issue of Public Performance. That would mean that the Kindle, if put in a room full of people and amplified, could be used to create a public performance. However, the word performance needs to be thought about – what exactly is a performance? The New Oxford American Dictionary, 2nd edition has two entries that are key to this issue:

•    a person’s rendering of a dramatic role, song, or piece of music (bold is mine)
•    LINGUISTIC – an individual’s use of a language, i.e., what a speaker actually says, including hesitations, false starts, and errors.

So how exactly does a machine perform? A machine can play music – a player piano can often sound quite impressive – but we wouldn’t call that a performance.  Furthermore, the linguistic definition of performance describes that performance is clearly human – the hesitations, false starts, even the errors we make when we read aloud.  I think that the definition of performance doesn’t allow for the idea of a Kindle using TTS.

In contrast, the notion of performance is precisely what an audio book brings to the world. Audio books are not made with TTS – in fact I would venture to say that even if TTS were perfected to the point that it would be difficult to tell the difference between TTS and a human voice pronouncing a word, no audio book would ever be recorded using TTS. Why? Because audio books are performances and the people who are hired to make the recordings are professional voice actors (or the authors performing their own work). People buy audio books to be entertained, not just for convenience of listening while they drive.

How can the disabled use TTS on the Kindle?

Short of Paul Aiken and the heads of the big audio houses reading this blog and seeing the light, so to speak, I am pretty sure the good folks at Amazon will wake up tomorrow with the same problem they went to bed with the night before – how can they help the visually impaired use the Kindle? This is especially important with the release of the Kindle DX, as it is being tested in universities, where publishers are legally required to provide digital copies for those with impairments, free of charge, so they can use tools such as TTS. Any testing will fail if the Kindle cannot use TTS.

My suggestion is that Amazon gets the Authors Guild and publishers to agree to enable TTS for all books, for any Kindle buyer who submits a disabled use claim form. Amazon can simply provide the form on its website, and by filling it out, users are sent, via the wireless connection, an update to their Kindle’s firmware, enabling TTS for every book.  This would be a very simple process to manage and can only help those who use the Kindle.

The Kindle’s implementation of TTS does not tread on any rights, will not threaten audio books, and truly benefits those who need it most, the visually impaired. It’s time to end the debates.

Reblog this post [with Zemanta]

2 Replies to “TTS is Not a Four Letter Word”

  1. That’s less than intelligent (stupid). As an avid audiobook listener, and avid reader, I have heard both TTS and ‘real’ audiobooks. There is absolutely NO comparison between the two, and nobody would willingly listen to a TTS book unless there was no other choice.

  2. I believe one of the main reasons that publishers don’t want there content test to speech enabled, is due to the fact that more then likely they will end up online as torrents. Freely downloadable by everyone on the net. Its a shame piracy stops dis advantaged people from using certen types of technology!

Comments are closed.