Language is a rich resource. Not only do we have the words we’ve inherited over time, but we can make new ones and use them differently. In professions, we have terms of art as well, creating even more nuanced ways of expressing ideas. It has been funny to me watching technology try to manage that nuance.

I don’t, of course, mean Nuance, the voice recognition company recently acquired by Microsoft as part of its cloud healthcare push. Nuance is also behind Dragon (aka Dragon Dictate), the long standing voice recognition tool used in the legal profession. It will be interesting to watch, though, if Microsoft leverages Nuance for a cloud legal effort as well, building on top of Matter Center.

Librarians use a lot of words. We’re responding to reference questions, marking up records to make resources findable, writing guides to help people access information. I expect we’re probably pretty good at it, over time.

I don’t use a lot of automatic tools while I’m writing. But spell and grammar checking tools have been pretty common and I leave those turned on in most applications. In a Microsoft Word document, I’ll also toggle on readability statistics.

This minimal reliance on automation for writing has meant that I am surprised when I see it pop up unexpectedly.

Where Are You From?

We are identified on the internet by things like our browser or operating system language and the country from which our request originates. I’ve talked about this in relation to using country-blocking apps on web sites. Some sites will provide a different experience depending on those flags. For example, Google will usually default to a Google.ca interaction even though that is a different experience from Google.com.

I’m an immigrant so my language often doesn’t align with American English or even Canadian English. When I was 7, I had to attend speech therapy during grade school to speak with a more American accent. But it didn’t really change my word choices, learned from family, from cultural references, and other sources.

I do not tend to code switch myself but there are times when I’ll read something or hear someone speak and make slight alterations in my head for understanding. It is understandable that technology might not be able to do that.

For example, I use Google Mail. I’ve noticed recently that, with GMail’s Smart Features turned on, I’m getting suggestions to fix my language. It tends not to be things that I’ve said incorrectly so much as things that I write that don’t conform to whatever standard it is using. I decided to test it by using a very British phrasing, where they drop out the “to” after “agree”. As you can see below, I copied a headline from a UK publication and Google Mail tried to make it conform.

A screenshot of a GMail compose window over a picture from the Guardian explaining that a soccer player (see! footballer?) saying “I thought I had agreed a new contract” and GMail suggesting that it should be “agreed TO a” new contract

It’s a little thing. But it suggests that Google thinks it knows more about my writing than it does. Even if my browser or OS or IP address or credit card billing address says something about my location or language preference, it’s not always the whole picture. Perhaps I’m multi-lingual (I dabble in a couple of languages) or I’m writing to someone in their vernacular (like my UK family).

Little words can matter. I might choose to exclude a word like “to” to conform to my reader’s expectations, to fit in. One of my favorite examples is this one, between Gareth Keenan and David Brent in the UK original of The Office:

Gareth: I’m assistant regional manager.
David Brent: Assistant to the regional manager.

Sometimes the automated results get so strange that, to understand the results, I have to learn new words from other cultures. For reasons that are probably known to some of you, I read a lot of Russian media. But I don’t read a lick of Russian without a translation tool. So I’m at the mercy of Bing Translate and Google Translate. They’re not terrible at getting words although they frequently miss on idiom.

Here was a head scratcher from a Russian media translation. The government was angry at metallurgists for price gouging of some sort. It introduced me to the term “snousing” although I’m still not entirely sure which English variant it comes from (UK?).

A Bing Translation (from the Edge browser) of a Russian media headline that says “Peskov on the ‘snousing’ of the state by metallurgists”

As these examples began to pile up, it made me realize that technology could end up smoothing off edges and standardizing out some of the flavor that gives language its power. I’ve turned off the Smart features in GMail now because, except for misspellings, I don’t value the conformity it recommends.

Legal Information Impact

The law is full of words and phrases that are unique to the profession. Shoot, US law libraries have books called “words and phrases” to help with that exact issue. It is one of the never-ending challenges for legal information access because it requires insider knowledge.

As technology is brought to bear on language, it will be interesting to watch how it adapts to non-conforming uses. On the one hand, it might make lawyers and other legal professionals more understandable. A lot of legal writing could benefit from running Word readability statistics (mentioned above) just to make it clearer.

But you can’t always simplify language that requires legal phrases or terms of art. Time is of the essence, right? What will machine learning make of word choice? Will it suggest word combinations that conform to general speech but undue legal meaning? A legal professional might notice the change but not everyone who has legal information needs will. Perhaps a Nuance-implementation in Outlook could help flag the need for legal usage, based on the deep repository built on decades of lawyer input.

Although I’d wholly support machine learning that got rid of all modern usage of per se.

Document assembly and forms-focused tools could manage that disconnect. They could lock in thaumaturgic language (yes, I was asked to define that word in first year Property) while allowing someone to solve their legal issue. But that will require an investment on the part of legal publishers and others to move beyond just providing text for legal information access. If the only information that is transmitted is text, leaving the recipient the responsibility to use it, we may find that the final product gets smoothed in ways that are unhelpful to the cause of justice or equity.

Me? I’ll keep grimacing or laughing out loud at the failed wordsmithing attempted by my machines. And watch as we try to make legal information more accessible, and seeing which tools hinder or help with that goal.