new language annotation software

regarding a new release of ELAN, language annotation software from The Language Archive (TLA) sponsored by the Max Planck Institute for Psycholinguistics (to which i have absolutely NO affliliation):


by Han Sloetjes

Recently, we have released ELAN 4.7.1. It introduces some important changes to the EAF format (now version 2.8). The XML structure of controlled vocabularies is changed such that it breaks backward compatibility! Connected to these changes are new features such as multilingual controlled vocabularies and the fact that annotations now store a reference to the CV entry they are based on. Language assignment is now possible for controlled vocabularies but will be extended to the display of Metadata, Data Categories and tiers in future releases.

There is an option in the Preferences to always store in the previous version of EAF, version 2.7, for backward compatibility with previous ELAN releases. But using both ELAN 4.7.x and 4.6.2 or lower on the same files should be done with care.

Other new features are a media player based on VLC for Linux, an option for adding dependent tiers in multiple files, n-gram analysis for corpora and volume sliders for all linked audio tracks for convenient switching between the audio sources.

and a link to the download site here


so, you know a linguist?

friend and colleague jodie martin spends some of her time using social media and creating kick-arse presentations for conferences and the wider public on the web.
here she has made a short little explainer for all of us linguists who get asked what is it that we do…

patterns of interaction recorded

this guy, deb roy, is working in visualising interaction, and trying to create machines that will interact in a “human-like” way. well, good luck with that, but the graphic visualisation of language and group dynamic interactions has me thinking maybe i should give up… although there’s always room for the micro as well as the macro patterning….

Short Memory: midnight oil, bin burnin

here’s peter garrett front man for the oils back in 1983 doing one of my favourite oils songs. his reference to afghanistan and a watchdog in another’s land was obviously pre the present occupational farce. nowadays peter’s more likely to be seen loping about in a suit and doing battle with deadshit lily-livered labour pollies behind closed doors in caucas. minister for the environment – a job made for him – in the last parliament, julia has now put him in her former horrid portfolio: education.
well, indeed, there’s trouble in education. at all levels. but not to go into that. here. now.

meanwhile, over on the SFL-related blog, i’ve posted a draft of a comparative analysis i did last year, using three pieces about australia from three different times and perspectives…in our his-story.. the peter garratt/midnight oil reference is apparent there, since one of the pieces i used is the lyrics of a midnight oil song, The Dead Heart.

last night, after the friday SFL presentation i was down the pub with the usual suspects and i was talking about the poem Australia by A.D. Hope, and how much i like it, then went on to talk about the analysis i had done using that piece and one of midnight oil’s. one of the people i was talking to said that the one i had chosen to use was his favourite midnight oil song, and so i felt bound to put up my draft analysis so he could at least read the AD hope piece that i’d been speaking about….

dan everett on language

here’s an interesting lecture on endangered languages in the anthropological tradition, and of course, it gladdens me heart that dan stands up to the chomskian idea that there is some universal grammatical criteria for distinguishing human language – and thus distinguish humans from other species (thus we should be allowed to exploit these ‘lower’ species – this is my extension, not dan’s btw)


Screen capture of part of the interface. Keyword search=privacy. Lexicalist

ABOUT. Lexicalist reads through millions of words of chatter on the internet to analyze how certain demographics talk and what kinds of things they talk about. We currently break this information down into three kinds of demographics: age, gender, and geography.

METHODOLOGY. Lexicalist works by analyzing rich sources of information online, including blog posts, news sources, and social networking sites like Twitter. Each bit of information is subjected to rigorous natural language processing, which includes a likelihood distribution of being authored over all geographic, age and gender demographics.

All of the statistical results displayed here are then normalized against the volume of information coming from each demographic to see what words are most commonly associated with certain populations. The result is a descriptive snapshot of language as it’s used today.

Lexicalist is worth investigating. In fact, it potentially is a time sink/waster for the lexically minded. I wish I knew how a descriptive snapshot relates to language usage. This seems to me to be more a question of the relations between a particular form for methodical description and some particular frame for usage. Presumably there is a relation born by natural language processing.

Okay, over at Wikipedia, the treatment of NLP includes:

Tasks and limitations

In theory, natural-language processing is a very attractive method of human-computer interaction. Early systems, such as SHRDLU, working in restricted “blocks worlds” with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world ambiguity and complexity.

Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of “understanding” is one of the major problems in natural-language processing. src

‘Context is to understanding, . . ‘

(I added Lexicalist to our links.)

Old and New Net

click to enlarge

Web 3.0 from Kate Ray on Vimeo.

This video from Kate Ray quickly made the rounds.

What a long way the net has come. I suppose it necessary but gratuitous to add: ‘for better and for worse.’

There’s a moment in this interesting mash-up where the speaker implies the following: could we re-render human brain to think more like a machine? This follows from the difficulty of making a machine think like a human.

I had to look up the use of the term ontologies because I know little about information science, and, the its use in the video seemed to depart from the philosophical term. Here’s the treatment about ontologies at wikipedia.

There is nothing about the problems faced by the varieties of user. I’m a user and I know of the problems I encounter in searching for information, both on the internet, in libraries, and, on my own computer, in my own archive of documents.

I’ll mention three challenges. I’ll frame this by stating that I wish my computer-based archives and library archives were indexed by google.

(1) usually, (my) searches for information on google are satisfied. However, because the results are matched with the real-time indexing my cognition provides for, the end of a search on a given topic–usually in the social sciences–is arbitrarily terminated. In other words, I have conclusive idea that a given result is the optimum result. I’d also characterize my search methods using partly ad hoc heuristics.

(2) searches in my computer-based archive are brute force and leverage Spotlite’s ability to look into the text of every file, BUT, involve scanning through very long result lists, most of which are not positive. As a user, the labor intensive task of organizing files on my end is, ‘too much.’ And, fit to this is the ease with which information can be archived versus the labor involved in organizing it. Somewhat: the intuitive’s curse…

(3) The most difficult search of the web and internet resources are those that are very particular and very local. A good example would be somebody’s address. Searches oriented to topics do not fall into this category.

One other note–I would guess my own search capability falls into the highly capable slice of any Bell Curve. This guess is based in my understanding of how to use the specific editing features of google search. And, it’s based on observing how most other people use search. One of the challenges for the semantic web, given,

The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information on the web is defined, making it possible for machines to process it.

is any useful, more powerful interface and facilitation, has to meet the different modes of differentiated users.

For example, I wouldn’t be skeptical of a machine’s ability to qualify results so that I could be confident I’ve reached the optimum set of results, but I’d like to know beforehand why I needn’t be skeptical. And, this would have to be presented to me at my level.

Subscribe: Entries | Comments

Copyright © NetDynam 2.0 2017 | NetDynam 2.0 is proudly powered by WordPress and Ani World.

Proudly using Dynamic Headers by Nicasio WordPress Design