They call it culturomics: the obvious play on the word “genomics” looks at trends in human thought and culture. But scientists say culturomics has been ____1____ by a lack of quantitative data. So researchers at Harvard, along with Google, Encyclopedia Britannica, and the American Heritage Dictionary, have come up with a new tool.
It’s a database of 5.2 million books, published since the year 1500. That’s four percent of all the books ever published, with a total of 500 billion words. The focus is on English language culture, so ____2____ of the books are in English.
Among the first findings of the research, published in the journal Science: about, 8500 new words enter the English language ____3____. But many of them don’t end up in dictionaries. And about ____4____—actors become famous around age 30, writers around 40, and politicians around 50. But the fame of politicians can eventually exceed that of actors.
A Google tool called the Books Ngram Viewer is ____5____ based on this data—users can track the usage and frequency of a word or phrase over the past few centuries. Thus, we can watch the fall and rise of Melville. And soon the rise and fall of Snooki.
【視聽版科學小組榮譽出品】
hampered
three quarters
annually
fame
available
人們把這叫做“culturomics”(文化基因組)——這明顯就是借用了"genomics"(基因組)這個單詞,從相似的角度探究人類思想和文化趨勢的奧秘。但科學家表示數(shù)據(jù)資料的缺失會妨礙文化基因組的工作。因此哈佛研究員和谷歌、大不列顛百科全書以及美國傳統(tǒng)詞典一起提供了一個新工具。
這個工具就是自1500年以來出版的520萬本書的數(shù)據(jù)。這些書的數(shù)量占所有出版書籍的4%,有5千億個單詞。由于這些書均聚焦于英語文化,因此有3/4都是用英語寫的。
研究的首個發(fā)現(xiàn)之一就是,每年大約有8500個新單詞進入英語體系,但是很多都不會出現(xiàn)在字典里。這項發(fā)現(xiàn)已刊登在《自然》雜志上。而談到個人成名時間——演員在30歲左右,作家在40歲左右,而政治家則在50歲左右。但是最終政治家的名聲會超過演員。
一個叫Ngram閱讀器的谷歌工具就是基于此數(shù)據(jù)庫誕生的,使用者可以追蹤某個單詞或短語在過去幾個世紀中的用法和使用頻率。這樣我們就可以看到Melville這個詞先衰后盛,以及Snooki這個詞先盛后衰的情況啦。^-^