One of my academic interests is corpus linguistics, and so naturally I’d love to find a great Korean corpus. I still haven’t found one I can download and access properly, but there are some web-based ones. One of these is the Sejong Corpus.
A corpus is just a very large collection of texts that has been annotated (generally for part of speech and lemma) so that powerful and specific searches can be performed. The Sejong Corpus is a collection of Korean texts, and there are several ways you can search it, but it’s only available through the web site, and the search tools are not nearly as powerful as what I can do with the British National Corpus (which I have on my own computer, and can also access on-line here).
The main search form on the Sejong corpus allows you to search for any lemma, and gives examples of that lemma, including any declined forms. Unfortunately, it doesn’t allow you to search for specific forms of a word. So I can search for 연구, and it gives sentences with the word 연구 in many forms, like 연구를, 연구와, and 연구이다, but I can’t search for 연구를, and I can’t search for ~를. This is quite a shame, as that tool would be really useful, especially for looking up examples of certain grammatical forms. There may be some way to do this, but I haven’t found it yet. It also seems to be lacking a way of searching for collocations.
One useful tool is the 전자사전검색 (see the top menu), which allows you to search for words found in the corpus. It differs from regular dictionaries in that it includes only but all words found in the corpus, thus including many words not found even in the 국어사전, including ones formed by productive word formation rules. The really neat thing about it is that, after choosing the part of speech you want on the left (choose 용언 if you want verbs or adjectives), you can also search for either X 시작하는, 끝나는 or 포함하는 words; that is, words beginning with, ending with, or including your search term. So if I want to look for examples of words ending in 꾼, I search 체언, 끝나는, “꾼”, and it gives 13 words, including 밀수꾼, 사기꾼, and 소몰이꾼.
I still haven’t checked out the entire site, but they’ve also got ways of searching for dialectal forms and for field-specific searches (전문 용어), e.g. comparing the frequency of words in different academic fields. Overall it’s a good start, but I could think of so many other search tools they could add to make it more useful.
- Word formation: ~다랗다, ~막하다, ~스름하다
- Tastes in Korean