1# This is a sample wordlist that can be converted to a binary dictionary 2# for use by the Latin IME. 3# The file is essentially a CSV file, with indent level denoting nesting. 4# 5# The file starts with a single CSV line with the header attributes. Whatever 6# the content, these are included as is in the binary file. The first attribute 7# of the file should be `dictionary'. Usual fields are `locale', `description', 8# `date', `version', `options'. 9# 10# Each word has a `word' entry and at least a `f' argument denoting its 11# probability, as an integer between 0 and 255 on a logarithmic scale, with 12# 255 meaning 1 and each decrement in 1 dividing probability by 1.15. 13# As a special case, a weight of 0 is taken to mean profanity - words that 14# should not be considered a typo, but that should never be suggested 15# explicitly. An entry may be made not a word by adding a `not_a_word' 16# field with a value of `true'. The main reason for putting such entries 17# into the dictionary is to add shortcut targets and maybe a whitelist 18# replacement. 19# 20# Each word may or may not have any number of shortcut target lines 21# starting with a `shortcut' entry and having at least a `f' frequency 22# value between 0 and 14, or the special value `whitelist' which becomes 23# 15, which is then taken to be the whitelist target of this word. 24# 25# Each word may also have any number of bigram lines starting with a 26# `bigram' entry containing the following word whose frequency should 27# override the unigram frequency when following the word this bigram is 28# for. 29# 30dictionary=main:en,locale=en,description=Sample wordlist,date=1351495318,version=1 31 word=sample,f=200 32 bigram=wordlist,f=243 33 word=wordlist,f=180 34 word=shortcut,f=176 35 shortcut=target,f=10 36 word=witelisted,f=10,not_a_word=true 37 shortcut=whitelisted,f=whitelist 38 word=profanity,f=0 39