mirror of
https://github.com/Helium314/HeliBoard.git
synced 2025-04-24 16:46:35 +00:00
With Romanian you have to take special care when handling words that contain the "sh" and "tz" character (read it as the "sh" in "shiver" and the "zz" in "pizza"). There are two sets of characters that look sort of the same `ş` and `ș`, `ţ` and `ț`. If you look carefully, one has a tail connected to the body, and the other has a comma separated from the body of the character. The correct ones are the one with the comma separated, not the touching tail. If in doubt, switch to a Romanian layout and type `;` and `'`, they will give you the correct characters to use. The HTML codes for these characters are: - `Ș` and `ș` for `Ș` and `ș`. - `Ț` and `ț` for `Ț` and `ț`. Reference: https://en.wikipedia.org/wiki/S-comma https://en.wikipedia.org/wiki/T-comma While similar in shape, this difference will break autoc ompletion. I've replaced all of them with the proper one. I've also tried creating a new dictionary but ran into issues... The list of words was downloaded from: https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018/ro/ro_full.txt This is not a quality source, and some cleaning up was done in order to remove some mistakes, like words containing numbers, and the `,` and `.` characters. Words that were separated with `--` were also removed as there is no such notation in the language. The tools from here were used to create the dictionary: https://github.com/remi0s/aosp-dictionary-tools They only take the top 150,000 words, from a total of 1,154,496 effectively skipping words with less than 2 occurrences. This is OK, I guess... although it misses a lot of valid ones. A better data source would help with this, but it's difficult to find such data. I guess I can come back in the future to improve this. |
||
---|---|---|
.. | ||
bg_wordlist.combined.gz | ||
cs_wordlist.combined.gz | ||
da_wordlist.combined.gz | ||
de_wordlist.combined.gz | ||
el_wordlist.combined.gz | ||
en_AU_wordlist.combined.gz | ||
en_emoji.combined.gz | ||
en_GB_wordlist.combined.gz | ||
en_US_wordlist.combined.gz | ||
en_wordlist.combined.gz | ||
eo_wordlist.combined.gz | ||
es_wordlist.combined.gz | ||
fi_wordlist.combined.gz | ||
fr_emoji.combined.gz | ||
fr_wordlist.combined.gz | ||
hr_wordlist.combined.gz | ||
it_wordlist.combined.gz | ||
iw_wordlist.combined.gz | ||
ka_wordlist.combined.gz | ||
lb_wordlist.combined.gz | ||
lt_wordlist.combined.gz | ||
lv_wordlist.combined.gz | ||
nb_wordlist.combined.gz | ||
nl_wordlist.combined.gz | ||
pl_wordlist.combined.gz | ||
pt_BR_wordlist.combined.gz | ||
pt_PT_wordlist.combined.gz | ||
ro_wordlist.combined.gz | ||
ru_wordlist.combined.gz | ||
sample.combined | ||
sl_wordlist.combined.gz | ||
sr_wordlist.combined.gz | ||
sv_wordlist.combined.gz | ||
tr_wordlist.combined.gz | ||
uk_wordlist.combined.gz |