| Letter | Frequency (%) |
|---|---|
| a | 6.2193% |
| á | 2.2355% |
| b | 1.5582% |
| c | 1.6067% |
| č | 0.9490% |
| d | 3.6019% |
| ď | 0.0222% |
| e | 7.6952% |
| é | 1.3346% |
| ě | 1.6453% |
| f | 0.2732% |
| g | 0.2729% |
| h | 1.2712% |
| ch | 1.1709% |
| i | 4.3528% |
| í | 3.2699% |
| j | 2.1194% |
| k | 3.7367% |
| l | 3.8424% |
| m | 3.2267% |
| n | 6.5353% |
| ň | 0.0814% |
| o | 8.6664% |
| ó | 0.0313% |
| p | 3.4127% |
| q | 0.0013% |
| r | 3.6970% |
| ř | 1.2166% |
| s | 4.5160% |
| š | 0.8052% |
| t | 5.7268% |
| ť | 0.0426% |
| u | 3.1443% |
| ú | 0.1031% |
| ů | 0.6948% |
| v | 4.6616% |
| w | 0.0088% |
| x | 0.0755% |
| y | 1.9093% |
| ý | 1.0721% |
| z | 2.1987% |
| ž | 0.9952% |
Relative letter frequencies (%)
Bigraphs
ST, PR, SK, CH, DN, TR
Trigraphs
PRO, UNI, OST, STA, ANI, OVA, YCH, STI, PRI, PRE, OJE, REN, IST, STR, EHO, TER, RED, ICH
Code
/**
* Prints out frequencies of input characters (in percent)
* @param source input file
* @param encoding encoding of the file
*/
public static void count(File source, String encoding) throws UnsupportedEncodingException, IOException{
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(source), encoding));
TreeMap<Character, Integer> occurences = new TreeMap<Character, Integer>();
String s = null;
int counter = 0;
while((s = reader.readLine())!= null){
for(int i = 0; i < s.length(); i++){
counter++;
Character curr = (Character) s.charAt(i);
if(occurences.get(curr) == null){
occurences.put(curr, new Integer(1));
} else {
occurences.put(curr, occurences.get(curr).intValue() + 1);
}
}
}
for(Character ch : occurences.keySet()){
System.out.println(ch.toString() + ": " + (occurences.get(ch).intValue()/(double)counter * 100));
}
}
Sources
- KRÁLÍK, Jan. Czech Alphabet. The Czech Language [online]. 2001 [cit. 2012-09-18]. Available at WWW: http://www.czech-language.cz/alphabet/alph-prehled.html