• Home
Name Date Size #Lines LOC

..--

compact_enc_det/03-May-2024-18,67315,072

util/03-May-2024-2,9971,523

CMakeLists.txtD03-May-20243.7 KiB10486

LICENSED03-May-202411.1 KiB203169

README.mdD03-May-20241 KiB4735

autogen.shD03-May-20241.9 KiB7436

README.md

1### Introduction
2
3Compact Encoding Detection(CED for short) is a library written in C++ that
4scans given raw bytes and detect the most likely text encoding.
5
6Basic usage:
7
8```
9#include "compact_enc_det/compact_enc_det.h"
10
11const char* text = "Input text";
12bool is_reliable;
13int bytes_consumed;
14
15Encoding encoding = CompactEncDet::DetectEncoding(
16        text, strlen(text),
17        nullptr, nullptr, nullptr,
18        UNKNOWN_ENCODING,
19        UNKNOWN_LANGUAGE,
20        CompactEncDet::WEB_CORPUS,
21        false,
22        &bytes_consumed,
23        &is_reliable);
24
25```
26
27### How to build
28
29You need [CMake](https://cmake.org/) to build the package. After unzipping
30the source code , run `autogen.sh` to build everything automatically.
31The script also downloads [Google Test](https://github.com/google/googletest)
32framework needed to build the unittest.
33
34```
35$ cd compact_enc_det
36$ ./autogen.sh
37...
38$ bin/ced_unittest
39```
40
41On Windows, run `cmake .` to download the test framework, and generate
42project files for Visual Studio.
43
44```
45D:\packages\compact_enc_det> cmake .
46```
47