Groking Varnam - Part 1
I wanted to figure out how varnam works (and even possibly rewrite it in a modern language, for fun). So here are the notes
Varnam is an interesting project. It helps you do transliteration. For the chatbot of mithr we wanted to convert some names and addresses to various Indian languages without having to manually “translate”. These are names, after all. So I got interested in Varnam. But then, GSoC is coming and SMC is applying. Varnam would be a really nice project for people to hack on. But the main developer of Varnam is really busy. So, I decided I will read through the code (even try to port it into Rust if possible) and learn how it works.
First, I downloaded the codebase to have it working on my computer. Really, the first thing that anyone should ever do when trying to learn a software.
As per the instructions, it uses cmake. The entire project is in C. And I have very rudimentary knowledge of that ecosystem. Perhaps, first we have to figure out what cmake does.
CMake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice. The suite of CMake tools were created by Kitware in response to the need for a powerful, cross-platform build environment for open-source projects such as ITK and VTK.
That’s a really nice introduction. And pretty self-explanatory. Cmake website also has links to courses worth 1000s of dollars. But I decided to stick with the documentation. The first impression on seeing a lot of “build variables”, “paths”, etc being used there is that cmake just helps people deal with the hundred arguments you have to pass to C compilers to get a working binary.
There is also a CMakeLists.txt file in the libvarnam code which talks about dependencies (like sqlite, etc) and configurations. It sounds like CMakeLists can be used to configure cmake in such a way that cmake will then configure make
in such a way that make will produce the right kind of output.
There is a stackoverflow answer about generating CMakeLists.txt from python. That tells a lot.
Anyhow, I went ahead and ran cmake .
and then make
which compiled and added a file named libvarnam.so
and some symlinks to the directory. But the next step: sudo make install
is something which life tells you never to perform. I tried my luck by skipping that command. (Even though the CMakeLists file talks clearly about moving things to /usr/share, etc).
I did ./varnamc
and to my surprise it did run and show me a usage help. But the luck was short lived when I tried to run ./varnamc -s ml -t varnam
when it errored saying “Varnam initialization failed Failed to find symbols file for: ml”. There is a schemes folder and there seems to be a file named ml
in that. But, varnam is probably looking for those files in the folder where it would have been had I run sudo make install
.
At this point I had two options. 1) I could build a package for archlinux which would actually install the application. 2) I could change the installation prefix to a non-system folder and make install
without corrupting my folders.
Having decided at the end of minidebconf to start contributing to the distro I use daily, I decided to go for the arch package. I was all set with the template PKGBUILD when I found there is already a package for libvarnam in AUR. You see, that is exactly why it is difficult to contribute to Arch :D
The AUR package uses the last release of libvarnam and I could have built a libvarnam-git version. But nevermind.
When I was installing from AUR, I realized I didn’t have ruby-ffi
dependency. So, once I got varnamc -s ml -t varnam
working, I went back to the working directory and tried ./varnamc -s ml -t varnam
and voila, it was working! But then I realized that could be because by installing libvarnam
package I would have moved the files to where it had to be and so I uninstalled libvarnam, keeping ruby-ffi and tried it again and it failed this time. Which means it was not about the dependency, but about the files being in the right place.
But I still needed a way to run my changes (if I make any change) because without that I wouldn’t be able to verify my assumptions about code. So I found a way to run makepkg from a local directory and proceeded with that. I had a folder structure like this:
varnamproject
|- libvarnam (code from github)
|- arch-package
|-- PKGBUILD
|-- src
|--- build
|--- libvarnam-3.2.6 (symlinked to ../../libvarnam)
And the PKGBUILD looked something like this:
build() {
cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr ../$pkgname-$pkgver
make
}
package() {
cd build
make DESTDIR="$pkgdir/" install
}
Even though this installed fine, I was still getting the error that “Varnam initialization failed Failed to find symbols file for: ml”. Then I realized that the release on github actually contained “vst” files which are not present in the source tree. That probably have to be manually created. So, all I had to do was make vst
to get all of them compiled.
And once I had them compiled, even without the arch packaging hack, I could get ./varnamc -s ml -t varnam
to work! Yay! Time to look into actual code.
The command that runs is varnamc
. And that’s a ruby file. And the first comment there clarifies its relation to libvarnam:
‘varnamc’ is a command line client to libvarnam. It allows you to quickly try libvarnam’s features.
varnamc
is more about the command line arguments and argument parsing, and then it passes all of that to varnamruby.rb
. So that’s the library that interacts with the library that libvarnam is. And varnamruby.rb
is a file that only deals with FFI (Foreign function interface) which tells ruby how to call the functions from libvarnam.so (which is compiled from the C).
Now, we might be better off starting from the transliterate feature, although the README talks about the role of scheme files, learnings, etc.
As per varnamc, the transliterate
action calls varnam_transliterate
and passes it three arguments - the pointer to an instance of varnam, the word to transliterate, and pointer to an area to store the results.
This varnam_transliterate
comes from the transliterate.c
file. Within that function these things happen (among others):
vst_tokenize
is called with the input and some metadata. Presumably returns tokens.vwt_get_best_match
is called.- If there is no best match,
vwt_tokenize_patterns
is called. vwt_get_suggestions
is called (presumably for showing the output)
So we have about 4 more functions to go through.
vst_tokenize
is in symbol-table.c
. We know what it does from the statement in the README:
Varnam uses a greedy tokenizer which processes input from left to right. Tokenizer tries all possible to combinations to generate the longest possible tokens for the given input. This token will be generated by utilizing the symbol table which is provided to varnam Generated tokens is assembled and varnam computes all possibilities of these tokens. Assume the input is malayalam, varnam generates tokens like, മ, ല, യാ, ളം ([ma], [la], [ya], [lam]) and many others.
But this wasn’t exactly enough to understand what tokenization step actually is. What would the other tokens be? Would there be [m], [al], [ay], [al], [am], for example? I had to see the output of tokenization step. But the output is stored in a varray. And a varray here is a custom “Dynamically growing array implementation” written in varray.c
(things that programmers had to do when writing in C). I had to figure out how to print the output. I could add an sprintf or something. But I make lots of syntax errors in C. I need an IDE.
I had Gnome Builder installed a few weeks ago when I was looking for a python IDE. Back then I realized it was better suited for projects related to GTK development. Turns out it works well for C projects too. I opened the folder in it and it immediately told me that I was passing an strbuf where a char was expected and so on. IDEs are cool!
While searching in the IDE for my sprintf, I found that libvarnam uses something called portable_snprintf
to print things. I might as well use that!
Gnome Builder was for some reason not able to build the project. It detected CMake as the build system but it kept failing. Does it even know that we have to run make
? While searching online I figured that people like things like ninja these days (I remember android also preferring ninja). Also there doesn’t seem to be a lot of answers from 2020 about Builder. So, I decided to look for another IDE.
Then I remembered that in my school time (when I had actually written some C++), I used to love Code::Blocks. Back then I was on Windows and didn’t know anything about free software. Turns out Code::Blocks is free software! So I installed that. But turns out it doesn’t really go well in dark mode and I couldn’t figure out how to open an entire folder at once. And the UI reminded me of Eclipse. Bummer.
Then I wondered. Perhaps. VS codium. Can? Maybe? Well, it has auto-completions. Maybe not fancy features. But it was enough for browsing around.
I also installed clangd
and CMake Tools
extensions. And that suddenly made things much more fancy. I could jump around to function definitions, there was linting/static analysis. Yay. Convenience works for now.
So, back to portable_snprintf
. It is nothing like printf
. Takes char *str, size_t str_m, const char *fmt, /*args*/ ...
as the arugments. At this point I realized that this codebase (probably like other codebases in C) is full of custom things - custom data structures, custom ways to print things, passing around length of strings, and so on. That means, I probably have to first learn how printing things work in C, and then look at the data structures to see how I can simplify things.
So there is nothing called string in C. It is all char. An sequence of chars is a string for humans. In C, therefore you can have a char array or a char pointer which are similar but different. The compiler makes a char array become pointer to the first element if you pass a char array to a function that expects a char pointer. Okay, I hate pointers. Maybe I should do some small project to get my basics right.
#include <stdio.h>
int main() {
printf("hello world");
return 0;
}
Wrote that. Did gcc hello.c
. Ran ./a.out
. No output! What! Tried lots of shell redirections and I could see output in between sporadically. Turns out, if I did printf("hello world\n");
(with that \n
) it comes out fine. And that’s because if we don’t put a newline, we might want output to continue on the older line Now on to arrays and pointers.
int main() {
char greeting[] = "hello world\n";
printf(greeting);
return 0;
}
That worked.
int main() {
char *greeting = "hello world\n";
printf(greeting);
return 0;
}
That worked too! Fantastic.
Now let us try passing things into functions.
void print(char *message) {
printf(message);
}
int main() {
char *greeting = "hello world through function that takes char pointer\n";
print(greeting);
return 0;
}
That worked (and I wrote that without looking it up! Wooh!)
void print(char message[]) {
printf(message);
}
int main() {
char greeting[] = "hello world in char array with function\n";
print(greeting);
return 0;
}
That also worked. Now let us try mixing matching.
void print(char message[]) {
printf(message);
}
int main() {
char *greeting = "hello world in char pointer with function that takes char array\n";
print(greeting);
return 0;
}
That worked too! Let us try the other way.
void print(char *message) {
printf(message);
}
int main() {
char greeting[] = "hello world in char array with function that takes char pointer\n";
print(greeting);
return 0;
}
That worked too. I now seriously suspect that all of these are working because printf deals with things correctly (and not because of anything to do with how I have written print). Anyhow, I now know that I can print strings with printf. Let me try replicating that in varnam. Before that I need to also figure out how to use %s
.
void print(char *message) {
printf("%s\n", message);
}
int main() {
char greeting[] = "hello world in char array with function that takes char pointer, this time no newline";
print(greeting);
return 0;
}
Okay. Now I can print.
I added printf("%s\n",ustring);
after the line READ_A_UTF8_CHAR (ustring, inputcopy, bytes_read);
in vst_tokenize
. The output of ./varnam -s ml -t varnam
now was:
arnam
rnam
nam
nam
am
am
m
വർനം
That’s progress. We now have a way to peek into some parts of the system. What we are really interested at this point is the tokens though. The tokens are stored as a varray
of vtoken
. And vtoken
is:
typedef struct token {
int id, type, match_type, priority, accept_condition, flags;
char tag[VARNAM_SYMBOL_MAX];
char pattern[VARNAM_SYMBOL_MAX];
char value1[VARNAM_SYMBOL_MAX];
char value2[VARNAM_SYMBOL_MAX];
char value3[VARNAM_SYMBOL_MAX];
} vtoken;
(Okay. I did notice the weird naming token
at the beginning and vtoken
at the bottom. C is fun!)
So that’s what a token is. Lots of flags in int, and then strings for tag, pattern, value1, value2, value3. Let us see if we can print them. I will probably need to get the char arrays out of the token. I have seen ->
being used elsewhere. Let me try that. I will try and get the first token
from the results
array.
There is a handly varray_get
function. But its return type is void*
. Turns out that’s the way to do generic return type in C. I would have to cast it to the right type.
I did this then: printf("%s\n", ((vtoken*) varray_get(result, 0))->value1);
after varray_push (result, tokens);
. That would take the first token out of the results, cast it to vtoken pointer, and then get the value1 out of it. The output was some empty lines though. Meaning the first token OR value1 is useless. I might have to run through the whole array. I quickly tried tag
and pattern
and value2
to see if it made any difference. It didn’t.
So I built a loop, like this:
int i;
for (i=0; i < varray_length(result); i++) {
printf("tag: %s\n", ((vtoken*) varray_get(result, i))->tag);
printf("pattern: %s\n", ((vtoken*) varray_get(result, i))->pattern);
printf("value1: %s\n", ((vtoken*) varray_get(result, i))->value1);
printf("value2: %s\n", ((vtoken*) varray_get(result, i))->value2);
printf("value3: %s\n", ((vtoken*) varray_get(result, i))->value3);
}
That gave me some rather strange output that varied each time I ran it. (Perhaps due to some machine learning stuff?). Anyhow, I realized I will have to print this as unicode chars. So this is the point where I get into the complexity of encoding characters in bytes. This article talks about Unicode in C. The char can only hold one byte, right? UTF-8 characters aren’t always held in one byte. So somehow I have to convert the char array to array of unicode, which means it will be chunking a few bytes together. mbstowcs (multibyte string to wide character string) seemed to be something that does this. But is it microsoft specifc? Is there a difference between multibyte and unicode? What every programmer absolutely, positively needs to know about encodings and character sets to work with text seems to be a must read.
And reading that made me realize that I cannot progress here without figuring out how varnam is encoding strings. Is it using UTF-8? UTF-16? UTF-32? Or something else?
Oh. There was a READ_A_UTF8_CHAR
function I saw earlier. Yes, util
file probably has all the functions I need to deal with strings. And there is an entire vutf8.c
file. So, varnam is probably using UTF-8 everywhere. lang_detection
has a varnam_detect_lang
function that uses (and demonstrates) the vutf8.c
functions.
While looking at that I also read through the definitions of get_pooled_string
(which uses v_->
and the v_
is #define v_ (handle->internal)
- I know. Crazy how C is) and vword.c
.
Now why on earth am I trying to convert token to UTF myself? That probably happens somewhere within the code, right? I looked at what is done with the result of vst_tokenize
. It is sent to resolve_tokens
. And within that I found a nice line
#ifdef _VARNAM_VERBOSE
printf ("Token %s, %d\n", token->pattern, token->type);
#endif
You see, if we can make Varnam verbose, we may not need to add our own debug statements. CMakeLists.txt has this:
if (VARNAM_VERBOSE)
add_definitions(-D_VARNAM_VERBOSE)
endif (VARNAM_VERBOSE)
As per documentation we need to set that variable (VARNAM_VERBOSE
). I tried environment variable first. But that failed. Then I did this: cmake . -DVARNAM_VERBOSE=1
. That worked. Now, here is how I get output:
$ ./varnamc -s ml -t varnam
Token va, 2
Token r, 2
Token na, 2
Token m, 2
Transliterating varnam
വർനം
Yay! We have what we need. The tokenization output.
Look at these for example:
$ ./varnamc -s ml -t malayalam
Token ma, 2
Token la, 2
Token ya, 2
Token la, 2
Token m, 2
Transliterating malayalam
മലയലം
$ ./varnamc -s ml -t thiruvananthapuram
Token thi, 4
Token ru, 4
Token va, 2
Token na, 2
Token ntha, 2
Token pu, 4
Token ra, 2
Token m, 2
Transliterating thiruvananthapuram
തിരുവനന്തപുരം
So, we can now roughly make the estimate that tokenization is the step at which most of the transliteration (where no learning file is involved) happens. As per the README:
Once these tokens are generated, they are combined and tested against the learning model to get rid of garbage values and come up with most used words. Words are sorted according to the frequency value and returned to the caller function.
Which means for deciding whether “la” in malayalam should be “ല” or “ള” the learning model will come into play.
So we move on to the next function, vwt_get_best_match
. It starts with this sql statement:
"select word, confidence from words where rowid in "
"(SELECT word_id FROM patterns_content as pc where pc.pattern = lower(?1) and learned = 1 limit 5) "
"order by confidence desc";
I am not exactly sure what counts as confidence here, but this seems to match patterns (presumably, input characters) seen previously to words (presumably, ouput words). The rest of the function is possibly just using the input to make this sql query and prepare return.
In our case, since we haven’t done any learning yet, this is probably going to return empty. Which takes us to the next function: vwt_tokenize_pattern
. It starts with a helpful comment:
/* Finds the longest possible match from the words table. Remaining string will be
converted using symbols tokenization */
Also, later in the function there is a call to varnam_debug
. That seems like it can output more debug information than the verbose output. But it doesn’t seem like the ruby wrapper calls this. Maybe I should add that call.
I had to add this to varnamruby.rb
attach_function :varnam_enable_logging, [:pointer, :int, :pointer], :int
And these to varnamc
DebugCallback = FFI::Function.new(:void, [:string]) do |message|
puts message
end
# Skipped lines
$options[:debug] = false
opts.on('-z', '--debug', 'Enable debugging') do
$options[:debug] = true
end
# Skipped lines
if ($options[:debug])
puts "Turning debug on"
done = VarnamLibrary.varnam_enable_logging($varnam_handle.get_pointer(0), Varnam::VARNAM_LOG_DEBUG, DebugCallback);
if done != 0
error_message = VarnamLibrary.varnam_get_last_error($varnam_handle.get_pointer(0))
puts "Unable to turn debugging on. #{error_message}"
exit(1)
end
end
See this commit to see exactly what changed. Anyhow, this turned on debug with --debug
flag.
Back to vwt_tokenize_pattern
then. In our case, there wouldn’t be any matches returning from get_matches
and can_find_possible_matches
because we don’t have any learnings. So, when it says:
/* At this point we will have the longest possible match. If nothing is available,
* there is no words that matches the prefix. In that case, exiting early */
our function will actually return. I confirmed this by doing a debug there.
Therefore, when there are no learnings, vwt_tokenize_pattern
is of no use at all, and we fall back to the initial tokens generated. Turns out I had underestimated the function resolve_tokens
. It seems to be doing some transliteration as that is the fallback which is used when there is no suggestion from learning model.
int
resolve_tokens(varnam *handle,
varray *tokens,
vword **word)
{
vtoken *virama, *token = NULL, *previous = NULL;
strbuf *string;
vtoken_renderer *r;
int rc, i;
assert(handle);
rc = vst_get_virama (handle, &virama);
if (rc)
return rc;
string = get_pooled_string (handle);
for(i = 0; i < varray_length(tokens); i++)
{
token = varray_get (tokens, i);
if (token->type == VARNAM_TOKEN_NON_JOINER) {
previous = NULL;
continue;
}
#ifdef _VARNAM_VERBOSE
printf ("Token %s, %d\n", token->pattern, token->type);
#endif
r = get_renderer (handle);
if (r != NULL)
{
rc = r->tl (handle, previous, token, string);
if (rc == VARNAM_ERROR)
return rc;
if (rc == VARNAM_SUCCESS)
continue;
}
if (token->type == VARNAM_TOKEN_VIRAMA)
{
/* we are resolving a virama. If the output ends with a virama already, add a
ZWNJ to it, so that following character will not be combined.
if output not ends with virama, add a virama and ZWNJ */
if(strbuf_endswith (string, virama->value1)) {
strbuf_add (string, ZWNJ());
}
else {
strbuf_add (string, virama->value1);
strbuf_add (string, ZWNJ());
}
}
else if(token->type == VARNAM_TOKEN_VOWEL)
{
if(virama && strbuf_endswith(string, virama->value1)) {
/* removing the virama and adding dependent vowel value */
strbuf_remove_from_last(string, virama->value1);
if(token->value2[0] != '\0') {
strbuf_add(string, token->value2);
}
}
else if(previous != NULL && previous->type != VARNAM_TOKEN_OTHER) {
strbuf_add(string, token->value2);
}
else {
strbuf_add(string, token->value1);
}
}
else if (token->type == VARNAM_TOKEN_NUMBER)
{
if (v_->config_use_indic_digits)
strbuf_add (string, token->value1);
else
strbuf_add (string, token->pattern);
}
else {
strbuf_add(string, token->value1);
}
previous = token;
varnam_debug (handle, strbuf_to_s (string));
}
*word = get_pooled_word (handle, strbuf_to_s (string), 1);
return VARNAM_SUCCESS;
}
So, in resolve_tokens
function, each of the token is passed through r->tl
which is the transliterate function of renderer. That’s where the magic of transliteration happens. But, such renderer is only defined for Malayalam. The rest of the languages just use the default rendering logic in the resolve_tokens
function. And that uses the token->value1
value for the stuff. This I could confirm by adding varnam_debug (handle, strbuf_to_s (string));
to the very end of the loop inside resolve_tokens
(Yes, I magically figured out how to solve unicode encoding by looking at the rest of the code :D). And the output is like this:
$ ./varnamc -s ml -t malayalam --debug
Turning debug on
Token ma, 2
മ
Token la, 2
മല
Token ya, 2
മലയ
Token la, 2
മലയല
Token m, 2
മലയലം
Tokenizing 'malayalam' with words tokenizer
Transliterating malayalam
മലയലം
Beautiful.
What this means is that the tokenization logic is important too. Because how does it figure out how to group chars? Let us go back to vst_tokenize
. The first thing it does is create a cache key (strbuf_addf (cacheKey, "%s%d%d", strbuf_to_s (lookup), tokenize_using, match_type);
) and look if there is a result in cache (cachedEntry = lru_find_in_cache (&v_->tokens_cache, strbuf_to_s (cacheKey));
). If there isn’t, it proceeds to the next part.
rc = read_all_tokens_and_add_to_array (handle,
strbuf_to_s (lookup),
tokenize_using,
match_type,
&tmpTokens, &tokensAvailable);
This seems to be the function that actually gets tokens for the input. By doing some varnam_debug
prints, the logic seems to be something like (psuedocode):
slice = first character of input
tokens = tokenize(slice)
while (ambiguity_in_tokenization(slice)):
slice = more characters from input
tokens = tokenize(slice)
output += tokens (there is no ambiguity now)
input = remove_from_left(input, slice)
continue loop with remaining input
See the output from my debugging:
lookup: m
Doing actual tokenization
tokens available
lookup: ma
Doing actual tokenization
tokens available
lookup: mal
Doing actual tokenization
tokens not available
Appending found token
There are now 1 tokens in the result
Moving tokenization forward in the input
lookup: l
Doing actual tokenization
tokens available
lookup: la
Doing actual tokenization
tokens available
lookup: lay
Doing actual tokenization
tokens not available
Appending found token
There are now 2 tokens in the result
Moving tokenization forward in the input
lookup: y
Doing actual tokenization
tokens available
lookup: ya
Doing actual tokenization
tokens available
lookup: yal
Doing actual tokenization
tokens not available
Appending found token
There are now 3 tokens in the result
Moving tokenization forward in the input
lookup: l
Cached token available
tokens available
lookup: la
Cached token available
tokens available
lookup: lam
Doing actual tokenization
tokens not available
Appending found token
There are now 4 tokens in the result
Moving tokenization forward in the input
lookup: m
Cached token available
tokens available
Appending found token
There are now 5 tokens in the result
Moving tokenization forward in the input
Token ma, 2
മ
Token la, 2
മല
Token ya, 2
മലയ
Token la, 2
മലയല
Token m, 2
മലയലം
Tokenizing 'malayalam' with words tokenizer
Transliterating malayalam
മലയലം
For “malayalam”, it started with m, found tokens, tried ma, found tokens, tried mal, found no tokens, settled with the token of “ma”, then continued with “l”, and so on.
The function that finds tokens is read_all_tokens_and_add_to_array
which uses prepare_tokenization_stmt
which is a function that generates sql to lookup symbols
table. So, if we know how the symbols table is generated, we probably know everything we need to know about tokens.
Oh. Turns out the file in which all of these functions are there is named symbol-table.c
which makes a lot of sense now. Where are these saved, though? HOME/.local/share/varnam/suggestions
contains sqlite3 databases of learnings. But README also says this:
Compiled version of Scheme file is called as Varnam Symbol Table (vst). This compilation is done using
varnamc
command line utility
So, symbols table is present in schemes
folder. Doing sqlite3 schemes/ml.vst
opens up the ml symbols.
sqlite> .tables
metadata stem_exceptions stemrules symbols
sqlite> select count(*) from symbols;
7410
sqlite> select * from symbols limit 10;
1|9|~|്||||1|0|0|0
2|1|a|അ||||1|0|0|1
3|1|a|ആ|ാ|||2|0|0|1
4|1|aa|ആ|ാ|||1|0|0|0
5|1|A|ആ|ാ|||1|0|0|1
6|1|i|ഇ|ി|||1|0|0|1
7|1|ee|ഈ|ീ|||1|0|0|0
8|1|I|ഈ|ീ|||1|0|0|0
9|1|ii|ഈ|ീ|||1|0|0|0
10|1|i|ഈ|ീ|||2|0|0|1
Aha! So. Tokens are created from the scheme files!
Now, all we have to look at is how the scheme files are written and how it is compiled to sqlite.
The scheme file looks like it is some ruby code with dictionaries and arrays of various mappings. What I failed to notice initially though is that these are function calls. Each line in the scheme file calls a function like vowels
or consonants
. These are in turn defined in varnamc
file. And the function takes those dictionaries and creates tokens out of them and stores them in the database. That’s the magic.
So, there is a lot of ruby code where this happens. (This is exactly what Joice was warning about).
Maybe I will stop here as part 1 and look at the learning part in another post. And if by the end of that I’m still having energy, I’ll probably think about how to port this. It might make more sense to port this to Kotlin as an android library. Maybe.