ugrapheme
https://github.com/Z4JC/ugrapheme/
Use
ugrapheme
to make your Python and Cython code see strings as a sequence of grapheme characters, so that the length of👩🏽🔬🏴Hi
is 4 instead of 13.
Trivial operations like reversing a string, getting the first and last character, etc. become easy not just for Latin and Emojis, but Devanagari, Hangul, Tamil, Bengali, Arabic, etc. Centering and justifying Emojis and non-Latin text in terminal output becomes easy again, as
ugrapheme
uses uwcwidth under the hood.
ugrapheme
exposes an interface that's almost identical to Python's native strings and maintains a similar performance envelope, processing strings at hundreds of megabytes or even gigabytes per second
I want this for go and javascript! Looks like an impressive piece of work.
Related: uwcwidth, for figuing out how many characters a string takes in a terminal