ugrapheme

last updated: Oct 18, 2024

https://github.com/Z4JC/ugrapheme/

Use ugrapheme to make your Python and Cython code see strings as a sequence of grapheme characters, so that the length of 👩🏽‍🔬🏴󠁧󠁢󠁳󠁣󠁴󠁿Hi is 4 instead of 13.

Trivial operations like reversing a string, getting the first and last character, etc. become easy not just for Latin and Emojis, but Devanagari, Hangul, Tamil, Bengali, Arabic, etc. Centering and justifying Emojis and non-Latin text in terminal output becomes easy again, as ugrapheme uses uwcwidth under the hood.

ugrapheme exposes an interface that's almost identical to Python's native strings and maintains a similar performance envelope, processing strings at hundreds of megabytes or even gigabytes per second

I want this for go and javascript! Looks like an impressive piece of work.

Related: uwcwidth, for figuing out how many characters a string takes in a terminal

↑ up