crosswalker
last updated: Oct 20, 2023
https://github.com/washingtonpost/crosswalker
Crosswalker is a general purpose tool for joining columns of text data that don't perfectly match. How it works:
- Upload a file or two and configure how to join columns (and optional additional metadata to show)
- The tool auto-ranks matches for each data row with a specially designed fast and thorough algorithm
- The tool auto-matches values that are practically identical
- The results are presented in an interactive spreadsheet from which you can manually continue matching
- As you go, the columns are resorted to highlight the most probable remaining matches
The application is entirely local, saving nothing to the cloud. Browser-based local storage is used to auto-save your progress (as long as you're not in incognito mode), and you can save/load your progress to/from files at any point. Once you're done matching, you can download the results as a CSV file.