Scriptorai: Translating the Most Important Greek Astrology Survey with AI, as a Start
All open source and available to read
tl;dr: You can now read an early, first-pass, needs-human-improvement translation of the first volume of the CCAG, the most important modern survey of Greek astrology online at Scriptorai.
The Catalogus Codicum Astrologorum Graecorum, or CCAG, is a 12-volume catalogue of Greek astrological writings, published from 1898 to 1953, which "edited, described, and excerpted from texts found in libraries throughout Europe, most edited and catalogued for the first time." It is an essential catalog for Greek scholars and now in the public domain, and many of the volumes are widely available as public PDFs. Despite that, they are not accessible to most of those who would want to read them today, because they are written in Latin and Ancient Greek.
This corpus is so valuable, and I have been so curious what exactly is in them. But to even understand and convey the topics of a single volume would take a human translator a significant amount of time and effort. In my opinion, the main reason these texts haven't been translated already is just that the number of people with the interest, knowledge, and opportunity to translate texts like these are vanishingly few. Meanwhile, the more context and history we can get of how astrology was practiced and considered, the deeper we can take our own practice and understanding.
Since these works are in the creative commons, I built a tool called hemiplon, which can take high-quality scans of PDFs in foreign languages, and using large language models (LLMs), transcribe the images as they are into text, and then do a translation of them.1 The end result is far from perfect, and absolutely requires skilled human translation effort to complete. But by doing a batch-translation of the entire text, we can at least get a medium-quality view of what the contents actually are, which can help us identify where the most interesting parts which demand skilled translation are. The preface to the first volume of CCAG refers to their decision to publish the volumes without every possible source with the Greek phrase Πλέον ἡμίσυ παντός, or "half is better than the whole". That is the spirit with which I have considered this project: it's not perfect, but something is better than nothing.
I have built a site called Scriptorai, a website hosting the first volume of CCAG. It has a doubled side-by-side view of the image, transcription, or translation so you can inspect it yourself, and anyone with a GitHub account can suggest corrections. There is also full-text search, so you can search the entire book for specific terms. (Also, you can flip pages with the left and right arrow keys, which I personally think is a nice touch.) The transcription and translation data are stored separately from the site in a GitHub repository, at sadalsvvd/scriptorai-ccag-01, so suggestions there will automatically show up on the website. If this is a project that people interested in, over time we can crowd-translate a high-quality version of this and other texts, and build more sophisticated collaboration tools. As the original CCAG volumes are in the public domain, I have also added a public domain license to the CCAG repository, and permissive MIT licenses to the source code to the Scriptorai site itself at sadalsvvd/scriptorai, and hemiplon, the tool for transcribing/translating PDFs at sadalsvvd/hemiplon. This means that you can also use this tool to do first-pass translations of your own texts, and view them locally with the site.2
Between hemiplon and Scriptorai, it should be possible to do initial translations of any well-scanned/structured foreign language text, which I hope will eventually make even more knowledge accessible. Scriptorai will also be the host going forward for all of the future CCAG volumes that I generate first-pass translations of, and whatever other texts I (or you!) might be interested in. Currently, the cost of doing a highest-quality-attempt transcription and translation run costs about $40 (most of the cost is from image transcription) for CCAG volume 1, which is 192 pages. That's too expensive for me to run all 16 CCAG PDFs I have at once, but I will run at least one book per month. I am considering accepting crowdfunding of some kind to get them converted faster, or do more, with communal input. If this is something that interests you, let me know.
I hope you find some interesting things in the first volume! I have already found multiple tidbits extremely enticing.
Some have criticized AI for its high energy usage, but these claims are usually overexaggerated and underresearched. This is a good resource for understanding the real impact of AI on the environment in terms of its energy usage. In any case, I feel that the benefits of making these texts accessible to English readers far outweigh the one-time cost of the energy used to translate them per text.
The documentation across both the site and the tool need some improvements, as does the tool documentation. The tool currently only handles CCAG-style PDFs or well-formatted, individual scanned pages. But if you're interested in using hemiplon/scriptorai as-is, get in touch and I can help you get set up.