Google Rolls out Optical Character Recognition in over 200 Languages

Share

Google OCR1

Improvement in Optical Character Recognition (OCR) technology is one of Google’s lesser-known projects, at least to lay consumers. In reality, many of us have been using OCR for years without knowing what it actually is.

OCR is the technology that enables Google to digitize text captured in image format and make it legibile from the computer’s perspective. So if you’ve ever uploaded a scanned PDF or other image file to Drive, then asked Drive to “Open with – Google Docs,” Google employs OCR, opening a new version of the document that displays the original image and then the extracted text.

The big news today is that OCR has now been rolled out to over 200 languages and 25 writing systems, which is pretty dang awesome. Even if at the end of the day, Google is a company that harvests our data to sell to third parties in their quest to not be evil™, and even if OCR supports that mission, this is the sort of altruistic endeavor that gets little notice but deserves much.

And because I’m feeling saucy, I’ve provide a complete list of the supported languages below. You’re welcome.

Acehnese, Acholi, Adangme, Afrikaans, Akan, Albanian, Algonquinian, Amharic, Ancient Greek, Arabic (Modern Standard), Araucanian/Mapuche, Armenian, Assamese, Asturian, Athabaskan, Aymara, Azerbaijani, Azerbaijani (Cyrillic; old orthography), Balinese, Bambara, Bantu, Bashkir, Basque, Batak, Belorussian, Bemba, Bengali, Bikol, Bislama, Bosnian, Breton, Bulgarian, Burmese, Catalan, Cebuano, Chechen, Cherokee, Chinese (Mandarin; Hong Kong), Chinese (Simplified; Mandarin), Chinese (Traditional; Mandarin), Choctaw, Chuvash, Cree, Creek, Crimean Tatar, Croatian, Czech, Dakota, Danish, Dhivehi, Duala, Dutch, Dzonkha, Efik, English (American), English (British), Esperanto, Estonian, Ewe, Faroese, Fijian, Filipino, Finnish, Fon, French (Canadian), French (European), Fulah, Ga, Galician, Ganda, Gayo, Georgian, German, Gilbertese, Gothic, Greek, Guarani, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Herero, Hiligaynon, Hindi, Hungarian, Iban, Icelandic, Igbo, Iloko, Indonesian, Irish, Italian, Japanese, Javanese, Kabyle, Kachin, Kalaallisut, Kamba, Kannada, Kanuri, Kara-Kalpak, Kazakh, Khasi, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi, Kongo, Korean, Kosraean, Kuanyama, Lao, Latin, Latvian, Lingala, Lithuanian, Low German, Lozi, Luba-Katanga, Luo, Macedonian, Madurese, Malagasy, Malay, Malayalam, Maltese, Mandingo, Manx, Maori, Marathi, Marshallese, Mende, Middle English, Middle High German, Minangkabau, Mohawk, Mongo, Mongolian, Nahuatl, Navajo, Ndonga, Nepali, Niuean, North Ndebele, Northern Sotho, Norwegian (Bokmål), Nyanja, Nyankole, Nyasa Tonga, Nzima, Occitan, Ojibwa, Old English, Old French, Old High German, Old Norse, Old Provencal, Oriya, Ossetic, Pampanga, Pangasinan, Papiamento, Pashto, Persian, Polish, Portuguese (Brazilian), Portuguese (European), Punjabi (Gurmukhi), Quechua, Romanian, Romansh, Romany, Rundi, Russian, Russian (Old Orthography), Sakha, Samoan, Sango, Sanskrit, Scots, Scottish Gaelic, Serbian (Cyrillic), Serbian (Latin), Shona, Sinhala, Slovak, Slovenian, Songhai, Southern Sotho, Spanish (European), Spanish (Latin American), Sundanese, Swahili, Swati, Swedish, Tahitian, Tajik, Tamil, Tatar, Telugu, Temne, Thai, Tibetan, Tigirinya, Tongan, Tsonga, Tswana, Turkish, Turkmen, Udmurt Ukrainian, Urdu, Uzbek, Uzbek (Cyrillic; old orthography), Venda, Vietnamese, Votic, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, Zapotec, and Zulu.

The technical side of this is beyond my pay grade, but if you want to learn more, check out the link below and your dreams will be filled with Hidden Markov Models (HMMs) and Python code.

All in all, the ability to convert what is effectively “background noise,” as Google describes it, to textual content that’s recognized by a computer is hugely useful, especially as the latest language rollout supports more developing countries.

Also, Old High German and Old Norse are supported, as well as Old English. Maybe it’ll turn out we had Beowulf wrong all along.

The update works on the desktop and mobile app versions of Drive.

Source: Google Research Blog

Come comment on this article: Google Rolls out Optical Character Recognition in over 200 Languages

Visit TalkAndroid for Android news, Android guides, and much more!

News

Company:

How a rumored CPU might embarrass the PS5

Does your Mac need antivirus software in 2024? We asked the experts

One of Tesla’s biggest competitors is making a phone, and it looks great

Live Pixel 9 Pro photos surface, highlighting rumored design changes

Long-overdue Wear OS 4 update is coming to one of our favorite smartwatches, sort of

HP LaserJet Pro MFP 3101fdw review: a fast business printer for home offices

Spigen Ultra Hybrid Samsung Galaxy S24 case review: Should you buy it?

Razer Kishi Ultra review: Should you buy it?

The Asus ROG Zephyrus G16 completely challenged my expectations

CUKTECH 20 Power Bank review: Should you buy it?

I’ve worn two of the best smart rings. Here’s which one you should buy

I did a camera test with two $1,800 phones. Then something annoying happened

Google Pixel 7a vs. Pixel 7: don’t buy the wrong Pixel

This is the most unusual Galaxy S23 Ultra camera test I’ve ever done

I tested the Galaxy S23 Ultra and iPhone 14 Pro cameras. Only one is a winner

How to search ChatGPT conversations

How to set up Windows 11 without a Microsoft account

How to transfer a Wear OS smartwatch from one phone to another

How to type an em dash in Windows

Ask Jerry: How to fight email spam

8 iPhone browser apps you should use instead of Safari

Are Facebook and Instagram still down? Here’s what we know

Are Facebook and Instagram still down? Here’s what we know

The 1Password Android app just got a huge upgrade

I never knew I needed this mini Mac app, but now I can’t live without it

Google Rolls out Optical Character Recognition in over 200 Languages

How a rumored CPU might embarrass the PS5

Does your Mac need antivirus software in 2024? We asked the experts

One of Tesla’s biggest competitors is making a phone, and it looks great

Live Pixel 9 Pro photos surface, highlighting rumored design changes

Long-overdue Wear OS 4 update is coming to one of our favorite smartwatches, sort of

More News

How a rumored CPU might embarrass the PS5

Does your Mac need antivirus software in 2024? We asked the experts

One of Tesla’s biggest competitors is making a phone, and it looks great

Live Pixel 9 Pro photos surface, highlighting rumored design changes