Home Using Telegram and Cloud Vision API to scan images for text
Post
Cancel

Using Telegram and Cloud Vision API to scan images for text

…to club it with Telegram and you can upload the image to Telegram and get back the text from the image as a reply!

As you might already know, Google’s Cloud Vision allows developers to perform optical character recognition (OCR) so in this article, I am going to jot down how to integrate with a telegram bot.

At this point, you must have a telegram bot that listens for chats that contain text or images and replies to them depending on the context.

To enable Cloud Vision API

  • In the Cloud console, go to the Create service account page – Go to Create service account
  • Select your project.
  • In the Service account name field, enter a name. The Cloud console fills in the Service account ID field based on this name.
  • In the Service account description field, enter a description.
  • Click Create and continue.
  • To provide access to your project, grant the following role(s) to your service account: Project > Owner.
  • In the Select a role list, select a role.
  • For additional roles, click add Add another role and add each additional role.

    Use a custom role that meets your needs.

  • Click Continue.
  • Click Done to finish creating the service account.
  • Do not close your browser window. You will use it in the next step.
  • Create a service account key:
    • In the Cloud console, click the email address for the service account that you created.
    • Click Keys.
    • Click Add key, and then click Create new key.
    • Click Create. A JSON key file is downloaded to your computer.
    • Click Close

Install the python library:

1
$ pip install --upgrade google-cloud-vision

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def detect_text():
    """Detects text in the file."""
    from google.cloud import vision
    import io
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/location/of/the/json/downloaded/from/previous/step/key.json"
    client = vision.ImageAnnotatorClient()

    path = f'{vision_image_path}/vision.jpg'

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.text_detection(image=image)
    os.remove(path)
    if response.error.message:
        return 'ERROR!\nFor more info on error messages, check:\nhttps://cloud.google.com/apis/design/errors'
    elif response.text_annotations:
        texts = response.text_annotations
        return texts[0].description
    else:
        return "no text was detected."


#data is the json posted by Telegram
def photo_message(data):
    parameters = {'chat_id': data['message']['from']['id'],
                  'text': 'Detection in progress', 'reply_to_message_id': data['message']['message_id']}
    r = requests.post(url=f"{tg_endpoint}/sendmessage", data=parameters)
    

    file_link = requests.post(
        url=f"{tg_endpoint}/getFile?file_id={data['message']['photo'][-1]['file_id']}").json()
    downloaded_file = requests.get(
        url=f"{file_endpoint}/{file_link['result']['file_path']}", allow_redirects=True)

#Since I am the only one using the bot, I set a static filename. You might want to dynamically allocate the file name - hint - use the timestamp etc.
    open(f'{vision_image_path}/vision.jpg',
         'wb').write(downloaded_file.content)
    detected_text = detect_text()
    if len(detected_text) > 3999:
        x = 4000
        message_array = [detected_text[y-x:y]
                         for y in range(x, len(detected_text)+x, x)]
        for message in message_array:
            parameters = {'chat_id': data['message']['from']['id'],
                          'text': message, 'reply_to_message_id': data['message']['message_id']}
            r = requests.post(
                url=f"{tg_endpoint}/sendmessage", data=parameters)
    else:
        parameters = {'chat_id': data['message']['from']['id'],
                      'text': detected_text, 'reply_to_message_id': data['message']['message_id']}
        r = requests.post(url=f"{tg_endpoint}/sendmessage", data=parameters)

Demo

This post is licensed under CC BY 4.0 by the author.