Home Scrape instagram images/videos using Termux
Post
Cancel

Scrape instagram images/videos using Termux

Termux is a cute little and efficient Linux environment on Android.

It has a very good package repo and you can find and install almost everything. At least in my case, there were no oh-oh moments due to a package not available - be it python, ffmpeg, etc. It has some differences with a regular linux environment1 but it can be easily bypassed with proot2.

Installed it today at 22 00 and from a quick glance at their wiki, found that it supported broadcast intents, that means you could snatch the intents from almost any app and parse the data to do anything on a Linux environment on your Android phone! 😉 Now, how cool is that!

I had been thinking of writing a hook to capture Instagram/Facebook/YouTube links and download data according to my requirements (video, audio only, specific images from Instagram carousels etc). Without having to rely on third party websites/ apps infested with ads and malware, I could script it and run it using termux in less than a second!

Apart from those silly stuff, I noticed that if you could acquire a wakelock for the app on Android ( RIP battery? ) it could run cron jobs - which is super cool! That could even mean setting up a simple web server or a flask app that can listen to webhooks ahem telegram which would be super productive and helpful.

More testing/research in the coming days!

Here’s my bash script attempt numer - 01 ( it’s not a well thought / efficient solution, rather a lazy attempt; probably I will refactor when I have time or if it breaks when instagram updates something in the backend )

Straight to the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#!/data/data/com.termux/files/usr/bin/bash
url=$1
userChoice=$(termux-dialog radio -v "YouTube Video,YouTube Audio,Instagram Image/Carousel Video,Instagram Reels/Video Story,Facebook Video,Other Files Downloads" -t Select)
userChoiceExists=$(echo $userChoice | jq -r '.text')

if [ -z "$userChoiceExists" ]
then
      echo "See ya!"
else
	  userChoiceIndex=$(echo $userChoice | jq -r '.index')
      case $userChoiceIndex in
            0)
                echo "Will try to download 480p if available or else next available best video 720 or 1080 or 4K"
                yt-dlp -f "bv*[height<=480]+ba/b[height<=480] / wv*+ba/w" -o "/storage/emulated/0/nevins/YouTube/%(title)s.%(ext)s" $url
                ;;
            1)
                yt-dlp --extract-audio --audio-format mp3 --embed-thumbnail --output "/storage/emulated/0/nevins/YouTube/Music/%(title)s.%(ext)s" $url
                ;;
            2)
                imageUrl=$url
                cropped=$(echo $imageUrl | sed 's/?.*//')
                tempSt="?__a=1&__d=dis"
                cropped="$cropped$tempSt"
                json=$(curl -H 'cookie: mid=; ig_nrcb=1; ig_did=; datr=; =base_domain=.instagram.com; csrftoken=1L; ds_user_id=; rur=""' -X GET $cropped)
                carouselExists=$(echo $json | jq '.items[0] | has("carousel_media_count")')
                videoOnlySinglePost=$(echo $json | jq '.items[0] | has("video_versions")')
                dirSuffix=$(echo $json | jq '.items[0].user.username')
                dirSuffix="${dirSuffix:1:${#dirSuffix}-2}"
                if $videoOnlySinglePost; then
                    mkdir -p '/storage/emulated/0/nevins/Instagram/Videos/'$dirSuffix
                else
                    mkdir -p '/storage/emulated/0/nevins/Instagram/Pictures/'$dirSuffix
                fi

                if $carouselExists; then
                    echo "Fuck yeah, lets get all of em!"
                    confirmation=''
                    limit=$(echo $json | jq '.items[0].carousel_media_count')
                    for (( c=0; c<limit; c++ ))
                    do
                        confirmation="$confirmation$c,"
                    done
                    userPrompt=$(termux-dialog checkbox -v ${confirmation::-1})
                    userWantsToDownload=$(echo $userPrompt | jq '.code')
                    if [[ $userWantsToDownload -eq -1 ]] ; then                        
                        echo $userPrompt | jq -r '.values[].index' | while read index; do
                            echo $json | jq '.items[0].carousel_media' | jq --argjson index $index -r '.[$index].media_type' | while read media_type ; do
                                if [[ $media_type -eq 1 ]] ; then
                                    echo $json | jq '.items[0].carousel_media' | jq --argjson index $index -r '.[$index].image_versions2.candidates[0].url' | while read url ; do
                                        fileName='/storage/emulated/0/nevins/Instagram/Pictures/'$dirSuffix
                                        rdm=$(openssl rand -hex 12)
                                        fileName="/$fileName/$rdm.jpg"
                                        curl -o $fileName -J -L $url
                                    done

                                elif [[ $media_type -eq 2 ]] ; then
                                    echo $json | jq '.items[0].carousel_media' | jq --argjson index $index -r '.[$index].video_versions[0].url' | while read url ; do
                                        fileName='/storage/emulated/0/nevins/Instagram/Videos/'$dirSuffix
                                        mkdir -p $fileName
                                        rdm=$(openssl rand -hex 12)
                                        fileName="/$fileName/$rdm.mp4"
                                        curl -o $fileName -J -L $url
                                    done
                                else
                                    echo "Unable to determine the media type..."
                                    termux-toast -g bottom "Unable to determine the media type..."
                                fi
                            done
                        done
                    fi
                    confirmation=''

                elif $videoOnlySinglePost; then
                    fileName='/storage/emulated/0/nevins/Instagram/Videos/'$dirSuffix
                    mkdir -p $fileName
                    rdm=$(openssl rand -hex 12)
                    fileName="/$fileName/$rdm.mp4"
                    singleDownload=$(echo $json | jq '.items[0].video_versions[0].url')
                    singleDownload="${singleDownload:1:${#singleDownload}-2}"

                    curl -o $fileName -J -L $singleDownload
                
                else
                    fileName='/storage/emulated/0/nevins/Instagram/Pictures/'$dirSuffix
                    rdm=$(openssl rand -hex 12)
                    fileName="/$fileName/$rdm.jpg"
                    singleDownload=$(echo $json | jq '.items[0].image_versions2.candidates[0].url')
                    singleDownload="${singleDownload:1:${#singleDownload}-2}"

                    curl -o $fileName -J -L $singleDownload
                fi
		/data/data/com.termux/files/home/rclone/sync.sh
                ;;
            3)
                yt-dlp --cookies "/data/data/com.termux/files/home/insta/cookies.txt" -P "/storage/emulated/0/nevins/Instagram/Videos/" $url
                ;;
            4)
                yt-dlp -o "/storage/emulated/0/nevins/Facebook/Videos/%(id)s.%(ext)s" $url
                ;;
            5)
                yt-dlp -o "/storage/emulated/0/nevins/%(id)s.%(ext)s" $url
	  esac
fi

Demo

This post is licensed under CC BY 4.0 by the author.