These threats highlight the urgent need to understand how such apps are disseminated at scale — particularly through Telegram's bot-assisted and channel-based infrastructure.
Introduction
In this project, we aim to systematically collect, extract, and identify Telegram URLs associated with underground app distribution through multi-stage automation, providing a foundation for further analysis.
Specifically, we aim to:
- **Discover** underground Telegram channels using search bots and keywords.
- **Extract** all URLs from the historical messages of those channels.
- **Identify** app download pages by analyzing URL content (e.g., download, Android, iOS).
- **Validate** and collect **at least 200 effective app download URLs**.
- **Deliver** a final report including:
- The discovered channels
- The validated download URLs
- Technical workflow and key findings
Background: Telegram Ecosystem
Telegram is a cross-platform encrypted messaging app with **800M+ users**.
- It supports public and private **channels** and **groups**, allowing admins to broadcast content to large audiences.
- Telegram enables **anonymous interaction**, lacks **moderation**, and supports **mass forwarding**, making it ideal for underground distribution.
- It has evolved into a **cybercrime ecosystem**, including: Money laundering, financial fraud, Piracy and revenge porn, Trade of personal information
Background: Bots in Telegram
- Telegram’s Bot API allows developers to create bots for automation.
- "Search bots" offer keyword-triggered channel recommendations.
- Many **underground actors buy keyword slots** (e.g., “VPN”, “Baccarat”) to promote their channels in top ranks.
- Channels listed under paid results are **strategically boosted** to reach wider audiences.
Background: App Package
Android and iOS, the two dominant mobile operating systems, employ distinct app installation package formats and distribution mechanisms.
Android: Open but Vulnerable
- Uses .apk format (Android Package Kit)
- Users can sideload apps from unofficial sources (web links, Telegram, marketplaces)
iOS: Closed but Not Impenetrable
- Uses .ipa format (iOS App Store Package)
- Installation usually limited to App Store
- However, alternative channels exist:
- TestFlight (for testing)
- Enterprise Signing
- WebClip (browser shortcut to apps)
Workflow
Our workflow involves three key stages:
- **Identifying Relevant Channels:** We utilized a curated list of keywords with Telegram bots to discover relevant channels.
- **Gathering Possible URLs:** The message history from these identified channels was systematically collected and parsed.
- **App Collection & Validation:** The collected URLs are processed using Optical Character Recognition (OCR) to identify and extract application information.
Step1: Collect Bots and Channels
- Registered a Telegram developer account and obtained *api_id* and *api_hash*:
- Applied for API keys at [core.telegram.org](https://core.telegram.org/api/obtaining_api_id) and used [Telethon](https://docs.telethon.dev/en/stable/index.html) to interact with Telegram
- Collected top 15 Telegram bots (e.g. *@hao1234bot*)
- Compiled keywords in **Chinese and English**
> e.g., "VPN", "91", "Crack", "porn", "百家乐"
```rust
with open('300keywords.txt', 'r') as f:
keywords = list(set(keyword.strip() for keyword in f.read().splitlines() if keyword.strip()))
random.shuffle(keywords)
...
async with client.conversation(entity) as conv:
words = random.sample(keywords, keyword_num)
for word in words:
try:
await conv.send_message(word)
```
Step2: Extract URLs from Bots
- Used Telethon to collect all messages from each identified channel
- Parsed historical messages (limit: 1000 messages per channel)
- Extracted *t.me/...* links from message entities
- Avoided crawling outdated or inactive links
```rust
re.findall(r'https?://t\.me/\w+', message.message)
```
```rust
# Adjust limit to control the number of messages to be processed
async for message in client.iter_messages(entity, limit=1000):
try:
if message.entities:
for msg in message.entities:
if hasattr(msg, 'url'):
urls.append(msg.url)
except:
print(f'Error processing message {message} in {entity.title}')
continue
```
Step3: Identify App Pages
- **Filtered** collected URLs to remove irrelevant or repeated domains
- Opened each URL using headless Chrome (Selenium)
- Captured **screenshots** after interacting with typical buttons like "进入", "Continue"
- Applied OCR using **PaddleOCR**, extracting keywords like:
> download, iOS, apk, android, app
Validity & Distribution
- Only 2.8% of the URLs were valid
- most were pornography websites and blockchain news, etc.
- Android is the dominant platform (72.8%), possibly due to its openness and sideloading ease.
- followed by iOS (16.6%), with a small portion targeting both platforms.
Categories & Summary
- Majority of valid URLs distribute gambling and pornographic apps
- Pirated software accounts for 17.8%
- AI tools and Web3/blockchain apps make up 13.8%
Summary: Out of the entire dataset, only 2.8% of URLs were deemed valid, totaling 304 confirmed app distribution links. The vast majority of invalid URLs pointed to irrelevant sites, such as adult content aggregators or outdated blockchain portals.
Among the valid subset, distribution is dominated by gambling, pornography, and pirated software. This illustrates a clear trend in the underground ecosystem. Platform targeting remains heavily skewed toward Android, likely due to its permissive installation model.
Challenges Encountered
- **Bot Interaction Rate Limiting:** Aggressive keyword probing led to temporary bans due to suspected spamming. We mitigated this by introducing **randomized delays** (asyncio.sleep) and timeouts for unresponsive bots.
- **Entity Type Ambiguity:** t.me/ links may refer to Channel, Chat, or User entities. Telethon APIs require precise type handling—**heuristic algorithms** were added to ensure safe access.
- **Region Restrictions:** To counter regional restrictions, we employed VPN tunneling and alternate Apple ID environments to simulate compliant locales.
- **Mental Health:** Insulted by TUApp users and shattered by massive urls.
Conclusion
- We implemented a **automated pipeline** to identify Telegram channels promoting underground apps.
- Our method combined **bot-driven discovery**, **regex-based URL extraction**, **headless browser screenshots**, and **OCR-based keyword recognition**.
- This enabled us to extract and filter **over 300 high-signal URLs** linked to illicit distribution.
- These findings help uncover Telegram’s role in **unregulated app dissemination**, particularly in **Chinese-speaking contexts** where censorship drives alternative distribution.
- Future work can leverage this pipeline to:
- Extending coverage to **private channels, group chats, or interactive bots**
- Incorporating **semantic analysis of message patterns** to detect coordination behavior
- Leveraging **visual cues from screenshots** for automated content classification and risk assessment
References
- Y. Guo, D. Wang, L. Wang, Y. Fang, C. Wang, M. Yang, T. Liu, H. Wang. Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram. ACM PACM IMC, Vol. 8, No. 3, Article 33, 2024.
- Telegram. Telegram Bot API. https://core.telegram.org/bots/api
- Lonami. Telethon: Pure Python MTProto Telegram Client. https://github.com/LonamiWebs/Telethon
- Apple Inc. A Threat Analysis of Sideloading. PDF
- Sixgill. Telegram: A Cybercriminal Hotspot. Link
- 10Guards. Is Telegram Turning into a Hub for Cybercrime? Link
- Kaspersky. Trojan.AndroidOS.Piom.bbdw. Link