Country-specific OSINT can help us decipher the intricate tapestry of societies around the globe. Collection and analysis of information from these unique online ecosystems allows us to uncover insights, that foster partnerships and business opportunities, and help shape assessments of various events across the globe.
When undertaking country-specific OSINT, it is useful to adjust our methods to cater to specific areas and native language users. Of course, when collecting any information as part of any open source task, we should always consider our attribution, particularly when we are not as familiar with different online environments.
In this blog, which follows a recent webinar, we will look at the Chinese internet. It is a unique interplay between technological innovation, governance, and distinct cultural characteristics. This interplay shapes how people interact, consume information, and conduct business online, setting it apart from other digital ecosystems around the world.
Whether gathering information on the Chinese Internet or other country specific online sources, it's vital to recognise the forces that shape the information you might encounter. Understanding these forces enables you to better use what you’ve collected. By grasping the 'why' behind the information, you can more effectively address the underlying question of 'so what'.
Technological innovation
China started marketing the internet to the public in the mid-1990s. By 2000, Chinese information technology company Sina was listed on the NASDAQ. As early as 2005, a group of Tsinghua University students were working on the prototype of an online social media platform called RenRen – almost a year before Facebook became widely available in the United States.
Technological innovation was also apparent in the realms of e-commerce and online payments. As PayPal made its debut in 1998, Chinese technology company Alibaba was introducing online marketplaces and business-to-business (B2B) e-commerce. Fast forward to 2011, Chinese technology company Tencent released WeChat, an all-encompassing application integrating messaging, social networking, and mobile payments.
Governance
Many governments around the world have struggled to regulate the internet. In 2000, during a televised interview, US President Bill Clinton compared regulating the internet to “sort of like trying to nail Jello to the wall”. However, China was able to regulate the internet before many western countries due, in part, to the centralisation of power within the Chinese Communist Party (CCP). This allowed China to set standards, regulations, and legislation for the internet early on. Most of these standards, regulations, and legislation remain to this day, due to their enduring ideological foundations.
During a speech in 2012, Chinese President Xi Jinping emphasised the importance of China's historical contribution to global civilisations. In this speech, Xi Jinping emphasised the role of the CCP in restoring China to global dominance during a period of national rejuvenation (1949-2049). Since then, the CCP has pursued rejuvenation through a variety of means, including propaganda enforced through censorship.
In China, propaganda does not necessarily carry negative connotations like bias or political agenda, it is seen as the spreading of an ideology as widely as possible. Censorship is the mechanism that ensures the CCP’s political agenda is propagated without interference.
The primary instruments of censorship in China include standards, regulations, and legislation. In 2022, a regulation was issued which makes Chinese internet information service providers responsible for moderating all content they host. Chinese companies hosting online content now use technology and people to monitor and moderate internet content. This impacts the type of information we can access from an OSINT perspective.
Infrastructure projects have also played a significant role in censoring the Chinese internet more broadly. The Golden Shield project gave rise to what western countries commonly refer to as the “Great Firewall of China” (防火长城). The Great Firewall describes a collection of tools, services and rules used to block certain internet content and connections.
Over time the Great Firewall has evolved and can now filter connections, redirect internet traffic, degrade online services and intercept online communications. The Great Firewall makes accessing certain Chinese content impossible from some IP addresses, usually IP addresses that resolve to locations outside mainland China. There are some practical ways of testing websites to see if they are blocked by the Great Firewall of China, like the View DNS Chinese Firewall test.
Understanding and using euphemisms
The Chinese language is particularly good at generating euphemisms. It is a tonal language, meaning single syllables can have multiple meanings depending on the tone used. When writing in Simplified Chinese, tonal differences are captured by different characters that represent tonal differences in the same sound. Over time, euphemisms have been adopted by many Chinese people as a measure to circumvent censorship.
For example, the Chinese Government has used its goal of constructing a “harmonious society” (和谐社会) to justify censorship measures. In response, some Chinese netizens started to use the word “harmony” as a euphemism for censorship. They did this by substituting the character for “Harmonious” with “river crab”. .
These two words sound nearly the same. In Chinese “river crab” is pronounced héxiè and “harmonious” is héxié. The difference is in the Chinese character representation, as seen in this example from Chinosity.com.
There are a number of resources which list common Chinese euphemisms used online. We have added our favourites to the OSINT Combine Bookmarks. Knowing about euphemisms is important because it allows us to account for cultural nuance in the information we collect and our analysis of the data.
Translation tools
Do we need to know Chinese to search the Chinese internet? Today, the answer is "no". Machine translation tools enabled by artificial intelligence are rapidly improving. Many are now capable of understanding context and cultural nuance. Importantly, machine translation offers speed and cost-efficiency, making them ideal for projects with quick turnarounds. We have listed some of our favourites below:
| DeepL | Yandex | Baidu Fanyi | Google Translate |
HQ | Germany | Russia | China | U.S. |
Character limit for free text translations | 5000 | 10000 | 1000 | 5000 |
Languages supported | 27 | 98 | 200 | 133 |
What can be translated | Text / files | Various | Limited (free version) | Various |
Browser / Mobile compatible | Yes | Yes | Yes | Yes |
When using machine translation tools, it is important to understand that each tool is unique. Some offer limited features unless you create an account while others will only translate content if the input matches certain criteria. For example, DeepL has a 5,000-character limit when inputting plain text and Yandex can only translate text from images if the image is over a certain resolution.
Most importantly, when translating from Chinese into English, machine translation tools will often give you different outputs. It is therefore worth using multiple tools for the same translation. This discrepancy means we should only use machine translation tools to gain a general understanding. If we are going to rely on a translation for any business-critical assessments, it is always worth seeking professional language support. See our previous blog on The Transliteration Problem in OSINT, that discusses this further.
Typing in Simplified Chinese
As of 2023, China had 1.5 billion internet users. Given the sheer volume of content on the Chinese internet, there is a significant amount of variation in quality across China’s digital domain.
We may be able directly copy or scrape content; however, this is not always possible. Having the Simplified Chinese language pack installed allows us to type the pinyin characters into a Word document. This will then give us multiple possible Chinese character representations for the keyword we are trying to collect. If we compare these different Chinese character representations to the word on the website, we will be able to select the representation in our document that matches. In seconds we have accurately rendered the word from the Chinese website into a text-based output we can use.
See the short video for instructions to download and install the Simplified Chinese language pack on a Windows device:
Where to search for content
Different search engines or archives have different strengths. For example, some index public WeChat conversations, some give preference to mobile-friendly sites. It’s best to be aware of the unique features of any search tool that you’re using. Google is often our go-to in most (non-China related) OSINT enquiries, but Google isn’t overly popular in China. However, Advanced Search operators can be useful for country/region specific filtering. Below, we have listed some features to be aware of across a range of different Chinese search engines.
Baidu (百度一下)
It is the most popular search engine.
In 2022, over 80% of China’s internet users conducted searches using Baidu.
Boolean searching works.
Only Mandarin websites rank on Baidu.
Baidu priorities the indexing of mobile-friendly websites.
Sogou (搜狗)
The only search engine to index content from public WeChat accounts.
Sogou uses Bing to search for English results, then translates back into Chinese.
Company statements indicate Sogou is focused on developing artificial intelligence and natural language processing supporting technology.
Qihoo 360; AKA: Haosou; AKA: 360 (奇虎)
360 has a “trends” tool which allows users to search for trends associated with keywords.
In October 2022, the U.S. Government added 360 to a list of "Chinese military companies". operating in the U.S.
Each search engine referred to above will offer different results for the same keyword or search query. It is therefore worth using a combination of search engines depending on where you want to draw results from. For example, if you are interested in the social media presence of an organisations website, you may wish to:
Use Google to get western social media results.
Use Baidu to capture social media references from the official website.
Use Sogou to capture publicly available WeChat content.
Key Takeaways
Tailoring OSINT methods for specific countries enhances the value of collected data, offering insights into unique online ecosystems.
Grasping the cultural, governance, and technological factors of a region like China enables better interpretation and application of collected data.
Machine translation tools have made it possible to search the Chinese internet without knowing the language. However, they have limitations, such as producing varied outputs, making them suitable for gaining a general understanding but for critical assessments, you may require professional language support.
Different Chinese search engines like Baidu, Sogou, and Qihoo 360 have unique features and data access points. For effective information gathering, using a combination of these along with Western search engines is advisable depending on the specific information sought.
If you would like to dive deeper into country specific OSINT, check out our three-day course, The OSINT Redbook: Investigating with the Chinese internet.