Yandex leak includes source code for popular Russian search engine

facepalm: As the world’s fourth largest search engine, Yandex is a true tech giant offering many digital or digital-enhanced services. The company has been involved in a recent security incident that will yield interesting results, at least for the SEO market.
Almost 50 gigabytes of stolen data from Yandex services was recently posted online. The company is trying to play down the leak, but the source code distributed via torrent can reveal a lot of useful information about how its services – and, in particular, the search engine – actually work.
The leak occurred on January 25 and concerned list of files which appeared to have been stolen in July 2022 from a vault dating back to February 2022, the month Russia launched its full-scale invasion of Ukraine. The torrent does not appear to contain any data (or pre-built binaries) other than the source code of all major Yandex services, including the search engine with its index bot, Maps (Russian version of Google Maps and Street View), Uber-like Taxi service, Mail , Market (Amazon alternative), cloud platform and more.
According to software engineer Arseniy Shestakov, a leak is a big deal. “Imagine one company” that can replace Google, Uber, Amazon, Netflix and Spotify at once, encoder said. The leak is also real, as Shestakov spoke to various people who worked at the company (or still work there), and said that some of the archives contain “up-to-date source code” for Yandex services and documentation pointing to real URLs intranets.
One of the most interesting – and potentially dangerous – aspects of the leak is the source code of the Yandex search engine, namely the ranking factors used by the algorithm to provide results for users’ search queries. Lists of leaks 1922 unique ranking factorsmost of which are marked as deprecated and have probably been replaced in the latest versions of the Yandex code.
The first ranking factor used by the Russian search engine is “PAGE_RANK”, which is a clear reference to the most important algorithm used by Google to rank web pages. As for Yandex’s own web search, the leaked algorithm seems to favor pages that aren’t too outdated, have a lot of organic traffic (i.e. unique visitors), are code-optimized and hosted on trusted servers, or are Wikipedia pages.
The Yandex leak certainly gives SEOs a lot of insight into how a world-class search engine actually works, although the security implications shouldn’t be that exciting. Shestakov said no personal data was involved, and a few API keys were probably only used for testing.
Yandex official press release of the incident, stated that the leaked code snippets were “outdated and different from the version currently in use” by its services, and some of the published snippets were “never actually used in operations”.
The company is still investigating the seemingly politically motivated incident and will take all possible steps to improve management oversight to ensure there are no more leaks in the future.
Source link