Leaked secrets on GitHub rose by 28% in 2023, report reveals
The 2024 edition of the State of Secrets Sprawl report by GitGuardian has revealed that the number of new secret occurrences leaked publicly on GitHub in 2023 was up 28% from the previous year, reaching 12.8 million. Impressively, the incidents of publicly exposed secrets has multiplied fourfold since the company first began reporting in 2021.
It was found that due to the growing number of code repositories on GitHub, which added 50 million new repositories last year (+22%), the risk of accidental and purposeful exposure of sensitive data has increased. In 2023 alone, GitGuardian detected more than 1 million valid occurrences of Google API secrets, 250,000 Google Cloud secrets, and 140,000 AWS secrets.
While the IT sector was the most affected industry, with 65.9% of the total detected leaks, other sectors like Education, Science & Tech, Retail, Manufacturing, and Finance & Insurance were also impacted, accounting for a collective 30.8% of all leaks.
The research also highlighted a major security gap; upon finding an exposed valid secret, 90% remain active for at least five days after the author is notified. This mainly affects API keys and authentication tokens for major service providers such as Cloudflare, AWS, OpenAI, and even GitHub.
"Developers erasing leaky commits or repositories instead of revoking are creating a major security risk for companies, which will remain vulnerable to threat actors mirroring public GitHub activity for as long as the credential remains valid. These zombie leaks are the worst," said Eric Fourrier, CEO and Founder of GitGuardian.
The study showed that only 28.2% of repositories that hosted erased commits having exposed a secret were still accessible at the study's time, indicating that the remaining ones were likely deleted or made private as a reaction to the leak. The report also found that in 2023, 12.4% of 2,050 repositories taken down by GitHub exposed at least one secret, signifying a 37.8% increase from 2020.
Fourrier emphasised the importance of proactive measures, stating, "The Toyota breach in 2022, which occurred after a hacker obtained credentials for one of its servers from source code published on GitHub, is proof that even five years after a leak, a compromise can still happen."
The report also shed light on topics like the potential use of LLMs models as an alternative to traditional secret detection tools and revealed that 3.11% of secrets leaked in private repositories were also exposed in public ones. "This dismantles the idea that relying on the privacy of source code as a security layer is a valid strategy," added Fourrier.
The study further analysed the pervasiveness of leaked secrets within PyPI, the official third-party package management system for the Python community, and found that in 2023, 11,054 unique secrets were exposed in package releases.
The report concluded with a set of recommendations for tackling secrets sprawl. It emphasized the need for a balanced mix of awareness, training, and effective, automated processes, along with discovery tools and robust controls.