Description
The robots.txt file is used to instruct web robots on which parts of a website to avoid crawling or indexing. While intended for cooperation with search engine crawlers, malicious actors may exploit this file to identify restricted areas and target them for unauthorized access, email harvesting, spamming, or scanning for security vulnerabilities.
Recommendation
To mitigate risks associated with robots.txt exposure, avoid revealing sensitive paths or directories in the robots.txt file. Regularly review and update the file to ensure it doesn’t inadvertently disclose information that could be exploited by malicious actors.
References
- Is your robots.txt file vulnerable? Here’s how to check and secure it
- Wikipedia: Robots exclusion standard
- CWE-200
- CAPEC-118
- OWASP 2021-A5