Add LibreOffice security advisories importer#2210
Add LibreOffice security advisories importer#2210NucleiAv wants to merge 5 commits intoaboutcode-org:mainfrom
Conversation
Fetches CVE IDs from the LibreOffice advisory listing page and retrieves structured data (CVSS, CWE, references, dates) from the CVE 5.0 JSON API at cveawg.mitre.org. Fixes: aboutcode-org#1898 Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Remove advisories.html fixture in favour of inline ADVISORY_HTML constant. Drop dead mock attributes and _make_resp helper. Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Replace local re.findall CVE regex with the shared find_all_cve utility. Normalise to uppercase before dedup to handle IGNORECASE matches from both href and link text. Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
| logger = logging.getLogger(__name__) | ||
|
|
||
| ADVISORIES_URL = "https://www.libreoffice.org/about-us/security/advisories/" | ||
| CVE_API_URL = "https://cveawg.mitre.org/api/cve/{cve_id}" |
There was a problem hiding this comment.
@NucleiAv This is incorrect. You are using two data sources: https://cveawg.mitre.org and https://www.libreoffice.org/about-us/security/advisories/. We should only use https://www.libreoffice.org/about-us/security/advisories/.
If https://www.libreoffice.org/about-us/security/advisories/ does not provide an API (feel free to do a deep search to confirm this), you should parse the HTML instead. Please take a look at other importers, such as the nginx importer: nginx_importer_v2.NginxImporterPipeline.
There was a problem hiding this comment.
@ziadhany sure! I will incorporate the changes, but I thought that HTML parsing wont work if the website layout changes. Moreover libreoffice website does not provide details like CVSS score, CVSS version(2.0, 3.x, 4.0), Severity or CWEs, etc. To populate those details I those to use the api approach. I will research again if libreoffice provides an api and if not, will modify the code to do html parsing.
(below, no details regarding CVSS, CWE, etc is mentioned in the website)

Parse advisory listing and individual advisory pages directly from libreoffice.org instead of calling cveawg.mitre.org. Drop unused JSON fixtures and update tests accordingly. Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
|
@ziadhany I researched but could not find any API. So, I have switched to HTML parsing using BeautifulSoup, following the same pattern as the nginx importer. The importer now fetches the listing page to extract advisory URLs, then fetches each individual advisory page and parses the available fields like CVE ID, title, announced date, description, and references. As informed above, a few fields are not available on LibreOffice's pages and will remain empty like CVSS versions, CVSS scores, severity ratings, CWE IDs, and affected version ranges. LibreOffice only lists the fixed version. If those are needed in future, NVD enrichment would be a separate step. |
Adds a pipeline importer for LibreOffice security advisories.
Instead of scraping individual advisory pages with BeautifulSoup which is brittle and hardcoded, and breaks whenever the site layout or UI changes, I used a different and a better approach. The importer fetches the advisory listing page, extracts CVE IDs, then calls the CVE API at https://cveawg.mitre.org/api/cve/{cve_id} for each one. This works because every LibreOffice advisory page links to the CVE record on https://www.cve.org/CVERecord?id={cve_id} in its references section, and the cveawg API returns the full structured CVE 5.0 JSON with CVSS scores, CWE weaknesses, references, and publish dates.