Skip to content

Add LibreOffice security advisories importer#2210

Open
NucleiAv wants to merge 5 commits intoaboutcode-org:mainfrom
NucleiAv:feat/libreoffice-importer-1898
Open

Add LibreOffice security advisories importer#2210
NucleiAv wants to merge 5 commits intoaboutcode-org:mainfrom
NucleiAv:feat/libreoffice-importer-1898

Conversation

@NucleiAv
Copy link

@NucleiAv NucleiAv commented Mar 14, 2026

Adds a pipeline importer for LibreOffice security advisories.

Instead of scraping individual advisory pages with BeautifulSoup which is brittle and hardcoded, and breaks whenever the site layout or UI changes, I used a different and a better approach. The importer fetches the advisory listing page, extracts CVE IDs, then calls the CVE API at https://cveawg.mitre.org/api/cve/{cve_id} for each one. This works because every LibreOffice advisory page links to the CVE record on https://www.cve.org/CVERecord?id={cve_id} in its references section, and the cveawg API returns the full structured CVE 5.0 JSON with CVSS scores, CWE weaknesses, references, and publish dates.

Fetches CVE IDs from the LibreOffice advisory listing page and
retrieves structured data (CVSS, CWE, references, dates) from
the CVE 5.0 JSON API at cveawg.mitre.org.

Fixes: aboutcode-org#1898

Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Remove advisories.html fixture in favour of inline ADVISORY_HTML
constant. Drop dead mock attributes and _make_resp helper.

Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Replace local re.findall CVE regex with the shared find_all_cve
utility. Normalise to uppercase before dedup to handle IGNORECASE
matches from both href and link text.

Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
logger = logging.getLogger(__name__)

ADVISORIES_URL = "https://www.libreoffice.org/about-us/security/advisories/"
CVE_API_URL = "https://cveawg.mitre.org/api/cve/{cve_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NucleiAv This is incorrect. You are using two data sources: https://cveawg.mitre.org and https://www.libreoffice.org/about-us/security/advisories/. We should only use https://www.libreoffice.org/about-us/security/advisories/.

If https://www.libreoffice.org/about-us/security/advisories/ does not provide an API (feel free to do a deep search to confirm this), you should parse the HTML instead. Please take a look at other importers, such as the nginx importer: nginx_importer_v2.NginxImporterPipeline.

Copy link
Author

@NucleiAv NucleiAv Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziadhany sure! I will incorporate the changes, but I thought that HTML parsing wont work if the website layout changes. Moreover libreoffice website does not provide details like CVSS score, CVSS version(2.0, 3.x, 4.0), Severity or CWEs, etc. To populate those details I those to use the api approach. I will research again if libreoffice provides an api and if not, will modify the code to do html parsing.

(below, no details regarding CVSS, CWE, etc is mentioned in the website)
image

Parse advisory listing and individual advisory pages directly from
libreoffice.org instead of calling cveawg.mitre.org. Drop unused
JSON fixtures and update tests accordingly.

Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
@NucleiAv
Copy link
Author

NucleiAv commented Mar 14, 2026

@ziadhany I researched but could not find any API. So, I have switched to HTML parsing using BeautifulSoup, following the same pattern as the nginx importer. The importer now fetches the listing page to extract advisory URLs, then fetches each individual advisory page and parses the available fields like CVE ID, title, announced date, description, and references. As informed above, a few fields are not available on LibreOffice's pages and will remain empty like CVSS versions, CVSS scores, severity ratings, CWE IDs, and affected version ranges. LibreOffice only lists the fixed version. If those are needed in future, NVD enrichment would be a separate step.

@NucleiAv NucleiAv requested a review from ziadhany March 14, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collect data from https://www.libreoffice.org/about-us/security/advisories/

2 participants