[fix] utils: truncated result (#4949)

Make sure to prase everything before returning.

Related: \
```
FAIL: test_html_to_text (tests.unit.test_utils.TestUtils.test_html_to_text)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/searxng/searxng/tests/unit/test_utils.py", line 53, in test_html_to_text
    self.assertEqual(utils.html_to_text(r"regexp: (?<![a-zA-Z]"), "regexp: (?<![a-zA-Z]")
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'regexp: (?' != 'regexp: (?<![a-zA-Z]'
- regexp: (?
+ regexp: (?<![a-zA-Z]
```
This commit is contained in:
Ivan Gabaldon 2025-06-27 17:52:12 +02:00 committed by GitHub
parent a76ccba9c5
commit 49fdf4edd9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -161,9 +161,11 @@ def html_to_text(html_str: str) -> str:
s = _HTMLTextExtractor()
try:
s.feed(html_str)
s.close()
except AssertionError:
s = _HTMLTextExtractor()
s.feed(escape(html_str, quote=True))
s.close()
except _HTMLTextExtractorException:
logger.debug("HTMLTextExtractor: invalid HTML\n%s", html_str)
return s.get_text()