lxml was re-encoding already decoded UTF-8 HTML based on charset="euc-kr" meta tag. Fixed by removing charset meta tags and explicitly setting UTF-8 encoding in HTMLParser. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>