HTML Only vs Webpage Complete files

The HTML differs when saving with the "HTML Only" option versus the "Webpage, Complete" option in Chrome and Firefox. I modified the regex based on the patterns I was finding for the differences in the HTML and made a pull request (!1 (merged)).

For google files: The parser is looking for the matching pattern /<base href="https:\/\/adssettings.google.com\/">/. The complete files have that elsewhere and sometimes it has the leading and trailing < and >, but sometimes it doesn’t, depending on browser. I propose changing the regex to /base href="https:\/\/adssettings.google.com\/"/ to handle those differences.

For facebook files: The parser is looking for the matching pattern /<html lang="[^"]*" id="facebook" class="[^"]*">/. The complete files have a different ordering of the lang and id options. I think the most important is the id="facebook" to confirm it is a facebook file. I propose changing the regex to /<html .*id="facebook" class="[^"]*"/. That way it will match any characters before the id="facebook", but still match the general structure and ensure id="facebook" is there.

Edited Feb 12, 2019 by Freedman, Joe