Going to about:cache in Firefox is a good way to get at the HTML being transmitted by a web site if you are struggling with a web site that is trying to block this content. (Ctrl-U normally displays the HTML source in Firefox. When this doesn't work, it's time to get mad!)
Click on "List Cache Entries" to go to about:cache?device=disk to see a list of cached stuff. Hey, there is a lot of noise here, huh? You can trim it down by clearing your cache and then immediately going to the page you wish to scrape and then immediately back to "List Cache Entries." Clear your cache in Firefox this way:
Tools > Options > Network > Clear Now
At any one line item at "List Cache Entries" one may click through to a page summary which will often expose a path to the cache like so:
C:\Users\whatever\AppData\Local\Mozilla\Firefox\Profiles\whatever\Cache
That said, an HTML scrape of the page should just run along the right side of the details. I use headspring.com in the image below as an example, however it was yfrog.com that inspired me to find a way around being blocked when I attempt to view HTML source.
Addendum 3/25/2014: Tools > Options > Network > Clear Now ...above should have "Advanced" between the "Options" and "Network." Maybe Firefox was different when I wrote this blog posting years ago. Maybe, alternatively, I just made a mistake.
No comments:
Post a Comment