plugin to scrape website & convert HTML to markdown #923
100ideas posted onGitHub
I'm enjoying boostnote after switching from evernote & quiver.app - thank you to everyone who has contributed to this promising open source tool.
I keep a "code" notebook for technical notes-to-self and today I wanted to add a "clipping" of a blog post to it. I wasn't sure what the best way was (sometimes I try copying-and-pasting directly from the browser, which worked OK in quiver's rich-text note mode... but rtf, gross), so I tried out a few tools for automatically converting from HTML to markdown.
pandoc
has a command-line option to fetch content from URL and can convert to/from HTML, markdown, and many other formats. Install on osx with brew install pandoc
, then:
pandoc -f html --normalize --wrap=none -t markdown_github+backtick_code_blocks+autolink_bare_uris -o output.md <URL>
as a handy fish shell function:
⯠function panscrape --description='usage: panscrape [URL] > blog_clipping.md'
pandoc -f html --normalize --wrap=none -t markdown_github+backtick_code_blocks+autolink_bare_uris $argv
end
⯠funcsave panscrape
⯠panscrape "https://shapeshed.com/command-line-utilities-with-nodejs/" > clipping.md
# or to copy directly to system clipboard
⯠panscrape "https://shapeshed.com/command-line-utilities-with-nodejs/" | pbcopy
Pandoc does an OK job but isn't definitely not perfect, so some manual editing of the output may be necessary, for instance deleting header & footer content.
If you don't want to install anything, fuckyeahmarkdown.com seems to have an alright hosted converter.
feature request
Add a command (plugin?) to Boostnote that takes a URL as input, scrapes the page, converts the html to markdown, and creates a new note filled with the result.
Starting points:
node-europa
"is a Node.js module for converting HTML into valid Markdown that uses the Europa Core engine."scrape-markdown
CLI tool based on node-europa- note: npm package is out-of-date and does not work; install from source repo with
npm install github:evangoer/scrape-markdown
- run locally
./node_modules/.bin/scrape-markdown [URL]
- note: npm package is out-of-date and does not work; install from source repo with
I would be happy to help with implementation.
#405
Fund this Issue
Rewarded pull request
Click to copy link
Recent activities