BoostIO/Boostnote

plugin to scrape website & convert HTML to markdown #923

100ideas posted onGitHub

I'm enjoying boostnote after switching from evernote & quiver.app - thank you to everyone who has contributed to this promising open source tool.

I keep a "code" notebook for technical notes-to-self and today I wanted to add a "clipping" of a blog post to it. I wasn't sure what the best way was (sometimes I try copying-and-pasting directly from the browser, which worked OK in quiver's rich-text note mode... but rtf, gross), so I tried out a few tools for automatically converting from HTML to markdown.

pandoc has a command-line option to fetch content from URL and can convert to/from HTML, markdown, and many other formats. Install on osx with brew install pandoc, then:

pandoc -f html --normalize --wrap=none -t markdown_github+backtick_code_blocks+autolink_bare_uris -o output.md <URL>

as a handy fish shell function:

❯ function panscrape --description='usage: panscrape [URL] > blog_clipping.md'
      pandoc -f html --normalize --wrap=none -t markdown_github+backtick_code_blocks+autolink_bare_uris $argv
  end
❯ funcsave panscrape
❯ panscrape "https://shapeshed.com/command-line-utilities-with-nodejs/" > clipping.md

# or to copy directly to system clipboard
❯ panscrape "https://shapeshed.com/command-line-utilities-with-nodejs/" | pbcopy

Pandoc does an OK job but isn't definitely not perfect, so some manual editing of the output may be necessary, for instance deleting header & footer content.

html-md-conversion-pandoc-boostnote

If you don't want to install anything, fuckyeahmarkdown.com seems to have an alright hosted converter.

feature request

Add a command (plugin?) to Boostnote that takes a URL as input, scrapes the page, converts the html to markdown, and creates a new note filled with the result.

Starting points:

  • node-europa "is a Node.js module for converting HTML into valid Markdown that uses the Europa Core engine."
  • scrape-markdown CLI tool based on node-europa
    • note: npm package is out-of-date and does not work; install from source repo with npm install github:evangoer/scrape-markdown
    • run locally ./node_modules/.bin/scrape-markdown [URL]

I would be happy to help with implementation.

#405


I use the copy as markdown plugin of chrome. I find it very convenient.

posted by kentchiu over 7 years ago

Thanks for sharing! Looks like copy-as-markdown uses reMarked.js internally, another option besides node-europa for the putative Boostnote plugin.

Screenshot comparing reMarked.js vs pandoc - reMarked has trouble parsing the code blocks for some reason: html-md-conversion-pandoc-remarked_vs_pandoc

reMarked code blocks fenced with 'true'?

This is funny. I couldn't figure out why the reMarked demo was fencing code blocks with 'true'. I think it's just a mistake in how the reMarker object is configured on the demo page:

// code blocks will be delimited with the string 'true'
var reMarker = new reMarked({gfm_code: true});

// this is what we want
// try it by pasting into the console at reMarked demo site
var reMarker = new reMarked({gfm_code: "```"});
reMarker.render(document.getElementById('html-inp').value)

example reMarked.js output w/ {gfm_code: "```"}:

The basics
----------

To create an executable Node.js script all you need is a Node.js shebang at the top of the script and then some code to execute.

```
#!/usr/bin/env node

console.log('hello world');
```

Assuming you are on a UNIX like system you can do this to make the script executable

```
chmod u+x yourscript
```

Now you can run it and you should see ‘hello world’ printed.

```
./yourscript
hello world
```

Handling arguments
------------------

As you get beyond basic scripts you’ll want to pass arguments into the script. The arguments passed to a script are available as `process.argv`.

If you pass arguments to the simple example above and add `console.log(process.argv)` you’ll see the arguments are available as an array. For example if you run

conclusion

I think reMarked.js - when properly configured - produces better output compared with pandoc, and possibly node-europa.

posted by 100ideas over 7 years ago

I just found copycat, and am testing it against copy as markdown (no affiliation). Combined with One Tab, my research aka open tabs aka browsing history are becoming useful articles and lists.

posted by tgrrr about 7 years ago

@kazup01 has boosted this issue with $100. Visit this issue on Issuehunt

posted by IssueHuntBot almost 7 years ago

@stormburpee has started working. Visit this issue on Issuehunt

posted by IssueHuntBot almost 7 years ago

@stormburpee has submitted output. Visit this issue on Issuehunt

posted by IssueHuntBot almost 7 years ago

Hey guys, feel free to take a look at the pull request I made for this feature over at #1981 Based of the url you suggested in the original post it works great, and I've been doing some testing with a bunch of other websites that I look at, and even ones that you probably wouldn't expect to work.

In the issue I've attached a few example photos for you to see.

posted by StormBurpee almost 7 years ago

@rokt33r has stopped working. Visit this issue on Issuehunt

posted by IssueHuntBot over 6 years ago

@kazup01 cancelled funding, $100, of this issue. Visit this issue on Issuehunt

posted by IssueHuntBot over 6 years ago

@boostio funded this issue with $100. Visit this issue on Issuehunt

posted by IssueHuntBot over 6 years ago

@edokan has started working. Visit this issue on Issuehunt

posted by IssueHuntBot over 6 years ago
posted by liuhoward about 6 years ago

Would a web clipper(like Evernote's browser extension) be a better solution for this?

posted by laike9m almost 6 years ago

@zerox-dg has rewarded $90.00 to @awolf81. See it on IssueHunt

  • :moneybag: Total deposit: $100.00
  • :tada: Repository reward(0%): $0.00
  • :wrench: Service fee(10%): $10.00
posted by issuehunt-app[bot] over 5 years ago

This feature is now available as of 0.13.0, when creating a new note:

image

posted by Flexo013 over 5 years ago

sweet!

posted by 100ideas over 5 years ago

Fund this Issue

$100.00
Rewarded

Rewarded pull request

Recent activities

awolf81 was rewarded by zerox-dg for BoostIO/Boostnote# 923
over 5 years ago
kazup01 submitted an output to  BoostIo/ Boostnote# 923
over 5 years ago