Add Polars documentation#2685
Merged
Merged
Conversation
Add a scraper for the Polars Python API reference (https://docs.pola.rs/api/python/stable/reference/), pinned to the stable 1.41.0 release. The site uses the pydata-sphinx-theme, so the scraper reuses the shared sphinx/clean_html filter alongside Polars-specific filters: - clean_html removes the theme chrome (sidebars, in-page TOC, prev/next navigation, footer) and tags code blocks for Python highlighting. - entries names each page from its heading and groups entries into types (DataFrame, LazyFrame, Series, Expressions, Functions, Data Types, Input/output, etc.). Top-level members are stored flat under api/ and are classified by their member name. Latest-version detection uses the most recent GitHub release and strips the py- prefix, since the repo also tags Rust (rs-) releases.
Contributor
Author
|
@simon04, would appreciate reviews. Many thanks! |
Contributor
|
Thanks, the PR looks good, except for one issue: the method signature is printed poorly. Example http://localhost:9292/polars/series/api/polars.series.sample
I've spent an hour fighting against the nested dl/dd/dt. Nokogiri seems to change the DOM tree. Giving up for tonight. Related error of the W3 validator: https://validator.w3.org/nu/?doc=https%3A%2F%2Fdocs.pola.rs%2Fapi%2Fpython%2Fstable%2Freference%2Fseries%2Fapi%2Fpolars.Series.to_numpy.html Here's a not very success full attempt. diff --git a/lib/docs/filters/polars/clean_html.rb b/lib/docs/filters/polars/clean_html.rb
index e270c8d9..ed03e690 100644
--- a/lib/docs/filters/polars/clean_html.rb
+++ b/lib/docs/filters/polars/clean_html.rb
@@ -20,6 +20,12 @@ module Docs
css('img').remove if root_page?
# Make sure every code block is tagged so Prism highlights it as Python.
+ css('.sig').each do |node|
+ node.after node.css('.reference.external:contains("[source]")').remove
+ node.name = 'pre'
+ node.content = node.content.strip
+ node['data-language'] = 'python'
+ end
css('.highlight pre').each do |node|
node.content = node.content
node['data-language'] = 'python'
diff --git a/lib/docs/scrapers/polars.rb b/lib/docs/scrapers/polars.rb
index fa332be7..c84f9e15 100644
--- a/lib/docs/scrapers/polars.rb
+++ b/lib/docs/scrapers/polars.rb
@@ -30,5 +30,17 @@ module Docs
def get_latest_version(opts)
get_latest_github_release('pola-rs', 'polars', opts).sub(/\Apy-/, '')
end
+
+ private
+
+ def parse(response) # Hook here because Nokogori dislikes <dl> nested inside <dt>
+ response.body.gsub! %r{<dd}, '<div'
+ response.body.gsub! %r{<dl}, '<div'
+ response.body.gsub! %r{<dt}, '<div'
+ response.body.gsub! %r{</dd}, '</div'
+ response.body.gsub! %r{</dl}, '</div'
+ response.body.gsub! %r{</dt}, '</div'
+ super
+ end
end
end |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Add a scraper for the Polars Python API reference
(https://docs.pola.rs/api/python/stable/reference/), pinned to the stable 1.41.0 release.
The site uses the pydata-sphinx-theme, so the scraper reuses the shared sphinx/clean_html filter alongside Polars-specific filters:
Latest-version detection uses the most recent GitHub release and strips the py- prefix, since the repo also tags Rust (rs-) releases.
If you’re adding a new scraper, please ensure that you have:
public/icons/docs/polars/directory:16.png: a 16×16 pixel icon for the doc16@2x.png: a 32×32 pixel icon for the docSOURCE: A text file containing the URL to the page the image can be found on or the URL of the original image itselfimage:
