2017-06-18 01:11:38 UTC
I've mentioned sergeant - <https://github.com/hrbrmstr/sergeant> -
before. It's an R package that provides an RJDBC driver, R DBI driver,
dplyr interface (with some custom functions mapped) and a REST
interface client to Apache Drill. Most of the focus/dev has been on
the dplyr interface since it provides the most "modern R-like"
experience for Drill.
If folks are unfamiliar with R's dplyr, you can get a feel for the
dplyr interface at <https://rud.is/rpubs/yelp.html> (it's a
mostly-dplyr port of the official Yelp analysis tutorial on the Drill
site-proper; some bits, such as pulling from nested JSON columns,
can't be 100% dplyr).
I have plans to submit sergeant to CRAN (the official R package
repository) this week and wanted to do a "last call" for anyone using
the package to file any issues they may be encountering or features
they would like implemented before the CRAN release.
CRAN doesn't like more than one update a month, hence my desire to get
everything in that I can on an initial release to CRAN.
Major thx to Edward Visel who assisted with the dplyr 0.7.0 conversion
(not sure if he's on the list but his efforts were greatly
Most recently, Drill + sergeant & R were used to analyze the results
of 30 TCP port scans of over 160 million internet hosts in one of our
annual cybersecurity research efforts at Rapid7 (ref:
Many thanks, also, to the Drill dev team. It's an awesome tool & ecosystem.