I'm James, an engineer / data scientist from Chicago. My time on GitHub is mostly spent writing Python, R, C++, and shell scripts on projects for data scientists and data engineers. My time off GitHub is spent with family, at hip hop shows, and watching reality TV.
- LightGBM: a lightweight gradient boosting machine
- lightgbm-dask-testing: containerized setup for testing LightGBM's Dask interface locally and on Amazon ECS
- pkgnet: R package for analyzing an R package's dependencies
- pydistcheck: linter that finds portability issues in Python package distributions (wheels, sdists, and conda packages)
- uptasticsearch: an R data frame client for Elasticsearch
- hamilton: a "micro-framework" for feature engineering in Python
- prefect: a workflow management thing in Python that plays nicely with Dask
- xgboost: another gradient boosting machine
click for details
The pull requests and none-code contributions below were chosen to showcase the types of software work I've done. This list is not exhaustive.
- adapting
lightgbmandxgboosttoscikit-learn1.6: - setting up
condapackages forlegate-boost,legate-dataframe, andlegate-raft: rapidsai/legate-boost#115 - replacing LightGBM's
setup.pywithscikit-build-corefor PEP 517/518 compatibility: lightgbm-org/LightGBM#5759 - upstreaming
dask-lightgbminto LightGBM and guiding community discussion with Dask, XGBoost maintainers - adding
Webhookstorage toprefect: PrefectHQ/prefect#3000 - adding
autoconf-based builds of LightGBM's R package: lightgbm-org/LightGBM#3188 - making
snowflake-connector-pythoncompatible withpyjwt1.x and 2.x: snowflakedb/snowflake-connector-python#604 - allow tight control over ports in LightGBM distributed traiining with Dask: lightgbm-org/LightGBM#3994
- cut compiled size of
{lightgbm}by ignoring CLI-only objects: lightgbm-org/LightGBM#3566 - allow use of multiple image pull secrets in
prefectkubernetes agent: PrefectHQ/prefect#3596 - replace single-shot HTTP requests with
httr::RETRY()in various R packages- project I led at Chi R Collab 2020: chircollab/chircollab20#1
{sergeant}(one example): hrbrmstr/sergeant#42
- tutorial on distributed LightGBM training with Dask: lightgbm-org/LightGBM#4030
- early stopping example in XGBoost Dask docs: dmlc/xgboost#6501
- detailed information on how LightGBM parameters affect training speed: lightgbm-org/LightGBM#3628
- guide on how to find valid memory and CPU combinations for ECS / Fargate clusters in
dask-cloudprovider: dask/dask-cloudprovider#156
- fixing
dtypesetting and tests acrosspredict()calls inlightgbm, required deep investigation (lightgbm-org/LightGBM#7140 (comment)) - fixing OpenMP conflicts in
lightgbm: - detecting debug symbols in
pandas2.0 wheels: pandas-dev/pandas#51900 - prevent
condafrom "downgrading" Python from CPython to PyPy, while also reducing the risk of a subtle networking error made worse by unpredictability in when Dask garbage collects objects (lightgbm-org/LightGBM#5510) - create a reproducible example for
lightgbmloading failing withGLIBCXXcompatibility errors: lightgbm-org/LightGBM#5106 (comment) - fix
jupyter_serverconda-forge feedstock recipe to prevent broken environments: conda-forge/jupyter_server-feedstock#84 - make multioutput behavior of
dask-mlregression metrics consistent withscikit-learn: dask/dask-ml#820 - fix saving Dask Random Forest models in
cuml: rapidsai/cuml#3388 - fix checks for availability of
mm_mallocin{lightgbm}autoconf-based builds: lightgbm-org/LightGBM#3510 - fix broken plots in
{lightgbm}'s docs site: lightgbm-org/LightGBM#3508 - factor out dependency on
gendef.exefor compiling XGBoost and LightGBM R packages with Visual Studio compilers and R 4.0:{xgboost}: dmlc/xgboost#5764{lightgbm}: lightgbm-org/LightGBM#3065
- helping with various migrations for all of the RAPIDS libraries:
- updating to newer
fmt/spdlog: rapidsai/build-planning#56 - Dropping Python 3.9: rapidsai/build-planning#88
- CUDA 12.5: rapidsai/build-planning#73
- Adding Python 3.12: rapidsai/build-planning#40
- Adding Python 3.11: rapidsai/build-planning#3
- updating to newer
- switching LightGBM's Python package jobs to
manylinux_2_28: lightgbm-org/LightGBM#5580 - automatically publish
prefect-saturnto PyPI when a new release is created: saturncloud/prefect-saturn#7 - moving LightGBM CI jobs from Travis to GitHub Actions:
- move
{uptasticsearch}CI to GitHub Actions: uptake/uptasticsearch#217 - add CI job testing
{lightgbm}within ASAN and UBSAN sanitizers: lightgbm-org/LightGBM#3439 - reduce data loading work in LightGBM tests by caching data loading calls: lightgbm-org/LightGBM#3486
- add Dockerfile to build an image for testing the Apache Arrow R package: apache/arrow#2770
- Sr. Software Engineer at NVIDIA, working on RAPIDS (https://github.com/rapidsai)
- adjunct instructor at Marquette University, where I teach "Intro to R Programming" (https://github.com/jameslamb/intro-to-r)
I've given talks on Dask, LightGBM, R, Python packaging, and other random stuff. For a full list and links to videos, see https://github.com/jameslamb/talks#gallery.
My DMs are open if you want to talk about open source, data science careers, Bravo shows, or anything else.
- π LinkedIn: https://www.linkedin.com/in/jameslamb1/
- π¦ Bluesky: https://bsky.app/profile/jameslamb.bsky.social







