The Best Machine Learning Frameworks & Extensions for Scikit-learn

Data formats

In this section, we’ll explore libraries that can be used to process and transform data.


You can use this package to map ‘DataFrame’ columns to Scikit-learn transformations. Then you can combine these columns into features.

data =pd.DataFrame({ 
from sklearn_pandas import DataFrameMapper mapper = DataFrameMapper([ 
('Uni', sklearn.preprocessing.LabelBinarizer()),
(['Age'], sklearn.preprocessing.StandardScaler())
mapper = DataFrameMapper([ 
('Uni', sklearn.preprocessing.LabelBinarizer()),
(['Age'], sklearn.preprocessing.StandardScaler())


This package combines n-dimensional labeled arrays from xarray with Scikit-learn tools.

  • ensure compatibility between Sklearn estimators with xarray DataArrays and Datasets,
  • enable estimators to change the number of samples,
  • have pre-processing transformers.
import numpy as np 
import xarray as xr
data = np.random.rand(16, 4)
my_xarray = xr.DataArray(data)
from sklearn.preprocessing import StandardScaler 
Xt = wrap(StandardScaler()).fit_transform(X)
pipeline = Pipeline([ 
('pca', wrap(PCA(n_components=50), reshapes='feature')),
('cls', wrap(LogisticRegression(), reshapes='feature'))
from sklearn_xarray.model_selection 
import CrossValidatorWrapper from sklearn.model_selection
import GridSearchCV, KFold
cv = CrossValidatorWrapper(KFold())
pipeline = Pipeline([
('pca', wrap(PCA(), reshapes='feature')),
('cls', wrap(LogisticRegression(), reshapes='feature'))
gridsearch = GridSearchCV(
pipeline, cv=cv, param_grid={'pca__n_components': [20, 40, 60]}


Are there tools and libraries that integrate Sklearn for better Auto-ML? Yes there are, and here are some examples.


With this, you can perform automated machine learning with Scikit-learn. For the setup you need to install some dependencies manually.

$ curl | xargs -n 1 -L 1 pip install
from autosklearn.classification 
import AutoSklearnClassifier
cls = AutoSklearnClassifier(), y_train)
predictions = cls.predict(X_test)

Auto_ViML — Automatic Variant Interpretable Machine Learning” (pronounced “Auto_Vimal”)

Given a certain dataset, Auto_ViML tries out different models with varying features. It eventually settles on the best performing model.

  • helps you clean data by suggesting changes to missing values, formatting, and adding variables;
  • classifies variables automatically, whether it’s text, data, or numerical;
  • generates model performance graphs automatically when verbose is set to 1 or 2;
  • lets you use of ‘featuretools’ for feature engineering;
  • handles imbalance data when ‘Imbalanced_Flag’ is set to ‘True’
from sklearn.model_selection import train_test_split, cross_validate X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42) 
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=54)
train, test = X_train.join(y_train), X_val.join(y_val)
model, features, train, test = Auto_ViML(train,"target",test,verbose=2)

TPOT — Tree-based Pipeline Optimization Tool

This is a Python-based auto-ml tool. It uses genetic programming to optimize machine learning pipelines.

from tpot import TPOTClassifier 
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(,, train_size=0.75, test_size=0.25, random_state=42)
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42), y_train)
print(tpot.score(X_test, y_test))

Feature Tools

This is a tool for automated feature engineering. It works by transforming temporal and relational datasets into feature matrices.

import featuretools as ftentities = {
"customers" : (customers_df, "customer_id"),
"sessions" : (sessions_df, "session_id", "session_start"),
"transactions" : (transactions_df, "transaction_id", "transaction_time")
relationships = [("sessions", "session_id", "transactions", "session_id"),
("customers", "customer_id", "sessions", "customer_id")]
feature_matrix, features_defs = ft.dfs(entities=entities,
relationships = relationships,
target_entity = "customers")


You can use Neuraxle for hyperparameter tuning and AutoML. Install ‘neuraxle’ via pip to start using it.

  • parallel computation and serialization,
  • time series processing through the provision of abstractions key to such projects.
  • a defined pipeline
  • a validation splitter
  • definition of a scoring metric via the ‘ScoringCallback’
  • a selected ‘hyperparams’ repository
  • a selected ‘hyperparams’ optimizer
  • an ‘AutoML’ loop

Experimentation frameworks

Now it’s time for a couple of SciKit tools that you can use for machine learning experimentation.

SciKit-Learn Laboratory

SciKit-Learn Laboratory is a command-line tool you can use to run machine learning experiments. To start using it, install `skll` via pip.

$ run_experimen experiment.cfg


The Scikit-learn integration of Neptune lets you log your experiments using Neptune. For instance, you can log the summary of your Scikit-learn regressor.

from neptunecontrib.monitoring.sklearn import log_regressor_summarylog_regressor_summary(rfr, X_train, X_test, y_train, y_test)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Priya Reddy

Priya Reddy

Hey This Is priya Reddy Iam a tech writer