Getting Started

Installation

The MindLab is the ultimate toolbox for high quality, efficient data science work. You can simply install it from pypi:

pip install mindlab

Plotting

Drawing professional plots is now trivial:

from mindlab import Figure, mock_data

figure = Figure(title='Mock Stock Prices', xtics='month', ylabel='Stock Price [USD]')
figure.line(mock_data.stock_prices())
_images/index_1_0_0.svg

The mindlab.Figure class provides a convenient but powerful interface to Matplotlib figures. No more fiddling with legends and tick locators, and no more plots with tiny fonts or unreadable tick labels! Make sure to check out the Plotting section of the documentation for a complete reference.

Jupyter

You can install MindLab with Jupyter support too:

pip install mindlab[jupyter]

This provides the lab command which starts a pre-configured JupyterLab session with sensible defaults and auto-completion:

$ lab
...
[I ... ServerApp] Jupyter Server is running at:
[I ... ServerApp] http://localhost:8888/lab?token={token}
[I ... ServerApp]  or http://127.0.0.1:8888/lab?token={token}

Executing queries against various data sources is extremely simple using the provided MindLab Magics (after authentication):

%%bigquery
SELECT title, `by` AS author, DATETIME(time_ts) AS posted_at, score
FROM bigquery-public-data.hacker_news.stories
ORDER BY time_ts NULLS LAST
LIMIT 3
title author posted_at score
0 Y Combinator pg 2006-10-09 18:21:51 61
1 A Student's Guide to Startups phyllis 2006-10-09 18:30:28 16
2 Woz Interview: the early days of Apple phyllis 2006-10-09 18:40:33 7

Of course, true power lies in combining the magics with MindLab’s plotting capabilities:

%%bigquery scores
SELECT score, COUNT(*) AS frequency, EXTRACT(YEAR FROM time_ts) as `year`
FROM bigquery-public-data.hacker_news.stories
WHERE time_ts >= TIMESTAMP '2007-01-01' AND time_ts < TIMESTAMP '2015-01-01'
      AND score IS NOT NULL
GROUP BY score, `year`
figure = Figure(
    title='Hacker News Score Distribution',
    xlabel='Score', ylabel='Frequency', xscale='log', yscale='log',
)
figure.scatter(scores.groupby('year'), cmap='viridis', alpha=0.5)
_images/index_1_3_0.svg

Spark

MindLab also provides Spark support (via PySpark) when installed with the spark extra:

pip install mindlab[spark]

This allows you to execute Spark queries locally, even if your data set is located in cloud storage:

from mindlab import spark_session

with spark_session() as spark:
    path = 'gs://test-data-mindlab-logikal-io/order_line_items.csv'
    data = spark.read.csv(path, inferSchema=True, header=True).toPandas()
data.head()
order_id sku quantity unit_price shipping
0 4321 starliner-spaceship 2 1500.0 0.0
1 4321 teal-t-shirt 1 25.0 0.0
2 4322 the-book-of-chaos 1 10.0 5.0

For more information check out the Spark section of the documentation.