Getting Started

Installation
Plotting
Jupyter
Spark

Installation 

The MindLab is the ultimate toolbox for high quality, efficient data science work. You can simply install it from pypi:

pip install mindlab

Plotting 

Drawing professional plots is now trivial:

from mindlab import Figure, mock_data

figure = Figure(title='Mock Stock Prices', xtics='month', ylabel='Stock Price [USD]')
figure.line(mock_data.stock_prices())

The mindlab.Figure class provides a convenient but powerful interface to Matplotlib figures. No more fiddling with legends and tick locators, and no more plots with tiny fonts or unreadable tick labels! Make sure to check out the Plotting section of the documentation for a complete reference.

Jupyter 

You can install MindLab with Jupyter support too:

pip install mindlab[jupyter]

This provides the lab command which starts a pre-configured JupyterLab session with sensible defaults and auto-completion:

$ lab
...
[I ... ServerApp] Jupyter Server is running at:
[I ... ServerApp] http://localhost:8888/lab?token={token}
[I ... ServerApp]  or http://127.0.0.1:8888/lab?token={token}

Executing queries against various data sources is extremely simple using the provided MindLab Magics:

%%bigquery
SELECT title, `by` AS author, DATETIME(time_ts) AS posted_at, score
FROM bigquery-public-data.hacker_news.stories
ORDER BY time_ts NULLS LAST
LIMIT 3

	title	author	posted_at	score
0	Y Combinator	pg	2006-10-09 18:21:51	61
1	A Student's Guide to Startups	phyllis	2006-10-09 18:30:28	16
2	Woz Interview: the early days of Apple	phyllis	2006-10-09 18:40:33	7

Of course, true power lies in combining the magics with MindLab’s plotting capabilities:

%%bigquery scores
SELECT score, COUNT(*) AS frequency, EXTRACT(YEAR FROM time_ts) as `year`
FROM bigquery-public-data.hacker_news.stories
WHERE time_ts >= TIMESTAMP '2007-01-01' AND time_ts < TIMESTAMP '2015-01-01'
      AND score IS NOT NULL
GROUP BY score, `year`

figure = Figure(
    title='Hacker News Score Distribution',
    xlabel='Score', ylabel='Frequency', xscale='log', yscale='log',
)
figure.scatter(scores.groupby('year'), cmap='viridis', alpha=0.5)

Spark 

MindLab also provides Spark support (via PySpark) when installed with the spark extra:

pip install mindlab[spark]

This allows you to execute Spark queries locally, even if your data set is located in cloud storage:

from mindlab import spark_session

with spark_session() as spark:
    path = 'gs://test-data-mindlab-logikal-io/order_line_items.csv'
    data = spark.read.csv(path, inferSchema=True, header=True).toPandas()

data.head()

	order_id	sku	quantity	unit_price	shipping
0	4321	starliner-spaceship	2	1500.0	0.0
1	4321	teal-t-shirt	1	25.0	0.0
2	4322	the-book-of-chaos	1	10.0	5.0

For more information check out the Spark section of the documentation.

Getting Started

Installation

Plotting

Jupyter

Spark

Installation 

Plotting 

Jupyter 

Spark 