Getting Started
Installation
The MindLab is the ultimate toolbox for high quality, efficient data science work. You can simply install it from pypi:
pip install mindlab
Plotting
Drawing professional plots is now trivial:
from mindlab import Figure, mock_data
figure = Figure(title='Mock Stock Prices', xtics='month', ylabel='Stock Price [USD]')
figure.line(mock_data.stock_prices())
The mindlab.Figure
class provides a convenient but powerful interface to Matplotlib figures. No more fiddling with legends and tick locators, and no more
plots with tiny fonts or unreadable tick labels! Make sure to check out the
Plotting section of the documentation for a complete reference.
Jupyter
You can install MindLab with Jupyter support too:
pip install mindlab[jupyter]
This provides the lab
command which starts a pre-configured JupyterLab session with sensible
defaults and auto-completion:
$ lab
...
[I ... ServerApp] Jupyter Server is running at:
[I ... ServerApp] http://localhost:8888/lab?token={token}
[I ... ServerApp] or http://127.0.0.1:8888/lab?token={token}
Executing queries against various data sources is extremely simple using the provided MindLab Magics:
%%bigquery
SELECT title, `by` AS author, DATETIME(time_ts) AS posted_at, score
FROM bigquery-public-data.hacker_news.stories
ORDER BY time_ts NULLS LAST
LIMIT 3
title | author | posted_at | score | |
---|---|---|---|---|
0 | Y Combinator | pg | 2006-10-09 18:21:51 | 61 |
1 | A Student's Guide to Startups | phyllis | 2006-10-09 18:30:28 | 16 |
2 | Woz Interview: the early days of Apple | phyllis | 2006-10-09 18:40:33 | 7 |
Of course, true power lies in combining the magics with MindLab’s plotting capabilities:
%%bigquery scores
SELECT score, COUNT(*) AS frequency, EXTRACT(YEAR FROM time_ts) as `year`
FROM bigquery-public-data.hacker_news.stories
WHERE time_ts >= TIMESTAMP '2007-01-01' AND time_ts < TIMESTAMP '2015-01-01'
AND score IS NOT NULL
GROUP BY score, `year`
figure = Figure(
title='Hacker News Score Distribution',
xlabel='Score', ylabel='Frequency', xscale='log', yscale='log',
)
figure.scatter(scores.groupby('year'), cmap='viridis', alpha=0.5)
Spark
MindLab also provides Spark support (via PySpark) when installed with the spark
extra:
pip install mindlab[spark]
This allows you to execute Spark queries locally, even if your data set is located in cloud storage:
from mindlab import spark_session
with spark_session() as spark:
path = 'gs://test-data-mindlab-logikal-io/order_line_items.csv'
data = spark.read.csv(path, inferSchema=True, header=True).toPandas()
data.head()
order_id | sku | quantity | unit_price | shipping | |
---|---|---|---|---|---|
0 | 4321 | starliner-spaceship | 2 | 1500.0 | 0.0 |
1 | 4321 | teal-t-shirt | 1 | 25.0 | 0.0 |
2 | 4322 | the-book-of-chaos | 1 | 10.0 | 5.0 |
For more information check out the Spark section of the documentation.