Data is not always useful and it doesn't matter how much of it you have.
There’s no mathematical tool to tell you if your hypothesis is true; you can only see whether it is consistent with the data, and if the data is sparse or unclear, your conclusions are uncertain.
Pachyderm - Reproducible Data Science at Scale.
Virgilio - Mentor for Data Science E-Learning.
Awesome Data Science with Python - Curated list of Python resources for data science.
nteract - Interactive computing suite for you.
Pandas - Powerful Python data analysis toolkit.
Datasette - Tool for exploring and publishing data.
Weld - High-performance runtime for data analytics applications.
Vaex - Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second.
Ibis - Python data analysis framework for Hadoop and SQL engines.
Kyso - Data analytics knowledge hub.
Feather - Fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
ROOT system - Provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way.
Redash - Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Prefect - New workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
dbt - Data build tool. Analytics engineering workflow.
Apache Zeppelin - Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Apache Nifi - Easy to use, powerful, and reliable system to process and distribute data.
Koalas - Pandas API on Apache Spark.
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Dagster - Python library for building data applications: ETL, ML, Data Pipelines, and more.
CuPy - NumPy-like API accelerated with CUDA.
SaturnCloud - Manage Data Science applications so Data Scientists don't have to do DevOps.
Falcon - Interactive Visual Analysis for Big Data.
Dask Gateway - Provides a secure, multi-tenant server for managing Dask clusters.
Google Cloud DataLab - Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache Kudu - Completes Hadoop's storage layer to enable fast analytics on fast data.