Data is not always useful and it doesn't matter how much of it you have.
There’s no mathematical tool to tell you if your hypothesis is true; you can only see whether it is consistent with the data, and if the data is sparse or unclear, your conclusions are uncertain.
Virgilio - Mentor for Data Science E-Learning.
Awesome Data Science with Python - Curated list of Python resources for data science.
nteract - Interactive computing suite for you.
Weld - High-performance runtime for data analytics applications.
Vaex - Out-of-Core DataFrames for Python, visualize and explore big tabular data at a billion rows per second.
Ibis - Python data analysis framework for Hadoop and SQL engines.
Kyso - Data analytics knowledge hub.
Feather - Fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
ROOT system - Provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way.
Prefect - New workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
Monument - High-productivity toolkit for predictions. AutoML for time series on any desktop, laptop or server.
Apache Zeppelin - Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Apache Nifi - Easy to use, powerful, and reliable system to process and distribute data.
Koalas - Pandas API on Apache Spark.
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
CuPy - NumPy-like API accelerated with CUDA.
SaturnCloud - Manage Data Science applications so Data Scientists don't have to do DevOps.
Falcon - Interactive Visual Analysis for Big Data.
Google Cloud DataLab - Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache Kudu - Completes Hadoop's storage layer to enable fast analytics on fast data.
Jigsaw Labs - Learn Data Science part-time.
Data Science Ontology - Knowledge base about data science.
Data Engineering Project - Implementation of the data pipeline which consumes the latest news from RSS Feeds and makes them available for users via handy API.
Hex Technologies - Turn your notebooks into collaborative, sharable data apps and stories. No more loose CSVs, chart screenshots, or stale decks.
Amundsen by Lyft - Open source data discovery and metadata engine.
PandasGUI - GUI for analyzing Pandas DataFrames.
Holistics - Data Modeling & Self-Service BI Platform.
Neptune Python Client - Integrate your Python scripts with Neptune.
Synerise - Powerful ecosystem driven by Artificial Intelligence with real-time data orchestration created to drive business growth.
Dataquest - Learn R, Python and SQL for Data Science.
Carpentries - Teach foundational coding and data science skills to researchers worldwide.
Data Engineering Book - Accumulated knowledge and experience in the field of Data Engineering.
Data Science Lifecycle Process - Set of prescriptive steps and best practices to enable data science teams to consistently deliver value.
Data Science Lifecycle Base Repo - Template repository for data science projects using the Data Science Life Cycle Process.
Data Carpentry - Develops and teaches workshops on the fundamental data skills needed to conduct research.
Data Together Research - Research for tackling the general problem of data resilience & interactivity in all its forms.
ZeroCostDL4Mic - Google Colab based no-cost toolbox to explore Deep-Learning in Microscopy.
Dataproofer - Proofreader for your data.
Wildland - Open data management protocol.
JetBrains DataSpell - IDE for Data Scientists.
node-rapids - GPU-accelerated data science and visualization in node.
[Moving beyond “algorithmic bias is a data problem” (2021)](https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1) (HN)
Free data science resources - Thematic list of high-quality data science resources.