Skip to main content

<< back to mikedelrosso.com

pandas holds data projects together like glue

Leverage the most used library for data analysis and manipulation in Python

pandas and deephaven

The pandas DataFrame

Python pandas is the second-most popular Python framework, according to Stack Overflow's 2021 survey. It presents a versatile table construct -- the pandas DataFrame -- that is a higher-order sibling to NumPy's array object.

Deephaven is a compelling complement to pandas, because it shares a fundamental "table-first" development style. Deephaven's "StreamingTables", however, are designed to be dynamic (e.g., changing in real time). Since both are powerful Python libraries, users migrate seamlessly between pandas and Deephaven, using DataFrames as static sources for StreamingTables, or snapshotting the latter for export to DataFrames on a one-time or periodic basis.

pandas and NumPy

pandas DataFrames are backed by NumPy arrays, where users can access rows and columns by referring to their names. Thus, DataFrames in particular make accessing data easier than alternatives such as NumPy ndarrays, lists, or other objects. Additionally, DataFrames have a number of built-in methods for moving window operations, finding specific data, separating columns and rows from one another, and many others.

pandas DataFrames, Deephaven Tables and AI/ML

The pandas Python library supports operations on data structures that map directly to Deephaven data tables. Furthermore, pandas can be used in conjunction with Deephaven tables for queries in artificial intelligence and machine learning (AI/ML). Models can be trained using pandas DataFrames. Then, the trained models can be leveraged in real time on Deephaven tables. The latter is usually accomplished by running functions for converting pandas DataFrames to and from Deephaven tables (see supporting documentation below).

And to further streamline real-time AI calculations within Python, converting pandas DataFrames to NumPy ndarrays is lightning fast, due to the fact that the structures themselves are backed directly by NumPy ndarrays.

Real-time calculations in Python

pandas allows the Deephaven Core query engine to train AI/ML models, among many other practical data applications. Particularly, pandas and Deephaven play well together along:

  • Training an AI/ML model on a DataFrame, which the user can then test on a real-time Deephaven table
  • Reordering and splitting table data with a pandas DataFrame
  • Moving window operations in pandas and Deephaven

Video: Interop with Python

Starter projects with pandas

Cool things you can do with pandas + Deephaven in real time:

  1. Combine, group and clean data
  2. Work with metadata
  3. Aggregate data

Deephaven examples

In addition to our Core code on GitHub, Deephaven features example data projects. Here are some of the best:

Develop with Deephaven Core

Need help? Ask a question onGithub DiscussionsorSlack.

pandas docs: