Discover 3 tools tagged with Scala for Python data engineering.
Distributed Columnar Streaming Database
A distributed, columnar, versioned, and streaming database designed for real-time and batch analytics. FiloDB combines the benefits of columnar storage with streaming ingestion, making it suitable for time-series and event data workloads.
Distributed Machine Learning
An environment for quickly creating scalable, performant machine learning applications. Mahout provides mathematically expressive Scala DSL and supports Apache Spark and Apache Flink backends for distributed linear algebra operations.
Spark's Graph Processing API
Apache Spark's API for graphs and graph-parallel computation. GraphX extends the Spark RDD with a graph abstraction, providing a set of fundamental operators and optimized algorithms for graph analytics like PageRank and connected components.