193 Useful Java Links for Databases, Search Engines, Big Data And Machine Learning

More than 193 Useful Java Links for Databases, Search Engines, Big Data And Machine Learning

Articles & Tutorials Programming

In this article, I’ll continue the series of useful java links, it will be about Databases, search engines, big data, and machine learning.

In the previous article, I’ve shared the useful java links for development.

 

Summary:

 

1. Databases and storage:

Everything which simplifies interactions with the database.

Name Description
1. Thinkaurelius Titan Titan is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries. License: Apache 2 , GitHub stars
. business friendly license
2. Apache Cassandra Cassandra is a partitioned row store. Rows are organized into tables with a required primary key. License: Apache 2 , GitHub stars. business friendly license
3. Orientdb OrientDB is the first Multi-Model DBMS with Document & Graph engine. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing, Reactive Queries and has a small memory footprint. OrientDB is licensed with Apache 2 license and the development is driven by OrientDB LTD and a worldwide Open Source community. License: Apache 2/CDDL 1/Eclipse Distribution 1.0, GitHub stars. business friendly license
4. Neo4j Neo4j is the world’s leading Graph Database. It is a high-performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions. License: GNU 3/ GNU AGPLv3, GitHub stars. impossible proprietary code linking license
5. Mapdb MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine. http://www.mapdb.org/. License: Apache 2 , GitHub stars. business friendly license
6. Voldemort An open-source clone of Amazon’s Dynamo. Voldemort is a distributed key-value storage system. License: Apache 2 , GitHub stars. business friendly license
7. Alluxio (formerly Tachyon) Memory-Speed Virtual Distributed Storage System. License: Apache 2 , GitHub stars. business friendly license
8. Opentsdb A scalable, distributed Time Series Database. License: GNU 3, GitHub stars. impossible proprietary code linking license
9. Hazelcast Open Source In-Memory Data Grid. License: Apache 2 , GitHub stars. business friendly license
10. Tinkerpop Blueprints A Property Graph Model Interface. It provides implementations, test suites, and supporting extensions. Graph databases and frameworks that implement the Blueprints interfaces automatically support Blueprints-enabled applications. Likewise, Blueprints-enabled applications can plug-and-play different Blueprints-enabled graph backends. License: BSD 3, GitHub stars. business friendly license
11. Apache Lucene solr Apache Lucene/Solr. Lucene is a search engine library Solr is a search engine server that uses lucene. License: Apache 2 , GitHub stars. business friendly license
12. Java Chronicle Java Indexed Record-Chronicle — This library is an ultra low latency, high throughput, persisted, messaging and event-driven in-memory database. License: Apache 2 , GitHub stars. business friendly license
13. Torodb ToroDB – Open source NoSQL database that runs on top of an RDBMS. Compatible with MongoDB protocol and APIs, but with support for native SQL, atomic operations and reliable and durable backends like PostgreSQL. License: GNU AGPLv3, GitHub stars. impossible proprietary code linking license
14. Crate Crate.IO: The fast, scalable, easy to use SQL database with native full text search. https://crate.io/.License: Apache 2 , GitHub stars. business friendly license
15. Linkedin Pinot A realtime distributed OLAP datastore. Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real-time analytics with low latency. License: Apache 2 , GitHub stars. business friendly license
16. Solandra Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra. License: Apache 2 , GitHub stars. business friendly license
17. Voltdb VoltDB is a horizontally-scalable, in-memory SQL RDBMS designed for applications that have extremely high read and write throughput requirements. License: GNU AGPLv3, GitHub stars. impossible proprietary code linking license
18. Leveldb This is a rewrite (port) of LevelDB in Java. This goal is to have a feature-complete implementation that is within 10% of the performance of the C++ original and produces byte-for-byte exact copies of the C++ code… License: Apache 2 , GitHub stars. business friendly license
19. Kairosdb KairosDB is a fast distributed scalable time-series database written on top of Cassandra. License: Apache 2 , GitHub stars. business friendly license
20. Linkedin Sensei Sensei is a distributed, elastic real-time searchable database. License: Apache 2 , GitHub stars. business friendly license
21. Elephantdb Distributed database specialized in exporting key/value data from Hadoop. License: BSD 3, GitHub stars. business friendly license
22. Apache Drill Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. License: Apache 2 , GitHub stars. business friendly license
23. Tinkerpop Rexster Rexster is a graph server that exposes any Blueprints graph through HTTP/REST and a binary protocol called RexPro. Extensions support standard traversal goals such as search, score, rank, and, in concert, recommendation. License: BSD 3, GitHub stars. business friendly license
24. Tomcat redis session manager Redis-backed non-sticky session store for Apache Tomcat. License: MIT , GitHub stars. business friendly license
25. Embulk Embulk is a parallel bulk data loader that helps data transfer between various storage, databases, NoSQL and cloud services. License: Apache 2 , GitHub stars. business friendly license
26. H2 Welcome to H2, the Java SQL database. The main features of H2 are: Very fast, open-source, JDBC API, Embedded and server modes; in-memory databases, Browser-based Console application, Small footprint: around 1.5 MB jar file size License: Mozilla Public License 1.1. and Eclipse Public License v1.0. business friendly license
27. Apache Derby Apache Derby, an Apache DB subproject, is an open-source relational database implemented entirely in Java. Derby provides an embedded JDBC driver that lets you embed Derby in any Java-based solution. License: Apache 2. business friendly license
28. Apache Empire-db Apache Empire-DB is a lightweight relational database abstraction layer and data persistence component. License: Apache 2. business friendly license
29. Apache Ignite Apache Ignite is an In-Memory Data Fabric providing in-memory data caching, partitioning, processing, and querying components. License: Apache 2. business friendly license
30. Tarantool is an open-source NoSQL database management system and Lua application server. It maintains databases in memory and ensures crash resistance with write-ahead logging. It includes a Lua interpreter and interactive console but also accepts connections from programs in several other languages. License: BSD licenses. business friendly license

 

Distributed Databases:

Databases in a distributed system that appear to applications as a single data source.

Name Description
1. Apache Cassandra The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. License: Apache 2 , GitHub stars. business friendly license
2. Apache HBase Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. License: Apache 2. business friendly license
3. Druid Druid is a Fast column-oriented distributed data store. http://druid.io License: Apache 2, GitHub stars. business friendly license
4. Infinispan Infinispan is a distributed in-memory key/value data store with an optional schema. It can be used both as an embedded Java library and as a language-independent service accessed remotely over a variety of protocols (HotRod, REST, Memcached and WebSockets). License: Apache 2. business friendly license
5. OpenTSDB The Scalable Time Series Database Store and serve massive amounts of time series data without losing granularity. http://opentsdb.net License: GNU 3, GitHub stars. impossible proprietary code linking license

 

Prev1 of 8

Leave a Reply

Your email address will not be published. Required fields are marked *