MCPSERV.CLUB
Apache Solr

Apache Solr

Self-Hosted

Fast, scalable search engine for full‑text and vector queries

Active(100)
1.5kstars
0views
Updated 15 hours ago

Overview

Discover what makes Apache Solr powerful

Apache Solr is a high‑performance, open‑source search platform that builds on the Lucene text‑search engine library. It exposes a RESTful HTTP API for indexing, querying, and managing data, while providing advanced features such as faceting, filtering, full‑text search with analyzers, vector similarity search, and geospatial queries. Solr’s core is written in Java and packaged as a self‑contained JAR that runs on any JVM, making it straightforward to embed in existing Java ecosystems or to deploy as a standalone service behind reverse proxies.

Language & Runtime

Core Engine

Storage

Coordination & Cluster Management

Overview

Apache Solr is a high‑performance, open‑source search platform that builds on the Lucene text‑search engine library. It exposes a RESTful HTTP API for indexing, querying, and managing data, while providing advanced features such as faceting, filtering, full‑text search with analyzers, vector similarity search, and geospatial queries. Solr’s core is written in Java and packaged as a self‑contained JAR that runs on any JVM, making it straightforward to embed in existing Java ecosystems or to deploy as a standalone service behind reverse proxies.

Architecture & Technical Stack

  • Language & Runtime: Java 17+ (LTS) on the JVM, with optional integration layers in languages like Python or Node via HTTP clients.
  • Core Engine: Apache Lucene for indexing and query execution, with a thin Solr wrapper that adds schema management, distributed coordination, and REST endpoints.
  • Storage: Uses Lucene’s file‑based index format on local or networked filesystems (e.g., NFS, EBS). For cloud deployments, Solr can persist state in distributed filesystems such as HDFS or object stores via the Solr Cloud “Zookeeper” coordination layer.
  • Coordination & Cluster Management: Apache ZooKeeper (or Solr’s built‑in “SolrCloud” mode) handles cluster metadata, leader election, and configuration distribution.
  • Configuration: XML/JSON schema files (schema.xml, solrconfig.xml) or the newer schemaless mode where Solr infers field types from incoming documents.
  • Cluster API: Admin REST endpoints expose cluster health, shard allocation, replica status, and configuration updates.

Core Capabilities & APIs

  • Query API: Supports standard Lucene query syntax, function queries, JSON‑Pipes, and the newer structured query language (Solr 9).
  • Indexing API: Batch add/update/delete via update/json or update/xml. Incremental commit and soft‑commit options provide near real‑time indexing.
  • Faceting & Aggregation: Statistical faceting, pivot facets, and the Stats component for histograms.
  • Vector Search: Approximate nearest neighbor (ANN) search via HNSW or IVF indexes, enabling semantic search on embeddings.
  • Geospatial: Point‑in‑polygon, distance calculations, and spatial joins using the geofilt component.
  • Plugins & Extensions: Custom query parsers, similarity models, and request handlers can be added as JARs; Solr also supports Solr Cell for OCR and PDF extraction.

Deployment & Infrastructure

  • Self‑Hosting: Runs as a single JVM process; requires Java, optional ZooKeeper for clustering.
  • Scalability: SolrCloud partitions data into shards and replicates each shard across nodes, allowing horizontal scaling. Load balancing is achieved via client‑side routing or external proxies (NGINX, Envoy).
  • Containerization: Official Docker images expose ports 8983 and ZooKeeper, with environment variables for cluster configuration.
  • Kubernetes: The Solr Operator automates StatefulSet creation, persistent volume provisioning, and rolling upgrades. Operators expose CRDs for SolrCloud, SolrCore, and SolrConfigSet.
  • High Availability: Automatic failover, leader election, and snapshot recovery are built‑in. Backups can be taken via the snapshot API or external tools like Velero.

Integration & Extensibility

  • Client Libraries: Official Java client (solrj), as well as community clients for Python, Go, Ruby, and JavaScript.
  • Webhooks & Callbacks: Solr can emit events (e.g., commit, delete) to external systems via the Notification component or custom plugins.
  • Custom Plugins: Developers can write Java classes that implement RequestHandler, UpdateProcessorFactory, or Similarity interfaces, then deploy them as JARs.
  • RESTful API: Enables integration with CI/CD pipelines, monitoring (Prometheus exporter), and custom dashboards.
  • External Data Sources: Solr supports data import handlers that fetch from JDBC, REST APIs, or Hadoop datasets.

Developer Experience

  • Configuration: While XML is verbose, the schemaless mode reduces boilerplate. JSON configuration files are also supported for brevity.
  • Documentation: The Reference Guide (≈ 200 pages) covers architecture, API usage, and best practices.
  • Community & Support: Active mailing lists, Slack channel, and a robust JIRA issue tracker.
  • Testing: Built‑in unit tests and integration tests for core components; the project follows Maven conventions, making it easy to run mvn test.
  • Licensing: Apache License 2.0 allows commercial use without royalties, encouraging enterprise adoption.

Use Cases

  1. Enterprise Search: Index corporate documents (PDF, Office) with Solr Cell and expose a faceted search UI for employees.
  2. E‑commerce Product Search: Combine full‑text, faceting, and vector embeddings for product recommendations.
  3. Geospatial Analytics: Store location data (lat/long) and perform proximity searches for logistics or real‑estate platforms.
  4. Log & Event Search: Ingest log streams via Logstash, index into SolrCloud, and provide Kibana‑style dashboards

Open SourceReady to get started?

Join the community and start self-hosting Apache Solr today

Weekly Views

Loading...
Support Us
Most Popular

Infrastructure Supporter

$5/month

Keep our servers running and help us maintain the best directory for developers

Repository Health

Loading health data...

Information

Category
other
License
APACHE-2.0
Stars
1.5k
Technical Specs
Pricing
Open Source
Database
None
Docker
Official
Supported OS
LinuxWindowsmacOSDocker
Author
apache
apache
Last Updated
15 hours ago