Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions ai/select-algorithm-java/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Build output
target/

# IDE
.idea/
*.iml
.classpath
.project
.settings/

# Environment
.env
107 changes: 107 additions & 0 deletions ai/select-algorithm-java/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# DocumentDB Vector Index Algorithm Comparison (Java)

This sample compares DocumentDB vector index algorithms (DiskANN, HNSW, IVF) across similarity functions (COS, L2, IP) to help you choose the best configuration for your use case.

## Overview

Vector indexes improve search performance by organizing vectors for efficient similarity searches. This sample:

- Creates collections per algorithm/similarity combination
- Configures algorithm-specific index parameters
- Measures query latency for each configuration
- Displays a comparison table to guide your selection

## Prerequisites

- [Java 21 or higher](https://learn.microsoft.com/java/openjdk/download)
- [Maven 3.6 or higher](https://maven.apache.org/download.cgi)
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli)
- Azure subscription with:
- Azure DocumentDB (MongoDB vCore) cluster
- Azure OpenAI with text-embedding-3-small model
- Managed identity configured for passwordless auth

## Setup

1. Copy `.env.example` to `.env`:
```bash
cp .env.example .env
```

2. Update `.env` with your Azure resource values. The sample uses the
[dotenv-java](https://github.com/cdimascio/dotenv-java) library to load
variables from `.env` at startup, falling back to system environment
variables when the file is absent.

3. Sign in to Azure for passwordless authentication:
```bash
az login
```

4. Compile the project:
```bash
mvn clean compile
```

> **Note:** This sample does not include a Maven Wrapper (`mvnw`). Install
> Maven 3.6+ from <https://maven.apache.org/download.cgi> and ensure `mvn` is
> on your PATH.

## Usage

Run the comparison for specific or all algorithms and similarity functions:

```bash
# Compare all algorithms with cosine similarity
mvn exec:java -Dexec.mainClass="com.azure.documentdb.selectalgorithm.SelectAlgorithm"

# Compare only DiskANN with all similarity functions
ALGORITHM=diskann SIMILARITY=all mvn exec:java -Dexec.mainClass="com.azure.documentdb.selectalgorithm.SelectAlgorithm"

# Compare HNSW with L2 (Euclidean) distance
ALGORITHM=hnsw SIMILARITY=L2 mvn exec:java -Dexec.mainClass="com.azure.documentdb.selectalgorithm.SelectAlgorithm"

# Compare all algorithms and all similarity functions
ALGORITHM=all SIMILARITY=all mvn exec:java -Dexec.mainClass="com.azure.documentdb.selectalgorithm.SelectAlgorithm"
```

### Environment Variables

- `ALGORITHM`: Which algorithm(s) to test
- `all` (default): Test DiskANN, HNSW, and IVF
- `diskann`: Test only DiskANN
- `hnsw`: Test only HNSW
- `ivf`: Test only IVF

- `SIMILARITY`: Which similarity function(s) to test
- `COS` (default): Cosine similarity
- `L2`: Euclidean distance
- `IP`: Inner product
- `all`: Test all similarity functions

## Algorithm Characteristics

### DiskANN
- Disk-based for large datasets
- Good balance of speed and accuracy
- Parameters: maxDegree=32, lBuild=50, lSearch=100

### HNSW
- Memory-based hierarchical graph
- Excellent for real-time applications
- Parameters: m=16, efConstruction=64, efSearch=80

### IVF
- Cluster-based partitioning
- Fast search via centroids
- Parameters: numLists=1, nProbes=1

## Output

The sample prints a comparison table showing latency per query for each algorithm/similarity combination, helping you make an informed choice.

## Further Resources

- [Azure DocumentDB Documentation](https://learn.microsoft.com/azure/documentdb/)
- [Vector Search in DocumentDB](https://learn.microsoft.com/azure/documentdb/vector-search)
- [MongoDB Java Driver Documentation](https://mongodb.github.io/mongo-java-driver/)
78 changes: 78 additions & 0 deletions ai/select-algorithm-java/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.azure.documentdb.samples</groupId>
<artifactId>select-algorithm-java</artifactId>
<version>1.0-SNAPSHOT</version>
<name>Azure DocumentDB Vector Algorithm Comparison</name>
<description>Compare DocumentDB vector index algorithms (DiskANN, HNSW, IVF) using Java SDK</description>

<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<maven.compiler.release>21</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-sdk-bom</artifactId>
<version>1.2.29</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-sync</artifactId>
<version>5.6.2</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-openai</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>io.github.cdimascio</groupId>
<artifactId>dotenv-java</artifactId>
<version>3.0.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version>
<configuration>
<release>21</release>
<compilerArgs>
<arg>-Xlint:all</arg>
</compilerArgs>
</configuration>
</plugin>
</plugins>
</build>
</project>
Loading