Semantic Search¶

Search conversations by meaning, not just keywords. Powered by local embeddings via Ollama — your data never leaves your machine.

Why Semantic Search?¶

When searching past conversations, you rarely remember exact keywords — you remember the subject or concept you discussed. Keyword search (conv search) requires exact text matches, which often misses what you're looking for.

Semantic search embeds your conversations into vector space and finds matches by meaning. Searching for "auth token refresh" will find conversations about authentication and OAuth even if those exact words weren't used.

Prerequisites¶

Semantic search requires Ollama running locally with an embedding model.

1. Install Ollama¶

macOS (Homebrew)Linux / WSL

brew install ollama

curl -fsSL https://ollama.com/install.sh | sh

2. Start Ollama¶

# Start as a background service (recommended — auto-starts on login)
brew services start ollama

# Or run manually in foreground
ollama serve

3. Pull an embedding model¶

# Default — best quality, 32K context, 100+ languages
ollama pull qwen3-embedding:0.6b

# Alternative — faster indexing, smaller download
ollama pull nomic-embed-text

Model	Size	Dimensions	Context	Indexing speed	Languages
`qwen3-embedding:0.6b` (default)	639 MB	1024	32K tokens	~1.5s/conv	100+
`nomic-embed-text`	274 MB	768	8K tokens	~0.1s/conv	English-focused

qwen3-embedding:0.6b produces higher-quality embeddings with a much larger context window. nomic-embed-text is significantly faster to index and uses less disk/memory — a good choice if you have many conversations or limited hardware.

To use nomic instead of the default:

export TCLAUDE_EMBED_MODEL=nomic-embed-text

Or pass --model nomic-embed-text to individual commands.

4. Verify¶

curl -s http://localhost:11434/api/embed \
  -d '{"model": "qwen3-embedding:0.6b", "input": "hello world"}' \
  | python3 -c "import sys,json; e=json.load(sys.stdin)['embeddings'][0]; print(f'{len(e)} dimensions')"
# Expected: 1024 dimensions

Getting Started¶

Build the index¶

Before searching, you need to index your conversations:

# Index conversations for the current project
tclaude conv index-embeddings

# Index all projects
tclaude conv index-embeddings -g

Indexing chunks each conversation into pieces (metadata + content at turn boundaries), embeds them via Ollama, and stores the vectors in SQLite. It's incremental — only new or modified conversations are re-indexed on subsequent runs.

Indexing 48 conversations (15 already indexed)...
  [1/48] a9a330a8 please unzip all the zip files - 1 chunks (66ms)
  [2/48] 1211af48 hello - 2 chunks (44ms)
  ...
Done in 5.6s: 48 conversations, 78 chunks, 0 errors

Search¶

# Search by meaning
tclaude conv search-embeddings "auth token refresh"

# Search all projects
tclaude conv search-embeddings -g "kubernetes deployment"

# Show matching chunk text
tclaude conv search-embeddings -l "database migration"

# JSON output
tclaude conv search-embeddings --json "API rate limiting"

Aliases: sem, semantic, ai, ask, ai-search

Interactive Watch Mode¶

Semantic search is integrated into the interactive conversation browser (tclaude conv ls -w).

Usage¶

Press s to start a semantic search
If Ollama is running, tclaude checks how many conversations need indexing
If unindexed conversations exist, you'll be prompted: "N of M conversations not indexed. Index now? [y/n]"
- y — indexes with a progress bar, then shows the search input
- n — skips indexing and searches what's already indexed
Type your query in the cyan Semantic: [query_] input and press Enter
Results are displayed sorted by similarity, with a SCORE column added to the table
You can select and attach to any result with Enter, just like normal mode

Exiting Semantic Mode¶

Key	Action
`Esc`	Exit semantic results, return to normal listing
`/`	Switch to text search
`1`–`5`	Sort by column (exits semantic mode)
`s`	Start a new semantic search

CLI Reference¶

`tclaude conv index-embeddings`¶

Build or update the semantic search index.

tclaude conv index-embeddings          # index current project
tclaude conv index-embeddings -g       # index all projects
tclaude conv index-embeddings --reindex  # force full rebuild
tclaude conv index-embeddings --model mxbai-embed-large  # use a different model

Flag	Description
`-g, --global`	Index conversations from all projects
`--reindex`	Wipe and rebuild all embeddings
`--model`	Embedding model name (default: `qwen3-embedding:0.6b`)
`--url`	Ollama API base URL (default: `http://localhost:11434`)

`tclaude conv search-embeddings`¶

Search conversations by meaning.

tclaude conv search-embeddings "query"
tclaude conv sem "query"          # shorthand alias

Flag	Description
`-g, --global`	Search across all projects
`-n`	Number of results (default: 10)
`-l, --long`	Show matching chunk text
`--json`	JSON output
`--model`	Embedding model name (default: `qwen3-embedding:0.6b`)
`--url`	Ollama API base URL (default: `http://localhost:11434`)

How It Works¶

Chunking¶

Each conversation is split into chunks for embedding:

Metadata chunk (always generated) — title, summary, first prompt, and project path. This is the "what was this conversation about" anchor.
Content chunks — user + assistant message pairs, grouped at turn boundaries. Chunks target ~24K characters (~8K tokens). Short conversations produce a single content chunk.

Ranking¶

Conversations are ranked by the maximum similarity across all their chunks, not the sum or average. This prevents long conversations (many chunks) from being artificially boosted over short ones.

Storage¶

Embeddings are stored in ~/.tclaude/db.sqlite (table conv_embeddings). Each chunk stores its text, embedding vector (raw float32 bytes — 1024 dims for qwen3-embedding, 768 for nomic), model name, and creation timestamp.

Invalidation¶

Indexing is incremental using file modification times:

If a conversation file is newer than its stored embeddings, it gets re-indexed
Orphaned embeddings (deleted conversations) are cleaned up automatically
Use --reindex to force a full rebuild (useful when switching models)

Context length handling¶

If a chunk exceeds the model's context window, tclaude automatically reduces it by 1/3 and retries, repeating until it fits. In practice, this rarely triggers — especially with qwen3-embedding's 32K token context.

Switching models

Embeddings are model-specific — you can't search with a different model than you indexed with. If you switch models, rebuild the index with tclaude conv index-embeddings --reindex.

Code Layout¶

File	Purpose
`pkg/claude/conv/embeddings.go`	Ollama client, chunking, cosine similarity
`pkg/claude/conv/embed_search.go`	CLI commands (`index-embeddings`, `search-embeddings`)
`pkg/claude/conv/watch.go`	Interactive watch mode integration
`pkg/claude/common/db/embeddings.go`	SQLite storage for embeddings
`pkg/claude/conv/embeddings_test.go`	Unit tests with mocked Ollama server