Graph Databases in Practice: Building Recommendation Systems with Neo4j
A practical guide to using Neo4j for building recommendation engines - when to use graph databases, data modeling patterns, and Cypher query optimization.
Dibyank Padhy
Engineering Manager & Full Stack Developer
Table of Contents
When Relational Databases Are Not Enough
I have been a PostgreSQL enthusiast for years. It is an incredible database that can handle 90% of use cases. But when I was building a social features module for LivingTree - a social network for K-12 schools with 20,000+ users - I hit a wall that no amount of SQL optimization could solve.
The feature was "People You May Know" - a recommendation system that suggests connections based on shared schools, classes, interests, and mutual friends. In PostgreSQL, this required multiple self-joins on large tables, and the queries were taking 3-5 seconds even with careful indexing. In Neo4j, the same query took 15 milliseconds.
Graph databases are not a replacement for relational databases. They are a specialized tool for specific problems - primarily those involving relationships between entities where the relationships themselves carry meaning and need to be traversed efficiently.
Graph Data Modeling: Think in Nodes and Edges
The fundamental mental shift with graph databases is that relationships are first-class citizens, not join tables. In Neo4j, both nodes (entities) and relationships (edges) can have properties.
// Creating the school social graph
CREATE (alice:User {name: 'Alice', role: 'teacher', school_id: 'ps101'})
CREATE (bob:User {name: 'Bob', role: 'parent', school_id: 'ps101'})
CREATE (carol:User {name: 'Carol', role: 'teacher', school_id: 'ps101'})
CREATE (dave:User {name: 'Dave', role: 'parent', school_id: 'ps102'})
CREATE (ps101:School {name: 'PS 101', district: 'NYC'})
CREATE (ps102:School {name: 'PS 102', district: 'NYC'})
CREATE (math3:Class {name: '3rd Grade Math', year: 2026})
// Relationships with properties
CREATE (alice)-[:TEACHES {since: 2024}]->(math3)
CREATE (bob)-[:HAS_CHILD_IN]->(math3)
CREATE (alice)-[:WORKS_AT]->(ps101)
CREATE (bob)-[:CONNECTED_TO {since: 2025}]->(carol)
CREATE (alice)-[:CONNECTED_TO {since: 2024}]->(carol)Building the Recommendation Engine
The power of graph databases for recommendations lies in their ability to traverse relationships efficiently. Here is the Cypher query that powers "People You May Know":
// Find recommendation candidates for a user
// Based on: mutual connections, shared school, shared classes
MATCH (me:User {id: $userId})
// Find friends-of-friends (2 hops away)
OPTIONAL MATCH (me)-[:CONNECTED_TO]-(friend)-[:CONNECTED_TO]-(fof:User)
WHERE fof <> me AND NOT (me)-[:CONNECTED_TO]-(fof)
WITH me, fof, COUNT(DISTINCT friend) AS mutualFriends
// Find users in the same school
OPTIONAL MATCH (me)-[:WORKS_AT|HAS_CHILD_IN]->()<-[:WORKS_AT|HAS_CHILD_IN]-(schoolmate:User)
WHERE schoolmate <> me AND NOT (me)-[:CONNECTED_TO]-(schoolmate)
// Find users in the same classes
OPTIONAL MATCH (me)-[:TEACHES|HAS_CHILD_IN]->(class)<-[:TEACHES|HAS_CHILD_IN]-(classmate:User)
WHERE classmate <> me AND NOT (me)-[:CONNECTED_TO]-(classmate)
// Score and rank recommendations
WITH COLLECT(DISTINCT {
user: fof,
score: mutualFriends * 3.0,
reason: 'mutual connections'
}) + COLLECT(DISTINCT {
user: schoolmate,
score: 2.0,
reason: 'same school'
}) + COLLECT(DISTINCT {
user: classmate,
score: 4.0,
reason: 'same class'
}) AS candidates
UNWIND candidates AS c
WITH c.user AS recommended, SUM(c.score) AS totalScore,
COLLECT(DISTINCT c.reason) AS reasons
ORDER BY totalScore DESC
LIMIT 10
RETURN recommended.name, recommended.role, totalScore, reasonsPerformance Optimization Tips
Create indexes on node properties used in MATCH clauses - this is the single most impactful optimization
Use PROFILE to analyze query execution plans - Neo4j's visual profiler is excellent
Limit traversal depth - unbounded traversals can explode in dense graphs
Use relationship direction when possible - directed queries are faster than undirected
Batch writes with UNWIND instead of individual CREATE statements
Integrating Neo4j with Your Application
import neo4j from 'neo4j-driver';
class RecommendationService {
constructor() {
this.driver = neo4j.driver(
process.env.NEO4J_URI,
neo4j.auth.basic(process.env.NEO4J_USER, process.env.NEO4J_PASSWORD)
);
}
async getRecommendations(userId: string, limit: number = 10) {
const session = this.driver.session({ defaultAccessMode: 'READ' });
try {
const result = await session.run(
RECOMMENDATION_QUERY,
{ userId, limit: neo4j.int(limit) }
);
return result.records.map(record => ({
user: record.get('recommended').properties,
score: record.get('totalScore'),
reasons: record.get('reasons'),
}));
} finally {
await session.close();
}
}
async close() {
await this.driver.close();
}
}When to Use Neo4j vs PostgreSQL
Use Neo4j when:
Your queries primarily involve traversing relationships (social networks, recommendation engines, fraud detection)
Relationship depth is variable - "find all paths between A and B up to 6 hops"
The schema evolves frequently - graph databases are naturally schema-flexible
Stick with PostgreSQL when:
Your data is primarily tabular with predictable queries
You need complex transactions across many entities
Aggregation queries (SUM, AVG, GROUP BY) are the primary workload
In most production systems, the answer is to use both. Neo4j for relationship-heavy queries that would require expensive joins, and PostgreSQL for everything else. That is exactly what we did at LivingTree, and it worked beautifully.
Stay Updated
Get notified when I publish new articles on engineering, AI, and leadership. No spam, unsubscribe anytime.