Graph Databases in Practice: Building Recommendation Systems with Neo4j

When Relational Databases Are Not Enough

I have been a PostgreSQL enthusiast for years. It is an incredible database that can handle 90% of use cases. But when I was building a social features module for LivingTree - a social network for K-12 schools with 20,000+ users - I hit a wall that no amount of SQL optimization could solve.

The feature was "People You May Know" - a recommendation system that suggests connections based on shared schools, classes, interests, and mutual friends. In PostgreSQL, this required multiple self-joins on large tables, and the queries were taking 3-5 seconds even with careful indexing. In Neo4j, the same query took 15 milliseconds.

Graph databases are not a replacement for relational databases. They are a specialized tool for specific problems - primarily those involving relationships between entities where the relationships themselves carry meaning and need to be traversed efficiently.

Graph Data Modeling: Think in Nodes and Edges

The fundamental mental shift with graph databases is that relationships are first-class citizens, not join tables. In Neo4j, both nodes (entities) and relationships (edges) can have properties.

bash

// Creating the school social graph
CREATE (alice:User {name: 'Alice', role: 'teacher', school_id: 'ps101'})
CREATE (bob:User {name: 'Bob', role: 'parent', school_id: 'ps101'})
CREATE (carol:User {name: 'Carol', role: 'teacher', school_id: 'ps101'})
CREATE (dave:User {name: 'Dave', role: 'parent', school_id: 'ps102'})

CREATE (ps101:School {name: 'PS 101', district: 'NYC'})
CREATE (ps102:School {name: 'PS 102', district: 'NYC'})

CREATE (math3:Class {name: '3rd Grade Math', year: 2026})

// Relationships with properties
CREATE (alice)-[:TEACHES {since: 2024}]->(math3)
CREATE (bob)-[:HAS_CHILD_IN]->(math3)
CREATE (alice)-[:WORKS_AT]->(ps101)
CREATE (bob)-[:CONNECTED_TO {since: 2025}]->(carol)
CREATE (alice)-[:CONNECTED_TO {since: 2024}]->(carol)

Building the Recommendation Engine

The power of graph databases for recommendations lies in their ability to traverse relationships efficiently. Here is the Cypher query that powers "People You May Know":

bash

// Find recommendation candidates for a user
// Based on: mutual connections, shared school, shared classes
MATCH (me:User {id: $userId})

// Find friends-of-friends (2 hops away)
OPTIONAL MATCH (me)-[:CONNECTED_TO]-(friend)-[:CONNECTED_TO]-(fof:User)
WHERE fof <> me AND NOT (me)-[:CONNECTED_TO]-(fof)
WITH me, fof, COUNT(DISTINCT friend) AS mutualFriends

// Find users in the same school
OPTIONAL MATCH (me)-[:WORKS_AT|HAS_CHILD_IN]->()<-[:WORKS_AT|HAS_CHILD_IN]-(schoolmate:User)
WHERE schoolmate <> me AND NOT (me)-[:CONNECTED_TO]-(schoolmate)

// Find users in the same classes
OPTIONAL MATCH (me)-[:TEACHES|HAS_CHILD_IN]->(class)<-[:TEACHES|HAS_CHILD_IN]-(classmate:User)
WHERE classmate <> me AND NOT (me)-[:CONNECTED_TO]-(classmate)

// Score and rank recommendations
WITH COLLECT(DISTINCT {
  user: fof,
  score: mutualFriends * 3.0,
  reason: 'mutual connections'
}) + COLLECT(DISTINCT {
  user: schoolmate,
  score: 2.0,
  reason: 'same school'
}) + COLLECT(DISTINCT {
  user: classmate,
  score: 4.0,
  reason: 'same class'
}) AS candidates

UNWIND candidates AS c
WITH c.user AS recommended, SUM(c.score) AS totalScore,
     COLLECT(DISTINCT c.reason) AS reasons
ORDER BY totalScore DESC
LIMIT 10
RETURN recommended.name, recommended.role, totalScore, reasons

Performance Optimization Tips

Create indexes on node properties used in MATCH clauses - this is the single most impactful optimization

Use PROFILE to analyze query execution plans - Neo4j's visual profiler is excellent

Limit traversal depth - unbounded traversals can explode in dense graphs

Use relationship direction when possible - directed queries are faster than undirected

Batch writes with UNWIND instead of individual CREATE statements

Integrating Neo4j with Your Application

typescript

import neo4j from 'neo4j-driver';

class RecommendationService {
  constructor() {
    this.driver = neo4j.driver(
      process.env.NEO4J_URI,
      neo4j.auth.basic(process.env.NEO4J_USER, process.env.NEO4J_PASSWORD)
    );
  }

  async getRecommendations(userId: string, limit: number = 10) {
    const session = this.driver.session({ defaultAccessMode: 'READ' });

    try {
      const result = await session.run(
        RECOMMENDATION_QUERY,
        { userId, limit: neo4j.int(limit) }
      );

      return result.records.map(record => ({
        user: record.get('recommended').properties,
        score: record.get('totalScore'),
        reasons: record.get('reasons'),
      }));
    } finally {
      await session.close();
    }
  }

  async close() {
    await this.driver.close();
  }
}

When to Use Neo4j vs PostgreSQL

Use Neo4j when:

Your queries primarily involve traversing relationships (social networks, recommendation engines, fraud detection)

Relationship depth is variable - "find all paths between A and B up to 6 hops"

The schema evolves frequently - graph databases are naturally schema-flexible

Stick with PostgreSQL when:

Your data is primarily tabular with predictable queries

You need complex transactions across many entities

Aggregation queries (SUM, AVG, GROUP BY) are the primary workload

In most production systems, the answer is to use both. Neo4j for relationship-heavy queries that would require expensive joins, and PostgreSQL for everything else. That is exactly what we did at LivingTree, and it worked beautifully.

Graph Databases in Practice: Building Recommendation Systems with Neo4j

Table of Contents

When Relational Databases Are Not Enough

Graph Data Modeling: Think in Nodes and Edges

Building the Recommendation Engine

Performance Optimization Tips

Integrating Neo4j with Your Application

When to Use Neo4j vs PostgreSQL

Stay Updated

About the Author

More Articles

Astro vs Next.js in 2026: Choosing the Right Framework for Content-Driven Sites

Building Mobile Apps for Education: UX Lessons from Marathon Kids and LivingTree