Data Connectors: 30+ Sources for Unified Search

Data connectors automatically sync content from external sources into Zine (powered by Graphlit). Instead of manually uploading files, connectors continuously monitor Slack channels, Gmail inboxes, Google Drive folders, GitHub repos, and 30+ other sources—keeping your knowledge base up-to-date automatically.

This guide covers feed architecture, OAuth vs API key authentication, all connector types, polling strategies, and production patterns. By the end, you'll know how to connect any data source and build automated content pipelines.

What You'll Learn

Feed architecture and lifecycle
OAuth flows vs API key authentication
Connector patterns by category (messaging, cloud storage, project management)
Feed configuration options (readLimit, schedules, filters)
Polling vs webhook patterns
Production feed management
Error handling and retry strategies

Prerequisites:

A Graphlit project - Sign up (2 min)
SDK installed: npm install graphlit-client (30 sec)
OAuth apps set up for connectors you want to use (we'll show you how)

Time to complete: 80 minutes
Difficulty: Intermediate

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.

Feed Architecture
Authentication Methods
Messaging Connectors
Cloud Storage Connectors
Project Management Connectors
Social Media & Web Connectors
Feed Management
Production Patterns

Part 1: Feed Architecture

What is a Feed?

A feed is a continuous sync between an external data source and Graphlit. Once created, it:

Initial sync: Fetches existing content (e.g., last 100 Slack messages)
Continuous monitoring: Polls for new content (e.g., every 15 minutes)
Auto-ingestion: New content automatically appears in Graphlit

Key insight: Feeds are "set it and forget it"—no manual re-triggering needed.

✅ Quick Win: Once a feed is created, new content automatically appears in your search results and RAG responses—no additional code needed.

All 30+ Supported Connectors

Zine supports 30+ connector types across 6 categories:

Messaging & Collaboration (6):

Slack - Channels, threads, DMs
Microsoft Teams - Team channels and conversations
Discord - Server channels
Gmail - Email inbox (labels, folders)
Outlook Email - Microsoft email
Intercom - Support articles and tickets

Cloud Storage (8):

Google Drive - Docs, Sheets, Slides, PDFs
Microsoft OneDrive - Personal cloud storage
SharePoint - Enterprise document management
Dropbox - Files and folders
Box - Enterprise file storage
Amazon S3 - Object storage buckets
Azure Blob Storage - Cloud file storage
FTP/SFTP - File servers

Source Control & Development (5):

GitHub Code - Repository contents
GitHub Issues - Bug tracking and discussions
GitHub Pull Requests - Code reviews
GitHub Commits - Change history
GitLab - Code and issues

Project Management (4):

Jira - Issue tracking
Linear - Modern project management
Trello - Kanban boards
Asana - Task management

Knowledge Management (2):

Notion - Pages and databases
Confluence - Wiki pages

Social Media & Web (6):

Reddit - Posts and comments
Twitter/X - Tweets and threads
YouTube - Video transcripts
RSS Feeds - Blog feeds
Web Crawling - Website content
Web Search - Tavily, Exa, Perplexity

Calendars & Meetings (3):

Google Calendar - Events and meetings
Outlook Calendar - Microsoft calendar
Zoom - Meeting recordings (transcribed)

Customer & Sales (2):

Zendesk - Support tickets
Salesforce - CRM data (custom integration)

Feed Lifecycle

CREATE → ENABLED → SYNCING → INDEXED
    ↓
DISABLED (if paused)
    ↓
DELETED (if removed)

Part 2: Authentication Methods

OAuth (Recommended for Most Connectors)

OAuth lets users authorize access without sharing passwords. Graphlit manages the OAuth flow.

Connectors using OAuth:

Slack
Gmail / Google Drive / Google Calendar
Microsoft (Outlook, OneDrive, SharePoint, Teams)
GitHub
Notion
Jira
Linear
Reddit
Twitter

OAuth flow:

User clicks "Connect Slack"
Redirected to Slack OAuth
User authorizes
Graphlit receives OAuth token
Create feed with token

// Example: Slack OAuth
const authUrl = `https://slack.com/oauth/v2/authorize?client_id=${SLACK_CLIENT_ID}&scope=channels:read,channels:history&redirect_uri=${REDIRECT_URI}`;

// User visits authUrl, authorizes
// Slack redirects back with code

// Exchange code for token
const tokenResponse = await fetch('https://slack.com/api/oauth.v2.access', {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: `code=${code}&client_id=${SLACK_CLIENT_ID}&client_secret=${SLACK_CLIENT_SECRET}`
});

const { access_token } = await tokenResponse.json();

// Create feed with token
const feed = await graphlit.createFeed({
  name: 'My Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: access_token,
    channel: 'general',  // Single channel per feed
    readLimit: 100
  }
});

API Keys (For Services Without OAuth)

Some connectors use direct API keys:

RSS feeds (no auth)
Web crawling (no auth)
S3 (access key + secret)
Azure Storage (connection string)

import { FeedTypes, FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

// Example: S3 feed with API keys
const s3Feed = await graphlit.createFeed({
  name: 'Company S3 Bucket',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.S3Blob,
    s3: {
      bucketName: 'documents',
      region: 'us-east-1',
      accessKey: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY,
      prefix: 'pdfs/'  // Optional: filter by folder
    },
    isRecursive: true,
    readLimit: 1000
  }
});

Part 3: Messaging Connectors

Slack

Use case: Search team conversations, RAG over chat history, entity extraction from messages.

import { Graphlit } from 'graphlit-client';
import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create Slack feed
const slackFeed = await graphlit.createFeed({
  name: 'Engineering Slack',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'engineering',  // Single channel
    readLimit: 500  // Last 500 messages
  }
});

console.log('Slack feed created:', slackFeed.createFeed.id);

// Wait for initial sync
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(slackFeed.createFeed.id);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 10000));  // Check every 10s
}

console.log('✓ Slack history synced');

OAuth scopes needed:

channels:read - List channels
channels:history - Read messages
groups:read - Private channels (optional)
groups:history - Private messages (optional)

What gets synced:

All messages in specified channels
Threaded replies
User mentions
Files/images attached to messages
Reactions (optional)

💡 Pro Tip: Combine Slack feeds with entity extraction to automatically identify who's working on which projects from Slack conversations.

Gmail

Use case: Search emails, extract contacts/companies, email-based RAG.

const gmailFeed = await graphlit.createFeed({
  name: 'My Gmail',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: process.env.GMAIL_OAUTH_TOKEN
    },
    includeAttachments: true,
    readLimit: 100  // Last 100 emails
  }
});

OAuth scopes needed:

https://www.googleapis.com/auth/gmail.readonly

What gets synced:

Email subject, body, sender, recipients
Attachments (PDFs, images, etc.)
Timestamps
Email threads

Microsoft Teams

const teamsFeed = await graphlit.createFeed({
  name: 'Engineering Team',
  type: FeedTypes.Message,
  message: {
    type: FeedServiceTypes.MicrosoftTeams,
    microsoft: {
      refreshToken: process.env.TEAMS_OAUTH_TOKEN
    },
    teamId: 'team-guid',
    channel: 'channel-guid',  // Single channel
    readLimit: 100
  }
});

Discord

const discordFeed = await graphlit.createFeed({
  name: 'Community Discord',
  type: FeedTypes.Message,
  message: {
    type: FeedServiceTypes.Discord,
    discord: {
      token: process.env.DISCORD_BOT_TOKEN,
      serverId: 'guild-id'
    },
    channel: 'channel-id',  // Single channel
    readLimit: 500
  }
});

Part 4: Cloud Storage Connectors

Google Drive

Use case: Sync company documents, collaborative files, shared folders.

const driveFeed = await graphlit.createFeed({
  name: 'Company Drive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GoogleDrive,
    googleDrive: {
      refreshToken: process.env.GOOGLE_OAUTH_TOKEN,
      folderId: 'folder-id'  // Optional: sync specific folder
    },
    isRecursive: true,
    readLimit: 1000
  }
});

What gets synced:

Google Docs (converted to markdown)
Google Sheets (tables extracted)
Google Slides (text extracted)
PDFs, images, videos
Files in subfolders

OAuth scopes needed:

https://www.googleapis.com/auth/drive.readonly

OneDrive / SharePoint

// OneDrive personal
const oneDriveFeed = await graphlit.createFeed({
  name: 'My OneDrive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.OneDrive,
    oneDrive: {
      refreshToken: process.env.MICROSOFT_OAUTH_TOKEN,
      folderId: 'folder-id'  // Optional
    },
    isRecursive: true,
    readLimit: 500
  }
});

// SharePoint (team sites)
const sharePointFeed = await graphlit.createFeed({
  name: 'Company SharePoint',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.SharePoint,
    sharePoint: {
      refreshToken: process.env.MICROSOFT_OAUTH_TOKEN,
      siteId: 'site-id',
      driveId: 'drive-id'
    },
    isRecursive: true,
    readLimit: 1000
  }
});

GitHub

Use case: Sync code repos, documentation, READMEs.

const githubFeed = await graphlit.createFeed({
  name: 'Company Repo',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo'
    },
    isRecursive: true
  }
});

What gets synced:

Source code files
README.md files
Documentation
Commit messages (optional)

Amazon S3

const s3Feed = await graphlit.createFeed({
  name: 'Documents S3 Bucket',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.S3Blob,
    s3: {
      bucketName: 'company-documents',
      region: 'us-east-1',
      accessKey: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY,
      prefix: 'public/'  // Optional: sync specific folder
    },
    isRecursive: true
  }
});

Part 5: Project Management Connectors

Jira

Use case: Search issues, track project status, entity extraction from tickets.

const jiraFeed = await graphlit.createFeed({
  name: 'Engineering Jira',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.AtlassianJira,
    jira: {
      email: 'user@company.com',
      token: process.env.JIRA_API_TOKEN,
      uri: 'https://yourcompany.atlassian.net',
      project: 'PROJ'  // Project key
    }
  },
  readLimit: 500
});

What gets synced:

Issue title, description, comments
Status, assignee, reporter
Attachments
Custom fields

Linear

const linearFeed = await graphlit.createFeed({
  name: 'Product Linear',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.Linear,
    linear: {
      token: process.env.LINEAR_API_KEY,
      teamId: 'team-id'
    }
  },
  readLimit: 500
});

Notion

const notionFeed = await graphlit.createFeed({
  name: 'Company Wiki',
  type: FeedTypes.Notion,
  notion: {
    token: process.env.NOTION_INTEGRATION_TOKEN
  },
  readLimit: 1000
});

What gets synced:

Pages and sub-pages
Databases and records
Embedded content
Inline comments

GitHub Issues & Pull Requests

// Issues
const issuesFeed = await graphlit.createFeed({
  name: 'Repo Issues',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo',
      includeIssues: true
    }
  },
  readLimit: 500
});

// Pull Requests
const prFeed = await graphlit.createFeed({
  name: 'Repo PRs',
  type: FeedTypes.PullRequest,
  pullRequest: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo'
    }
  },
  readLimit: 100
});

Part 6: Social Media & Web Connectors

const redditFeed = await graphlit.createFeed({
  name: 'Tech Subreddit',
  type: FeedTypes.Reddit,
  reddit: {
    subredditName: 'MachineLearning'
  },
  readLimit: 100
});

RSS Feeds

const rssFeed = await graphlit.createFeed({
  name: 'Tech News RSS',
  type: FeedTypes.Rss,
  rss: {
    uri: 'https://techcrunch.com/feed/'
  },
  readLimit: 50
});

Web Crawling

Use case: Scrape documentation sites, competitor analysis, content aggregation.

const webCrawl = await graphlit.createFeed({
  name: 'Documentation Crawler',
  type: FeedTypes.Web,
  web: {
    uri: 'https://docs.example.com',
    allowedPaths: ['^https://docs\\.example\\.com/.*'],  // Regex patterns
    excludedPaths: ['/api/.*', '/archive/.*']
  },
  readLimit: 500
});

What gets scraped:

Page HTML (converted to markdown)
Links (follows to crawl more pages)
Images (optional)
Metadata (title, description)

YouTube

const youtubeFeed = await graphlit.createFeed({
  name: 'Channel Videos',
  type: FeedTypes.YouTube,
  youtube: {
    channelIdentifier: 'channel-id'
  },
  readLimit: 50
});

What gets synced:

Video transcripts (auto-generated or manual)
Titles, descriptions
Thumbnails
Comments (optional)

Part 7: Feed Management

Query Feeds

// Get all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  console.log(`${feed.name} (${feed.type})`);
  console.log(`  State: ${feed.state}`);
  console.log(`  Last sync: ${feed.lastSyncDateTime}`);
});

Update Feed

// Change feed configuration
await graphlit.updateFeed(feedId, {
  name: 'Updated Name',
  slack: {
    readLimit: 1000  // Increase sync limit
  }
});

Disable/Enable Feed

// Pause syncing
await graphlit.disableFeed(feedId);

// Resume syncing
await graphlit.enableFeed(feedId);

Delete Feed

// Delete feed (and optionally its content)
await graphlit.deleteFeed(feedId);

// Delete feed but keep synced content
await graphlit.deleteFeed(feedId, false);

Trigger Manual Sync

// Force immediate sync (useful for testing)
await graphlit.triggerFeedSync(feedId);

// Wait for sync to complete
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feedId);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 5000));
}

Part 8: Advanced Patterns

Pattern 1: Feed with Workflow

Apply processing to synced content:

// Create workflow first
const workflow = await graphlit.createWorkflow({
  name: "Extract Entities",
  extraction: { /* ... */ }
});

// Create feed with workflow
const feed = await graphlit.createFeed({
  name: 'Slack with Entities',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  workflow: { id: workflow.createWorkflow.id }
});

// All synced messages will have entities extracted

Pattern 2: Feed with Collections

Auto-organize synced content:

// Create collection
const collection = await graphlit.createCollection('Slack Messages');

// Create feed that adds to collection
const feed = await graphlit.createFeed({
  name: 'Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  collections: [{ id: collection.createCollection.id }]
});

Pattern 3: Multi-Feed Strategy

Sync from multiple sources into unified knowledge base:

// Feed 1: Slack
const slackFeed = await graphlit.createFeed({
  name: 'Slack',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  }
});

// Feed 2: Gmail
const gmailFeed = await graphlit.createFeed({
  name: 'Gmail',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: process.env.GMAIL_OAUTH_TOKEN
    },
    readLimit: 100
  }
});

// Feed 3: Google Drive
const driveFeed = await graphlit.createFeed({
  name: 'Drive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GoogleDrive,
    googleDrive: {
      refreshToken: process.env.GOOGLE_OAUTH_TOKEN
    },
    isRecursive: true
  }
});

// Now search across all sources
const results = await graphlit.queryContents({
  search: "project update"
});
// Returns results from Slack, Gmail, AND Drive

Pattern 4: Scheduled Feeds

Control sync frequency:

const feed = await graphlit.createFeed({
  name: 'Daily News Feed',
  type: FeedTypes.Rss,
  rss: {
    uri: 'https://news.com/feed'
  },
  readLimit: 50,
  schedulePolicy: {
    recurrenceType: TimedPolicyRecurrenceTypes.Repeat,
    repeatInterval: 'P1D'  // ISO 8601: 1 day
  }
});

Part 9: Production Patterns

Pattern 1: OAuth Token Refresh

OAuth tokens expire—handle refresh:

// Store refresh token when user authorizes
const oauthData = {
  accessToken: '...',
  refreshToken: '...',
  expiresAt: Date.now() + 3600000
};

// Before creating feed, check if token is expired
async function getValidToken() {
  if (Date.now() > oauthData.expiresAt) {
    // Refresh token
    const newTokens = await refreshOAuthToken(oauthData.refreshToken);
    oauthData.accessToken = newTokens.accessToken;
    oauthData.expiresAt = Date.now() + 3600000;
  }
  return oauthData.accessToken;
}

// Use refreshed token
const token = await getValidToken();
const feed = await graphlit.createFeed({
  name: 'Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token,
    channel: 'general'
  }
});

Pattern 2: Feed Health Monitoring

Monitor feed status:

// Check all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  if (feed.state === 'FAILED') {
    console.error(`Feed ${feed.name} failed`);
    // Alert ops team
  }
  
  if (feed.lastSyncDateTime) {
    const hoursSinceSync = (Date.now() - new Date(feed.lastSyncDateTime).getTime()) / 3600000;
    if (hoursSinceSync > 24) {
      console.warn(`Feed ${feed.name} hasn't synced in ${hoursSinceSync}h`);
    }
  }
});

Pattern 3: Rate Limiting

Avoid overwhelming external APIs:

// Create feeds with delays
const urls = ['url1', 'url2', 'url3'];

for (const url of urls) {
  const feed = await graphlit.createFeed({
    name: `RSS Feed ${url}`,
    type: FeedTypes.Rss,
    rss: { uri: url }
  });
  
  // Wait 5 seconds between feed creations
  await new Promise(r => setTimeout(r, 5000));
}

Common Issues & Solutions

Issue: OAuth Token Invalid

Problem: "Invalid token" error when creating feed.

Solution: Refresh OAuth token or re-authorize:

try {
  const feed = await graphlit.createFeed(config);
} catch (error: any) {
  if (error.message.includes('invalid token')) {
    // Redirect user to re-authorize
    window.location.href = getOAuthUrl();
  }
}

Issue: Feed Not Syncing

Problem: Feed created but no content appears.

Solutions:

Check feed state:

const feed = await graphlit.getFeed(feedId);
console.log('State:', feed.feed.state);

Wait for initial sync:

await waitForFeedCompletion(feedId);

Trigger manual sync:

await graphlit.triggerFeedSync(feedId);

Issue: Too Much Content

Problem: Feed syncs thousands of items, overwhelming system.

Solution: Use readLimit:

const feed = await graphlit.createFeed({
  name: 'Limited Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  readLimit: 100  // Only last 100 messages
});

What's Next?

You now understand data connectors completely. Next steps:

Set up OAuth apps for connectors you need
Create feeds for key data sources
Apply workflows to customize processing
Monitor feed health in production

Related guides:

Content Ingestion - Manual ingestion vs feeds
Workflows and Processing - Process feed content
Building Knowledge Graphs - Extract entities from feeds
Production Architecture - Monitor feed health

Happy connecting! 🔌

Data Connectors: 30+ Sources for Unified Search

What You'll Learn

Table of Contents

Part 1: Feed Architecture

What is a Feed?

All 30+ Supported Connectors

Feed Lifecycle

Part 2: Authentication Methods

OAuth (Recommended for Most Connectors)

API Keys (For Services Without OAuth)

Part 3: Messaging Connectors

Slack

Gmail

Microsoft Teams

Discord

Part 4: Cloud Storage Connectors

Google Drive

OneDrive / SharePoint

GitHub

Amazon S3

Part 5: Project Management Connectors

Jira

Linear

Notion

GitHub Issues & Pull Requests

Part 6: Social Media & Web Connectors

Reddit

RSS Feeds

Web Crawling

YouTube

Part 7: Feed Management

Query Feeds

Update Feed

Disable/Enable Feed

Delete Feed

Trigger Manual Sync

Part 8: Advanced Patterns

Pattern 1: Feed with Workflow

Pattern 2: Feed with Collections

Pattern 3: Multi-Feed Strategy

Pattern 4: Scheduled Feeds

Part 9: Production Patterns

Pattern 1: OAuth Token Refresh

Pattern 2: Feed Health Monitoring

Pattern 3: Rate Limiting

Common Issues & Solutions

Issue: OAuth Token Invalid

Issue: Feed Not Syncing

Issue: Too Much Content

What's Next?

Ready to Build with Graphlit?