Guide24 min read

Data Connectors: 30+ Sources for Unified Search

Connect 30+ data sources to Zine. Comprehensive guide covering OAuth flows, feed management, and continuous sync for Slack, Gmail, GitHub, Notion, and more.

Data connectors automatically sync content from external sources into Zine (powered by Graphlit). Instead of manually uploading files, connectors continuously monitor Slack channels, Gmail inboxes, Google Drive folders, GitHub repos, and 30+ other sources—keeping your knowledge base up-to-date automatically.

This guide covers feed architecture, OAuth vs API key authentication, all connector types, polling strategies, and production patterns. By the end, you'll know how to connect any data source and build automated content pipelines.

What You'll Learn

  • Feed architecture and lifecycle
  • OAuth flows vs API key authentication
  • Connector patterns by category (messaging, cloud storage, project management)
  • Feed configuration options (readLimit, schedules, filters)
  • Polling vs webhook patterns
  • Production feed management
  • Error handling and retry strategies

Prerequisites:

  • A Graphlit project - Sign up (2 min)
  • SDK installed: npm install graphlit-client (30 sec)
  • OAuth apps set up for connectors you want to use (we'll show you how)

Time to complete: 80 minutes
Difficulty: Intermediate

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.


Table of Contents

  1. Feed Architecture
  2. Authentication Methods
  3. Messaging Connectors
  4. Cloud Storage Connectors
  5. Project Management Connectors
  6. Social Media & Web Connectors
  7. Feed Management
  8. Production Patterns

Part 1: Feed Architecture

What is a Feed?

A feed is a continuous sync between an external data source and Graphlit. Once created, it:

  1. Initial sync: Fetches existing content (e.g., last 100 Slack messages)
  2. Continuous monitoring: Polls for new content (e.g., every 15 minutes)
  3. Auto-ingestion: New content automatically appears in Graphlit

Key insight: Feeds are "set it and forget it"—no manual re-triggering needed.

✅ Quick Win: Once a feed is created, new content automatically appears in your search results and RAG responses—no additional code needed.

All 30+ Supported Connectors

Zine supports 30+ connector types across 6 categories:

Messaging & Collaboration (6):

  • Slack - Channels, threads, DMs
  • Microsoft Teams - Team channels and conversations
  • Discord - Server channels
  • Gmail - Email inbox (labels, folders)
  • Outlook Email - Microsoft email
  • Intercom - Support articles and tickets

Cloud Storage (8):

  • Google Drive - Docs, Sheets, Slides, PDFs
  • Microsoft OneDrive - Personal cloud storage
  • SharePoint - Enterprise document management
  • Dropbox - Files and folders
  • Box - Enterprise file storage
  • Amazon S3 - Object storage buckets
  • Azure Blob Storage - Cloud file storage
  • FTP/SFTP - File servers

Source Control & Development (5):

  • GitHub Code - Repository contents
  • GitHub Issues - Bug tracking and discussions
  • GitHub Pull Requests - Code reviews
  • GitHub Commits - Change history
  • GitLab - Code and issues

Project Management (4):

  • Jira - Issue tracking
  • Linear - Modern project management
  • Trello - Kanban boards
  • Asana - Task management

Knowledge Management (2):

  • Notion - Pages and databases
  • Confluence - Wiki pages

Social Media & Web (6):

  • Reddit - Posts and comments
  • Twitter/X - Tweets and threads
  • YouTube - Video transcripts
  • RSS Feeds - Blog feeds
  • Web Crawling - Website content
  • Web Search - Tavily, Exa, Perplexity

Calendars & Meetings (3):

  • Google Calendar - Events and meetings
  • Outlook Calendar - Microsoft calendar
  • Zoom - Meeting recordings (transcribed)

Customer & Sales (2):

  • Zendesk - Support tickets
  • Salesforce - CRM data (custom integration)

Feed Lifecycle

CREATE → ENABLED → SYNCING → INDEXED
    ↓
DISABLED (if paused)
    ↓
DELETED (if removed)

Part 2: Authentication Methods

OAuth (Recommended for Most Connectors)

OAuth lets users authorize access without sharing passwords. Graphlit manages the OAuth flow.

Connectors using OAuth:

  • Slack
  • Gmail / Google Drive / Google Calendar
  • Microsoft (Outlook, OneDrive, SharePoint, Teams)
  • GitHub
  • Notion
  • Jira
  • Linear
  • Reddit
  • Twitter

OAuth flow:

  1. User clicks "Connect Slack"
  2. Redirected to Slack OAuth
  3. User authorizes
  4. Graphlit receives OAuth token
  5. Create feed with token
// Example: Slack OAuth
const authUrl = `https://slack.com/oauth/v2/authorize?client_id=${SLACK_CLIENT_ID}&scope=channels:read,channels:history&redirect_uri=${REDIRECT_URI}`;

// User visits authUrl, authorizes
// Slack redirects back with code

// Exchange code for token
const tokenResponse = await fetch('https://slack.com/api/oauth.v2.access', {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: `code=${code}&client_id=${SLACK_CLIENT_ID}&client_secret=${SLACK_CLIENT_SECRET}`
});

const { access_token } = await tokenResponse.json();

// Create feed with token
const feed = await graphlit.createFeed({
  name: 'My Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: access_token,
    channel: 'general',  // Single channel per feed
    readLimit: 100
  }
});

API Keys (For Services Without OAuth)

Some connectors use direct API keys:

  • RSS feeds (no auth)
  • Web crawling (no auth)
  • S3 (access key + secret)
  • Azure Storage (connection string)
import { FeedTypes, FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

// Example: S3 feed with API keys
const s3Feed = await graphlit.createFeed({
  name: 'Company S3 Bucket',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.S3Blob,
    s3: {
      bucketName: 'documents',
      region: 'us-east-1',
      accessKey: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY,
      prefix: 'pdfs/'  // Optional: filter by folder
    },
    isRecursive: true,
    readLimit: 1000
  }
});

Part 3: Messaging Connectors

Slack

Use case: Search team conversations, RAG over chat history, entity extraction from messages.

import { Graphlit } from 'graphlit-client';
import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create Slack feed
const slackFeed = await graphlit.createFeed({
  name: 'Engineering Slack',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'engineering',  // Single channel
    readLimit: 500  // Last 500 messages
  }
});

console.log('Slack feed created:', slackFeed.createFeed.id);

// Wait for initial sync
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(slackFeed.createFeed.id);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 10000));  // Check every 10s
}

console.log('✓ Slack history synced');

OAuth scopes needed:

  • channels:read - List channels
  • channels:history - Read messages
  • groups:read - Private channels (optional)
  • groups:history - Private messages (optional)

What gets synced:

  • All messages in specified channels
  • Threaded replies
  • User mentions
  • Files/images attached to messages
  • Reactions (optional)

💡 Pro Tip: Combine Slack feeds with entity extraction to automatically identify who's working on which projects from Slack conversations.

Gmail

Use case: Search emails, extract contacts/companies, email-based RAG.

const gmailFeed = await graphlit.createFeed({
  name: 'My Gmail',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: process.env.GMAIL_OAUTH_TOKEN
    },
    includeAttachments: true,
    readLimit: 100  // Last 100 emails
  }
});

OAuth scopes needed:

  • https://www.googleapis.com/auth/gmail.readonly

What gets synced:

  • Email subject, body, sender, recipients
  • Attachments (PDFs, images, etc.)
  • Timestamps
  • Email threads

Microsoft Teams

const teamsFeed = await graphlit.createFeed({
  name: 'Engineering Team',
  type: FeedTypes.Message,
  message: {
    type: FeedServiceTypes.MicrosoftTeams,
    microsoft: {
      refreshToken: process.env.TEAMS_OAUTH_TOKEN
    },
    teamId: 'team-guid',
    channel: 'channel-guid',  // Single channel
    readLimit: 100
  }
});

Discord

const discordFeed = await graphlit.createFeed({
  name: 'Community Discord',
  type: FeedTypes.Message,
  message: {
    type: FeedServiceTypes.Discord,
    discord: {
      token: process.env.DISCORD_BOT_TOKEN,
      serverId: 'guild-id'
    },
    channel: 'channel-id',  // Single channel
    readLimit: 500
  }
});

Part 4: Cloud Storage Connectors

Google Drive

Use case: Sync company documents, collaborative files, shared folders.

const driveFeed = await graphlit.createFeed({
  name: 'Company Drive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GoogleDrive,
    googleDrive: {
      refreshToken: process.env.GOOGLE_OAUTH_TOKEN,
      folderId: 'folder-id'  // Optional: sync specific folder
    },
    isRecursive: true,
    readLimit: 1000
  }
});

What gets synced:

  • Google Docs (converted to markdown)
  • Google Sheets (tables extracted)
  • Google Slides (text extracted)
  • PDFs, images, videos
  • Files in subfolders

OAuth scopes needed:

  • https://www.googleapis.com/auth/drive.readonly

OneDrive / SharePoint

// OneDrive personal
const oneDriveFeed = await graphlit.createFeed({
  name: 'My OneDrive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.OneDrive,
    oneDrive: {
      refreshToken: process.env.MICROSOFT_OAUTH_TOKEN,
      folderId: 'folder-id'  // Optional
    },
    isRecursive: true,
    readLimit: 500
  }
});

// SharePoint (team sites)
const sharePointFeed = await graphlit.createFeed({
  name: 'Company SharePoint',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.SharePoint,
    sharePoint: {
      refreshToken: process.env.MICROSOFT_OAUTH_TOKEN,
      siteId: 'site-id',
      driveId: 'drive-id'
    },
    isRecursive: true,
    readLimit: 1000
  }
});

GitHub

Use case: Sync code repos, documentation, READMEs.

const githubFeed = await graphlit.createFeed({
  name: 'Company Repo',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo'
    },
    isRecursive: true
  }
});

What gets synced:

  • Source code files
  • README.md files
  • Documentation
  • Commit messages (optional)

Amazon S3

const s3Feed = await graphlit.createFeed({
  name: 'Documents S3 Bucket',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.S3Blob,
    s3: {
      bucketName: 'company-documents',
      region: 'us-east-1',
      accessKey: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY,
      prefix: 'public/'  // Optional: sync specific folder
    },
    isRecursive: true
  }
});

Part 5: Project Management Connectors

Jira

Use case: Search issues, track project status, entity extraction from tickets.

const jiraFeed = await graphlit.createFeed({
  name: 'Engineering Jira',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.AtlassianJira,
    jira: {
      email: 'user@company.com',
      token: process.env.JIRA_API_TOKEN,
      uri: 'https://yourcompany.atlassian.net',
      project: 'PROJ'  // Project key
    }
  },
  readLimit: 500
});

What gets synced:

  • Issue title, description, comments
  • Status, assignee, reporter
  • Attachments
  • Custom fields

Linear

const linearFeed = await graphlit.createFeed({
  name: 'Product Linear',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.Linear,
    linear: {
      token: process.env.LINEAR_API_KEY,
      teamId: 'team-id'
    }
  },
  readLimit: 500
});

Notion

const notionFeed = await graphlit.createFeed({
  name: 'Company Wiki',
  type: FeedTypes.Notion,
  notion: {
    token: process.env.NOTION_INTEGRATION_TOKEN
  },
  readLimit: 1000
});

What gets synced:

  • Pages and sub-pages
  • Databases and records
  • Embedded content
  • Inline comments

GitHub Issues & Pull Requests

// Issues
const issuesFeed = await graphlit.createFeed({
  name: 'Repo Issues',
  type: FeedTypes.Issue,
  issue: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo',
      includeIssues: true
    }
  },
  readLimit: 500
});

// Pull Requests
const prFeed = await graphlit.createFeed({
  name: 'Repo PRs',
  type: FeedTypes.PullRequest,
  pullRequest: {
    type: FeedServiceTypes.GitHub,
    github: {
      personalAccessToken: process.env.GITHUB_PAT,
      repositoryOwner: 'my-company',
      repositoryName: 'main-repo'
    }
  },
  readLimit: 100
});

Part 6: Social Media & Web Connectors

Reddit

const redditFeed = await graphlit.createFeed({
  name: 'Tech Subreddit',
  type: FeedTypes.Reddit,
  reddit: {
    subredditName: 'MachineLearning'
  },
  readLimit: 100
});

RSS Feeds

const rssFeed = await graphlit.createFeed({
  name: 'Tech News RSS',
  type: FeedTypes.Rss,
  rss: {
    uri: 'https://techcrunch.com/feed/'
  },
  readLimit: 50
});

Web Crawling

Use case: Scrape documentation sites, competitor analysis, content aggregation.

const webCrawl = await graphlit.createFeed({
  name: 'Documentation Crawler',
  type: FeedTypes.Web,
  web: {
    uri: 'https://docs.example.com',
    allowedPaths: ['^https://docs\\.example\\.com/.*'],  // Regex patterns
    excludedPaths: ['/api/.*', '/archive/.*']
  },
  readLimit: 500
});

What gets scraped:

  • Page HTML (converted to markdown)
  • Links (follows to crawl more pages)
  • Images (optional)
  • Metadata (title, description)

YouTube

const youtubeFeed = await graphlit.createFeed({
  name: 'Channel Videos',
  type: FeedTypes.YouTube,
  youtube: {
    channelIdentifier: 'channel-id'
  },
  readLimit: 50
});

What gets synced:

  • Video transcripts (auto-generated or manual)
  • Titles, descriptions
  • Thumbnails
  • Comments (optional)

Part 7: Feed Management

Query Feeds

// Get all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  console.log(`${feed.name} (${feed.type})`);
  console.log(`  State: ${feed.state}`);
  console.log(`  Last sync: ${feed.lastSyncDateTime}`);
});

Update Feed

// Change feed configuration
await graphlit.updateFeed(feedId, {
  name: 'Updated Name',
  slack: {
    readLimit: 1000  // Increase sync limit
  }
});

Disable/Enable Feed

// Pause syncing
await graphlit.disableFeed(feedId);

// Resume syncing
await graphlit.enableFeed(feedId);

Delete Feed

// Delete feed (and optionally its content)
await graphlit.deleteFeed(feedId);

// Delete feed but keep synced content
await graphlit.deleteFeed(feedId, false);

Trigger Manual Sync

// Force immediate sync (useful for testing)
await graphlit.triggerFeedSync(feedId);

// Wait for sync to complete
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feedId);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 5000));
}

Part 8: Advanced Patterns

Pattern 1: Feed with Workflow

Apply processing to synced content:

// Create workflow first
const workflow = await graphlit.createWorkflow({
  name: "Extract Entities",
  extraction: { /* ... */ }
});

// Create feed with workflow
const feed = await graphlit.createFeed({
  name: 'Slack with Entities',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  workflow: { id: workflow.createWorkflow.id }
});

// All synced messages will have entities extracted

Pattern 2: Feed with Collections

Auto-organize synced content:

// Create collection
const collection = await graphlit.createCollection('Slack Messages');

// Create feed that adds to collection
const feed = await graphlit.createFeed({
  name: 'Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  collections: [{ id: collection.createCollection.id }]
});

Pattern 3: Multi-Feed Strategy

Sync from multiple sources into unified knowledge base:

// Feed 1: Slack
const slackFeed = await graphlit.createFeed({
  name: 'Slack',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  }
});

// Feed 2: Gmail
const gmailFeed = await graphlit.createFeed({
  name: 'Gmail',
  type: FeedTypes.Email,
  email: {
    type: FeedServiceTypes.GoogleEmail,
    google: {
      refreshToken: process.env.GMAIL_OAUTH_TOKEN
    },
    readLimit: 100
  }
});

// Feed 3: Google Drive
const driveFeed = await graphlit.createFeed({
  name: 'Drive',
  type: FeedTypes.Site,
  site: {
    type: FeedServiceTypes.GoogleDrive,
    googleDrive: {
      refreshToken: process.env.GOOGLE_OAUTH_TOKEN
    },
    isRecursive: true
  }
});

// Now search across all sources
const results = await graphlit.queryContents({
  search: "project update"
});
// Returns results from Slack, Gmail, AND Drive

Pattern 4: Scheduled Feeds

Control sync frequency:

const feed = await graphlit.createFeed({
  name: 'Daily News Feed',
  type: FeedTypes.Rss,
  rss: {
    uri: 'https://news.com/feed'
  },
  readLimit: 50,
  schedulePolicy: {
    recurrenceType: TimedPolicyRecurrenceTypes.Repeat,
    repeatInterval: 'P1D'  // ISO 8601: 1 day
  }
});

Part 9: Production Patterns

Pattern 1: OAuth Token Refresh

OAuth tokens expire—handle refresh:

// Store refresh token when user authorizes
const oauthData = {
  accessToken: '...',
  refreshToken: '...',
  expiresAt: Date.now() + 3600000
};

// Before creating feed, check if token is expired
async function getValidToken() {
  if (Date.now() > oauthData.expiresAt) {
    // Refresh token
    const newTokens = await refreshOAuthToken(oauthData.refreshToken);
    oauthData.accessToken = newTokens.accessToken;
    oauthData.expiresAt = Date.now() + 3600000;
  }
  return oauthData.accessToken;
}

// Use refreshed token
const token = await getValidToken();
const feed = await graphlit.createFeed({
  name: 'Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token,
    channel: 'general'
  }
});

Pattern 2: Feed Health Monitoring

Monitor feed status:

// Check all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  if (feed.state === 'FAILED') {
    console.error(`Feed ${feed.name} failed`);
    // Alert ops team
  }
  
  if (feed.lastSyncDateTime) {
    const hoursSinceSync = (Date.now() - new Date(feed.lastSyncDateTime).getTime()) / 3600000;
    if (hoursSinceSync > 24) {
      console.warn(`Feed ${feed.name} hasn't synced in ${hoursSinceSync}h`);
    }
  }
});

Pattern 3: Rate Limiting

Avoid overwhelming external APIs:

// Create feeds with delays
const urls = ['url1', 'url2', 'url3'];

for (const url of urls) {
  const feed = await graphlit.createFeed({
    name: `RSS Feed ${url}`,
    type: FeedTypes.Rss,
    rss: { uri: url }
  });
  
  // Wait 5 seconds between feed creations
  await new Promise(r => setTimeout(r, 5000));
}

Common Issues & Solutions

Issue: OAuth Token Invalid

Problem: "Invalid token" error when creating feed.

Solution: Refresh OAuth token or re-authorize:

try {
  const feed = await graphlit.createFeed(config);
} catch (error: any) {
  if (error.message.includes('invalid token')) {
    // Redirect user to re-authorize
    window.location.href = getOAuthUrl();
  }
}

Issue: Feed Not Syncing

Problem: Feed created but no content appears.

Solutions:

  1. Check feed state:
const feed = await graphlit.getFeed(feedId);
console.log('State:', feed.feed.state);
  1. Wait for initial sync:
await waitForFeedCompletion(feedId);
  1. Trigger manual sync:
await graphlit.triggerFeedSync(feedId);

Issue: Too Much Content

Problem: Feed syncs thousands of items, overwhelming system.

Solution: Use readLimit:

const feed = await graphlit.createFeed({
  name: 'Limited Slack Feed',
  type: FeedTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channel: 'general'
  },
  readLimit: 100  // Only last 100 messages
});

What's Next?

You now understand data connectors completely. Next steps:

  1. Set up OAuth apps for connectors you need
  2. Create feeds for key data sources
  3. Apply workflows to customize processing
  4. Monitor feed health in production

Related guides:

Happy connecting! 🔌

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Data Connectors: 30+ Sources for Unified Search | Graphlit Developer Guides