Commit Graph

25 Commits (80aa9aa8b053655683cbdae1aeccb083166bc714)

Author SHA1 Message Date
Neil Alexander 56b5847c74
Add prometheus metrics for destination queues, sync requests
Squashed commit of the following:

commit 7ed1c6cfe67429dbe378a763d832c150eb0f781d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:53:27 2020 +0000

    Updates

commit 8442099d08760b8d086e6d58f9f30284e378a2cd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:43:18 2020 +0000

    Add some sync statistics

commit ffe2a11644ed3d5297d1775a680886c574143fdb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:37:00 2020 +0000

    Fix backing off display

commit 27443a93855aa60a49806ecabbf9b09f818301bd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:28:43 2020 +0000

    Add some destination queue metrics
2020-12-16 15:02:39 +00:00
Neil Alexander f64c8822bc
Federation sender refactor (#1621)
* Refactor federation sender, again

* Clean up better

* Missing operators

* Try to get overflowed events from database

* Fix queries

* Log less

* Comments

* nil PDUs/EDUs shouldn't happen but guard against them for safety

* Tweak logging

* Fix transaction coalescing

* Update comments

* Check nils more

* Remove channels as they add extra complexity and possibly will deadlock

* Don't hold lock while sending transaction

* Less spam about sleeping queues

* Comments

* Bug-fixing

* Don't try to rehydrate twice

* Don't queue in memory for blacklisted destinations

* Don't queue in memory for blacklisted destinations

* Fix a couple of bugs

* Check for duplicates when pulling things out of the database

* Durable transactions, some more refactoring

* Revert "Durable transactions, some more refactoring"

This reverts commit 5daf924eaaefec5e4f7c12c16ca24e898de4adbb.

* Fix deadlock
2020-12-09 10:03:22 +00:00
Ronnie Ebrin a677a288bd
federationsender/roomserver: don't panic while federation is disabled (#1615) 2020-12-04 14:08:17 +00:00
Neil Alexander bdf6490375
Add ability to disable federation (#1604)
* Allow disabling federation

* Don't start federation queues if disabled

* Fix for Go 1.13
2020-12-02 15:10:03 +00:00
Kegsay 2570418f42
Remove ServerACLs from the current state server (#1390)
* Remove ServerACLs from the current state server

Functionality moved to roomserver

* Nothing to see here, move along
2020-09-04 10:40:58 +01:00
Neil Alexander 04bc09f591
Defer keyserver and federationsender wakeups to give HTTP listeners time to start (#1389) 2020-09-03 21:17:55 +01:00
Neil Alexander 6cb1a65809
Synchronous invites (#1273)
* Refactor invites to be synchronous

* Fix synchronous invites

* Fix client API return type for send invite error

* Linter

* Restore PerformError on rsAPI.PerformInvite

* Update sytest-whitelist

* Don't override PerformError with normal errors

* Fix error passing

* Un-whitelist a couple of tests

* Update sytest-whitelist

* Try to handle multiple invite rejections better

* nolint

* Update gomatrixserverlib

* Fix /v1/invite test

* Remove replace from go.mod
2020-08-17 11:40:49 +01:00
Neil Alexander 4c4732a9c9
Don't send to ACL'd servers (#1267)
* Don't send to ACL'd servers

* Use gjson to look for room_id in EDU
2020-08-13 14:23:37 +01:00
Neil Alexander 5dd5a41119
Tweak log levels of some federation logging (#1248)
* Tweak log levels of some federation logging

* Update go.mod/go.sum for matrix-org/util#22 and matrix-org/gomatrixserverlib#215
2020-08-07 15:00:23 +01:00
Neil Alexander 1e71fd645e
Persistent federation sender blacklist (#1214)
* Initial persistence of blacklists

* Move statistics folder

* Make MaxFederationRetries configurable

* Set lower failure thresholds for Yggdrasil demos

* Still write events into database for blacklisted hosts (they can be tidied up later)

* Review comments
2020-07-22 17:01:29 +01:00
Neil Alexander 11a39fe3b5
Deduplicate FS database, EDU persistence table (#1207)
* Deduplicate FS database, add some EDU persistence groundwork

* Extend TransactionWriter to use optional existing transaction, use that for FS SQLite database writes

* Fix build due to bad keyserver import

* Working EDU persistence

* gocyclo, unsurprisingly

* Remove unused

* Update copyright notices
2020-07-20 16:55:20 +01:00
Neil Alexander 72b3160776
Send-to-device messages over federation (#1198)
* Initial work to send send-to-device messages over federation

* Wire up send-to-device consumer, message formatting

* Generate random message ID

* Review comments, update sytest whitelist
2020-07-14 12:33:37 +01:00
Kegsay 8e9580852d
bugfix: continue sending PDUs if ones are added whilst sending another PDU (#1187)
* Add a bit more logging to the fedsender

* bugfix: continue sending PDUs if ones are added whilst sending another PDU

Without this, the queue goes back to sleep on `<-oq.notifyPDUs` which won't
fire because `pendingPDUs` is already > 0. This should fix a flakey sytest.

* Break if no txn is sent

* Tweak federation sender wake-ups

* Update comments

* Remove break or that'll kill the parent loop

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-07-07 16:36:10 +01:00
Neil Alexander 46dbc46f84
Wake up destination queues more aggressively (#1183)
* Wake up destination queues more aggressively

* We don't really need Ch here do we
2020-07-03 16:31:56 +01:00
Neil Alexander 1773fd84b7
Hydrate destination queues at startup (#1179)
* Hydrate destination queues at startup

* Review comments
2020-07-03 11:49:49 +01:00
Neil Alexander 42dd962425
Persistent federation sender queues (PDUs) (#1173)
* Initial work on persistent queues

* Update index for event ID and server name

* Put things into database (postgres for now)

* Duplicate postgres code into sqlite for now just to stop build errors, will fix SQLite soon

* Fix table name

* Fix index

* Fix table name

* Use RETURNING because LastInsertID is not supported by postgres

* Use functions

* Marshal headered event

* Don't error on now rows

* Don't block if there are PDUs waiting

* Try to tidy up JSON

* Debug logging

* Fix query, use transactions in postgres

* Clean up

* Rehydrate more opportunistically

* Fix SQLite

* remove unused types

* Review comments

* Shuffle things around a bit

* Clean up transaction properly

* Don't send empty transactions

* Reduce unnecessary retries

* Count PDUs to make more resilient

* Don't stop when there is work to be done

* Try to limit wakeups

* well this is tedious

* Fix race in incomplete transactions

* Thread safety on transaction ID/count
2020-07-01 11:46:38 +01:00
Kegsay 0dc4ceaa2d
Minor perf/debugging improvements (#1121)
* Minor perf/debugging improvements

- publicroomsapi: Don't call QueryEventsByID with no event IDs
- appservice: Consume only if there are 1 or more ASes
- roomserver: don't keep a copy of the request "for debugging" - we trace now

* fedsender: return early if we have no destinations

* Unbreak tests
2020-06-12 15:11:33 +01:00
Kegsay 399b6ae334
Remove federationsender producer, which in fact was not a producer (#1115)
* Remove federationsender producer, which in fact was not a producer

* Set the signing struct
2020-06-10 16:54:43 +01:00
Kegsay cfc137652e
Add a way to force federationsender to retry sending transactions (#1077)
* Add a way to force federationsender to retry sending transactions

And use it in P2P mode when we pick up new nodes.

* Linting

* Use atomic bool to stop us blocking on the channel
2020-06-01 18:34:08 +01:00
Neil Alexander a16db1c408
Improve federation sender performance, implement backoff and blacklisting, fix up invites a bit (#1007)
* Improve federation sender performance and behaviour, add backoff

* Tweaks

* Tweaks

* Tweaks

* Take copies of events before passing to destination queues

* Don't accidentally drop queued messages

* Don't take copies again

* Tidy up a bit

* Break out statistics (tracked component-wide), report success and failures from Perform actions

* Fix comment, use atomic add

* Improve logic a bit, don't block on wakeup, move idle check

* Don't retry sucessful invites, don't dispatch sendEvent, sendInvite etc

* Dedupe destinations, fix other bug hopefully

* Dispatch sends again

* Federation sender to ignore invites that are destined locally

* Loopback invite events

* Remodel a bit with channels

* Linter

* Only loopback invite event if we know the room

* We should tell other resident servers about the invite if we know about the room

* Correct invite signing

* Fix invite loopback

* Check HTTP response codes, push new invites to front of queue

* Review comments
2020-05-07 12:42:06 +01:00
Neil Alexander 3a858afca2
Loopback event from invite response (#982)
* Working invite v2 support

* Fix copyright notice

* Update gomatrixserverlib

* Add fallthrough

* Add missing continue

* Update sytest-whitelist, gomatrixserverlib

* Update gomatrixserverlib to test matrix-org/gomatrixserverlib#181

* Update gomatrixserverlib
2020-04-28 10:53:07 +01:00
Neil Alexander 067b875063
Invites v2 endpoint (#952)
* Start converting v1 invite endpoint to v2

* Update gomatrixserverlib

* Early federationsender code for sending invites

* Sending invites sorta happens now

* Populate invite request with stripped state

* Remodel a bit, don't reflect received invites

* Handle invite_room_state

* Handle room versions a bit better

* Update gomatrixserverlib

* Tweak order in destinationQueue.next

* Revert check in processMessage

* Tweak federation sender destination queue code a bit

* Add comments
2020-04-03 14:29:06 +01:00
Neil Alexander 05e1ae8745
Further room version wiring (#936)
* Room version 2 by default, other wiring updates, update gomatrixserverlib

* Fix nil pointer exception

* Fix some more nil pointer exceptions hopefully

* Update gomatrixserverlib

* Send all room versions when joining, not just stable ones

* Remove room version cquery

* Get room version when getting events from the roomserver database

* Reset default back to room version 2

* Don't generate event IDs unless needed

* Revert "Remove room version cquery"

This reverts commit a170d5873360dd059614460acc8b21ab2cda9767.

* Query room version in federation API, client API as needed

* Improvements to make_join send_join dance

* Make room server producers use headered events

* Lint tweaks

* Update gomatrixserverlib

* Versioned SendJoin

* Query room version in syncapi backfill

* Handle transaction marshalling/unmarshalling within Dendrite

* Sorta fix federation (kinda)

* whoops commit federation API too

* Use NewEventFromTrustedJSON when getting events from the database

* Update gomatrixserverlib

* Strip headers on federationapi endpoints

* Fix bug in clientapi profile room version query

* Update gomatrixserverlib

* Return more useful error if room version query doesn't find the room

* Update gomatrixserverlib

* Update gomatrixserverlib

* Maybe fix federation

* Fix formatting directive

* Update sytest whitelist and blacklist

* Temporarily disable room versions 3 and 4 until gmsl is fixed

* Fix count of EDUs in logging

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Rely on EventBuilder in gmsl to generate the event IDs for us

* Some review comments fixed

* Move function out of common and into gmsl

* Comment in federationsender destinationqueue

* Update gomatrixserverlib
2020-03-27 16:28:22 +00:00
Andrew Morgan 1eb77b8161
Don't print Sending EDU if there is noone to send to (#721)
The logs had a lot of:

```
Sending EDU event                             destinations="[]" edu_type=m.typing
```

Which is useless if it isn't actually sending the event anywhere (destinations is empty).
2019-07-01 16:04:49 +01:00
ruben 74827428bd use go module for dependencies (#594) 2019-05-21 21:56:55 +01:00