Commit Graph

36 Commits (d15836e260130f85edd5d9a104e5304f001d2681)

Author SHA1 Message Date
Neil Alexander d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander 8b5cd256cb
Don't hold destination queues in memory forever (#1769)
* Don't hold destination queues in memory forever

* Close channels

* Fix ordering

* Clear more aggressively

* clearQueue only called by defer so should be safe to delete queue in any case

* Wake queue when created, otherwise cleanup doesn't get called in all cases

* Clean up periodically, we hit a race condition otherwise

* Tweaks

* Don't create queues for blacklisted hosts

* Check blacklist properly
2021-02-17 15:16:35 +00:00
Neil Alexander bd72ed50d4
Reduce log level of 'Failed to send transaction' log line, since quite often it is flooding logs for dead servers 2021-02-04 12:25:31 +00:00
Neil Alexander 9f443317bc
Graceful shutdowns (#1734)
* Initial graceful stop

* Fix dendritejs

* Use process context for outbound federation requests in destination queues

* Reduce logging

* Fix log level
2021-01-26 12:56:20 +00:00
Neil Alexander 56b5847c74
Add prometheus metrics for destination queues, sync requests
Squashed commit of the following:

commit 7ed1c6cfe67429dbe378a763d832c150eb0f781d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:53:27 2020 +0000

    Updates

commit 8442099d08760b8d086e6d58f9f30284e378a2cd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:43:18 2020 +0000

    Add some sync statistics

commit ffe2a11644ed3d5297d1775a680886c574143fdb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:37:00 2020 +0000

    Fix backing off display

commit 27443a93855aa60a49806ecabbf9b09f818301bd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:28:43 2020 +0000

    Add some destination queue metrics
2020-12-16 15:02:39 +00:00
Neil Alexander f64c8822bc
Federation sender refactor (#1621)
* Refactor federation sender, again

* Clean up better

* Missing operators

* Try to get overflowed events from database

* Fix queries

* Log less

* Comments

* nil PDUs/EDUs shouldn't happen but guard against them for safety

* Tweak logging

* Fix transaction coalescing

* Update comments

* Check nils more

* Remove channels as they add extra complexity and possibly will deadlock

* Don't hold lock while sending transaction

* Less spam about sleeping queues

* Comments

* Bug-fixing

* Don't try to rehydrate twice

* Don't queue in memory for blacklisted destinations

* Don't queue in memory for blacklisted destinations

* Fix a couple of bugs

* Check for duplicates when pulling things out of the database

* Durable transactions, some more refactoring

* Revert "Durable transactions, some more refactoring"

This reverts commit 5daf924eaaefec5e4f7c12c16ca24e898de4adbb.

* Fix deadlock
2020-12-09 10:03:22 +00:00
Neil Alexander fe5d1400bf
Update federation timeouts (#1504)
* Update to matrix-org/gomatrixserverlib#234

* Update gomatrixserverlib

* Update federation timeouts

* Fix dendritejs

* Increase /send context time in destination queue
2020-10-09 17:08:32 +01:00
Neil Alexander d63d7c5640
Tweak log level of a fairly common log line 2020-09-29 17:08:47 +01:00
Neil Alexander a854e3aa18
Fix backoff bug 2020-09-22 14:53:36 +01:00
Neil Alexander 880b164490
Refactor backoff again (#1431)
* Tweak backoffs

* Refactor backoff some more, remove BackoffIfRequired as it adds unnecessary complexity

* Ignore 404s
2020-09-21 13:30:37 +01:00
Neil Alexander 6cb1a65809
Synchronous invites (#1273)
* Refactor invites to be synchronous

* Fix synchronous invites

* Fix client API return type for send invite error

* Linter

* Restore PerformError on rsAPI.PerformInvite

* Update sytest-whitelist

* Don't override PerformError with normal errors

* Fix error passing

* Un-whitelist a couple of tests

* Update sytest-whitelist

* Try to handle multiple invite rejections better

* nolint

* Update gomatrixserverlib

* Fix /v1/invite test

* Remove replace from go.mod
2020-08-17 11:40:49 +01:00
Neil Alexander 58998e9874
Backoff fixes (#1250)
* Backoff fixes

* Update comments

* Fix destination queue

* Log why we're blacklisting

* Fix logic fail

* Logging level

* Fix bug

* Maybe fix that bug after all

* Fix debug output

* Fix tests
2020-08-07 18:50:29 +01:00
Neil Alexander 5dd5a41119
Tweak log levels of some federation logging (#1248)
* Tweak log levels of some federation logging

* Update go.mod/go.sum for matrix-org/util#22 and matrix-org/gomatrixserverlib#215
2020-08-07 15:00:23 +01:00
Neil Alexander 1e71fd645e
Persistent federation sender blacklist (#1214)
* Initial persistence of blacklists

* Move statistics folder

* Make MaxFederationRetries configurable

* Set lower failure thresholds for Yggdrasil demos

* Still write events into database for blacklisted hosts (they can be tidied up later)

* Review comments
2020-07-22 17:01:29 +01:00
Neil Alexander 11a39fe3b5
Deduplicate FS database, EDU persistence table (#1207)
* Deduplicate FS database, add some EDU persistence groundwork

* Extend TransactionWriter to use optional existing transaction, use that for FS SQLite database writes

* Fix build due to bad keyserver import

* Working EDU persistence

* gocyclo, unsurprisingly

* Remove unused

* Update copyright notices
2020-07-20 16:55:20 +01:00
Neil Alexander 08e9d996b6
Yggdrasil demo updates
Squashed commit of the following:

commit 6c2c48f862c1b6f8e741c57804282eceffe02487
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 16:28:09 2020 +0100

    Add README.md

commit 5eeefdadf8e3881dd7a32559a92be49bd7ddaf47
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 10:18:50 2020 +0100

    Fix wedge in federation sender

commit e2ebffbfba25cf82378393940a613ec32bfb909f
Merge: 0883ef88 abf26c12
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:23 2020 +0100

    Merge branch 'master' into neilalexander/yggdrasil

commit 0883ef8870e340f2ae9a0c37ed939dc2ab9911f6
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:06 2020 +0100

    Adjust timeouts

commit ba2d53199910f13b60cc892debe96a962e8c9acb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 16:34:40 2020 +0100

    Try to wake up from peers/sessions properly

commit 73f42eb494741ba5b0e0cef43654708e3c8eb399
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 15:43:38 2020 +0100

    Use TransactionWriter to reduce database lock issues on SQLite

commit 08bfe63241a18c58c539c91b9f52edccda63a611
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 12:38:02 2020 +0100

    Un-wedge federation

    Squashed commit of the following:

    commit aee933f8785e7a7998105f6090f514d18051a1bd
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:22:41 2020 +0100

        Un-goroutine the goroutines

    commit 478374e5d18a3056cac6682ef9095d41352d1295
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:09:31 2020 +0100

        Reduce federation sender wedges

commit 40cc62c54d9e3a863868214c48b7c18e522a4772
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 10:02:52 2020 +0100

    Handle switching in/out background more reliably
2020-07-10 16:28:18 +01:00
Neil Alexander 99b50f30a0
Reduce federation sender wedges (#1191)
* Reduce federation sender wedges

* Un-goroutine the goroutines
2020-07-09 15:39:35 +01:00
Neil Alexander 2bb580c1b0
Handle case where pendingPDUs might get out of sync for some reason 2020-07-08 15:42:36 +01:00
Neil Alexander af6bc47f16
Squashed commit of the following:
commit b4cb47aa1329d2ada10ae6426fd9d2a69f47536a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 14:13:27 2020 +0100

    Restrict transaction send context time

commit 7c28205cdb5d842071d46b1ec599d09cca708e57
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 14:00:06 2020 +0100

    Add to gobind build

commit d9e2c72e0576a2eb0ce6ac48eed6cc9d4761a0ea
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 13:43:21 2020 +0100

    Wake up destination queues for new sessions/links

commit 21766c6c52bd00511d28981457e9034358c32a8d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 13:17:18 2020 +0100

    Tweak QUIC parameters
2020-07-08 14:52:48 +01:00
Neil Alexander de0f427ddc Fix build 2020-07-07 16:54:14 +01:00
Neil Alexander 51fd532940 Fix error handling in federationsender 2020-07-07 16:53:10 +01:00
Kegsay 8e9580852d
bugfix: continue sending PDUs if ones are added whilst sending another PDU (#1187)
* Add a bit more logging to the fedsender

* bugfix: continue sending PDUs if ones are added whilst sending another PDU

Without this, the queue goes back to sleep on `<-oq.notifyPDUs` which won't
fire because `pendingPDUs` is already > 0. This should fix a flakey sytest.

* Break if no txn is sent

* Tweak federation sender wake-ups

* Update comments

* Remove break or that'll kill the parent loop

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-07-07 16:36:10 +01:00
Neil Alexander 46dbc46f84
Wake up destination queues more aggressively (#1183)
* Wake up destination queues more aggressively

* We don't really need Ch here do we
2020-07-03 16:31:56 +01:00
Neil Alexander 1773fd84b7
Hydrate destination queues at startup (#1179)
* Hydrate destination queues at startup

* Review comments
2020-07-03 11:49:49 +01:00
Neil Alexander 38caf8e5b7
Yggdrasil+QUIC demo, federation sender tweaks (#1177)
* Initial QUIC work

* Update Yggdrasil demo

* Make sure that the federation sender knows how many pending events are in the database when the worker starts

* QUIC tunables

* pprof

* Don't spin

* Set build info for Yggdrasil
2020-07-02 17:43:07 +01:00
Neil Alexander 42dd962425
Persistent federation sender queues (PDUs) (#1173)
* Initial work on persistent queues

* Update index for event ID and server name

* Put things into database (postgres for now)

* Duplicate postgres code into sqlite for now just to stop build errors, will fix SQLite soon

* Fix table name

* Fix index

* Fix table name

* Use RETURNING because LastInsertID is not supported by postgres

* Use functions

* Marshal headered event

* Don't error on now rows

* Don't block if there are PDUs waiting

* Try to tidy up JSON

* Debug logging

* Fix query, use transactions in postgres

* Clean up

* Rehydrate more opportunistically

* Fix SQLite

* remove unused types

* Review comments

* Shuffle things around a bit

* Clean up transaction properly

* Don't send empty transactions

* Reduce unnecessary retries

* Count PDUs to make more resilient

* Don't stop when there is work to be done

* Try to limit wakeups

* well this is tedious

* Fix race in incomplete transactions

* Thread safety on transaction ID/count
2020-07-01 11:46:38 +01:00
Kegsay 399b6ae334
Remove federationsender producer, which in fact was not a producer (#1115)
* Remove federationsender producer, which in fact was not a producer

* Set the signing struct
2020-06-10 16:54:43 +01:00
Neil Alexander 225b72bd42
Don't reset counters before successful outgoing federation request (#1089)
* Don't reset counters before successful outgoing federation request on incoming federation request

* Comments
2020-06-04 10:54:10 +01:00
Kegsay cfc137652e
Add a way to force federationsender to retry sending transactions (#1077)
* Add a way to force federationsender to retry sending transactions

And use it in P2P mode when we pick up new nodes.

* Linting

* Use atomic bool to stop us blocking on the channel
2020-06-01 18:34:08 +01:00
Neil Alexander 57841fc35e
Read batches from incoming channels (#1067) 2020-05-27 12:16:53 +01:00
Neil Alexander a16db1c408
Improve federation sender performance, implement backoff and blacklisting, fix up invites a bit (#1007)
* Improve federation sender performance and behaviour, add backoff

* Tweaks

* Tweaks

* Tweaks

* Take copies of events before passing to destination queues

* Don't accidentally drop queued messages

* Don't take copies again

* Tidy up a bit

* Break out statistics (tracked component-wide), report success and failures from Perform actions

* Fix comment, use atomic add

* Improve logic a bit, don't block on wakeup, move idle check

* Don't retry sucessful invites, don't dispatch sendEvent, sendInvite etc

* Dedupe destinations, fix other bug hopefully

* Dispatch sends again

* Federation sender to ignore invites that are destined locally

* Loopback invite events

* Remodel a bit with channels

* Linter

* Only loopback invite event if we know the room

* We should tell other resident servers about the invite if we know about the room

* Correct invite signing

* Fix invite loopback

* Check HTTP response codes, push new invites to front of queue

* Review comments
2020-05-07 12:42:06 +01:00
Neil Alexander 3a858afca2
Loopback event from invite response (#982)
* Working invite v2 support

* Fix copyright notice

* Update gomatrixserverlib

* Add fallthrough

* Add missing continue

* Update sytest-whitelist, gomatrixserverlib

* Update gomatrixserverlib to test matrix-org/gomatrixserverlib#181

* Update gomatrixserverlib
2020-04-28 10:53:07 +01:00
Neil Alexander 067b875063
Invites v2 endpoint (#952)
* Start converting v1 invite endpoint to v2

* Update gomatrixserverlib

* Early federationsender code for sending invites

* Sending invites sorta happens now

* Populate invite request with stripped state

* Remodel a bit, don't reflect received invites

* Handle invite_room_state

* Handle room versions a bit better

* Update gomatrixserverlib

* Tweak order in destinationQueue.next

* Revert check in processMessage

* Tweak federation sender destination queue code a bit

* Add comments
2020-04-03 14:29:06 +01:00
Neil Alexander 05e1ae8745
Further room version wiring (#936)
* Room version 2 by default, other wiring updates, update gomatrixserverlib

* Fix nil pointer exception

* Fix some more nil pointer exceptions hopefully

* Update gomatrixserverlib

* Send all room versions when joining, not just stable ones

* Remove room version cquery

* Get room version when getting events from the roomserver database

* Reset default back to room version 2

* Don't generate event IDs unless needed

* Revert "Remove room version cquery"

This reverts commit a170d5873360dd059614460acc8b21ab2cda9767.

* Query room version in federation API, client API as needed

* Improvements to make_join send_join dance

* Make room server producers use headered events

* Lint tweaks

* Update gomatrixserverlib

* Versioned SendJoin

* Query room version in syncapi backfill

* Handle transaction marshalling/unmarshalling within Dendrite

* Sorta fix federation (kinda)

* whoops commit federation API too

* Use NewEventFromTrustedJSON when getting events from the database

* Update gomatrixserverlib

* Strip headers on federationapi endpoints

* Fix bug in clientapi profile room version query

* Update gomatrixserverlib

* Return more useful error if room version query doesn't find the room

* Update gomatrixserverlib

* Update gomatrixserverlib

* Maybe fix federation

* Fix formatting directive

* Update sytest whitelist and blacklist

* Temporarily disable room versions 3 and 4 until gmsl is fixed

* Fix count of EDUs in logging

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Rely on EventBuilder in gmsl to generate the event IDs for us

* Some review comments fixed

* Move function out of common and into gmsl

* Comment in federationsender destinationqueue

* Update gomatrixserverlib
2020-03-27 16:28:22 +00:00
Neil Alexander 6460b3725d
Make sure PDUs and EDUs in transaction don't marshal to null (#876) 2020-02-28 14:54:51 +00:00
ruben 74827428bd use go module for dependencies (#594) 2019-05-21 21:56:55 +01:00