Commit graph

93 commits

Author SHA1 Message Date
Neil Alexander
89c772fb78
Report which component failed to consume (#1375) 2020-09-01 16:53:38 +01:00
Neil Alexander
c0f28845f8
Try to protect GetNextTransactionPDUs (#1350) 2020-08-27 15:27:12 +01:00
Neil Alexander
7466e6b718
Fix lock errors in federation sender (#1347)
* Fix lock errors in federation sender

* Additional fix to writers
2020-08-27 11:05:41 +01:00
Neil Alexander
9d53351dc2
Component-wide TransactionWriters (#1290)
* Offset updates take place using TransactionWriter

* Refactor TransactionWriter in current state server

* Refactor TransactionWriter in federation sender

* Refactor TransactionWriter in key server

* Refactor TransactionWriter in media API

* Refactor TransactionWriter in server key API

* Refactor TransactionWriter in sync API

* Refactor TransactionWriter in user API

* Fix deadlocking Sync API tests

* Un-deadlock device database

* Fix appservice API

* Rename TransactionWriters to Writers

* Move writers up a layer in sync API

* Document sqlutil.Writer interface

* Add note to Writer documentation
2020-08-21 10:42:08 +01:00
Kegsay
6d6bb75137
Add FederationClient interface to federationsender (#1284)
* Add FederationClient interface to federationsender

- Use a shim struct in HTTP mode to keep the same API as `FederationClient`.
- Use `federationsender` instead of `FederationClient` in `keyserver`.

* Pointers not values

* Review comments

* Fix unit tests

* Rejig backoff

* Unbreak test

* Remove debug logs

* Review comments and linting
2020-08-20 17:03:07 +01:00
Neil Alexander
0fea056db4
Change backoff behaviour so that Failure returns planned end time (#1288) 2020-08-20 14:58:53 +01:00
Neil Alexander
b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00
Neil Alexander
6cb1a65809
Synchronous invites (#1273)
* Refactor invites to be synchronous

* Fix synchronous invites

* Fix client API return type for send invite error

* Linter

* Restore PerformError on rsAPI.PerformInvite

* Update sytest-whitelist

* Don't override PerformError with normal errors

* Fix error passing

* Un-whitelist a couple of tests

* Update sytest-whitelist

* Try to handle multiple invite rejections better

* nolint

* Update gomatrixserverlib

* Fix /v1/invite test

* Remove replace from go.mod
2020-08-17 11:40:49 +01:00
Neil Alexander
4c4732a9c9
Don't send to ACL'd servers (#1267)
* Don't send to ACL'd servers

* Use gjson to look for room_id in EDU
2020-08-13 14:23:37 +01:00
Neil Alexander
9677a95afc
API setup refactoring (#1266)
* Start HTTP endpoint refactoring

* Update SetupAndServeHTTP

* Fix builds

* Don't set up external listener if no address configured

* TLS HTTP setup

* Break apart client/federation/key/media muxes

* Tweaks

* Fix P2P demos

* Fix media API routing

* Review comments @Kegsay

* Update sample config

* Fix gobind build

* Fix External -> Public in federation API test
2020-08-13 12:16:37 +01:00
Neil Alexander
52eeeb1627
Prefix-defined Kafka topics (#1254)
* Prefix-defined Kafka topics

* Fix current state server test
2020-08-10 15:18:37 +01:00
Neil Alexander
4b09f445c9
Configuration format v1 (#1230)
* Initial pass at refactoring config (not finished)

* Don't forget current state and EDU servers

* More shifting around

* Update server key API tests

* Fix roomserver test

* Fix more tests

* Further tweaks

* Fix current state server test (sort of)

* Maybe fix appservices

* Fix client API test

* Include database connection string in database options

* Fix sync API build

* Update config test

* Fix unit tests

* Fix federation sender build

* Fix gobind build

* Set Listen address for all services in HTTP monolith mode

* Validate config, reinstate appservice derived in directory, tweaks

* Tweak federation API test

* Set MaxOpenConnections/MaxIdleConnections to previous values

* Update generate-config
2020-08-10 14:18:04 +01:00
Neil Alexander
58998e9874
Backoff fixes (#1250)
* Backoff fixes

* Update comments

* Fix destination queue

* Log why we're blacklisting

* Fix logic fail

* Logging level

* Fix bug

* Maybe fix that bug after all

* Fix debug output

* Fix tests
2020-08-07 18:50:29 +01:00
Neil Alexander
5dd5a41119
Tweak log levels of some federation logging (#1248)
* Tweak log levels of some federation logging

* Update go.mod/go.sum for matrix-org/util#22 and matrix-org/gomatrixserverlib#215
2020-08-07 15:00:23 +01:00
Neil Alexander
b7491aae03
Yggdrasil demo updates (#1241)
* PerformServersAlive in PerformBroadcastEDU

* Don't double-pointer

* More reliable QUIC session handling

* Direct peer lookup, other tweaks

* Tweaks

* Try to wake up queues on incoming QUIC session

* Set session callbak on gobind build

* Fix incoming session storage

* Stateless reset, other tweaks

* Reset sessions when coordinates change

* Disable HTTP connection reuse, tweak timeouts
2020-08-06 16:00:42 +01:00
Neil Alexander
22f028e141
SelectJoinedHostsForRooms should use QueryVariadic on SQLite (#1238)
* SelectJoinedHostsForRooms should use QueryVariadic on SQLite

* Fix strings.Replace

* Fix statement
2020-08-05 10:00:35 +01:00
Kegan Dougal
78ab33f91f Unbreak postgres 2020-08-04 11:41:48 +01:00
Kegsay
0c4e8f6d4f
Send device list updates to servers (outbound only) (#1237)
* Add QueryDeviceMessages to serve up device keys and stream IDs

* Consume key change events in fedsender

Don't yet send them to destinations as we haven't worked them out yet

* Send device list updates to all required servers

* Glue it all together
2020-08-04 11:32:14 +01:00
Neil Alexander
cfeb1b2f42
Add UNIQUE constraint to blacklist table (#1216) 2020-07-23 10:22:23 +01:00
Neil Alexander
1e71fd645e
Persistent federation sender blacklist (#1214)
* Initial persistence of blacklists

* Move statistics folder

* Make MaxFederationRetries configurable

* Set lower failure thresholds for Yggdrasil demos

* Still write events into database for blacklisted hosts (they can be tidied up later)

* Review comments
2020-07-22 17:01:29 +01:00
Neil Alexander
489f34fed7
Remove debug lines 2020-07-20 17:03:20 +01:00
Neil Alexander
11a39fe3b5
Deduplicate FS database, EDU persistence table (#1207)
* Deduplicate FS database, add some EDU persistence groundwork

* Extend TransactionWriter to use optional existing transaction, use that for FS SQLite database writes

* Fix build due to bad keyserver import

* Working EDU persistence

* gocyclo, unsurprisingly

* Remove unused

* Update copyright notices
2020-07-20 16:55:20 +01:00
Neil Alexander
e5208c2ec9
Yggdrasil demo updates ("Bare QUIC")
Squashed commit of the following:

commit 86c2388e13ffdbabdd50cea205652dccc40e1860
Merge: b0a3ee6c f5e7e751
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:47:10 2020 +0100

    Merge branch 'master' into neilalexander/yggbarequic

commit b0a3ee6c5c063962384bb91c59ec753ddc8cfe5f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:42:22 2020 +0100

    Add support for broadcasting wake-up EDUs to known hosts

commit 8a5c2020b3a4b705b5d5686a9e71990a49e6d471
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:42:10 2020 +0100

    Bare QUIC demo working

commit d3939b3d6568cf4262c0391486a5203873b68bfc
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 15 11:42:43 2020 +0100

    Support bare Yggdrasil sessions with encrypted QUIC
2020-07-16 13:52:08 +01:00
Neil Alexander
72b3160776
Send-to-device messages over federation (#1198)
* Initial work to send send-to-device messages over federation

* Wire up send-to-device consumer, message formatting

* Generate random message ID

* Review comments, update sytest whitelist
2020-07-14 12:33:37 +01:00
Neil Alexander
08e9d996b6
Yggdrasil demo updates
Squashed commit of the following:

commit 6c2c48f862c1b6f8e741c57804282eceffe02487
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 16:28:09 2020 +0100

    Add README.md

commit 5eeefdadf8e3881dd7a32559a92be49bd7ddaf47
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 10:18:50 2020 +0100

    Fix wedge in federation sender

commit e2ebffbfba25cf82378393940a613ec32bfb909f
Merge: 0883ef88 abf26c12
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:23 2020 +0100

    Merge branch 'master' into neilalexander/yggdrasil

commit 0883ef8870e340f2ae9a0c37ed939dc2ab9911f6
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:06 2020 +0100

    Adjust timeouts

commit ba2d53199910f13b60cc892debe96a962e8c9acb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 16:34:40 2020 +0100

    Try to wake up from peers/sessions properly

commit 73f42eb494741ba5b0e0cef43654708e3c8eb399
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 15:43:38 2020 +0100

    Use TransactionWriter to reduce database lock issues on SQLite

commit 08bfe63241a18c58c539c91b9f52edccda63a611
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 12:38:02 2020 +0100

    Un-wedge federation

    Squashed commit of the following:

    commit aee933f8785e7a7998105f6090f514d18051a1bd
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:22:41 2020 +0100

        Un-goroutine the goroutines

    commit 478374e5d18a3056cac6682ef9095d41352d1295
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:09:31 2020 +0100

        Reduce federation sender wedges

commit 40cc62c54d9e3a863868214c48b7c18e522a4772
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 10:02:52 2020 +0100

    Handle switching in/out background more reliably
2020-07-10 16:28:18 +01:00
Neil Alexander
9cc52f47f3
Use TransactionWriter to reduce database lock issues on SQLite (#1192) 2020-07-09 17:48:56 +01:00
Neil Alexander
99b50f30a0
Reduce federation sender wedges (#1191)
* Reduce federation sender wedges

* Un-goroutine the goroutines
2020-07-09 15:39:35 +01:00
Neil Alexander
2bb580c1b0
Handle case where pendingPDUs might get out of sync for some reason 2020-07-08 15:42:36 +01:00
Neil Alexander
af6bc47f16
Squashed commit of the following:
commit b4cb47aa1329d2ada10ae6426fd9d2a69f47536a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 14:13:27 2020 +0100

    Restrict transaction send context time

commit 7c28205cdb5d842071d46b1ec599d09cca708e57
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 14:00:06 2020 +0100

    Add to gobind build

commit d9e2c72e0576a2eb0ce6ac48eed6cc9d4761a0ea
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 13:43:21 2020 +0100

    Wake up destination queues for new sessions/links

commit 21766c6c52bd00511d28981457e9034358c32a8d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 8 13:17:18 2020 +0100

    Tweak QUIC parameters
2020-07-08 14:52:48 +01:00
Neil Alexander
de0f427ddc Fix build 2020-07-07 16:54:14 +01:00
Neil Alexander
51fd532940 Fix error handling in federationsender 2020-07-07 16:53:10 +01:00
Kegsay
8e9580852d
bugfix: continue sending PDUs if ones are added whilst sending another PDU (#1187)
* Add a bit more logging to the fedsender

* bugfix: continue sending PDUs if ones are added whilst sending another PDU

Without this, the queue goes back to sleep on `<-oq.notifyPDUs` which won't
fire because `pendingPDUs` is already > 0. This should fix a flakey sytest.

* Break if no txn is sent

* Tweak federation sender wake-ups

* Update comments

* Remove break or that'll kill the parent loop

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-07-07 16:36:10 +01:00
Neil Alexander
46dbc46f84
Wake up destination queues more aggressively (#1183)
* Wake up destination queues more aggressively

* We don't really need Ch here do we
2020-07-03 16:31:56 +01:00
Neil Alexander
1773fd84b7
Hydrate destination queues at startup (#1179)
* Hydrate destination queues at startup

* Review comments
2020-07-03 11:49:49 +01:00
Neil Alexander
38caf8e5b7
Yggdrasil+QUIC demo, federation sender tweaks (#1177)
* Initial QUIC work

* Update Yggdrasil demo

* Make sure that the federation sender knows how many pending events are in the database when the worker starts

* QUIC tunables

* pprof

* Don't spin

* Set build info for Yggdrasil
2020-07-02 17:43:07 +01:00
Neil Alexander
42dd962425
Persistent federation sender queues (PDUs) (#1173)
* Initial work on persistent queues

* Update index for event ID and server name

* Put things into database (postgres for now)

* Duplicate postgres code into sqlite for now just to stop build errors, will fix SQLite soon

* Fix table name

* Fix index

* Fix table name

* Use RETURNING because LastInsertID is not supported by postgres

* Use functions

* Marshal headered event

* Don't error on now rows

* Don't block if there are PDUs waiting

* Try to tidy up JSON

* Debug logging

* Fix query, use transactions in postgres

* Clean up

* Rehydrate more opportunistically

* Fix SQLite

* remove unused types

* Review comments

* Shuffle things around a bit

* Clean up transaction properly

* Don't send empty transactions

* Reduce unnecessary retries

* Count PDUs to make more resilient

* Don't stop when there is work to be done

* Try to limit wakeups

* well this is tedious

* Fix race in incomplete transactions

* Thread safety on transaction ID/count
2020-07-01 11:46:38 +01:00
Kegsay
43cddfe00f
Return remote errors from FS.PerformJoin (#1164)
* Return remote errors from FS.PerformJoin

Follows the same pattern as PerformJoin on roomserver (no error return).

Also return the right format for incompatible room version errors.

Makes a bunch of tests pass!

* Handle network errors better when returning remote HTTP errors

* Linting

* Fix tests

* Update whitelist, pass network errors through in API=1 mode
2020-06-25 15:04:48 +01:00
Kegsay
0dc4ceaa2d
Minor perf/debugging improvements (#1121)
* Minor perf/debugging improvements

- publicroomsapi: Don't call QueryEventsByID with no event IDs
- appservice: Consume only if there are 1 or more ASes
- roomserver: don't keep a copy of the request "for debugging" - we trace now

* fedsender: return early if we have no destinations

* Unbreak tests
2020-06-12 15:11:33 +01:00
Kegsay
ecd7accbad
Rehuffle where things are in the internal package (#1122)
renamed:    internal/eventcontent.go -> internal/eventutil/eventcontent.go
	renamed:    internal/events.go -> internal/eventutil/events.go
	renamed:    internal/types.go -> internal/eventutil/types.go
	renamed:    internal/http/http.go -> internal/httputil/http.go
	renamed:    internal/httpapi.go -> internal/httputil/httpapi.go
	renamed:    internal/httpapi_test.go -> internal/httputil/httpapi_test.go
	renamed:    internal/httpapis/paths.go -> internal/httputil/paths.go
	renamed:    internal/routing.go -> internal/httputil/routing.go
	renamed:    internal/basecomponent/base.go -> internal/setup/base.go
	renamed:    internal/basecomponent/flags.go -> internal/setup/flags.go
	renamed:    internal/partition_offset_table.go -> internal/sqlutil/partition_offset_table.go
	renamed:    internal/postgres.go -> internal/sqlutil/postgres.go
	renamed:    internal/postgres_wasm.go -> internal/sqlutil/postgres_wasm.go
	renamed:    internal/sql.go -> internal/sqlutil/sql.go
2020-06-12 14:55:57 +01:00
Kegsay
4675e1ddb6
Add trace logging to RoomserverInternalAPI (#1120)
This is a wrapper around whatever impl we have which then logs
the function name/request/response/error.

Also tweak when we log on kafka streams: only log on the producer
side not the consumer side: we've never had issues with comms and
having 1 message rather than N would be nice.
2020-06-12 12:10:08 +01:00
Kegsay
ec7718e7f8
Roomserver API changes (#1118)
* s/QueryBackfill/PerformBackfill/g

* OutputEvent now includes AddStateEvents which contain the full event of extra state events

* Only include adds not the current event

* Get adding state right
2020-06-11 19:50:40 +01:00
Kegsay
25cd2dd1c9
Remove unused internal APIs (#1117) 2020-06-11 15:07:16 +01:00
Kegsay
399b6ae334
Remove federationsender producer, which in fact was not a producer (#1115)
* Remove federationsender producer, which in fact was not a producer

* Set the signing struct
2020-06-10 16:54:43 +01:00
Kegsay
4f171c56a8
Split out SetupFooComponent (#1106)
* Split out adding HTTP routes from making internal APIs for clarity

* Split out more components

* Split out more things

* Finish converting

* internal mux for internal routes
2020-06-08 15:51:07 +01:00
Neil Alexander
76ff47c052
Use AuthChainProvider to try and speed up federated joins (#1100)
* Use MissingAuthEventHandler on performjoin to try and speed up cases where we have missing events

* Update gomatrixserverlib

* Use supplied room version

* Use AuthChainProvider

* Tweaks

* Update gomatrixserverlib

* Signature checks
2020-06-05 11:48:52 +01:00
Kegsay
f4c676ccdd
Refactor how federationsender gets created (#1095)
* Refactor how federationsender gets created

* s/httpint/inthttp/ for alphabetical niceness with internal package
2020-06-04 14:27:10 +01:00
Kegsay
e7d1ac84c3
Add ParseFileURI and use it when dealing with file URIs (#1088)
* Add ParseFileURI and use it when dealing with file URIs

Fixes #1059

* Missing file

* Linting
2020-06-04 11:18:08 +01:00
Neil Alexander
225b72bd42
Don't reset counters before successful outgoing federation request (#1089)
* Don't reset counters before successful outgoing federation request on incoming federation request

* Comments
2020-06-04 10:54:10 +01:00
Kegsay
cfc137652e
Add a way to force federationsender to retry sending transactions (#1077)
* Add a way to force federationsender to retry sending transactions

And use it in P2P mode when we pick up new nodes.

* Linting

* Use atomic bool to stop us blocking on the channel
2020-06-01 18:34:08 +01:00
Kegsay
fe5cf6f880
fedsender: de-duplicate without sorting server names (#1073) 2020-05-29 13:50:06 +01:00