Commit Graph

123 Commits (e30c52308963a9807ddd7fbd86204deb02d22bd0)

Author SHA1 Message Date
Neil Alexander 56b5847c74
Add prometheus metrics for destination queues, sync requests
Squashed commit of the following:

commit 7ed1c6cfe67429dbe378a763d832c150eb0f781d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:53:27 2020 +0000

    Updates

commit 8442099d08760b8d086e6d58f9f30284e378a2cd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:43:18 2020 +0000

    Add some sync statistics

commit ffe2a11644ed3d5297d1775a680886c574143fdb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:37:00 2020 +0000

    Fix backing off display

commit 27443a93855aa60a49806ecabbf9b09f818301bd
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Dec 16 14:28:43 2020 +0000

    Add some destination queue metrics
2020-12-16 15:02:39 +00:00
Neil Alexander f64c8822bc
Federation sender refactor (#1621)
* Refactor federation sender, again

* Clean up better

* Missing operators

* Try to get overflowed events from database

* Fix queries

* Log less

* Comments

* nil PDUs/EDUs shouldn't happen but guard against them for safety

* Tweak logging

* Fix transaction coalescing

* Update comments

* Check nils more

* Remove channels as they add extra complexity and possibly will deadlock

* Don't hold lock while sending transaction

* Less spam about sleeping queues

* Comments

* Bug-fixing

* Don't try to rehydrate twice

* Don't queue in memory for blacklisted destinations

* Don't queue in memory for blacklisted destinations

* Fix a couple of bugs

* Check for duplicates when pulling things out of the database

* Durable transactions, some more refactoring

* Revert "Durable transactions, some more refactoring"

This reverts commit 5daf924eaaefec5e4f7c12c16ca24e898de4adbb.

* Fix deadlock
2020-12-09 10:03:22 +00:00
Neil Alexander 5d65a879a5
Federation sender event cache (#1614)
* Cache federation sender events

* Store in the correct cache

* Update federation event cache

* Fix Unset

* Give EDUs same caching treatment as PDUs

* Make federationsender_cache_size configurable

* Default caches configuration

* Fix unit tests

* Revert "Fix unit tests"

This reverts commit 24eb5d22524f20e1024b1475debe61ae20538a5a.

* Revert "Default caches configuration"

This reverts commit 464ecd1e64b9d2983f6fd5430e9607519d543cb3.

* Revert "Make federationsender_cache_size configurable"

This reverts commit 4631f5324151e006a15d6f19008f06361b994607.
2020-12-04 14:52:10 +00:00
Kegsay b507312d4c
MSC2836 threading: part 2 (#1596)
* Update GMSL

* Add MSC2836EventRelationships to fedsender

* Call MSC2836EventRelationships in reqCtx

* auth remote servers

* Extract room ID and servers from previous events; refactor a bit

* initial cut of federated threading

* Use the right client/fed struct in the response

* Add QueryAuthChain for use with MSC2836

* Add auth chain to federated response

* Fix pointers

* under CI: more logging and enable mscs, nil fix

* Handle direction: up

* Actually send message events to the roomserver..

* Add children and children_hash to unsigned, with tests

* Add logic for exploring threads and tracking children; missing storage functions

* Implement storage functions for children

* Add fetchUnknownEvent

* Do federated hits for include_children if we have unexplored children

* Use /ev_rel rather than /event as the former includes child metadata

* Remove cross-room threading impl

* Enable MSC2836 in the p2p demo

* Namespace mscs db

* Enable msc2836 for ygg

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-12-04 14:11:01 +00:00
Ronnie Ebrin a677a288bd
federationsender/roomserver: don't panic while federation is disabled (#1615) 2020-12-04 14:08:17 +00:00
Neil Alexander b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander bdf6490375
Add ability to disable federation (#1604)
* Allow disabling federation

* Don't start federation queues if disabled

* Fix for Go 1.13
2020-12-02 15:10:03 +00:00
Kegsay 6353b0b7e4
MSC2836: Threading - part one (#1589)
* Add mscs/hooks package, begin work for msc2836

* Flesh out hooks and add SQL schema

* Begin implementing core msc2836 logic

* Add test harness

* Linting

* Implement visibility checks; stub out APIs for tests

* Flesh out testing

* Flesh out walkThread a bit

* Persist the origin_server_ts as well

* Edges table instead of relationships

* Add nodes table for event metadata

* LEFT JOIN to extract origin_server_ts for children

* Add graph walking structs

* Implement walking algorithm

* Add more graph walking tests

* Add auto_join for local rooms

* Fix create table syntax on postgres

* Add relationship_room_id|servers to the unsigned section of events

* Persist the parent room_id/servers in edge metadata

Other events cannot assert the true room_id/servers for the
parent event, only make claims to them, hence why this is
edge metadata.

* guts to pass through room_id/servers

* Refactor msc2836 to allow handling from federation

* Add JoinedVia to PerformJoin responses

* Fix tests; review comments
2020-11-19 11:34:59 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK bcb89ada5e
Implement read receipts (#1528)
* fix conversion from int to string yields a string of one rune, not a string of digits

* Add receipts table to syncapi

* Use StreamingToken as the since value

* Add required method to testEDUProducer

* Make receipt json creation "easier" to read

* Add receipts api to the eduserver

* Add receipts endpoint

* Add eduserver kafka consumer

* Add missing kafka config

* Add passing tests to whitelist

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Fix copy & paste error

* Fix column count error

* Make outbound federation receipts pass

* Make "Inbound federation rejects receipts from wrong remote" pass

* Don't use errors package

* - Add TODO for batching requests
- Rename variable

* Return a better error message

* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright

* Better creation/usage of JoinResponse

* Query all joined rooms instead of just one

* Update gomatrixserverlib

* Add sqlite3 migration

* Add postgres migration

* Ensure required sequence exists before running migrations

* Clarification on comment

* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}

* Remove dead code
Use key for timestamp

* Fix postgres query...

* Remove single purpose struct

* Use key/value directly

* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.

* Actually update the id, so it is correctly send in syncs

* Set receipt on request to /read_markers

* Fix issue with receipts getting overwritten

* Use fmt.Errorf instead of pkg/errors

* Revert "Add postgres migration"

This reverts commit 722fe5a04628882b787d096942459961db159b06.

* Revert "Add sqlite3 migration"

This reverts commit d113b03f6495a4b8f8bcf158a3d00b510b4240cc.

* Fix selectRoomReceipts query

* Make golangci-lint happy

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-09 18:46:11 +00:00
Neil Alexander 3afc623098
Fix RewritesState bug (#1557)
* Set RewritesState once

* Check if any new state provided

* Obey rewritesState

* Don't nuke everything the sync API knows when purging state

* Fix panic from duplicate insert

* Consistency

* Use HasState

* Remove nolint

* Clean up joined rooms on state rewrite
2020-10-22 10:39:16 +01:00
Neil Alexander 6e63df1d9a
KindOld (#1531)
* Add KindOld

* Don't process latest events/memberships for old events

* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events

* Signal to downstream components if an event has become a forward extremity

* Don't exclude from sync

* Soft-fail checks on KindNew

* Don't run the latest events updater at all for KindOld

* Don't make federation sender change after all

* Kind in federation sender join

* Don't send isForwardExtremity

* Fix syncapi

* Update comments

* Fix SendEventWithState

* Update sytest-whitelist

* Generate old output events

* Sync API consumes old room events

* Update comments
2020-10-19 14:59:13 +01:00
Neil Alexander 49abe359e6
Start Kafka connections for each component that needs them (#1527)
* Start Kafka connection for each component that needs one

* Fix roomserver unit tests

* Rename to naffkaInstance (@Kegsay review comment)

* Fix import cycle
2020-10-15 13:27:13 +01:00
Neil Alexander 9d6b77c58a
Try to retrieve missing auth events from multiple servers (#1516)
* Recursively fetch auth events if needed

* Fix processEvent call

* Ask more servers in lookupEvent

* Don't panic!

* Panic at the Disco

* Find servers more aggressively

* Add getServers

* Fix number of servers to 5, don't bail making RespState if auth events missing

* Fix panic

* Ignore missing state events too

* Report number of servers correctly

* Don't reuse request context for /send_join

* Update federation API tests

* Don't recurse processEvents

* Implement getEvents differently
2020-10-13 11:53:20 +01:00
Kegsay 9096bfcee8
Validate m.room.create events in send_join responses (#1505)
* Validate m.room.create events in send_join responses

For sytest compliance, refs #1315 and #1317

Fixes #1317

* Linting
2020-10-10 00:21:15 +01:00
Neil Alexander fe5d1400bf
Update federation timeouts (#1504)
* Update to matrix-org/gomatrixserverlib#234

* Update gomatrixserverlib

* Update federation timeouts

* Fix dendritejs

* Increase /send context time in destination queue
2020-10-09 17:08:32 +01:00
Neil Alexander bf90db5b60
Remove KindRewrite (#1481)
* Don't send rewrite events

* Remove final traces of rewrite events

* Remove test that is no longer needed

* Revert "Remove test that is no longer needed"

This reverts commit 9a45babff690480acd656a52f2c2950a5f7e9ada.

* Update test to use KindOutlier
2020-10-06 11:05:00 +01:00
Neil Alexander d63d7c5640
Tweak log level of a fairly common log line 2020-09-29 17:08:47 +01:00
Neil Alexander a854e3aa18
Fix backoff bug 2020-09-22 14:53:36 +01:00
Neil Alexander a14b29b526
Initial notary support (#1436)
* Initial work on notary support

* Somewhat working (but not properly filtered) notary support, other tweaks

* Update gomatrixserverlib
2020-09-22 14:40:54 +01:00
Neil Alexander a7563ede3d
Process federated joins in background context (#1434)
* Return early from federated room join

* Synchronous perform-join as long as possible

* Don't allow multiple federated joins to the same room by the same user
2020-09-22 11:05:45 +01:00
Neil Alexander 880b164490
Refactor backoff again (#1431)
* Tweak backoffs

* Refactor backoff some more, remove BackoffIfRequired as it adds unnecessary complexity

* Ignore 404s
2020-09-21 13:30:37 +01:00
Neil Alexander 965f068d1a
Handle state with input event as new events (#1415)
* SendEventWithState events as new

* Use cumulative state IDs for final event

* Error wrapping in calculateAndSetState

* Handle overwriting same event type and state key

* Hacky way to spot historical events

* Don't exclude from sync

* Don't generate output events when rewriting forward extremities

* Update output event check

* Historical output events

* Define output room event type

* Notify key changes on state

* Don't send our membership event twice

* Deduplicate state entries

* Tweaks

* Remove unnecessary nolint

* Fix current state upsert in sync API

* Send auth events as outliers, state events as rewrite

* Sync API don't consume state events

* Process events actually

* Improve outlier check

* Fix local room check

* Remove extra room check, it seems to break the whole damn world

* Fix federated join check

* Fix nil pointer exception

* Better comments on DeduplicateStateEntries

* Reflow forced federated joins

* Don't force federated join for possibly even local invites

* Comment SendEventWithState better

* Rewrite room state in sync API storage

* Add TODO

* Clean up all room data when receiving create event

* Don't generate output events for rewrites, but instead notify that state is rewritten on the final new event

* Rename to PurgeRoom

* Exclude backfilled messages from /sync

* Split out rewriting state from updating state from state res

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2020-09-15 11:17:46 +01:00
Matthew Hodgson 39507bacc3
Peeking via MSC2753 (#1370)
Initial implementation of MSC2753, as tested by https://github.com/matrix-org/sytest/pull/944.
Doesn't yet handle unpeeks, peeked EDUs, or history viz changing during a peek - these will follow.
https://github.com/matrix-org/dendrite/pull/1370 has full details.
2020-09-10 14:39:18 +01:00
Neil Alexander 668a722ee0
Backoff for 401s (#1410)
* Backoff for 401s

* Human-readable retry_after in logs
2020-09-08 13:41:08 +01:00
Neil Alexander 1602df8752
Ignore state events with invalid signatures when joining rooms (#1407)
* Use state from RespSendJoin post-check

* Don't create input events for invalid events

* Let's try this again

* Update gomatrixserverlib

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@38f437f
2020-09-07 16:54:51 +01:00
Kegsay 088294ee65
Remove QueryRoomsForUser from current state server (#1398) 2020-09-04 15:58:30 +01:00
Kegsay 2570418f42
Remove ServerACLs from the current state server (#1390)
* Remove ServerACLs from the current state server

Functionality moved to roomserver

* Nothing to see here, move along
2020-09-04 10:40:58 +01:00
Neil Alexander 04bc09f591
Defer keyserver and federationsender wakeups to give HTTP listeners time to start (#1389) 2020-09-03 21:17:55 +01:00
Neil Alexander 096191ca24
Use federation sender for backfill/getting missing events (#1379)
* Use federation sender for backfill and getting missing events

* Fix internal URL paths

* Update go.mod/go.sum for matrix-org/gomatrixserverlib#218

* Add missing server implementations in HTTP interface
2020-09-02 15:26:30 +01:00
Neil Alexander 89c772fb78
Report which component failed to consume (#1375) 2020-09-01 16:53:38 +01:00
Neil Alexander c0f28845f8
Try to protect GetNextTransactionPDUs (#1350) 2020-08-27 15:27:12 +01:00
Neil Alexander 7466e6b718
Fix lock errors in federation sender (#1347)
* Fix lock errors in federation sender

* Additional fix to writers
2020-08-27 11:05:41 +01:00
Neil Alexander 9d53351dc2
Component-wide TransactionWriters (#1290)
* Offset updates take place using TransactionWriter

* Refactor TransactionWriter in current state server

* Refactor TransactionWriter in federation sender

* Refactor TransactionWriter in key server

* Refactor TransactionWriter in media API

* Refactor TransactionWriter in server key API

* Refactor TransactionWriter in sync API

* Refactor TransactionWriter in user API

* Fix deadlocking Sync API tests

* Un-deadlock device database

* Fix appservice API

* Rename TransactionWriters to Writers

* Move writers up a layer in sync API

* Document sqlutil.Writer interface

* Add note to Writer documentation
2020-08-21 10:42:08 +01:00
Kegsay 6d6bb75137
Add FederationClient interface to federationsender (#1284)
* Add FederationClient interface to federationsender

- Use a shim struct in HTTP mode to keep the same API as `FederationClient`.
- Use `federationsender` instead of `FederationClient` in `keyserver`.

* Pointers not values

* Review comments

* Fix unit tests

* Rejig backoff

* Unbreak test

* Remove debug logs

* Review comments and linting
2020-08-20 17:03:07 +01:00
Neil Alexander 0fea056db4
Change backoff behaviour so that Failure returns planned end time (#1288) 2020-08-20 14:58:53 +01:00
Neil Alexander b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00
Neil Alexander 6cb1a65809
Synchronous invites (#1273)
* Refactor invites to be synchronous

* Fix synchronous invites

* Fix client API return type for send invite error

* Linter

* Restore PerformError on rsAPI.PerformInvite

* Update sytest-whitelist

* Don't override PerformError with normal errors

* Fix error passing

* Un-whitelist a couple of tests

* Update sytest-whitelist

* Try to handle multiple invite rejections better

* nolint

* Update gomatrixserverlib

* Fix /v1/invite test

* Remove replace from go.mod
2020-08-17 11:40:49 +01:00
Neil Alexander 4c4732a9c9
Don't send to ACL'd servers (#1267)
* Don't send to ACL'd servers

* Use gjson to look for room_id in EDU
2020-08-13 14:23:37 +01:00
Neil Alexander 9677a95afc
API setup refactoring (#1266)
* Start HTTP endpoint refactoring

* Update SetupAndServeHTTP

* Fix builds

* Don't set up external listener if no address configured

* TLS HTTP setup

* Break apart client/federation/key/media muxes

* Tweaks

* Fix P2P demos

* Fix media API routing

* Review comments @Kegsay

* Update sample config

* Fix gobind build

* Fix External -> Public in federation API test
2020-08-13 12:16:37 +01:00
Neil Alexander 52eeeb1627
Prefix-defined Kafka topics (#1254)
* Prefix-defined Kafka topics

* Fix current state server test
2020-08-10 15:18:37 +01:00
Neil Alexander 4b09f445c9
Configuration format v1 (#1230)
* Initial pass at refactoring config (not finished)

* Don't forget current state and EDU servers

* More shifting around

* Update server key API tests

* Fix roomserver test

* Fix more tests

* Further tweaks

* Fix current state server test (sort of)

* Maybe fix appservices

* Fix client API test

* Include database connection string in database options

* Fix sync API build

* Update config test

* Fix unit tests

* Fix federation sender build

* Fix gobind build

* Set Listen address for all services in HTTP monolith mode

* Validate config, reinstate appservice derived in directory, tweaks

* Tweak federation API test

* Set MaxOpenConnections/MaxIdleConnections to previous values

* Update generate-config
2020-08-10 14:18:04 +01:00
Neil Alexander 58998e9874
Backoff fixes (#1250)
* Backoff fixes

* Update comments

* Fix destination queue

* Log why we're blacklisting

* Fix logic fail

* Logging level

* Fix bug

* Maybe fix that bug after all

* Fix debug output

* Fix tests
2020-08-07 18:50:29 +01:00
Neil Alexander 5dd5a41119
Tweak log levels of some federation logging (#1248)
* Tweak log levels of some federation logging

* Update go.mod/go.sum for matrix-org/util#22 and matrix-org/gomatrixserverlib#215
2020-08-07 15:00:23 +01:00
Neil Alexander b7491aae03
Yggdrasil demo updates (#1241)
* PerformServersAlive in PerformBroadcastEDU

* Don't double-pointer

* More reliable QUIC session handling

* Direct peer lookup, other tweaks

* Tweaks

* Try to wake up queues on incoming QUIC session

* Set session callbak on gobind build

* Fix incoming session storage

* Stateless reset, other tweaks

* Reset sessions when coordinates change

* Disable HTTP connection reuse, tweak timeouts
2020-08-06 16:00:42 +01:00
Neil Alexander 22f028e141
SelectJoinedHostsForRooms should use QueryVariadic on SQLite (#1238)
* SelectJoinedHostsForRooms should use QueryVariadic on SQLite

* Fix strings.Replace

* Fix statement
2020-08-05 10:00:35 +01:00
Kegan Dougal 78ab33f91f Unbreak postgres 2020-08-04 11:41:48 +01:00
Kegsay 0c4e8f6d4f
Send device list updates to servers (outbound only) (#1237)
* Add QueryDeviceMessages to serve up device keys and stream IDs

* Consume key change events in fedsender

Don't yet send them to destinations as we haven't worked them out yet

* Send device list updates to all required servers

* Glue it all together
2020-08-04 11:32:14 +01:00
Neil Alexander cfeb1b2f42
Add UNIQUE constraint to blacklist table (#1216) 2020-07-23 10:22:23 +01:00
Neil Alexander 1e71fd645e
Persistent federation sender blacklist (#1214)
* Initial persistence of blacklists

* Move statistics folder

* Make MaxFederationRetries configurable

* Set lower failure thresholds for Yggdrasil demos

* Still write events into database for blacklisted hosts (they can be tidied up later)

* Review comments
2020-07-22 17:01:29 +01:00