Commit Graph

58 Commits (5ade348d142012367e6cf4b8c2c65d6fbf357af6)

Author SHA1 Message Date
Kegsay f8d3a762c4
Add a per-room mutex to federationapi when processing transactions (#1810)
* Add a per-room mutex to federationapi when processing transactions

This has numerous benefits:
 - Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
   in a state resolution being performed immediately, without checking if we were already doing this in
   a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
 - Prevents memory usage from growing too large as a result and potentially OOMing.

And costs:
 - High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
   though this has always been an issue as roomserver has a per-room mutex already.

* Fix unit tests

* Correct mutex lock ordering
2021-03-30 10:01:32 +01:00
Kegsay af41f6d454
Add Sentry support (#1803)
* Add Sentry support

* Use HTTP Sentry properly maybe

* Capture panics

* Log fed Sentry stuff correctly

* British english linter
2021-03-24 10:25:24 +00:00
Kegsay 802f1c96f8
Add more metrics (#1802)
* Add more metrics

* Linting
2021-03-23 15:22:00 +00:00
Kegsay a1b7e4ef3f
log less for failed key querys, add counters for incoming pdus/edus (#1801)
* log less for failed key querys, add counters for incoming pdus/edus

* use labels

* Blacklist flakey test

* Fix metrics
2021-03-23 11:33:36 +00:00
Neil Alexander d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander 5d74a1757f
Don't query for servers so often in /send (#1766)
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events

* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254

* Fix resolve-state

* Initialise t.servers on first use
2021-02-16 17:12:17 +00:00
Neil Alexander 05324b6861
Send/state tweaks (#1681)
* Check missing event count

* Don't use request context for /send
2021-01-04 13:47:48 +00:00
Neil Alexander b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander 265cf5e835
Protect txnReq.newEvents with mutex (#1587)
* Protect txnReq.newEvents and txnReq.haveEvents with mutex

* Missing defer

* Remove t.haveEventsMutex
2020-11-18 11:31:58 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK bcb89ada5e
Implement read receipts (#1528)
* fix conversion from int to string yields a string of one rune, not a string of digits

* Add receipts table to syncapi

* Use StreamingToken as the since value

* Add required method to testEDUProducer

* Make receipt json creation "easier" to read

* Add receipts api to the eduserver

* Add receipts endpoint

* Add eduserver kafka consumer

* Add missing kafka config

* Add passing tests to whitelist

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Fix copy & paste error

* Fix column count error

* Make outbound federation receipts pass

* Make "Inbound federation rejects receipts from wrong remote" pass

* Don't use errors package

* - Add TODO for batching requests
- Rename variable

* Return a better error message

* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright

* Better creation/usage of JoinResponse

* Query all joined rooms instead of just one

* Update gomatrixserverlib

* Add sqlite3 migration

* Add postgres migration

* Ensure required sequence exists before running migrations

* Clarification on comment

* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}

* Remove dead code
Use key for timestamp

* Fix postgres query...

* Remove single purpose struct

* Use key/value directly

* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.

* Actually update the id, so it is correctly send in syncs

* Set receipt on request to /read_markers

* Fix issue with receipts getting overwritten

* Use fmt.Errorf instead of pkg/errors

* Revert "Add postgres migration"

This reverts commit 722fe5a04628882b787d096942459961db159b06.

* Revert "Add sqlite3 migration"

This reverts commit d113b03f6495a4b8f8bcf158a3d00b510b4240cc.

* Fix selectRoomReceipts query

* Make golangci-lint happy

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-09 18:46:11 +00:00
Neil Alexander 6e63df1d9a
KindOld (#1531)
* Add KindOld

* Don't process latest events/memberships for old events

* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events

* Signal to downstream components if an event has become a forward extremity

* Don't exclude from sync

* Soft-fail checks on KindNew

* Don't run the latest events updater at all for KindOld

* Don't make federation sender change after all

* Kind in federation sender join

* Don't send isForwardExtremity

* Fix syncapi

* Update comments

* Fix SendEventWithState

* Update sytest-whitelist

* Generate old output events

* Sync API consumes old room events

* Update comments
2020-10-19 14:59:13 +01:00
Neil Alexander 10f1beb0de
Don't re-run state resolution on a single trusted state snapshot (#1526)
* Don't re-run state resolution on a single trusted state snapshot

* Lint

* Check if backward extremity is create event before checking missing state
2020-10-15 12:08:49 +01:00
Neil Alexander 6f12b8f85c
Ignore typing events where sender doesn't match origin (#1523)
* Ignore typing notifications where the sender doesn't match the origin

* Update sytest-whitelist

* Fix formatting directives
2020-10-14 16:49:25 +01:00
Neil Alexander 7a1fd123de
Improved state handling in /send (#1521)
* Capture errors

* Don't request only state key tuples needed for auth (we end up discarding room state this way)

* QueryStateAfterEvent returns all state when no tuples supplied

* Resolve state

* Comments
2020-10-14 12:39:37 +01:00
Neil Alexander 9d6b77c58a
Try to retrieve missing auth events from multiple servers (#1516)
* Recursively fetch auth events if needed

* Fix processEvent call

* Ask more servers in lookupEvent

* Don't panic!

* Panic at the Disco

* Find servers more aggressively

* Add getServers

* Fix number of servers to 5, don't bail making RespState if auth events missing

* Fix panic

* Ignore missing state events too

* Report number of servers correctly

* Don't reuse request context for /send_join

* Update federation API tests

* Don't recurse processEvents

* Implement getEvents differently
2020-10-13 11:53:20 +01:00
Neil Alexander 8001627cfc
Get missing event tweaks (#1514)
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events

* Not finished refactor

* Fix test

* Remove isInboundTxn

* Remove debug logging
2020-10-12 15:56:15 +01:00
Neil Alexander 4df7e345bb
Only return 500 on /send if a database error occurs (#1503) 2020-10-09 15:06:43 +01:00
Neil Alexander 28454d6fb7
Log origin in /send 2020-10-02 11:38:35 +01:00
Neil Alexander 135b5e264f
Fix panic on verifySigError in fetching missing events 2020-09-30 13:51:54 +01:00
Neil Alexander 43cdba9a69
Ignore depth in federation API (#1451) 2020-09-29 14:07:59 +01:00
Neil Alexander 738b829a23
Fetch missing auth events, implement QueryMissingAuthPrevEvents, try other servers in room for /event and /get_missing_events (#1450)
* Try to ask other servers in the room for missing events if the origin won't provide them

* Logging

* More logging

* Implement QueryMissingAuthPrevEvents

* Try to get missing auth events badly

* Use processEvent

* Logging

* Update QueryMissingAuthPrevEvents

* Try to find missing auth events

* Patchy fix for test

* Logging tweaks

* Send auth events as outliers

* Update check in QueryMissingAuthPrevEvents

* Error responses

* More return codes

* Don't return error on reject/soft-fail since it was ultimately handled

* More tweaks

* More error tweaks
2020-09-29 13:40:29 +01:00
Neil Alexander ce318f53bc
Use workers when fetching events from /state_ids, use /state only if significant portion of events missing (#1447)
* Don't fall back to /state on incoming /send

* Event workers for /state_ids, use /state only if significant percentage of events are missing
2020-09-28 11:32:59 +01:00
Neil Alexander 40dd16a6e6
Don't fall back to /state on incoming /send (#1446) 2020-09-28 10:03:18 +01:00
Kegsay 18231f25b4
Implement rejected events (#1426)
* WIP Event rejection

* Still send back errors for rejected events

Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.

* Implement rejected events

Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.

* Update test to match reality

* Modify InputRoomEvents to no longer return an error

Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.

* Remove redundant returns; linting

* Update blacklist
2020-09-16 13:00:52 +01:00
Neil Alexander 726ad6ce2e
Backoff ignore invalid signatures (#1408) 2020-09-08 10:28:13 +01:00
Neil Alexander 895ead8048
Use background context when processing event with missing state (#1403)
* Use background context when processing event with missing state

* Five minute timeout

* Remove context from txnreq, thread through instead

* Fix unit tests
2020-09-07 12:32:40 +01:00
Kegsay 2570418f42
Remove ServerACLs from the current state server (#1390)
* Remove ServerACLs from the current state server

Functionality moved to roomserver

* Nothing to see here, move along
2020-09-04 10:40:58 +01:00
Neil Alexander 6150de6cb3
FIFO ordering of input events (#1386)
* Initial FIFOing of roomserver inputs

* Remove EventID response from api.InputRoomEventsResponse

* Don't send back event ID unnecessarily

* Fix ordering hopefully

* Reduce copies, use buffered task channel to reduce contention on other rooms

* Fix error handling
2020-09-03 15:22:16 +01:00
Neil Alexander bcdf9577a3
Support for server ACLs (#1261)
* First pass at server ACLs (not efficient)

* Use transaction origin, update whitelist

* Fix federation API test

It's sufficient for us to return nothing in response to current state, so that the server ACL check returns no ACLs.

* More efficient server ACLs - hopefully

* Fix queries

* Fix queries

* Avoid panics by nil pointers

* Bug fixes

* Fix state event type

* Fix mutex

* Update logging

* Ignore port when matching servername

* Use read mutex

* Fix bugs

* Fix sync API test

* Comments

* Add tests, tweaks to behaviour

* Fix test output
2020-08-11 18:19:11 +01:00
Neil Alexander 4b09f445c9
Configuration format v1 (#1230)
* Initial pass at refactoring config (not finished)

* Don't forget current state and EDU servers

* More shifting around

* Update server key API tests

* Fix roomserver test

* Fix more tests

* Further tweaks

* Fix current state server test (sort of)

* Maybe fix appservices

* Fix client API test

* Include database connection string in database options

* Fix sync API build

* Update config test

* Fix unit tests

* Fix federation sender build

* Fix gobind build

* Set Listen address for all services in HTTP monolith mode

* Validate config, reinstate appservice derived in directory, tweaks

* Tweak federation API test

* Set MaxOpenConnections/MaxIdleConnections to previous values

* Update generate-config
2020-08-10 14:18:04 +01:00
Neil Alexander 5dd5a41119
Tweak log levels of some federation logging (#1248)
* Tweak log levels of some federation logging

* Update go.mod/go.sum for matrix-org/util#22 and matrix-org/gomatrixserverlib#215
2020-08-07 15:00:23 +01:00
Kegsay 642f9cb964
Process inbound device list updates from federation (#1240)
* Add InputDeviceListUpdate

* Unbreak unit tests

* Process inbound device list updates from federation

- Persist the keys in the keyserver and produce key changes
- Does not currently fetch keys from the remote server if the prev IDs are missing

* Linting
2020-08-05 13:41:16 +01:00
Neil Alexander 4cf45d1ce9
Don't include current state in processEventWithMissingState (#1126)
* Don't include current state in processEventWithMissingState

* Remove lookupCurrentState as not needed

Co-authored-by: Kegsay <kegan@matrix.org>
2020-06-29 14:39:21 +01:00
Kegsay 914f6cadce
Add /send restrictions and return correct error codes (#1156)
* Add /send restrictions and return correct error codes

- Max 50 PDUs / 100 EDUs
- Fail the transaction when PDUs contain bad JSON

* Update whitelist

* Unbreak test

* Linting
2020-06-23 13:15:15 +01:00
Kegsay 02565c37aa
/send auth errors are silent (#1149)
* /send auth errors are silent

* Fix test
2020-06-23 10:31:17 +01:00
Kegsay 7c36fb78a7
Fix rooms v3 url paths for good - with tests (#1130)
* Fix rooms v3 url paths for good - with tests

- Add a test rig around `federationapi` to test routing.
- Use `JSONVerifier` over `KeyRing` so we can stub things out more easily.
- Add `test.NopJSONVerifier` which verifies nothing.
- Add `base.BaseMux` which is the original `mux.Router` used to spawn public/internal routers.
- Listen on `base.BaseMux` and not the default serve mux as it cleans paths which we don't want.
- Factor out `ListenAndServe` to `test.ListenAndServe` and add flag for listening on TLS.

* Fix comments

* Linting
2020-06-15 16:57:59 +01:00
Kegsay b7187a9a35
Remove clientapi producers which aren't actually producers (#1111)
* Remove clientapi producers which aren't actually producers

They are actually just convenience wrappers around the internal APIs
for roomserver/eduserver. Move their logic to their respective `api`
packages and call them directly.

* Remove TODO

* unbreak ygg
2020-06-10 12:17:54 +01:00
Neil Alexander 76ff47c052
Use AuthChainProvider to try and speed up federated joins (#1100)
* Use MissingAuthEventHandler on performjoin to try and speed up cases where we have missing events

* Update gomatrixserverlib

* Use supplied room version

* Use AuthChainProvider

* Tweaks

* Update gomatrixserverlib

* Signature checks
2020-06-05 11:48:52 +01:00
Neil Alexander a5d822004d
Send-to-device support (#1072)
* Groundwork for send-to-device messaging

* Update sample config

* Add unstable routing for now

* Send to device consumer in sync API

* Start the send-to-device consumer

* fix indentation in dendrite-config.yaml

* Create send-to-device database tables, other tweaks

* Add some logic for send-to-device messages, add them into sync stream

* Handle incoming send-to-device messages, count them with EDU stream pos

* Undo changes to test

* pq.Array

* Fix sync

* Logging

* Fix a couple of transaction things, fix client API

* Add send-to-device test, hopefully fix bugs

* Comments

* Refactor a bit

* Fix schema

* Fix queries

* Debug logging

* Fix storing and retrieving of send-to-device messages

* Try to avoid database locks

* Update sync position

* Use latest sync position

* Jiggle about sync a bit

* Fix tests

* Break out the retrieval from the update/delete behaviour

* Comments

* nolint on getResponseWithPDUsForCompleteSync

* Try to line up sync tokens again

* Implement wildcard

* Add all send-to-device tests to whitelist, what could possibly go wrong?

* Only care about wildcard when targeted locally

* Deduplicate transactions

* Handle tokens properly, return immediately if waiting send-to-device messages

* Fix sync

* Update sytest-whitelist

* Fix copyright notice (need to do more of this)

* Comments, copyrights

* Return errors from Do, fix dendritejs

* Review comments

* Comments

* Constructor for TransactionWriter

* defletions

* Update gomatrixserverlib, sytest-blacklist
2020-06-01 17:50:19 +01:00
Neil Alexander 406b47267e
Return 500 when processing a transaction fails fatally (#1066) 2020-05-27 11:16:27 +01:00
Kegsay 24d8df664c
Fix #897 and shuffle directory around (#1054)
* Fix #897 and shuffle directory around

* Update find-lint

* goimports

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-05-21 14:40:13 +01:00
Neil Alexander ee140c9d6a
Reduce 500s (#1017)
* Try to avoid returning 500s on /send

* Don't return 500s from media API download requests

* Don't 500 on context errors

* Update sytest-whitelist

* Fix lint, add comments
2020-05-13 13:01:45 +01:00
Kegsay ce5dfbebf9
Implement /get_missing_events (#1022)
* WIP get_missing_events work

* More WIP get_missing_events work

* First working /get_missing_events implementation

Flakey currently due to racing between /sync and /send

* Final tweaks

* Remove log lines

* Linting

* go mod tidy

* Clamp min depth to 0

* sort events by depth because sytest makes me sad

Specifically I think it's
4172585c25/lib/SyTest/Federation/Client.pm (L265)
to blame here.
2020-05-12 16:24:28 +01:00
Kegsay 3b98535dc5
only send new events to RS; add tests for /state_ids and /event (#1011)
* only send new events to RS; add tests for /state_ids and /event

* Review comments: send in auth event order

* Ignore order of state events for this test as RespState.Events is non-deterministic
2020-05-06 18:03:25 +01:00
Kegsay 1294852270
Add tests around federationapi's txnReq (#1010)
* Add necessary stubs for testing txnReq

* Add basic tests
2020-05-06 14:27:02 +01:00
Kegsay 1db5dfe4d0
Fetch events by ID rather than use current state as this includes auth events (#1009) 2020-05-05 16:46:22 +01:00
Kegsay 31d3b0d4a5
Prefer /state_ids when missing state across federation (#1008)
* Prefer /state_ids when missing state across federation

* Linting

* Better logging
2020-05-05 15:48:37 +01:00
Neil Alexander e15f6676ac
Consolidation of roomserver APIs (#994)
* Consolidation of roomserver APIs

* Comment out alias tests for now, they are broken

* Wire AS API into roomserver again

* Roomserver didn't take asAPI param before so return to that

* Prevent roomserver asking AS API for alias info

* Rename some files

* Remove alias_test, incoherent tests and unwanted appservice integration

* Remove FS API inject on syncapi component
2020-05-01 10:48:17 +01:00
Neil Alexander 3c2e6f967b
Federation fixes and error handling (#970)
* Improve error handling in federation /send endpoint a bit

* Remove unknownRoomError, use unmarshalError when unable to get room ID

* Swap out a couple more internal server errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Return bad limit in error

* Same with domain/userid
2020-04-16 17:59:55 +01:00