Commit graph

31 commits

Author SHA1 Message Date
Neil Alexander
79c5485c8d
Allow clearing federation blacklist at startup for P2P demos 2021-05-24 11:43:24 +01:00
Kegsay
656d11ec90
fedsender: tolerate dupe membership events (#1824)
* fedsender: tolerate dupe membership events

Previously if the fedsender got a duplicate membership event it would cause
the entire process to crash. Now it doesn't. This masks an issue with the
roomserver where it can emit duplicate membership events.

* Update joined_hosts_table.go
2021-04-14 11:11:54 +01:00
Neil Alexander
6099379ea4
Remove rooms table from federation sender (#1751)
* Remove last sent event ID column from federation sender

* Remove EventIDMismatchError

* Remove the federationsender rooms table altogether, it's useless

* Add migration

* Fix migrations

* Fix migrations
2021-02-04 11:52:49 +00:00
Matthew Hodgson
0571d395b5
Peeking over federation via MSC2444 (#1391)
* a very very WIP first cut of peeking via MSC2753.

doesn't yet compile or work.
needs to actually add the peeking block into the sync response.
checking in now before it gets any bigger, and to gather any initial feedback on the vague shape of it.

* make PeekingDeviceSet private

* add server_name param

* blind stab at adding a `peek` section to /sync

* make it build

* make it launch

* add peeking to getResponseWithPDUsForCompleteSync

* cancel any peeks when we join a room

* spell out how to runoutside of docker if you want speed

* fix SQL

* remove unnecessary txn for SelectPeeks

* fix s/join/peek/ cargocult fail

* HACK: Track goroutine IDs to determine when we write by the wrong thread

To use: set `DENDRITE_TRACE_SQL=1` then grep for `unsafe`

* Track partition offsets and only log unsafe for non-selects

* Put redactions in the writer goroutine

* Update filters on writer goroutine

* wrap peek storage in goid hack

* use exclusive writer, and MarkPeeksAsOld more efficiently

* don't log ascii in binary at sql trace...

* strip out empty roomd deltas

* re-add txn to SelectPeeks

* re-add accidentally deleted field

* reject peeks for non-worldreadable rooms

* move perform_peek

* fix package

* correctly refactor perform_peek

* WIP of implementing MSC2444

* typo

* Revert "Merge branch 'kegan/HACK-goid-sqlite-db-is-locked' into matthew/peeking"

This reverts commit 3cebd8dbfbccdf82b7930b7b6eda92095ca6ef41, reversing
changes made to ed4b3a58a7855acc43530693cc855b439edf9c7c.

* (almost) make it build

* clean up bad merge

* support SendEventWithState with optional event

* fix build & lint

* fix build & lint

* reinstate federated peeks in the roomserver (doh)

* fix sql thinko

* todo for authenticating state returned by /peek

* support returning current state from QueryStateAndAuthChain

* handle SS /peek

* reimplement SS /peek to prod the RS to tell the FS about the peek

* rename RemotePeeks as OutboundPeeks

* rename remote_peeks_table as outbound_peeks_table

* add perform_handle_remote_peek.go

* flesh out federation doc

* add inbound peeks table and hook it up

* rename ambiguous RemotePeek as InboundPeek

* rename FSAPI's PerformPeek as PerformOutboundPeek

* setup inbound peeks db correctly

* fix api.SendEventWithState with no event

* track latestevent on /peek

* go fmt

* document the peek send stream race better

* fix SendEventWithRewrite not to bail if handed a non-state event

* add fixme

* switch SS /peek to use SendEventWithRewrite

* fix comment

* use reverse topo ordering to find latest extrem

* support postgres for federated peeking

* go fmt

* back out bogus go.mod change

* Fix performOutboundPeekUsingServer

* Fix getAuthChain -> GetAuthChain

* Fix build issues

* Fix build again

* Fix getAuthChain -> GetAuthChain

* Don't repeat outbound peeks for the same room ID to the same servers

* Fix lint

* Don't omitempty to appease sytest

Co-authored-by: Kegan Dougal <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-01-22 14:55:08 +00:00
Neil Alexander
f64c8822bc
Federation sender refactor (#1621)
* Refactor federation sender, again

* Clean up better

* Missing operators

* Try to get overflowed events from database

* Fix queries

* Log less

* Comments

* nil PDUs/EDUs shouldn't happen but guard against them for safety

* Tweak logging

* Fix transaction coalescing

* Update comments

* Check nils more

* Remove channels as they add extra complexity and possibly will deadlock

* Don't hold lock while sending transaction

* Less spam about sleeping queues

* Comments

* Bug-fixing

* Don't try to rehydrate twice

* Don't queue in memory for blacklisted destinations

* Don't queue in memory for blacklisted destinations

* Fix a couple of bugs

* Check for duplicates when pulling things out of the database

* Durable transactions, some more refactoring

* Revert "Durable transactions, some more refactoring"

This reverts commit 5daf924eaaefec5e4f7c12c16ca24e898de4adbb.

* Fix deadlock
2020-12-09 10:03:22 +00:00
Neil Alexander
5d65a879a5
Federation sender event cache (#1614)
* Cache federation sender events

* Store in the correct cache

* Update federation event cache

* Fix Unset

* Give EDUs same caching treatment as PDUs

* Make federationsender_cache_size configurable

* Default caches configuration

* Fix unit tests

* Revert "Fix unit tests"

This reverts commit 24eb5d22524f20e1024b1475debe61ae20538a5a.

* Revert "Default caches configuration"

This reverts commit 464ecd1e64b9d2983f6fd5430e9607519d543cb3.

* Revert "Make federationsender_cache_size configurable"

This reverts commit 4631f5324151e006a15d6f19008f06361b994607.
2020-12-04 14:52:10 +00:00
Neil Alexander
b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander
3afc623098
Fix RewritesState bug (#1557)
* Set RewritesState once

* Check if any new state provided

* Obey rewritesState

* Don't nuke everything the sync API knows when purging state

* Fix panic from duplicate insert

* Consistency

* Use HasState

* Remove nolint

* Clean up joined rooms on state rewrite
2020-10-22 10:39:16 +01:00
Neil Alexander
9d53351dc2
Component-wide TransactionWriters (#1290)
* Offset updates take place using TransactionWriter

* Refactor TransactionWriter in current state server

* Refactor TransactionWriter in federation sender

* Refactor TransactionWriter in key server

* Refactor TransactionWriter in media API

* Refactor TransactionWriter in server key API

* Refactor TransactionWriter in sync API

* Refactor TransactionWriter in user API

* Fix deadlocking Sync API tests

* Un-deadlock device database

* Fix appservice API

* Rename TransactionWriters to Writers

* Move writers up a layer in sync API

* Document sqlutil.Writer interface

* Add note to Writer documentation
2020-08-21 10:42:08 +01:00
Neil Alexander
b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00
Neil Alexander
4b09f445c9
Configuration format v1 (#1230)
* Initial pass at refactoring config (not finished)

* Don't forget current state and EDU servers

* More shifting around

* Update server key API tests

* Fix roomserver test

* Fix more tests

* Further tweaks

* Fix current state server test (sort of)

* Maybe fix appservices

* Fix client API test

* Include database connection string in database options

* Fix sync API build

* Update config test

* Fix unit tests

* Fix federation sender build

* Fix gobind build

* Set Listen address for all services in HTTP monolith mode

* Validate config, reinstate appservice derived in directory, tweaks

* Tweak federation API test

* Set MaxOpenConnections/MaxIdleConnections to previous values

* Update generate-config
2020-08-10 14:18:04 +01:00
Neil Alexander
22f028e141
SelectJoinedHostsForRooms should use QueryVariadic on SQLite (#1238)
* SelectJoinedHostsForRooms should use QueryVariadic on SQLite

* Fix strings.Replace

* Fix statement
2020-08-05 10:00:35 +01:00
Kegsay
0c4e8f6d4f
Send device list updates to servers (outbound only) (#1237)
* Add QueryDeviceMessages to serve up device keys and stream IDs

* Consume key change events in fedsender

Don't yet send them to destinations as we haven't worked them out yet

* Send device list updates to all required servers

* Glue it all together
2020-08-04 11:32:14 +01:00
Neil Alexander
cfeb1b2f42
Add UNIQUE constraint to blacklist table (#1216) 2020-07-23 10:22:23 +01:00
Neil Alexander
1e71fd645e
Persistent federation sender blacklist (#1214)
* Initial persistence of blacklists

* Move statistics folder

* Make MaxFederationRetries configurable

* Set lower failure thresholds for Yggdrasil demos

* Still write events into database for blacklisted hosts (they can be tidied up later)

* Review comments
2020-07-22 17:01:29 +01:00
Neil Alexander
489f34fed7
Remove debug lines 2020-07-20 17:03:20 +01:00
Neil Alexander
11a39fe3b5
Deduplicate FS database, EDU persistence table (#1207)
* Deduplicate FS database, add some EDU persistence groundwork

* Extend TransactionWriter to use optional existing transaction, use that for FS SQLite database writes

* Fix build due to bad keyserver import

* Working EDU persistence

* gocyclo, unsurprisingly

* Remove unused

* Update copyright notices
2020-07-20 16:55:20 +01:00
Neil Alexander
e5208c2ec9
Yggdrasil demo updates ("Bare QUIC")
Squashed commit of the following:

commit 86c2388e13ffdbabdd50cea205652dccc40e1860
Merge: b0a3ee6c f5e7e751
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:47:10 2020 +0100

    Merge branch 'master' into neilalexander/yggbarequic

commit b0a3ee6c5c063962384bb91c59ec753ddc8cfe5f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:42:22 2020 +0100

    Add support for broadcasting wake-up EDUs to known hosts

commit 8a5c2020b3a4b705b5d5686a9e71990a49e6d471
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 16 13:42:10 2020 +0100

    Bare QUIC demo working

commit d3939b3d6568cf4262c0391486a5203873b68bfc
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jul 15 11:42:43 2020 +0100

    Support bare Yggdrasil sessions with encrypted QUIC
2020-07-16 13:52:08 +01:00
Neil Alexander
08e9d996b6
Yggdrasil demo updates
Squashed commit of the following:

commit 6c2c48f862c1b6f8e741c57804282eceffe02487
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 16:28:09 2020 +0100

    Add README.md

commit 5eeefdadf8e3881dd7a32559a92be49bd7ddaf47
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 10:18:50 2020 +0100

    Fix wedge in federation sender

commit e2ebffbfba25cf82378393940a613ec32bfb909f
Merge: 0883ef88 abf26c12
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:23 2020 +0100

    Merge branch 'master' into neilalexander/yggdrasil

commit 0883ef8870e340f2ae9a0c37ed939dc2ab9911f6
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 10 09:51:06 2020 +0100

    Adjust timeouts

commit ba2d53199910f13b60cc892debe96a962e8c9acb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 16:34:40 2020 +0100

    Try to wake up from peers/sessions properly

commit 73f42eb494741ba5b0e0cef43654708e3c8eb399
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 15:43:38 2020 +0100

    Use TransactionWriter to reduce database lock issues on SQLite

commit 08bfe63241a18c58c539c91b9f52edccda63a611
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 12:38:02 2020 +0100

    Un-wedge federation

    Squashed commit of the following:

    commit aee933f8785e7a7998105f6090f514d18051a1bd
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:22:41 2020 +0100

        Un-goroutine the goroutines

    commit 478374e5d18a3056cac6682ef9095d41352d1295
    Author: Neil Alexander <neilalexander@users.noreply.github.com>
    Date:   Thu Jul 9 12:09:31 2020 +0100

        Reduce federation sender wedges

commit 40cc62c54d9e3a863868214c48b7c18e522a4772
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Thu Jul 9 10:02:52 2020 +0100

    Handle switching in/out background more reliably
2020-07-10 16:28:18 +01:00
Neil Alexander
9cc52f47f3
Use TransactionWriter to reduce database lock issues on SQLite (#1192) 2020-07-09 17:48:56 +01:00
Neil Alexander
1773fd84b7
Hydrate destination queues at startup (#1179)
* Hydrate destination queues at startup

* Review comments
2020-07-03 11:49:49 +01:00
Neil Alexander
38caf8e5b7
Yggdrasil+QUIC demo, federation sender tweaks (#1177)
* Initial QUIC work

* Update Yggdrasil demo

* Make sure that the federation sender knows how many pending events are in the database when the worker starts

* QUIC tunables

* pprof

* Don't spin

* Set build info for Yggdrasil
2020-07-02 17:43:07 +01:00
Neil Alexander
42dd962425
Persistent federation sender queues (PDUs) (#1173)
* Initial work on persistent queues

* Update index for event ID and server name

* Put things into database (postgres for now)

* Duplicate postgres code into sqlite for now just to stop build errors, will fix SQLite soon

* Fix table name

* Fix index

* Fix table name

* Use RETURNING because LastInsertID is not supported by postgres

* Use functions

* Marshal headered event

* Don't error on now rows

* Don't block if there are PDUs waiting

* Try to tidy up JSON

* Debug logging

* Fix query, use transactions in postgres

* Clean up

* Rehydrate more opportunistically

* Fix SQLite

* remove unused types

* Review comments

* Shuffle things around a bit

* Clean up transaction properly

* Don't send empty transactions

* Reduce unnecessary retries

* Count PDUs to make more resilient

* Don't stop when there is work to be done

* Try to limit wakeups

* well this is tedious

* Fix race in incomplete transactions

* Thread safety on transaction ID/count
2020-07-01 11:46:38 +01:00
Kegsay
ecd7accbad
Rehuffle where things are in the internal package (#1122)
renamed:    internal/eventcontent.go -> internal/eventutil/eventcontent.go
	renamed:    internal/events.go -> internal/eventutil/events.go
	renamed:    internal/types.go -> internal/eventutil/types.go
	renamed:    internal/http/http.go -> internal/httputil/http.go
	renamed:    internal/httpapi.go -> internal/httputil/httpapi.go
	renamed:    internal/httpapi_test.go -> internal/httputil/httpapi_test.go
	renamed:    internal/httpapis/paths.go -> internal/httputil/paths.go
	renamed:    internal/routing.go -> internal/httputil/routing.go
	renamed:    internal/basecomponent/base.go -> internal/setup/base.go
	renamed:    internal/basecomponent/flags.go -> internal/setup/flags.go
	renamed:    internal/partition_offset_table.go -> internal/sqlutil/partition_offset_table.go
	renamed:    internal/postgres.go -> internal/sqlutil/postgres.go
	renamed:    internal/postgres_wasm.go -> internal/sqlutil/postgres_wasm.go
	renamed:    internal/sql.go -> internal/sqlutil/sql.go
2020-06-12 14:55:57 +01:00
Kegsay
e7d1ac84c3
Add ParseFileURI and use it when dealing with file URIs (#1088)
* Add ParseFileURI and use it when dealing with file URIs

Fixes #1059

* Missing file

* Linting
2020-06-04 11:18:08 +01:00
Kegsay
24d8df664c
Fix #897 and shuffle directory around (#1054)
* Fix #897 and shuffle directory around

* Update find-lint

* goimports

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-05-21 14:40:13 +01:00
Neil Alexander
f7cfa75886
Limit database connections (#980, #564) (#998)
* Limit database connections (#564)

- Add new options to the config file database:
      max_open_conns: 100
      max_idle_conns: 2
      conn_max_lifetime: -1
- Implement connection parameter setup on the *DB (database/sql) in internal/sqlutil/trace.go:Open()
- Propagate the values in the form of DbProperties interface via all the
  Open() and NewDatabase() functions

Signed-off-by: Tomas Jirka <tomas.jirka@email.cz>

* Fix wasm builds

* Remove file accidentally added from working tree

Co-authored-by: Tomas Jirka <tomas.jirka@email.cz>
2020-05-01 13:34:53 +01:00
Kegsay
c1bca95adb
Add SQL tracing via DENDRITE_TRACE_SQL (#968)
* Add SQL tracing via DENDRITE_TRACE_SQL

Add this to `internal/sqlutil` in preparation for #897

* Not entirely
2020-04-16 10:06:55 +01:00
Prateek Sachan
c019ad7086
Log errors from rows.Close (#920)
* Log errors from rows.Close

* fixed imports

* Added contextual messages

* fixed review changes
2020-03-18 10:17:18 +00:00
Kegsay
a97b8eafd4
Add peer-to-peer support into Dendrite via libp2p and fetch (#880)
* Use a fork of pq which supports userCurrent on wasm

* Use sqlite3_js driver when running in JS

* Add cmd/dendritejs to pull in sqlite3_js driver for wasm only

* Update to latest go-sqlite-js version

* Replace prometheus with a stub. sigh

* Hard-code a config and don't use opentracing

* Latest go-sqlite3-js version

* Generate a key for now

* Listen for fetch traffic rather than HTTP

* Latest hacks for js

* libp2p support

* More libp2p

* Fork gjson to allow us to enforce auth checks as before

Previously, all events would come down redacted because the hash
checks would fail. They would fail because sjson.DeleteBytes didn't
remove keys not used for hashing. This didn't work because of a build
tag which included a file which no-oped the index returned.

See https://github.com/tidwall/gjson/issues/157

When it's resolved, let's go back to mainline.

* Use gjson@1.6.0 as it fixes https://github.com/tidwall/gjson/issues/157

* Use latest gomatrixserverlib for sig checks

* Fix a bug which could cause exclude_from_sync to not be set

Caused when sending events over federation.

* Use query variadic to make lookups actually work!

* Latest gomatrixserverlib

* Add notes on getting p2p up and running

Partly so I don't forget myself!

* refactor: Move p2p specific stuff to cmd/dendritejs

This is important or else the normal build of dendrite will fail
because the p2p libraries depend on syscall/js which doesn't work
on normal builds.

Also, clean up main.go to read a bit better.

* Update ho-http-js-libp2p to return errors from RoundTrip

* Add an LRU cache around the key DB

We actually need this for P2P because otherwise we can *segfault*
with things like: "runtime: unexpected return pc for runtime.handleEvent"
where the event is a `syscall/js` event, caused by spamming sql.js
caused by "Checking event signatures for 14 events of room state" which
hammers the key DB repeatedly in quick succession.

Using a cache fixes this, though the underlying cause is probably a bug
in the version of Go I'm on (1.13.7)

* breaking: Add Tracing.Enabled to toggle whether we do opentracing

Defaults to false, which is why this is a breaking change. We need
this flag because WASM builds cannot do opentracing.

* Start adding conditional builds for wasm to handle lib/pq

The general idea here is to have the wasm build have a `NewXXXDatabase`
that doesn't import any postgres package and hence we never import
`lib/pq`, which doesn't work under WASM (undefined `userCurrent`).

* Remove lib/pq for wasm for syncapi

* Add conditional building to remaining storage APIs

* Update build script to set env vars correctly for dendritejs

* sqlite bug fixes

* Docs

* Add a no-op main for dendritejs when not building under wasm

* Use the real prometheus, even for WASM

Instead, the dendrite-sw.js must mock out `process.pid` and
`fs.stat` - which must invoke the callback with an error (e.g `EINVAL`)
in order for it to work:

```
    global.process = {
        pid: 1,
    };
    global.fs.stat = function(path, cb) {
        cb({
            code: "EINVAL",
        });
    }
```

* Linting
2020-03-06 10:23:55 +00:00
Kegsay
b6ea1bc67a
Support sqlite in addition to postgres (#869)
* Move current work into single branch

* Initial massaging of clientapi etc (not working yet)

* Interfaces for accounts/devices databases

* Duplicate postgres package for sqlite3 (no changes made to it yet)

* Some keydb, accountdb, devicedb, common partition fixes, some more syncapi tweaking

* Fix accounts DB, device DB

* Update naffka dependency for SQLite

* Naffka SQLite

* Update naffka to latest master

* SQLite support for federationsender

* Mostly not-bad support for SQLite in syncapi (although there are problems where lots of events get classed incorrectly as backward extremities, probably because of IN/ANY clauses that are badly supported)

* Update Dockerfile -> Go 1.13.7, add build-base (as gcc and friends are needed for SQLite)

* Implement GET endpoints for account_data in clientapi

* Nuke filtering for now...

* Revert "Implement GET endpoints for account_data in clientapi"

This reverts commit 4d80dff4583d278620d9b3ed437e9fcd8d4674ee.

* Implement GET endpoints for account_data in clientapi (#861)

* Implement GET endpoints for account_data in clientapi

* Fix accountDB parameter

* Remove fmt.Println

* Fix insertAccountData SQLite query

* Fix accountDB storage interfaces

* Add empty push rules into account data on account creation (#862)

* Put SaveAccountData into the right function this time

* Not sure if roomserver is better or worse now

* sqlite work

* Allow empty last sent ID for the first event

* sqlite: room creation works

* Support sending messages

* Nuke fmt.println

* Move QueryVariadic etc into common, other device fixes

* Fix some linter issues

* Fix bugs

* Fix some linting errors

* Fix errcheck lint errors

* Make naffka use postgres as fallback, fix couple of compile errors

* What on earth happened to the /rooms/{roomID}/send/{eventType} routing

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-02-13 17:27:33 +00:00