Conversation
f8ca4e2 to
e55ac11
Compare
Added cach layer Remove redundant calls.
e55ac11 to
04f6520
Compare
| t.Helper() | ||
|
|
||
| var expSegments [][]byte | ||
| expSegments := make([][]byte, 0, len(exp)) |
There was a problem hiding this comment.
These are fine but feel like they belong in a separate PR.
There was a problem hiding this comment.
Yes, but linter was failing so I decided to fix everything. Next time will create separate PR
|
Suggestion: Instead of a fixed TTL ( How it works:
|
|
I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles. If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here). The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping. @janos wdyt? |
Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache. I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically. The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking. |
For me it sounds like overengineering a little bit from one side, and not so flexible implementations from another side (may be I miss smth?). I thought about whatif would like to cache another type of value? Than
I have couple of concerns:
I think, there can be other corner cases, which would be more difficult to catch, since debugging will become more complex. |
janos
left a comment
There was a problem hiding this comment.
@sbackend123 thank you for addressing my comments. As far as they are concerned, all is fine.
Given that there is a light on some different approaches, and responses on how sensitive storage incentivization is to the block number precision, I am still not approving it, as there is a possibility to reduce the frequency of rpc call for block number even more.
|
akrem-chabchoub
left a comment
There was a problem hiding this comment.
There are some git conflicts that needs to be resolved
| chunks = make([]swarm.Chunk, 0, int(chunksPerPO)*2) | ||
| putter = storer.ReservePutter() | ||
| ) | ||
| chunks = make([]swarm.Chunk, 0, chunksPerPO*2) |
There was a problem hiding this comment.
it is a duplicate,there is already allocation on line 543
There was a problem hiding this comment.
that's also why the linter is complaining
| "github.com/ethersphere/bee/v2/pkg/transaction/wrapped/cache" | ||
| ) | ||
|
|
||
| const defaultBlockSyncInterval = 10 |
There was a problem hiding this comment.
if the value is always this one, i think it makes more sense to have the CLI flag default to be 10 and get rid of this const. it just complicates the configuration and adjustment (and gets rid of the necessity of using a sentinel value)
| c.mu.RLock() | ||
| defer c.mu.RUnlock() | ||
|
|
||
| if c.valid { |
There was a problem hiding this comment.
if this generalized cache object is already aware of the expiresAt time, i think that it can already be "smart enough" to know something has expired already. you can already determine here whether valid is still valid, and in fact you can do away with the field on the type altogether and just check whether now() >= expiresAt, since any subsequent usage of this value has to also compare now to the expiresAt time (otherwise you're blindly using that value).
also a note that generics increases the complexity here with no obvious place where it can be reused.
| } | ||
| } | ||
|
|
||
| func (c *ExpiringSingleFlightCache[T]) Peek() (T, time.Time, bool) { |
There was a problem hiding this comment.
this method does not need to be exported apart from for unit tests. can we not have it tested by just checking the callbacks?
| } | ||
|
|
||
| func (c *ExpiringSingleFlightCache[T]) PeekOrLoad(ctx context.Context, now time.Time, canReuse ReuseEvaluator[T], loader Loader[T]) (T, error) { | ||
| if value, expiresAt, ok := c.Peek(); ok { |
There was a problem hiding this comment.
i am not 100% sure that Peek is needed. since the cache is expiring, and it already has a notion of expiration time, then why do you need to use both peer and pass now (passed by the caller) back to the caller using the data they have already provided just in order to check whether now() >= expirationTime?
would it not be easier to inline the mutex locking and the now>expirationTime check?
There was a problem hiding this comment.
Maybe I’m going too much into the details, but PeekOrLoad is already not the most trivial function, and I would keep Peek at least to avoid expanding PeekOrLoad(). Especially since there is also Set, which works with the mutex as well, so we end up with three methods, each responsible for a specific task. Does it makes sense for you?
There was a problem hiding this comment.
Yes I agree it is not trivial and there's already a bit too much indirection for what it is supposed to do. I just find it odd that the caller passes now to PeekOrLoad... just to end up getting it back in the callback (plus shadowing) and do the relevance check in the callback with:
if now.Before(expiresAt) {
return true, expiresAt
}So why pass now at all? you can have it cleaner with just passing c.value and c.expiresAt into the callback and tell the callback to do the check and call it a day. In both cases Peek is not needed.
| number: header.Number.Uint64(), | ||
| timestamp: time.Unix(int64(header.Time), 0).UTC(), | ||
| } | ||
| return anchor, b.nextExpectedBlockTime(anchor, 0), nil |
There was a problem hiding this comment.
i'm not sure i understand this very well. intuitively i'm interpreting this as this fetched block value will expire in the next upcoming block tick, but the whole change is about "caching" (or trusting the guessing of block numbers) in chunks (i.e. expiration date is more like a next sync date).
also, the passing of 0 here as a time value isn't clear - why?
There was a problem hiding this comment.
After brief discussion decided to remove expiresAt, so nextExpectedBlockTime is not needed anymore and we can simplify implementation
| minimumGasTipCap: int64(minimumGasTipCap), | ||
| blockTime: blockTime, | ||
| metrics: newMetrics(), | ||
| blockSyncInterval: blockSyncInterval, |
There was a problem hiding this comment.
We should prevent blockSyncInterval to be 0. Maybe to set it to 1 in that case?
In pkg/transaction/wrapped/wrapped.go#L135 division by zero can happen
acud
left a comment
There was a problem hiding this comment.
LGTM. As general feedback - I'd find it helpful to have a bit more comments in the code as it is just making the reviews easier to do. They don't have to be long, but the general direction of what the code does or should do (method level documentation) is generally good and gives the reader a bit more info while going through the code.
| optionNameRedistributionAddress = "redistribution-address" | ||
| optionNameStakingAddress = "staking-address" | ||
| optionNameBlockTime = "block-time" | ||
| optionNameBlockSyncInterval = "block-sync-interval" |
There was a problem hiding this comment.
Please update the packaging folder with the new flag (in docker-compose,...)
# Conflicts: # packaging/docker/docker-compose.yml # packaging/docker/env
Checklist
Description
Removed RPC patterns in the transaction flow. Added cache layer for BlockNumber rpc call.
Fixed lint issues in the new cache package and cleaned up minor lint findings in adjacent test/helper code.
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
#5388
Screenshots (if appropriate):
Highly likely hit ratio is low because of high load errors (EOF), which is fix by PR