New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Brotli Compression to CoreFX #24389
Comments
I don't think the non-streaming samples are illustrating what we are after. They will never go through more than one itteration of the loop. The right sample would be something like the following: public interface IOutput
{
Span<byte> Buffer { get; };
void Commit(int bytes);
void Resize(int minimumSize);
}
// This code is very naive, but it does illustrate a pipe scenario
public static void PipeCompress(ReadOnlyMemory<byte>[] inputs, IOutput output)
{
BrotliEncoder encoder;
for(int i=0; i<inputes.Length; i++) {
var input = inputs[i];
while (!input.IsEmpty)
{
var buffer = output.Buffer;
encoder.Compress(input, buffer, out int bytesConsumed, out int written);
output.Commit(written);
input = input.Slice(bytesConsumed);
}
}
encoder.Flush(output, out int bytesWritten, isFinished: true);
} |
The BrotliResult enum is very similar (if not identical) to the OperationStatus enum in the System.Buffers namespace that we use for our Base64 Encoder and Decoder. Can we use OperationStatus as the returned enum in Brotli as well? https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/OperationStatus.cs We should make sure we agree on the enum as part of this API review (leftover from previous API review: https://github.com/dotnet/corefx/issues/22412). |
@ahsonkhan where was the OperationStatus approved as API? I asked the question on #22845 but didn't get answer :) |
Oh, yes, I didn't see that OperationStatus was in CoreFX already. It is interchangeable with BrotliStatus. I'll update the proposal. |
Updated the spec with the more accurate use-case examples. |
I'm not convinced of the value of a struct-based encoder / decoder as opposed to a class-based encoder / decoder. At its core due to the interop you have a native handle under the covers. These handles need to have definite lifetimes, which means that somebody needs to make sure that handles are closed properly and that they're no longer used after they're closed. It's very difficult to make these guarantees with structs, and I'm afraid that we'll lead developers into a pit of failure if we go this direction. Indeed, this is why |
@GrabYourPitchforks I tend to agree. I'd prefer to see the internal handle represented by a That said, I do see the value in being allocation free and that's why I made it a struct. I tried to lessen the potential for erroneous code by making it an IDisposable struct, though it's still on the developer to actually do the disposal themselves. I'd be interested to hear other people's opinions on this. Historically, using structs in this manner has been a big no-no and is warned against in almost every article I could find, so whether the minor perf gain is worth the cost is questionable. I'll run some tests and see how much of a hit we take if we make it into a class instead. |
Even if there is a measurable cost to switching to class (and that would honestly surprise me), callers can work around it by pooling encoder / decoder instances. There's nothing that requires the type be a struct in order to get high performance. As an aside, I'm seeing that with this and other proposals we're trying to invent concepts analogous to C++'s deterministic destruction and non-copyable types. If this is something we think might be valuable to C# developers, we should propose these features directly, and then we can create types that are properly implemented on top of these features. |
It doesn't appear to have a significant impact that I could see in my testing which isn't too surprising. I'll update the proposal to be class based, unless someone feels strongly that it should be a struct. @KrzysztofCwalina ? |
It was implicitly approved as a placeholder to unblock the Base64 APIs and the final review was deferred, as far as I can recall, until we have more APIs (like the text encoders APIs, or in this case, compression APIs) to get a better understanding of the use cases. That is why I brought it up here since now we have more scenarios to confirm the usefulness/correctness of the enum. |
We discussed the struct/class and non-allocating/allocating concern in-person and the ending compromise is to leave it as an IDiposable struct, but use a Safehandle for the state pointer. Does anyone else have any more feedback before I mark this as ready for review? |
cc: @terrajobst |
Regarding OperationStatus, something to consider is that we already have an enum with a similar name in System.Net.NetworkInformation - OperationalStatus. |
@ahsonkhan, good point. Maybe we should call it OperationResult |
FYI: First part of the API review discussion on 2017/12/19 was recorded - see https://youtu.be/ZrT0uOsqQlI?t=6541 (5 min duration) |
We reviewed the API and made some minor tweaks: public partial class BrotliStream : Stream
{
public BrotliStream(Stream stream, CompressionLevel compressionLevel);
public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen);
public BrotliStream(Stream stream, CompressionMode mode);
public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen);
public Stream BaseStream { get; }
// We don't think we need those for now. Not having aligns the type with DeflateStream/GzipStream
// public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen, int bufferSize);
// public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen, int bufferSize);
// Overrides omitted for clarity
}
// We discussed making these guys classes and derive from CFO ourselves instead of wrapping a SafeHandle
// but we decided against it due to complexity and only minor savings (like when these are boxed)
public struct BrotliDecoder : IDisposable
{
public OperationStatus Decompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten);
public void Dispose();
public static bool TryDecompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
}
public struct BrotliEncoder : IDisposable
{
BrotliEncoder(int quality, int window);
public OperationStatus Compress(ReadOnlySpan<byte> source,
Span<byte> destination,
out int bytesConsumed,
out int bytesWritten,
bool isFinalBlock);
public OperationStatus Flush(Span<byte> destination, out int bytesWritten);
public void Dispose();
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int quality, int window);
public static int GetMaxCompressedLength(int length);
} |
FYI: Second part of API review discussion on 2017/12/19 was recorded - see https://www.youtube.com/watch?v=IIKqOagRdWA (1h 43min duration) (video is mostly frozen) |
which version of System.IO.Compression has BrotliStream? because I could not find it @ianhays |
@guylando BrotliStream is in its own assembly (System.IO.Compression.Brotli.dll) that is included in framework versions of 2.1 preview1 forward. For example, |
@ianhays what is the minimal nugget package to reference to get it? |
Essentially, yeah. A few versions back it was a bundling of packages or a meta-package with a large dependency list, but now it's a primary shipping vehicle that contains most of the framework. We don't ship Brotli as a standalone package because it's included in the framework package and we don't ship it downlevel either. cc: @weshaggard |
@ianhays Do I understand correctly that its a best practice to include Microsoft.NETCore.App instead of separate packages? Our projects are (3 years old) from before the existence of Microsoft.NETCore.App so when upgrading in the past I left them as is and never moved to Microsoft.NETCore.App because sounded better to leave as is. |
@guylando you will be on supported & tested combination of packages, which makes it actually less likely for you to hit issues. |
And does Microsoft.AspNetCore.App include it also? or only Microsoft.NetCore.App? thanks! |
If you are targeting .NET Core then yes that is the correct thing to do. You don't actually need to reference the package directly in your application, the reference you get is controlled by the TargetFramework you are using. You will need to target netcoreapp2.1 to get these new APIs.
Microsoft.AspNetCore.App depends on Microsoft.NetCore.App so it will be part of the closure. |
And is it possible to get Brotli if I target net461? |
No we don't ship it as a library package it is only available for .NET Core 2.1+ |
thanks for the answers! |
@weshaggard, @ianhays so brotli is not netstandard? and never will be? it is only on .net core? If I would like to include it into library that is available for netstandard, netcore, full framework it won't be possible? Could you explain decision behind that? the corefx-lab version works with netstandard, why final (preview) release not? Because of that decision I'm not able to use Brotli in Asp NET Core targeted full framework. |
That is correct. It is only supported on .NET Core today. That isn't to say it will never be part of the .NET Standard or built as a netstandard library but there are various technical issues that come when you ship a OOB (Out-of-Band) library like this when it exists inbox for a platform like .NET Core or .NET Framework, which is why we don't currently support that scenario. If using this Brotli library on .NET Framework is a scenario you would like to see supported I suggest filing a issue for it and the owners of the library will take it under consideration. |
@weshaggard Thank you for answer, I'm a bit disappointed because corefx lab version is targeted netstandard1.1. |
@msmolka I understand your frustration but there are reasons for our choices. Keep in mind that corefxlab is for prototypes and experimenting so they do what is the quickest to get the job done and then once we decide to actually ship a library it moves to corefx and gets designed with all the different contexts in mind. In this case we decided it is best for Brotli to ship as part of the platform so that we can take advantage of it in our lower networking layer which is also part of the platform. Given that we made it part of the platform we decided not to ship it as an OOB library because of issues that occur with OOBs (see https://github.com/dotnet/corefx/issues/17522 for a hint of some similar troubles we had with System.Net.Http). Please do file an separate issue if you'd like to see Brotli supported on .NET Framework and we will consider supporting it. |
Hello,
The return is false and destination is empty. Thanks |
@ricardobeckssa The |
@khellang how can I know the space before compress? what's space should I to inform? |
|
@khellang thanks, I did:
|
FYI, you can abbreviate the following: To the following, relying on the implicit cast: ReadOnlySpan<byte> source = data; // or data.AsSpan(); |
System.IO.Compression.Brotli
Introduction
Brotli is a generic-purpose lossless compression algorithm that compresses data
using a combination of a modern variant of the LZ77 algorithm, Huffman coding
and 2nd order context modeling, with a compression ratio comparable to the best
currently available general-purpose compression methods. It is similar in speed
to deflate but offers more dense compression.
The specification of the Brotli Compressed Data Format is defined in RFC 7932.
Brotli encoding is supported by most web browsers, major web servers, and some CDNs (Content Delivery Networks).
BrotliStream
Proposed API
The API surface area for BrotliStream is identical to that of DeflateStream but with added
bufferSize
constructors.Example Usage
The BrotliStream behavior is the same as that of DeflateStream or GZipStream to allow easily converting DeflateStream/GZipStream code to use BrotliStream.
BrotliEncoder & BrotliDecoder
Proposed API
The goal of the streamless implementation is to provide a non-allocating, performant Brotli implementation free from Streams. It contains simple Compress/Decompress operations that return an enum indicating the success of the operation as well as static CompressFully/DecompressFully operations that allow single-pass compression/decompression without the need for a BrotliEncoder/BrotliDecoder instance.
Design Questions
Should we allow setting the Quality/Window via
Set_
functions of make them constructor variables? They must be set before encoding either way.BrotliEncoder SetQuality/SetWindows vs constructor overloads:
Flush vs Finalize
Should there be an option for intermediate flushes or only for finalize? The main use case of an intermediate Flush is if you want to get more of the outputted bytes but aren’t yet done supplying input to the compressor.
Allow input to Flush/Finalize?
I prefer the simpler Flush/Finalize that don’t take input, but the underlying call allows input if we decide that’s more usable. If we go that route then we could potentially just condense the API down to one function.
Naming
Example Usage
Implementation
The implementation will be based around the c code provided by Google that will be inserted into our existing native Compression libraries (clrcompression (Windows) and System.IO.Compression.Native (Unix). In CoreFX we'll have a managed wrapper to pinvoke into the native brotli implementation and provide the above API around it, same as we do for zlib. See dotnet/corefxlab#1673 for a discussion on the pros of cons of a fully managed implementation and my justification for using the native approach (at least for now). Performance testing to come later, with the implementation PR.
This proposal is an evolution of the CoreFXLab implementation of Brotli.
This is a component of https://github.com/dotnet/corefx/issues/24826
PTAL: @joshfree @KrzysztofCwalina @GrabYourPitchforks @ViktorHofer @stephentoub @terrajobst @ahsonkhan @JeremyKuhne
The text was updated successfully, but these errors were encountered: