void09: btw this isn't to say there's never a reason why you might want to know exactly how a file was turned into a DAG (i.e. chunker, layout, etc.) however, those use cases tend to be the exception rather than the rule.

There's certainly some interesting questions around how to deal with existing data where the publisher has not published a CID (e.g. Canonical doesn't publish a CID for Ubuntu ISOs). However, the answers are generally non-trivial.

For example, a simple text file might look the same on Windows and Unix but the line endings in the file are different. While the "essence" of the content is the same the files are legitimately not the same. Should these two text files be canonically encoded before being added to IPFS, do we remove all the Windows line endings? At some point these questions become very case-specific which makes it clearer why a lower level protocol like IPFS would try and leave things up to application developers and users instead of prescribing things like "all text files must have Unix line endings"