tacker

a simple web bundler
git clone https://tongong.net/git/tacker.git
Log | Files | Refs | README

commit b72c2c33a1df9d77d67fb77d52ea7c67a02e379f
parent 17839623927e816114d4f6858520958748d0ede2
Author: tongong <tongong@gmx.net>
Date:   Sat,  2 Jul 2022 12:19:46 +0200

readme update & bugfixes

Diffstat:
MREADME.md | 77+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
Mbundle_html.ha | 4++--
Mbundle_js.ha | 6+++++-
Mmain.ha | 4++--
Mpath_helpers.ha | 13+++++++++----
Mtest-page/a.js | 6++++++
Atest-page/b.js | 4++++
Atest-page/c.js | 2++
Mtest-page/index.html | 2++
9 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/README.md b/README.md @@ -2,15 +2,20 @@ `tacker` takes your files and staples them together. The goal of this project is to be a simple web bundler independent of the disaster that is the modern -npm ecosystem. Advanced bundling and optimization techniques are not in scope +npm ecosystem. The main use case of `tacker` is bundling single page +applications into a single `.html` file for easier distribution. You can of +course also use it to quickly get access to modularity when developing +userscripts, to inline images into your static page, etc. + +Advanced bundling and optimization techniques are not in scope of `tacker` - try one of the bloated mainstream bundlers instead: - webpack (75 dependencies) - parcel (184 dependencies) - browserify (175 dependencies) - ... -`tacker` was written as an experimental project in the new `hare` programming -language. +`tacker` was written as an experimental project in the new +[hare programming language](https://harelang.org/). ## features - entrypoints: @@ -23,11 +28,25 @@ language. - other style sheets (`@import url(...)`) - binary data as base64 (e.g. `background-image: url(...)`) - JS - - a subset of CommonJS modules + - a subset of CommonJS modules (see below for important drawbacks) - `require(...)` - `module.exports` and `exports` - - binary data as base64 through custom `requireBinary(...)` function +The "conceptual module name space root" is the working directory. This means +that required paths which are not relative are resolved from the cwd. For +security reasons only files in the cwd can be bundled. This can be changed with +the `-p` option. Input and output file name stay relative to the cwd. The +`.js` in `require()` imports is optional. + +`tacker` does not aim to be 100% spec-compliant. The goal is to work in all +common scenarios without laying to much emphasis on obscure edge cases. It is a +tacker after all - not an industrial robot. Though unlike a real-world tacker +your security should not be at hazard. Malicious source files can obviously +take over your bundled page but they can never take over your system. + +## known bugs & missing features + +### require() CommonJS was chosen out of personal preference and its simplicity compared to ES Modules (tree-shaking optimizations enabled by ES Modules would not be implemented either way). The parser is rather simple though. To confirm to the @@ -37,16 +56,42 @@ value and not special syntax the same is the case for the whole program as every function could be possibly rebound to `require`. This requires the complete execution of the program at bundle-time to be able to reason about possible aliases to `require`. This is impossible and thus `require()` will be -treated as special syntax. This implementation is thus wrong but should work -for every sane usage of `require()`. +treated as special syntax. This implementation (and in fact every CommonJS +bundler) is thus wrong but should work for every sane usage of `require()`. -The "conceptual module name space root" is the working directory. This means -that required paths which are not relative are resolved from the cwd. For -security reasons only files in the cwd can be bundled. This can be changed with -the `-p` option. Input and output file name stay relative to the cwd. +The `require()` macro expects a string literal with single or double quotes as +single argument. Whitespace between `(` and `"`/`'` or between `"`/`'` and `)` +is forbidden. Currently no escape sequences are allowed as this would add a lot +of complexity and is not needed for sane file names. This feature may be added +in the future. -`tacker` does not aim to be 100% spec-compliant. The goal is to work in all -common scenarios without laying to much emphasis on obscure edge cases. It is a -tacker after all - not an industrial robot. Though unlike a real-world tacker -your security should not be at hazard. Malicious source files can obviously -take over your bundled page but they can never take over your system. +Correctly expanding the `require()` macro requires recognizing string +literals (to not cause bugs by changing string content). This in turn requires +correctly recognizing regex literals as they could contain quote characters and +as far as I know this requires parsing the whole AST (how to decide if `/5/` is +a regex or part of an arithmetic expression?). A similar problem arises for +template literals. To avoid this complexity `tacker` only reads until reaching +the first string, regex or template literal. This means that module imports +have to be at the top of each source file which is the case already for most +projects. All potentially skipped `require()` calls will be announced as a +warning. + +### script end tags & regex literals +When inlining javascript in html, the script cannot contain script end tags +(`</script>`). To handle this all occurrences of `</script` will be replaced by +`<\/script`. This works in string literals and comments and should never occur +in normal code. I am however not sure about regex literals - there could be +very rare edge cases where these break. + +### external resources +Bundling of external scripts, images, etc. is currently forbidden and `tacker` +will throw an error. There are two alternative behaviors: + +1. Bundling the external resource: I think it is a bad idea to bundle random + assets from the internet. +2. Allowing references to external resources without bundling them: This would + be a better way of handling external resources but it creates a runtime + dependency which is not very sustainable considering link rot. + +It would be possible to enable behavior (2) via a command argument flag but I +currently do not see the point in implementing this feature. diff --git a/bundle_html.ha b/bundle_html.ha @@ -55,7 +55,7 @@ fn tacker_html(inputpath: str, ofile: io::handle) void = { const src = resolve_path(src, inputpath); defer free(src); - tacker_js(src, ofile); + tacker_js(src, ofile, true); fmt::fprint(ofile, "</script>")!; }; } else if (m == 2) { @@ -86,7 +86,7 @@ fn tacker_html(inputpath: str, ofile: io::handle) void = { const href = resolve_path(href, inputpath); defer free(href); - tacker_js(href, ofile); + tacker_css(href, ofile); fmt::fprint(ofile, "</style>")!; }; } else { diff --git a/bundle_js.ha b/bundle_js.ha @@ -2,7 +2,11 @@ use fmt; use io; use os; -fn tacker_js(inputpath: str, ofile: io::handle) void = { +// html: true if the output can be inlined in a html script tag. This is +// important because code like e.g. +// let tag = "</script>"; +// has to be escaped. +fn tacker_js(inputpath: str, ofile: io::handle, html: bool) void = { const ifile = os::open(inputpath)!; defer io::close(ifile)!; // TODO diff --git a/main.ha b/main.ha @@ -51,12 +51,12 @@ export fn main() void = { let extstart = lastdotindex(ifile); if (extstart == -1) - fmt::fatalf("file \"{}\" has broken filetype.", ifile); + fixed_fatalf("file \"{}\" has broken filetype.", ifile); let ext = strings::fromutf8(strings::toutf8(ifile)[(extstart + 1)..]); switch (ext) { case "html" => tacker_html(ifile, ofile); - case "js" => tacker_js(ifile, ofile); + case "js" => tacker_js(ifile, ofile, false); case "css" => tacker_css(ifile, ofile); case => fixed_fatalf("unknown filetype: \"{}\".", ifile); }; diff --git a/path_helpers.ha b/path_helpers.ha @@ -1,4 +1,3 @@ -use fmt; use fs; use os; use slices; @@ -23,7 +22,8 @@ fn realpath_resolve(path: str) str = { const p = match (os::realpath(path)) { case let p: str => yield p; case let p: fs::error => - fmt::fatalf("path \"{}\" does not exist.", path); + fixed_fatalf("path \"{}\" does not exist.", path); + yield ""; // unreachable }; return os::resolve(p); }; @@ -32,7 +32,12 @@ fn realpath_resolve(path: str) str = { // from: path to the file (or directory) where the reference was found. // Return value has to be freed. fn resolve_path(path: str, from: str) str = { - // directory path is relativ to + if (strings::hasprefix(path, "http://") || + strings::hasprefix(path, "https://")) { + fixed_fatalf("bundling of external resources is not allowed: \"{}\".", + path); + }; + // directory path is relativ to base // ends with "/" const base = if (strings::hasprefix(path, "./") || strings::hasprefix(path, "../")) { @@ -44,7 +49,7 @@ fn resolve_path(path: str, from: str) str = { defer free(r); const r = strings::dup(realpath_resolve(r)); if (!strings::hasprefix(r, basepath)) - fmt::fatalf("file path \"{}\" violates the base path \"{}\".", + fixed_fatalf("file path \"{}\" violates the base path \"{}\".", r, basepath); return r; }; diff --git a/test-page/a.js b/test-page/a.js @@ -1,4 +1,10 @@ // let testm = require("./b.js") // console.log(testm.hello()); +let r = "this require('b.js') will not be macro-expanded."; console.log("hi from an imported script!"); + +function a() { + // this should throw a warning + console.log(require("test")); +}; diff --git a/test-page/b.js b/test-page/b.js @@ -0,0 +1,4 @@ +module.exports = { + hello: () => ":)", + c: require("./c"), +} diff --git a/test-page/c.js b/test-page/c.js @@ -0,0 +1,2 @@ +console.log(require("./a.js")); +exports.msg = ":)"; diff --git a/test-page/index.html b/test-page/index.html @@ -13,5 +13,7 @@ <h1>test page</h1> a nice image: <img src=./example.png alt="example image"/> + <!-- uncomment to test external resources ban --> + <!-- <img src="https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2.png"/> --> </body> </html>