Byte Streams and Character Streams
Shen streams are byte (usually 8 bit) streams and the fundamental operations are 'read-byte' and 'write-byte' which read and write bytes from and to streams. Since Shen reads and writes bytes by their decimal representations, for an 8 bit byte, the legitimate values for these functions are in the range 0 to 255.
The relation between the bytes traded and the characters represented is called an encoding. Extended
ASCII (256 characters) is the encoding of one byte to a character. ASCII (128 characters) is the character set representable
in 7 bits. ASCII represents the minimum range of characters that any Shen platform must support.
Ultimately all streams traffic in bytes, because this is the lingua franca of the computer, and some implementations
(SBCL being one) allow the programmer to read a stream with both character and byte operations, but this cannot be assumed.
In many languages (e.g. C, Java, Python, CLisp) the standard input and standard output streams are character streams; that is, the programmer may direct only characters or strings to be read from or sent to them and not bytes. Since Shen works in bytes, for these character-oriented platforms some adaption is needed to glue the Shen reader and writer to the standard input and output. Prior to 2021, this was achieved by ad hoc adjustments to the kernel code. Shen does not incorporate characters as a data type; their place being taken by unit strings.
In 2021, a more structured approach to dealing with this problem was introduced involving the introduction of four primitives directed towards languages which support character based standard streams. These primitives are internal to the Shen package (they are not designed for the user) and are prefaced by shen. They are used in the kernel code to glue byte-oriented operations to character streams. They are as follows.
1. A 1-place primitive function 'shen.char-stinput?' which takes a stream as an argument and returns true if it is a character based standard input.
2. An equivalent function 'shen.char-stoutput?' which takes a stream as an argument and returns true if it is a character based standard output.
3. A 1-place function 'shen.write-string' which prints a string to the standard input.
4. A 1-place function 'shen.read-unit-string' that reads a unit string from the standard input.
These functions operate to define 'pr' and 'read-byte' as follows.
procedure pr (String, Stream)
1. If Stream is a character based standard output stream
a. write String to Stream using 'shen.write-string'.
2. Else
a. convert String to a list L of unit strings.
b. map each element of L to a list of bytes B based on extended ASCII encoding and
c. write each element of B to Stream using 'shen.write-byte'
3. Return String.
procedure read-byte (Stream)
1. If Stream is a character based standard input stream
a. read one unit string off the Stream using 'shen.read-unit-string' and map it to a byte.
2. Else just read the byte.
Since reading a byte is a very low level operation which is used intensively in reading
files, repeatedly testing for every byte read to see if the input stream is character based is
unacceptably slow. Therefore the kernel code tests the stream once to determine if it is character
based and uses procedure 1a. if it is and 2. if it is not.
In Shen implementations which run purely off byte streams, these four intermediary functions
are trivially definable or redundant. The functions 'shen.char-stinput?' and 'shen.char-stoutput?'
need to be defined for byte stream platforms unless the kernel code is adjusted to remove them, but
their definition is trivial.
(define shen.char-stinput?
_ -> false)
(define shen.char-stoutput?
_ -> false)
For purely byte stream oriented implementations 'shen.write-string' and 'shen.read-unit-string' need not be defined since the control paths containing calls to these functions can never be activated. Thus not all implementations of Shen will support the internal functions described in this section, since their utility is internal to the configuration of the platform.
|