ShenDoc 30


 

 

Byte Streams and Character Streams

Shen streams are byte (usually 8 bit) streams and the fundamental operations are 'read-byte' and 'write-byte' which read and write bytes from and to streams. Since Shen reads and writes bytes by their decimal representations, for an 8 bit byte, the legitimate values for these functions are in the range 0 to 255.

The relation between the bytes traded and the characters represented is called an encoding. Extended ASCII (256 characters) is the encoding of one byte to a character. ASCII (128 characters) is the character set representable in 7 bits. ASCII represents the minimum range of characters that any Shen platform must support.

Ultimately all streams traffic in bytes, because this is the lingua franca of the computer, and some implementations (SBCL being one) allow the programmer to read a stream with both character and byte operations, but this cannot be assumed.

In many languages (e.g. C, Java, Python, CLisp) the standard input and standard output streams are character streams; that is, the programmer may direct only characters or strings to be read from or sent to them and not bytes. Since Shen works in bytes, for these character-oriented platforms some adaption is needed to glue the Shen reader and writer to the standard input and output. Prior to 2021, this was achieved by ad hoc adjustments to the kernel code. Shen does not incorporate characters as a data type; their place being taken by unit strings.

In 2021, a more structured approach to dealing with this problem was introduced involving the introduction of four primitives directed towards languages which support character based standard streams. These primitives are internal to the Shen package (they are not designed for the user) and are prefaced by shen. They are used in the kernel code to glue byte-oriented operations to character streams. They are as follows.

1. A 1-place primitive function 'shen.char-stinput?' which takes a stream as an argument and returns true if it is a character based standard input.
2. An equivalent function 'shen.char-stoutput?' which takes a stream as an argument and returns true if it is a character based standard output.
3. A 1-place function 'shen.write-string' which prints a string to the standard input.
4. A 1-place function 'shen.read-unit-string' that reads a unit string from the standard input.

These functions operate to define 'pr' and 'read-byte' as follows.

procedure pr (String, Stream)

1.	If Stream is a character based standard output stream 
	a.	write String to Stream using 'shen.write-string'.
2.	Else 
	a.	convert String to a list L of unit strings.
	b.	map each element of L to a list of bytes B based on extended ASCII encoding and 
	c.	write each element of B to Stream using 'shen.write-byte'
3.	Return String.
procedure read-byte (Stream)

1.	If Stream is a character based standard input stream
	a.	read one unit string off the Stream using 'shen.read-unit-string' and map it to a byte.
2.	Else just read the byte.

Since reading a byte is a very low level operation which is used intensively in reading files, repeatedly testing for every byte read to see if the input stream is character based is unacceptably slow. Therefore the kernel code tests the stream once to determine if it is character based and uses procedure 1a. if it is and 2. if it is not.

In Shen implementations which run purely off byte streams, these four intermediary functions are trivially definable or redundant. The functions 'shen.char-stinput?' and 'shen.char-stoutput?' need to be defined for byte stream platforms unless the kernel code is adjusted to remove them, but their definition is trivial.

(define shen.char-stinput?
   _ -> false)

(define shen.char-stoutput?
   _ -> false)

For purely byte stream oriented implementations 'shen.write-string' and 'shen.read-unit-string' need not be defined since the control paths containing calls to these functions can never be activated. Thus not all implementations of Shen will support the internal functions described in this section, since their utility is internal to the configuration of the platform.

Acknowledgements

History
Basic Types in Shen and Kλ
The Primitive Functions of Kλ
The Syntax of Kλ
Notes on the Implementation of Kλ
Boolean Operators
The Syntax of Symbols
The Semantics of Symbols in Shen and Kλ
Packages
Prolog
Shen-YACC
Strings
Strings and Pattern Matching
Lists
Streams
Character Streams and Byte Streams
Bytes and Unicode
Reader Macros
Vectors
Standard Vectors and Pattern Matching
Non-standard Vectors and Tuples
Equality
I/O
Generic Functions
Eval
Type Declarations
External Global Variables
Property Lists and Hashing
Error Handling
Numbers
Floats and Integers
The Timer
Comments
Special Forms

Built by Shen Technology (c) Mark Tarver, September 2021