Pascal Programming/Strings: Difference between revisions

From testwiki
Jump to navigation Jump to search
imported>Kai Burghardt
Rejected the last 2 text changes (by 188.252.186.179) and restored revision 4247376 by ShakespeareFan00: This is no improvement. One edit will even introduce a programming error.
 
(No difference)

Latest revision as of 19:15, 18 June 2023

The data type string() is used to store a finite sequence of char values. It is a special case of an [[../Arrays|array]], but unlike an Template:Nowrap the data type string() has some advantages facilitating its effective usage.

The data type string() as presented here is an Extended Pascal extension, as defined in the Template:Abbr standard 10206. Due to its high relevance in practice, this topic has been put into the Standard Pascal part of this Wikibook, right after the chapter on [[../Arrays|arrays]]. Template:XWarning

Properties

Capacity

Definition

The declaration of a string data type always entails a maximum capacity:

program stringDemo(output);
type
	address = string(60);
var
	houseAndStreet: address;
begin
	houseAndStreet := '742 Evergreen Trc.';
	writeLn('Send complaints to:');
	writeLn(houseAndStreet);
end.

After the word string follows a positive integer number surrounded by parenthesis. This is not a function call.[fn 1]

Implications

Variables of the data type address as defined above will only be able to store up to 60 independent char values. Of course it is possible to store less, or even 0, but once this limit is set it cannot be expanded.

Inquiry

String variables “know” about their own maximum capacity: If you use writeLn(houseAndStreet.capacity), this will print 60. Every string variable automatically has a “field” called capacity. This field is accessed by writing the respective string variable’s name and the word capacity joined by a dot (.). This field is read-only: You cannot assign values to it. It can only appear in expressions.

Length

All string variables have a current length. This is the total number of legit char values every string variable currently contains. To query this number, the Template:Abbr standard defines a new function called length:

program lengthDemo(output);
type
	domain = string(42);
var
	alphabet: domain;
begin
	alphabet := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
	writeLn(length(alphabet));
end.

The length function returns a non-negative integer value denoting the supplied string’s length. It also accepts char values.[fn 2] A char value has by definition a length of 1.

It is guaranteed that the length of a string variable will always be less than or equal to its corresponding capacity.

Compatibility

You can copy entire string values using the := operator provided the variable on the Template:Abbr has the same or a greater capacity than the Template:Abbr string expression. This is different than a regular array’s behavior, which would require dimensions and size to match exactly.

program stringAssignmentDemo;
type
	zipcode = string(5);
	stateCode = string(2);
var
	zip: zipcode;
	state: stateCode;
begin
	zip := '12345';
	state := 'QQ';
	
	zip := state; // ✔
	// zip.capacity > state.capacity
	// ↯ state := zip; ✘
end.

As long as no clipping occurs, i. e. the omission of values because of a too short capacity, the assignment is fine.

Index

It is worth noting that otherwise strings are internally regarded as arrays.[fn 3] Like a [[../Arrays#Character array|character array]] you can access (and alter) every array element independently by specifying a valid index surrounded by brackets. However, there is a big difference with respect to validity of an index. At any time, you are only allowed to specify indices that are within the range 1..length. This range may be empty, specifically if length is currently 0. Template:Caution

Standard routines

In addition to the length function, Template:Abbr also defines a few other standard functions operating on strings.

Manipulation

The following functions return strings.

Substring

In order to obtain just a part of a string (or char) expression, the function Template:Nowrap returns a sub-string of stringOrCharacter having the non-negative length count, starting at the positive index firstCharacter. It is important that Template:Nowrap is a valid character index in stringOrCharacter, otherwise the function causes an error.[fn 4] Template:Code:Output

For string-variables, the subStr function is the same as specifying myString[firstCharacter..firstCharacter+count].[fn 5] Evidently, if the firstCharacter value is some complicated expression, the subStr function should be preferred to prevent any programming mistakes. Template:Code:Output

Furthermore, the third parameter to subStr can be omitted: This will simply return the rest of the given string starting at the position indicated by the second parameter.[fn 6]

Remove trailing spaces

The trim(source) function returns a copy of source without any trailing space characters, i. e. Template:Nowrap. In Template:Abbr scripts any blanks to the right are considered insignificant, yet in computing they take up (memory) space. It is advisable to prune strings before writing them, for example, to a disk or other long-term storage media, or transmission via networks. Concededly memory requirements were a more relevant issue prior to the 21st century.

First occurrence of substring

The function Template:Nowrap finds the first occurrence of pattern in source and returns the starting index. All characters from pattern match the characters in source at the returned offset:

  1 2 3  
pattern X Y X  
  1 2 3  
pattern   X Y X  
  1 2 3  
pattern X Y X  
source Z Y X Y X Y X
1 2 3 4 5 6 7
index('ZYXYXYX', 'XYX') returns 3

Note, to obtain the second or any subsequent occurrence, you need to use a proper substring of the source.

Because the “empty string” is, mathematically speaking, present everywhere, Template:Nowrap always returns 1. Conversely, because any non-empty string cannot occur in an empty string, Template:Nowrap always returns 0, in the context of strings an otherwise invalid index. The value zero is returned if pattern does not occur in source. This will always be the case if pattern is longer than source.

Operators

The Template:Abbr standard introduced an additional operator for strings of any length, including single characters. The + operator concatenates two strings or characters, or any combination thereof. Unlike the arithmetic +, this operator is non-commutative, that means the order of the operands matters.

expression result
Template:Nowrap 'Foobar'
Template:Nowrap ''
Template:Nowrap Template:Nowrap
concatenation samples

Concatenation is useful if you intend to save the data somewhere. Supplying concatenated strings to routines such as write/writeLn, however, may possibly be disadvantageous: The concatenation, especially of long strings, first requires to allocate enough memory to accommodate for the entire resulting string. Then, all the operands are copied to their respective location. This takes time. Hence, in the case of write/writeLn it is advisable (for very long strings) to use their capability of accepting an infinite number of (comma-separated) parameters. Template:Hidden The Template:Abbr, the Template:Abbr and Delphi are also shipped with a function concat performing the very same task. Read the respective compiler’s documentation before using it, because there are some differences, or just stick to the standardized + operator.

Sophisticated comparison

All functions presented in this subsection return a Boolean value.

Order

Since every character in a string has an ordinal value, we can think of a method to sort them. There are two flavors of comparing strings:

  • One uses the [[../Expressions and Branches#Comparisons|relational operators]] already introduced, such as =, > or <=.
  • The other one is to use dedicated functions like Template:Abbr, or Template:Abbr.

The difference lies in their treatment of strings that vary in length. While the former will bring both strings to the same length by [[../Arrays#Character array|padding]] them with space characters (Template:Nowrap), the latter simply clips them to the shortest length, but taking into account which one was longer (if necessary).

function name meaning operator
EQ equal =
NE not equal <>
LT less than <
LE less than or equal to <=
GT greater than >
GE greater than or equal to >=
string comparison functions and operators

All these functions and operators are binary, that means they expect and accept only exactly two parameters or operands respectively. They can produce different results if supplied with the same input, as you will see in the next two sub-subsections.

Equality

Let’s start with equality.

  • Two strings (of any length) are considered equal by the EQ function if both operands are of the same length and the value, i. e. the character sequence that actually make up the strings, are the same.
  • An =‑comparison, on the other hand, augments any “missing” characters in the shorter string by using the padding character space (Template:Nowrap).[fn 7]

Template:Code:Output To put this relationship in other words, Pascal terms you already know:

(foo = bar)  =  EQ(trim(foo), trim(bar))

The actual implementation is usually different, because trim can be, especially for long strings, quite resource-consuming (time, as well as memory).

As a consequence, an =‑comparison is usually used if trailing spaces are insignificant, but are still there for technical reasons (e. g. because you are using an Template:Nowrap). Only EQ ensures both strings are lexicographically the same. Note that the capacity of either string is irrelevant. The function NE, short for not equal, behaves accordingly.

Less than

A string is determined to be “less than” another one by sequentially reading both strings simultaneously from left to right and comparing corresponding characters. If all characters match, the strings are said to be equal to each other. However, if we encounter a differing character pair, processing is aborted and the relation of the current characters determines the overall string’s relation.

first operand 'A' 'B' 'C' 'D'
second operand 'A' 'B' 'E' 'A'
determined relation = = < Template:Abbr
Template:Nowrap evaluates to true

If both strings are of equal length, the Template:Abbr function and the <‑operator behave the same. LT actually even builds on top of <. Things get interesting if the supplied strings differ in length.

  1. The LT function first cuts both strings to the same (shorter) length. (substring)
  2. Then a regular comparison is performed as demonstrated above. If the shortened versions, common length versions turn out to be equal, the (originally) longer string is said to be greater than the other one.

Template:Code:Output The situation above has been provoked artificially for demonstration purposes, but this can still become an issue if you are frequently using characters that are “smaller” than the regular space character, like for instance if you are programming on an 1980s 8‑bit Atari computer using [[w:ATASCII|Template:Abbr]]. The Template:Abbr, Template:Abbr, and Template:Abbr functions act accordingly.

Details on string literals

Inclusion of delimiter

In Pascal string literals start with and are terminated by the same character. Usually this is a straight (typewriter’s) apostrophe ('). Troubles arise if you want to actually include that character in a string literal, because the character you want to include into your string is already understood as the terminating delimiter. Conventionally, two straight typewriter’s apostrophes back-to-back are regarded as an apostrophe image. In the produced computer program, they are replaced by a single apostrophe.

program apostropheDemo(output);
var
	c: char;
begin
	for c := '0' to '9' do
	begin
		writeLn('ord(''', c, ''') = ', ord(c));
	end;
end.

Each double-apostrophe is replaced by a single apostrophe. The string still needs delimiting apostrophes, so you might end up with three consecutive apostrophes like in the example above, or even four consecutive apostrophes ('''') if you want a char-value consisting of a single apostrophe.

Non-permissible characters

A string is a linear sequence of characters, i. e. along a single dimension. Template:Caution You are nevertheless allowed to use the Template:Abbr-specific code indicating Template:Abbrs, yet the only cross-platform (i. e. guaranteed to work regardless of the used Template:Abbr) procedure is writeLn. Although not standardized, many compilers provide a constant representing the environment’s character/string necessary to produce line breaks. In Template:Abbr it is called lineEnding. Delphi has sLineBreak, which is also understood by the Template:Abbr for compatibility reasons. The Template:Abbr’s standard module GPC supplies the constant lineBreak. You will first need to import this module before you can use that identifier.

Remainder operator

Template:Wikibook The final Standard Pascal arithmetic operator you are introduced to, after learning to [[../Arrays#Division|divide]], is the remainder operator mod (short for modulo). Every integer division (div) may yield a remainder. This operator evaluates to this value.

i -3 -2 -1 0 1 2 3
Template:Nowrap 1 0 1 0 1 0 1
Template:Nowrap 0 1 2 0 1 2 0
mod operation sample values

Similar to all other division operations, the mod operator does not accept a zero value as the second operand. Moreover, the second operand to mod must be positive. There are many definitions, among computer scientists and mathematicians, as regards to the result if the divisor was negative. Pascal avoids any confusion by simply declaring negative divisors as illegal.

The mod operator is frequently used to ensure a certain value remains in a specific range starting at zero (0..n). Furthermore, you will find modulo in number theory. For example, the definition of prime numbers says “not divisible by any other number”. This expression can be translated into Pascal like that:

expression x is divisible by d
mathematical notation dx
Pascal expression Template:Nowrap
mathematical relation of mod

Template:XNote

Tasks

Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer More exercises can be found in:

Template:Newpage


Notes:

  1. In fact this is a discrimination of, what Template:Abbr calls “schema”. [[../Schemas|Schemata]] will be explained in detail in the Extensions Part of this Wikibook.
  2. This functionality is useful if you are handling constants you or someone might change at some point. Per definition the literal value Template:Nowrap is a char value, whereas '' (“null-string”) or '42' are string literals. In order to write generic code, length accepts all kinds of values that could denote a finite sequence of char values.
  3. In fact the definition essentially is Template:Nowrap.
  4. This means, in the case of empty strings, only the following function call could be legal Template:Nowrap. It goes without saying that such a function call is very useless.
  5. The string variable may not be bindable when using this notation.
  6. Omitting the third parameter in the case of empty strings or characters is not allowed. Template:Nowrap is illegal, because there is no “character 1” in an empty string. Also, Template:Nowrap is not allowed, because 'Z' is a char-expression and as such always has a length of 1, rendering any need for a “give me the rest of/subsequent characters of” function obsolete.
  7. If you are a Template:Abbr user, you will need to ensure you are in a fully-Template:Abbr-compliant mode for example by specifying ‑‑extended‑pascal on the command line. Otherwise, no padding occurs. The Standard (unextended) Pascal, as per Template:Abbr standard 7185, does not define any padding algorithm.

Template:Auto navigation