Screen Scraping

Project: A host application displays formatted data on a specified keyword, for example, an account number, user ID, or filename. This data needs to be retrieved from the screen and either processed within the script, or passed on to a different PC-based application.

Algorithm: The host application defines the sequence of screens and keystrokes to advance from one to the next. The script uses these to proceed from a known starting point such as the login banner to the desired host screen where the data can be retrieved. This part is much like the login script example, waiting for a prompt, and sending the command to go to the next screen. When the host screen with the desired data is displayed, a script function is used to retrieve the data from known coordinates on the screen and it is stored to a string variable or table. String functions and commands can then be used to validate or process the data.

Relevant Commands and Functions:

SCREEN ( ) — retrieves a line of screen data at the position specified
SEND — sends the specified data to the remote system
WAIT STRING — pauses execution until the specified string is received
TRIM ( ) — Removes a given string from the beginning/end of the specified string
SUBSTR ( ) — Returns a portion of the specified string
POS ( ) — Returns the position of a given string within the specifiedstring
TABLE DEFINE — Prepares a table for use
@R — Record buffer variables use to read/write to table
RECORD WRITE — Write a record from record buffer variable to table
SHOW — Display each script command as it is executed.

See Also:

PARSE — Locates a substring, storing the preceding and succeeding characters to string variables
SEARCH( ) — Returns string starting position in Session window
SEARCHINRECT( ) — Returns string starting position within rectangular area of Session window

A Brief Example

Below is a picture of the screen we wish to scrape:

In this example, note the use of commas to include multiple commands on a line, and the use of the backslash character to continue a command on a second line.

SHOW
CONNECT "BBS.Ses" WINDOW %wh
WAIT STRING ":" WINDOW %wh
SEND "" WINDOW %wh
WAIT STRING "password" WINDOW %wh
SEND "" WINDOW %wh
WAIT DELAY "1"
SEND nocr "c" WINDOW %wh, WAIT STRING "exit):" WINDOW %wh SEND "f" WINDOW %wh, WAIT STRING "exit):" WINDOW %wh
SEND "n" WINDOW %wh, WAIT STRING "exit):" WINDOW %wh
SEND "s" WINDOW %wh, WAIT STRING "exit):" WINDOW %wh
SEND "a" WINDOW %wh, WAIT DELAY "1"
SEND "b" WINDOW %wh, WAIT DELAY "1"
$filename = TRIM(SCREEN(3,15,12,%wh))
$library = TRIM(SCREEN(4,15,12,%wh))
$filedate = SCREEN(6,15,8,%wh)
$filetime = SCREEN(7,15,8,%wh)
$filesize = TRIM(SCREEN(8,15,12,%wh))
$uploadedby = TRIM(SCREEN(19,43,28,%wh))
%pos = POS($filesize, " bytes")
$filesize = SUBSTR($filesize,1,%pos-1)
TABLE DEFINE 0 FIELDS CHAR 12 CHAR 12 CHAR 8 \
CHAR 8 CHAR 12 CHAR 28
@R0.1 = $filename
@R0.2 = $library
@R0.3 = $filedate
@R0.4 = $filetime
@R0.5 = $filesize
@R0.6 = $uploadedby
RECORD WRITE 0
TABLE SAVE 0 TO "scr.txt" AS TEXT

This example retrieves information on a particular file on the FutureSoft BBS. The first part of the script handles the process of connecting to the BBS assuming a SES file is created, and getting to the screen with the desired information. This essentially involves waiting for the expected prompt and sending the appropriate keystrokes to move within the host application. This example uses the FutureSoft BBS since it is a host application publicly available.

Once we are at the correct screen, several fields of data are retrieved using the SCREEN( ) function and stored to string variables. The TRIM( ) function is used to remove trailing space characters from the strings. On this example screen, the host presents the file size followed by the word “bytes”. We want to retrieve the numeric value, which can be a variable number of characters, but not keep the word “bytes” since we want to eventually use the filesize as an integer. To do this, we screen scrape the whole field, use the POS( ) function to find the location of the word “bytes” in the string, and use the SUBSTR( ) function to keep only the characters preceding it. The PARSE command could also have been used to accomplish this step.

In the final part of this example, we create a structured table to store the data scraped from the screen. The TABLE DEFINE line is used to define the table structure, including the number of fields and their lengths. This also creates a record buffer variable, @R0 in this case, which has the same field definitions. Variable assignment is used to load the values of the respective fields into the record buffer, and it is written to the table using the RECORD WRITE command. The TABLE SAVE command is used to save it to a file so it can be viewed later. Usually, more than one record would be written to the table, in which case a loop would be added and the TABLE DEFINE command would be moved to ensure the table is initialized only once. See the Data Tables example for further illustration on the use of tables.

Further Development:

  • If the amount of data in a field varies, it is a good idea to use the SCREEN( ) function to retrieve the contents of the whole field and then use the TRIM ( ) function to remove trailing space characters.
  • In DCS 9, the cursor position in the session window is given at the bottom right of the application window, which can be useful to determine the locations of fields on the screen. The SCREEN( ) function has the cursor home position as (0,0) so remember to subtract 1 from the row and column values you read in the cursor location field before using them with the SCREEN( ) function.
  • Tip: A good way to debug screen scraping code is to pop up a dialog box which shows the contents and size of the string. For example:

$field1 = SCREEN(10,10,10),
DIALOG
MESSAGE $field1
MESSAGE STR(LENGTH($field1))
BUTTON "OK" RESUME
DIALOG END
WAIT RESUME
DIALOG CANCEL

  • If unique screen identifiers are used by the host application, such as a screen number, the current host screen may be recognized from these, allowing you to initiate the process from any point in the host application.
  • If the desired information is not displayed at a static location, it can be dynamically located based on its relative position to its associated prompt or label. See the SEARCH( ), and SEARCHINRECT( ) functions.