Testing
GnuCash verification - state testing
Some time ago I had found a bug in GnuCash which would cause a segfault in the Transfer dialog, which a user uses to transfer money from one account to another: https://bugs.gnucash.org/show_bug.cgi?id=787439.
It was speculated that the segfault occurs when, after pressing Esc to dismiss the dialog, it attempts to update and destroy the same variable at the same time.
The fix is to stop calling the function that updates the variable (gnc_xfer_date_changed_cb()) when Esc is pressed.
Some questions I can think of:
- Is it never called when it should not be?
- Is it always called when it should be?
- Is the function called from anywhere else?
- Is the data it manipulates manipulated from any other locations?
- Might versions of this bug (for different dialogs or dialog elements) occur elsewhere?
Questions 4. and 5. seem somewhat linked. They are (probably) of a rather large scope, but might be worth checking out how feasible they might be.
In answer to question 3. it appears (from inspecting code and experience) that it is tied to some sort of event listener, so it potentially could be called at any time.
It seemed profitable to use some of the ideas from state testing in Hendrickson's "Explore It!".
My understanding of the fix we have gone from these possible outcomes:
Dialog change -> user action -> updating +-> exit |-> not exit
To these:
Dialog change -> user action +-> not updating -> exit |-> updating -> not exit
(N.B. I have decided to concentrate on a small slice of the transfer dialogs behaviour so as not to be overwhelmed. There is more to be done but that is outside the scope of this verification.)
User actions include:
- Esc
- Enter
- Clicking Cancel
- Clicking OK
- Dialog element loses focus
- Tabbing
- Arrow keys
- Mouse
I explored the range of user actions with a breakpoint set on the gnc_xfer_date_changed_cb() function so that I know whenever it gets called and on the xferData variable so that I know whenever it gets modified (although I was limited by the fact that the latter is not a global variable and so the scope in which it exists is limited to specific functions).
I notice in fact that one can press Enter with the date field set to any value (including blank) and gnc_xfer_date_changed_cb() does not get called. However, inspecting the transfer that is generated the date appears to have been appropriately normalised. I assume that the date is being validated/normalised some other way, perhaps in the same step as the transfer is being submitted. So it does not seem imperative that the function gets called.
Still, from a UX perspective it might be better that this were handled with a notification to the user, rather than in the background, to prevent accidental, incorrect submissions.
While investigating what turned out to be spurious bug it occurs to me that one sort of event comes from interrupts/error handling which might lead us to an unanticipated state with different behaviour:
- Validation/normalisation
- Error/warning messages
I notice that the form will not accept values in the "Amount" field of 10 US trillion or above, and will show a notification to the user: "You must enter an amount to transfer". Nor will it normalise the value (adding decimal points and commas where necessary).
This may be a bug in itself (what about users of Rupees, Yen, etc.) but also suggests that this "error" condition is being mishandled/mis-categorised. Is there a vulnerability here I can exploit?
(This can cause a problem where transferring from low to high currency causes a withdrawal on low currency but does not appear in high currency. In log see WARN: "Value too large to represent as int64_t", which looks likely…)
I somewhat systematically went through all the user actions after changing each of the fields in the dialog apart from "Fetch Rate". (N.B. I was unable to see Memo in the list of transactions so could not tell if it was correctly recorded.)
It is possible to have an exchange rate of 0 (or blank, which I believe is treated the same). This leads to divide by zero errors in the logs. Needs following up.
I failed to find anything interesting from doing this and found myself losing focus. I therefore decided to refocus on the original bug.
Do any of the other fields in the dialog cause gnc_xfer_date_changed_cb() or gnc_xfer_price_update_cb() to be called? The "Exchange Rate" field calls the latter, and I notice that it may or may not be called if you press "Cancel". But I could not provoke any misbehaviour.
I tried to set breakpoints in the main functions gnc_xfer_dialog_run_exchange_dialog() and gnc_xfer_dialog_run_until_done() and in particular watch the xferData variable. But, these never seemed to get caught. I am not sure why.
LibreOffice Signature Lines in Calc
For LibreOffice 6.2 (still in development) I decided to test signature lines in Calc (release note and screenshots of feature).
These are similar to signatures on physical documents. With a signature line, someone receiving a document would add their name and sign the document using a digital certificate. The signature line would record this, optionally including a date/timestamp when it happened.
Digitally signing documents generally has three aims. To check that the other person who sent the document is the person you think they are (authentication). To check that the document has not changed since the other person signed it off (integrity). To prevent the other person denying having signed the document (non-repudiation).
I loosely followed a risk-based testing strategy. For each of the main aims listed above, I tried to think of circumstances which might threaten them.
Although it is easy to see if a signature is valid (i.e. trusted) or not, I could not work out how to identify an individual's certificate definitively. LibreOffice's certificate viewer has the fields "Thumbprint MD5" and "Thumbprint SHA-1", but I could not work out how to match these to certificates I had in my keyring. They did not bear any resemblance to the fingerprint the gpg program prints out (I did not try any other key generation program). It could at least be documented (which someone has already pointed out here https://bugs.documentfoundation.org/show_bug.cgi?id=113192).
I recorded what I did in a journal (also in org-mode) and as a mindmap (also as an image).
Testing LibreOffice Mail Merge improvements
I have recently been testing improvements made to LibreOffice's Mail Merge functionality for the 6.1 release (see release notes).
The Mail Merge functionality allows users to create a document with fields which refer to a database (which could be a LibreOffice Base database, spreadsheet, MySQL, etc.) The user can then create multiple versions of the same document, one for each row in the database, with the fields replaced by the corresponding values in the database.
For example, if you have a database of people's addresses you could use this to create a label for each person which contains their address.
Previously, one problem you would have is that if a person's database entry had some information missing (e.g. you had not recorded the county they live in) and you included that as a field in your document, it would appear as a blank line in that person's copy of the document. This could look unprofessional.
Figure 1: Before the change. The blank line is where the person's county should be, but they do not have one recorded in the database.
In 6.1, LibreOffice fixed this by giving users the option to hide a paragraph if all the database fields in that paragraph were empty.
Figure 2: After the change. The person's missing information is ignored.
I decided my strategy would be to create documents which used Mail Merge in an older version of LibreOffice, then open them in the latest version and see how they look compared to the older version.
I thought this represented at least two compelling scenarios of use. Some users may upgrade to the latest version to take advantage of the new functionality. I wanted to see if they would be disappointed in any way if they turned the new option on.
Other users may find themselves on the latest version and hope that their documents still appear the way they had previously, even if it means turning the option off. Would previous workarounds users' may have used for the above problem (e.g. using conditional fields) still work?
The strategy had other advantages. Being able to compare the old and new versions of LibreOffice provided a good oracle.
Trying to create complex and realistic documents in the old version was an opportunity to learn more about the Mail Merge functionality as well as other parts of LibreOffice which might be relevant to its operation. I filled out a reasonably large mindmap (below) of ideas and factors.
It also allowed for compatibility testing of the new version of LibreOffice.
I worked on this over a few weeks, roughly recording ideas and progress in a mindmap (as image) and raising a few bugs.
Some issues blocked my testing. Mail Merge's print functionality does not (as far as I could tell) allow one to print to a file. I did not want to waste paper and ink so I did not test this. Since then, at least one print related bug has been raised by someone else in this area.
There is a lot of functionality available in LibreOffice, so my testing would not have been very exhaustive. For example, I missed this.
I saw one or two crashes which only happened in very particular circumstances (where several factors interacted). I have already (just before writing this) found another crash involving the Mail Merge functionality in the latest daily build of 6.2. There may be other similar peculiar circumstances I have not tested.
Quick testing recutils
I did some testing of recutils using ideas from "quick testing" techniques (e.g. as in http://testingeducation.org/BBST/testdesign/).
Free Software Directory - Note Taking Session
I recently attended the Software Testing Clinic on Note Taking, hosted by Ministry of Testing Cambridge.
As homework, we were set the task of practicing our note taking skills while testing a website. I chose the Free Software Directory, a wiki where people can find information about free software packages.
Like many wikis, some of its articles can go out of date, particularly if no one has updated them in a while. So, I set myself the broad mission of finding information that had gone out of date for packages in the Directory.
However, with over 16000 packages listed, I needed some way of narrowing the scope. I reasoned that packages that have not been updated in a long time are more likely to contain out of date information.
Fortunately, the Directory has a page of packages last reviewed before 2010, which looked like a pretty good start.
I would work my way through that list, for each package noting information that was out of date (e.g. links broken, version numbers incorrect) and also testing whether the package conforms to the requirements necessary for inclusion in the directory.
To take notes, I decided to use GIMP to take and annotate screenshots. I took a screenshot of each package's page, scribbled notes in appropriate places, circled things, etc. (As a note, as of 2.10 GIMP can create a screenshot of an entire webpage if you pass it a url.)
When I had finished with a page I marked the top of the screenshot with a banner in a colour indicating some sort of overall assessment. For example, red was for pages which should be removed, yellow for pages which needed updating, green for pages which had no or only trivial issues.
When the directory of screenshots is looked at in thumbnail view (e.g. using Emacs' image-dired) I can quickly see which ones need further action.
Here's a sample. Find the gzip of the entire directory of screenshots here.
State testing PIA Chromium Extension
I was testing Private Internet Access' extension for Chromium browsers: https://github.com/pia-foss/extension-chrome
This, among other things, modifies Chromium's proxy settings to make it connect to one of their VPN servers.
I thought it would be interesting to explore the different states of the proxy connection and how they are triggered.
I made some modifications to the extension. I made it connect to a local http server when checking for available proxies (and for pinging), so I could control what proxies it used. For my exploration I gave it the credentials of a local OWASP ZAP proxy and I made requests to a local website.
Therefore, I could see the requests being made in Chromium's DevTools, as they were intercepted via the proxy and at their end destination (by viewing the website's logs).
I also had more control over the environment. For example, I could remove proxies from the list of those available, or shut the proxies down. I used this to explore what the extension might do in unusual circumstances (such as losing connection to the proxy it was connected to).
I only spent the total of a couple of hours on this. It would be interesting to explore further and in more detail.
I recorded some ideas and observations as I went in this mindmap.
Respects Your Freedom claims testing
Testing the TPE-R1100 Wireless-N Mini Router sold by ThinkPenguin against the claims implied by the Respects Your Freedom (RYF) certification.
I started with a brief tour of the product and everything it came with (that was still in my possession).
- The router itself which includes:
- SOC?
- Network interface(s)
- USB
- Power
- OS
- libreCMC
- bootloader?
- Setup instructions
- DVD with source code
- Sales brochure
- Which I no longer have
- Several cables
- Power
- Ethernet
I cannot test 5., and there is nothing to test for 6. (they are standard power and ethernet cables).
My testing will focus on components 1-4.
I did not open the router's case to see what was inside. I am basing what I know about its internals from inspecting the outside of it and the claims from ThinkPenguin. I could perhaps also use software to probe things like the network card and wireless chipset.
Setup Instructions
I read through the first three pages of the setup instructions in detail.
The language conforms strongly to FSF terminology. In the opening paragraph "free software" is mentioned twice and "GNU/Linux" once. The terms "freedom" and "freedom respecting" are also mentioned several times. I did not see any use of the term "open source" nor the term "Linux" on it own.
I saw no mention of proprietary software (apart from one potential exception which I talk about below).
Pages 4-7 include detailed instructions for using the product with specific third-party VPN providers. I wondered whether a) the instructions steer users towards non-free software and b) whether these third-party providers steer users towards non-free software.
As for a), I read one of the instructions in detail and skimmed the others. The only software mentioned by name is gedit, which is free software (https://git.gnome.org/browse/gedit/tree/COPYING).
I have not investigated b) and I wonder whether this would represent a problem for RYF compliance anyway.
I had a few other questions.
Foremost is that the instructions do not have any license statement associated with them. According to the RYF criteria: "Any generally useful technical documentation about the product…must be released under a free license." Does this count as "generally useful technical documentation"?
The instructions suggest the website "www.infosniper.net" to check your external IP address to see if the VPN is working. I assume that website uses non-free JavaScript (have not verified that yet).
Visiting that website on 2017/12/05 I was able to see my IP address and location without having to enable JavaScript in my browser, so it can be used without non-free software (although future changes to the website could make this no longer true). But, if a user has not turned off JavaScript they may unintentionally run non-free JavaScript when visiting that website.
Finally, I wonder whether having no root password set counts as a backdoor. The user is not instructed to change it.
Operating System and Source Code DVD
SSH'd into the device and started to explore the operating system. I thought it might help to see what software was included in the source code DVD so I inserted it into my disk drive and mounted it. I thought perhaps I could compare what was on the device with what was in the DVD, to see if they are including all the relevant source code of the software they are distributing.
I noticed that I could not find any license files on the source code DVD. I noted that down and will return to that later.
I am considering a number of different risks:
- There is non-free software on the device
- The way the software is distributed does not comply with their respective licenses
- A user cannot install modified/replacement software
- Non-free software is required for maintenance
- There is spyware or backdoors
Navigating through the OS, I see a few shell scripts in /bin which do not have license notices in their code.
From the DVD I untar the file librecmc-v1.3.4.tar.xz. This does include a license (GPLv2) which at least partly answers the concern I noted above. Looking at the files in librecmc/trunk/package/base-files/files/bin/ none of them have a license notice. I don't know if one is required or whether they are considered implicitly licensed under the GPL.
Also in librecmc/trunk/scripts/ are many scripts (perl, shell, python) which do not have license notices.
I find one init script (/etc/board.d/01_leds) with the opening comment:
# # Copyright (C) 2011 OpenWrt.org #
No other licensing information is included. This is important enough that I consider stopping and reporting this and some of the other problems to the FSF.
Boundary testing Savane
I applied some boundary testing techniques to Savane's temporary upload functionality. I was hopeful to discover some hidden or unanticipated boundaries!
I found the "Creep and Leap" heuristic useful for discovery and the strace tool let me observe in more detail what the PHP process was doing (e.g. where it was writing stuff to the filesystem).
I recorded what I found in this mindmap.
Function testing Savane
Savane is a PHP+MySQL web application which many free software projects use to manage their projects.
I was interested in applying some function testing techniques to test this product. In particular, studying the outcomes of an action in a broad sense and considering not just what should happen but what should not (i.e. looking for unanticipated side-affects). For my inspiration, see Bolton's article.
After briefly touring the application, I decided to test the login functionality.
I kept an eye on the application's logs, HTTP requests and the contents of the MySQL database.
Here is a mindmap I created as a result. Anything marked with an orange "!" were things I considered important but didn't get around to testing.
User testing diction
Charter
Simulate a proofreader of an online publication using diction to help proofread a large number of submissions for the next article. The submissions will come from people working all over the world and be in many different formats.
It is your goal to efficiently proofread these submissions and, where necessary, give useful feedback to the writers.
You are relatively tool-savvy, so use any method to automate steps.
Setup
Gathered a sample of online articles as PDF, HTML and word processed documents.
These are read into diction.
Some of the text is taken out and put into a text file, so that it can be read into diction.
Session
Immediately I notice that diction does not print the entire input file. Only the sentences where it identifies a possible mistake are printed. This is not the end of the world, as I can have the original file open.
I notice that in the output there are what appear to be numbers of the form "3.18-5.13". I assume they are line numbers of some sort. These could come in handy, if I could figure out what these numbers refer to. I could write a script to parse them and somehow cross-reference them to the original file. EDIT: the format appears to be <line>.<column> of the start and end of sentence.
I also notice a few words being highlighted which I would rather were not (words likely to come up a lot and which are usually OK to use). I can use my own custom suggestion file and remove these. This is a bit time consuming (interaction with this could be quicker).
Long suggestion text breaks up the original text to the extent that I am concerned that I am missing things.
It would also be nice to mix this with other tools like spelling and grammar checkers.
Reading in an HTML document does not always work well as it does not print the entire file, which is a problem for HTML tags. Might need to extract just the text or make sure the HTML is quite clean (e.g. no tags mid-sentence).
Summary
- Allowed me to get a new perspective on the thing I was testing
- I had not previously considered the effect of the original text being broken up too much
- Although somethings I already knew
- I have some new ideas for further testing
- For example, trying to parse the line and column numbers
- I don't feel like I got into the habits and personality of the user
- Was more like a use-case (not necessarily a bad thing)
- Perhaps personas would bring out the individual character more
- I did not complete "proofreading" all the articles I had collected
- But, I feel I had got most of the value out of the session I was going to get; felt no need to continue
- Perhaps this is a sign that diction could be more efficient
- On the other hand, I was taking lots of notes
- The session was split over two evenings; in the second evening I had lost a bit of the flow I had in the first evening
Bugs
Bug 1
Its sentence segmentation method could be better. For example, where there is a page break with a footer
are dependent on encrypted keys TESTING TRAPEZE | AUGUST 2017 In all of the above situations, a solution is to
Output as:
TESTING TRAPEZE | AUGUST 2017 In all of the above situations
Bug 2
would -> (use "should" if used as conditional statement in the first person or for "shall" in indirect quotation after a verb in past tense.
-> 'or "shall"'?
Bug 3
[situation -> (rewrite)]
Needs to be specific.
Bug 4?
Tokenises on dash: inside-out becomes [inside -> suggestion…]-out.
Bug 5
Used mainly when [a large number of -> many] the dependent systems...
Would be good to distinguish case when "a large number of" should be replaced entirely or when just "a large number" should be (as in the above examples).
Not a trivial fix, though.
Bug 6
[consider -> Not followed by "as" when it means "believe to be".]
Could the suggestion file not be:
consider as Not followed by "as" when it means "believe to be".
so it only suggested in the case it mentions?
Huntin', Fightin', Rustlin'
I found hunting very risky at low levels. A number of occasions I was attacked by brigands who, because they fought at range, could do huge amounts of damage before I could even get to them.
Figure 3: Four at once seems a bit excessive!
Considering that at low level you have very little money, and travelling around to find winnable fights to level up means you get through food very quickly, it seems a little unfair not to give players a cheap way of getting food.
After dying and having to restore a couple of times, I gave up.
However, dying did have its advantages. The game prints a plain text file with all your stats, achievements and alignment.
Thinking about alignment, I considered less honourable ways of getting food. Several towns have pens with sheep in them, which are not generally guarded.
Figure 4: They won't notice if I take a few, right?
The sheep were no match for my sword. Killing two gave me one shoulder of mutton (by the way, I wonder whether different food is more filling than others). I wondered what affect this might have on my alignment. I assumed it would be considered a chaotic act. However, according to the code coverage recorded, attacking unprovoked has no affect on your alignment.
Figure 5: The red "-" on the left means the if branch was not taken
As far as I have seen there are only a handful of sheep in each village, so this is perhaps not the most fruitful way to get food (unless they respawn quickly…)
As fun as this all was, I didn't feel like I was achieving much with my character, I was running out of money to buy food, and I was getting bored.
I tried to see if I could accelerate my progress in the game. Reading through the in-game doc:
There are a number of dungeons hidden about the countryside…The easiest are the Caves of the Goblin King…
I bought some more expensive gear, made my way to the caves and very quickly got killed. Fighting feels clunky and I am not sure you get enough feedback. You have a choice of combat manoeuvre and you can either let the game decide for you (is there some sort of default?) or pick yourself. To do the latter requires going through a couple of menus. It feels like this could be streamlined.
Summary
- I explored a couple of possible scenarios players might follow
- Got a sense of some of the frustration players might feel at lower level
- Got to know some of the games internal logic
- Added areas of interest to my mind map
- Found a few bugs
Code Coverage
Bugs
- What can I do against "sneak-thieves"? There does not seem to be anything you can do against them.
- I cannot find any ranged weapons. Do they exist?
- Units of currency are not always consistently named.
Random Session
My character had travelled some distance, was low on supplies and had lots of random loot. I decided he needed to find a shop. There were none in the nearby towns so I needed to go back to Rampart (but note, my character could also hunt for more food, need to explore this later).
On the way to the city, I had a couple of encounters and an arbitrary event.
Figure 6: Would this ever kill me? That would seem very harsh and spoil the game. But, sometimes good things happen as well!
My character got hurt during the encounters, but healed very quickly and was at full HP well before getting back to Rampart. There are healers in this game but, considering how quickly you heal, are they ever worth using? Perhaps there are ways in which you can get more permanently injured.
Code coverage
Bugs
- After giving money to a Mendicant priest, by total money does not appear to have gone down. EDIT: The money went down only after I had left the Encounter map.
Cannot always reach everything in Encounter maps.
Figure 7: Am I supposed to be able to reach him?
What I can put on my shoulders doesn't make sense.
Figure 8: Why can't I put a dagger in my boot?
I can see more enemies when I am at the edge of the Encounter map.
Figure 9: They weren't there before
Scenario Testing Omega-rpg
Thinking about scenarios in this context is a little unusual, as there is no wider context (such as you would have using software in a business) in which this software is used.
On the other hand, there are the scenarios played out in the game universe itself. These could be a very rich source of ideas.
There are scenarios suggested by the game itself. For example, advancement in one or more of the guilds.
There are scenarios the player might choose for themselves, perhaps things not taken account of by the game.
Quality Criteria and Oracles
There is a raft of things related to entertainment value which are going to need to be evaluated, which I haven't really considered explicitly.
I will also need to consider oracles for these things. A large factor will be my own experiences and feelings playing this game.
Code coverage
Bugs
At the very edge of the map you are asked if you want to leave, even if you're just skirting along the edge (which you sometimes need to do).
Figure 10: This could definitely get annoying
Testing Omega-rpg
Omega-rpg is a free software, rogue-like game where players have a certain amount of freedom of choice.
I thought it would be interesting to test this because I am used to testing software used in business. The context of a game is completely different.
Moreover, some of the qualities which make software more testable, such as the amount of feedback you get, are not necessarily appropriate for games. You don't always want players to know all the consequences of their actions (which would allow for metagaming).
Similarly, the game is going to prevent players from doing exactly what they want to do, otherwise there would be no challenge.
Therefore, I needed to find ways to increase testability without compromising what the developer(s) had intended.
I decided to experiment with the excellent debugging and code coverage features of GCC to allow me to see the internal state of the application as I used it, and also see if I can get any value out of code coverage data.
For the first few sessions of testing I mainly got used to the game, mapped out some qualities that would matter to players of the game, and considered some possible risks to those qualities.
Some of these ideas are recorded in this mind map.
I was concerned in particular whether the game was balanced. For example, are each of the possible character classes equally powerful? Are each of the guild rewards equal? Are player actions fairly rewarded or punished?
As far as I could see from the code, being in the thieves guild did not affect how other guilds react to the player. This might be thematically appropriate (only other thieves would know you were in the guild) it might allow you to join two guilds at the same time, which might be an advantage.
Code coverage
I tested for some time before deciding to turn it into a blog, so the data here is the combination of at least several evenings' testing.
Possible Vulnerabilities
- The save file is consumed by application on start. Only written out again when user saves and quits the game. There are good reasons to want this behaviour and other games do behave like this. But, can result in loss of file if, for example, game crashes while playing.