code clean-up + minor improvements

1) filter multiple attempts and keep only latest 2) new way to get any submission comment - compatible with BB ultra
skip "empty" (based on MIN FILE SIZE) files from inspection
2025-02-21 22:06:35 +00:00 · 2025-02-21 17:49:03 +00:00 · 2025-02-21 17:47:55 +00:00 · 2025-02-21 17:47:07 +00:00 · 2024-11-05 23:20:19 +00:00 · 2024-11-05 23:13:13 +00:00
22 changed files with 580 additions and 217 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -117,9 +117,6 @@ venv.bak/
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
@@ -129,10 +126,24 @@ dmypy.json
 .pyre/
 # BBGradebookOrganiser
 BB_gradebooks/
 BB_submissions/
-csv/
+csv-inspect/
 !BB_gradebooks/README.md
 !BB_submissions/README.md
 # mkdocs
 mkdocs.yml
 /site
 # vangef
 requirements.*.txt
 !requirements.txt
 ___*.py
 venv*
 .TODO
 .NOTES
--- a/BB_gradebooks/README.md
+++ b/BB_gradebooks/README.md
@@ -1,6 +1,7 @@
 # BBGradebookOrganiser
 Blackboard Gradebook Organiser
-### Blackboard gradebooks directory: *BB_gradebooks*
+## Blackboard gradebooks directory: [***BB_gradebooks***](.)
-Create a directory with an appropriate name for the gradebook / assignment in this directory, and extract the downloaded gradebook .zip file in it.
+Create a directory with an appropriate name for the gradebook / assignment in this directory, and extract the downloaded gradebook *.zip* file in it.
--- a/BB_submissions/README.md
+++ b/BB_submissions/README.md
@@ -1,9 +1,14 @@
 # BBGradebookOrganiser
 Blackboard Gradebook Organiser
-### Blackboard submissions directory: *BB_submissions*
+## Blackboard submissions directory: [***BB_submissions***](.)
- Gradebooks from directory *BB_gradebooks* will be organised into this directory, in a subdirectory with the same name
+Gradebooks will be organised into this directory, in a subdirectory with the same name:
-  - e.g. gradebook directory *AssignmentX* in *BB_gradebooks* will be organised into directory *AssignmentX* in *BB_submissions*
+
- Also, a text file with all submission comments will be created in this directory, with the gradebook name as prefix
+- Gradebook directory `AssignmentX` in [*BB_gradebooks*](../BB_gradebooks/) will be organised into directory `AssignmentX` in [*BB_submissions*](.)
-  - e.g. *AssignmentX_comments.txt* will be created for gradebook *AssignmentX*
+  
 Also, a text file with all submission comments will be created in this directory, with the gradebook name as prefix:
 - `AssignmentX_comments.txt` will be created for gradebook `AssignmentX`
--- a/README.md
+++ b/README.md
@@ -1,71 +0,0 @@
 # BBGradebookOrganiser
 Blackboard Gradebook Organiser
 ## Description
 **Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn](https://en.wikipedia.org/wiki/Blackboard_Learn).
 The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Any compressed files (.zip, .rar, .7z) are extracted into the student's directory, with any remaining files submitted individually also moved into the student's directory. Student comments from submissions are also extracted into a single text file for convenient access and review.  
 Additionally, after organising submissions, you can inspect all submitted files to detect duplicated files from different submissions/students by generating and comparing SHA256 hashes. See section [Inspect submissions](#inspect-submissions-mag) for details.
 ### Features
 - Extracts, and organises per student, the content of submitted compressed files with extensions: .zip, .rar, .7z
  - Detects invalid/corrupt files
  - Doesn't extract macOS system generated files (ignores directory *__MACOSX* inside the compressed file)
 - Deletes each compressed file after successful extraction into student directory
 - Organises per student any remaining individually submitted files
 - Checks and extracts any comments from the student submission generated text files
 - Checks if any compressed files (from the contents of the submitted compressed files) have been extracted and organised per student
  - The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually
 - [Inspect submissions](#inspect-submissions-mag) by SHA256 hash :new:
 ## Instructions
 ### Download gradebook
 - Go to the course page on Blackboard
 - Go to *Grade Centre -> Full Grade Centre*
 - Find assignment and click on the arrow for more options, and select *Assignment File Download*
 - Select all (click *Show All* at the bottom first, to display all users) and click submit to generate the gradebook zip file
 - Wait for the generated download link to appear, and click to download
 ### Extract gradebook
 - Extract the downloaded gradebook in a new directory inside *BB_gradebooks*
 ### Run script
 - Before running the script for the first time, install the required packages 
  - `python -m pip install -r requirements.txt`
  - If running on Linux/Mac, you also need to have *unrar* installed in order to be able to extract .rar files
    - `sudo apt install unrar` for Linux
    - `brew install rar` for Mac
 - Provide the name of the directory (from section *Extract gradebook* above) as an argument when running the script
  - `python organise_gradebook.py GRADEBOOK_DIR_NAME`
 - While running, the script displays on the terminal information and stats about the gradebook submissions and files
 ### Post-run
 - All submission files can be found - organised in directories per student number - in directory *BB_submissions* under the sub-directory named after the gradebook name provided when running the script
  - e.g. `python organise_gradebook.py GRADEBOOK_DIR_NAME` creates the directory *GRADEBOOK_DIR_NAME* inside *BB_submissions*
 - Each student directory contains the student's extracted and individually submitted files, and the text file generated by Blackboard with the submission (which also contains any comments left by the student)
 - All comments found in the gradebook are extracted in a text file in *BB_submissions*, with the gradebook name as prefix
  - e.g. *AssignmentX_comments.txt* will be created for gradebook *AssignmentX*
 - Compressed files are deleted after successfully extracting and organising the contents
  - any invalid/corrupt compressed files are moved into folder *\_\_BAD\_\_* inside the gradebook directory
 ## Inspect submissions :mag:
 ### Description
 - Generates SHA256 hashes for each submitted file, and outputs list to CSV file
 - Compares the generated hashes and finds any duplicate hashes - ignores duplicates if they are by the same student/submission
 - Finds all files with a duplicated hash and outputs them to CSV file with the following information: Student ID, file path, file name (without path), SHA256 hash
  - Further inspection and filtering needs to be done manually, depending on the submission files
 ### Usage
 - For this feature you also need to install the pandas package
  - `python -m pip install pandas`
 - Usage: `python inspect_submissions.py GRADEBOOK_DIR_NAME`
  - Note: run *after* organising a gradebook with `organise_gradebook.py`
 - Generated CSV files can be found in directory *csv*, with *GRADEBOOK_DIR_NAME* as file name prefix
  - e.g. inspecting submissions for *AssignmentX* will create 2 csv files:
    - AssignmentX_file_hashes_[datetime].csv
    - AssignmentX_suspicious_[datetime].csv
 ## Notes
 The Blackboard generated name for submission files must follow the pattern:
 > ANYTHING_STUDENTNUMBER_attempt_DATETIME_FILENAME
--- a/init.py
+++ b/init.py
@@ -1 +0,0 @@
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -0,0 +1,19 @@
 # **CHANGELOG**
 Blackboard Gradebook Organiser - main (functional) changes and new features log
 ## **Notable updates**
 2024-04-30 Restructure documentation - separate *Inspect by hash*
 2024-03-01 Allow customisation of default settings - most useful default to edit is `IGNORE_DIRS`: the list of names for directories, or files, to ignore when extracting from compressed files
 2023-07-17 Documentation updated and web docs added at [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser)
 2023-03-16 Hyperlinks for file paths and names listed in generated CSV files by *inspect by hash*
 2023-03-10 Added *inspect gradebook* and merged with *inspect submission* to make [***inspect by hash***](inspect/about.md)
 2023-03-02 Added *exclude files from hashing*
 2023-02-28 Added *inspect submission files by hash*
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,58 @@
 # **BBGradebookOrganiser**
 Blackboard Gradebook Organiser
 **Documentation**: [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser)
 **Source Code**: [github.com/vangef/BBGradebookOrganiser](https://github.com/vangef/BBGradebookOrganiser)
 ## **Description**
 **Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn &#x29c9;](https://en.wikipedia.org/wiki/Blackboard_Learn).  
 The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review.  
 Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](inspect/about.md) for more information.
 ## **Features**
 - Extracts, and organises per student, the content of submitted compressed files with extensions: `.zip`, `.rar`, `.7z`
    - Detects invalid/corrupt files
    - Skips extracting files and directories if their path contains any of the *ignored dirs*, as set in *settings.py* - ignored directories by default:
        - `__MACOSX` (macOS system generated files)
        - `.git` (git repo files)
        - `node_modules` (npm)
        - `vendor` (composer / laravel)
 - Deletes each compressed file after successful extraction into student directory
 - Organises per student any remaining individually submitted files
 - Checks and extracts any comments from the student submission generated text files
 - Checks if any compressed files (from the contents of the submitted compressed files) have been extracted and organised per student
    - The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually
 - [Inspect by hash](inspect/about.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect:
    - Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard*
    - Inspect submissions: After organising a gradebook - for identical files in the files extracted from any submitted *compressed* files
 ## **Instructions**
 See the documentation for [Requirements & Settings](instructions/requirements-settings.md) and [Usage](instructions/usage.md) instructions, and more information & details about [***Inspect by hash***](inspect/about.md).
 ## **General notes**
 The Blackboard generated name for submission files must follow the pattern:
 > ANYTHING_STUDENTNUMBER_attempt_DATETIME_FILENAME
 ## **Changelog**
 See [***Changelog***](CHANGELOG.md) for notable changes and updates.
--- a/docs/img/favicon.ico
+++ b/docs/img/favicon.ico
--- a/docs/inspect/about.md
+++ b/docs/inspect/about.md
@@ -0,0 +1,29 @@
 # **Inspect by hash** :mag:
 Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash
 ## **Description**
 With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations:
 [*Inspect gradebook*](usage.md#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard*
 [*Inspect submissions*](usage.md#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files
 ## **Features**
 - Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file
    - Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*)
 - Compares the generated hashes and finds any duplicates - ignores duplicates if they are by the same student/submission
 - Finds all files with the same hash and outputs the list to a CSV file with the following information:
    - *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash`
    - *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash`
 - File names and paths listed in the generated CSV files have hyperlinks to the actual files for a quick inspection of the file contents (or running the files, if executable)
 *Note:* Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files.
--- a/docs/inspect/requirements.md
+++ b/docs/inspect/requirements.md
@@ -0,0 +1,9 @@
 # **Inspect by hash** :mag:
 ## **Requirements**
 The ***inspect*** scripts require the `pandas` package - if it's not already installed, run:
 ```console
 python -m pip install pandas
 ```
--- a/docs/inspect/usage.md
+++ b/docs/inspect/usage.md
@@ -0,0 +1,45 @@
 # **Using Inspect by hash** :mag:
 ## **Inspect gradebook**
 If you haven't already, extract the downloaded from *Blackboard* gradebook in a new directory inside *BB_gradebooks*
 - e.g. for `AssignmentX` extract the gradebook in *BB_gradebooks*/`AssignmentX`
 To inspect a *gradeboook* run **`inspect_gradebook.py`** and provide the name of the gradebook directory as an argument, e.g. for the gradebook `AssignmentX` run:
 ```console
 python inspect_gradebook.py AssignmentX
 ```
 **Note:** run ***before*** organising a gradebook with *organise_gradebook.py* (or extract, again, the downloaded gradebook, if you want to inspect it after organising its submissions)
 Generated CSV files can be found in directory `csv-inspect`, with the inspected gradebook's name as file name prefix - e.g. inspecting gradebook `AssignmentX` will create 2 CSV files:
 - `AssignmentX_gradebook_file_hashes_[datetime].csv` - all files and their hashes
 - `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes
 ## **Inspect submissions**
 To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument.
 - e.g. for the organised gradebook `AssignmentX` (in *BB_submissions*/`AssignmentX`) run:
 ```console
 python inspect_submissions.py AssignmentX
 ```
 **Note:** run ***after*** organising a gradebook with *organise_gradebook.py*
 Generated CSV files can be found in directory `csv-inspect`, with the inspected submission's name as file name prefix - e.g. inspecting submissions for `AssignmentX` will create 2 CSV files:
 - `AssignmentX_submissions_file_hashes_[datetime].csv` - all files and their hashes
 - `AssignmentX_submissions_duplicate_[datetime].csv` - files with duplicate hashes
 *(Optional)* In order to exclude submission files from hashing, create a CSV file in directory `csv-inspect` to provide the file names to be excluded - e.g. for `AssignmentX` create:
 - `AssignmentX_excluded.csv` with a column named `exclude_filename` and list the file names
 **Note:** the directory *csv-inspect* is automatically created when you run *inspect_gradebook.py* or *inspect_submissions.py* - if you want to exclude files before the first run, you need to create it manually.
--- a/docs/instructions/requirements-settings.md
+++ b/docs/instructions/requirements-settings.md
@@ -0,0 +1,37 @@
 # **Requirements & Settings**
 ## **Install requirements**
 Before running the script for the first time, install the required python packages:
 Option 1 - Install `py7z`, `rarfile`
 ```console
 python -m pip install py7zr rarfile
 ```
 Option 2 - Install all packages, including `pandas` which is used in [Inspect by hash](../inspect/about.md), using the requirements file
 ```console
 python -m pip install -r requirements.txt
 ```
 **Note**: If running on Linux/Mac, you also need to have `unrar` installed in order to be able to extract `.rar` files (applies for both options 1 and 2)
 - `sudo apt install unrar` for Linux
 - `brew install rar` for Mac
 ## (Optional) **Edit settings**
 You can change the default settings by editing *utils/settings.py*. The main setting you might want to edit is `IGNORE_DIRS` - the list of names for directories, or files, to ignore when extracting from compressed files.
 Ignored directories by default:
 - `__MACOSX` (macOS system generated files)
 - `.git` (git repo files)
 - `node_modules` (npm)
 - `vendor` (composer / laravel)
--- a/docs/instructions/usage.md
+++ b/docs/instructions/usage.md
@@ -0,0 +1,57 @@
 # **Using BBGradebookOrganiser**
 ## **Download gradebook**
 1. Go to the course page on Blackboard
 2. Go to *Grade Centre -> Full Grade Centre*
 3. Find the assignment and click on the arrow for more options, and select *Assignment File Download*
 4. Select all (click *Show All* at the bottom first, to display all users) and click submit to generate the gradebook zip file
 5. Wait for the generated download link to appear, and click to download
 ## **Extract gradebook**
 Extract the downloaded gradebook in a new directory inside *BB_gradebooks*.
 - e.g. for `AssignmentX` extract the gradebook in *BB_gradebooks*/`AssignmentX`
 ## **Organise gradebook**
 To organise the gradebook run **`organise_gradebook.py`** and provide the name of the directory with the *extracted* gradebook (from section *Extract gradebook* above) as an argument.
 - e.g. for gradebook `AssignmentX` (in *BB_gradebooks*/`AssignmentX`) run:
 ```console
 python organise_gradebook.py AssignmentX
 ```
 While running, the script displays on the terminal information and stats about the gradebook submissions and files.
 ## **Post-run**
 All submission files can be found - organised in directories per student number - in directory *BB_submissions*, under the sub-directory named after the gradebook name provided when running the script.
 - e.g. `organise_gradebook.py AssignmentX` creates the directory `AssignmentX` inside *BB_submissions*
 Each student directory contains:
 - the extracted files from the submitted `.zip`, `.rar`, `.7z`
 - the individually submitted files
 - the text file generated by Blackboard for the submission (which also contains any comments left by the student)
 All comments found in the gradebook are extracted in a text file in *BB_submissions*, with the gradebook name as prefix.
 - e.g. `AssignmentX_comments.txt` will be created for gradebook `AssignmentX`
 Compressed files are deleted after successfully extracting and organising the contents.
 - Any invalid/corrupt compressed files are moved into folder `__BAD__` inside the gradebook directory
 ## **Inspect by hash** :mag:
 See [***Inspect by hash***](../inspect/about.md) for more information & details.
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -0,0 +1,5 @@
 .md-header__source {
    min-width: 12.5rem!important;
 }
--- a/inspect_gradebook.py
+++ b/inspect_gradebook.py
@@ -0,0 +1,20 @@
 import os, sys
 from utils.inspector import generate_hashes_gradebook, generate_duplicate_hashes_gradebook
 from utils.settings import BB_GRADEBOOKS_DIR
 def main():
    gradebook_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
    gradebook_dir_path = os.path.join(BB_GRADEBOOKS_DIR, gradebook_dir_name)
    if not os.path.exists(gradebook_dir_path):
        exit('[Info] Gradebook directory does not exist - nothing to inspect')
    if not os.listdir(gradebook_dir_path):  # if no files in gradebook dir
        exit(f'[Info] No files found in this gradebook - nothing to inspect')
    hashes_csv_file_path = generate_hashes_gradebook(gradebook_dir_path)  # generate CSV file with hashes for all files in gradebook & return path to CSV file for finding duplicate hashes
    generate_duplicate_hashes_gradebook(hashes_csv_file_path)  # generate CSV file with files having duplicate hashes
 if __name__ == '__main__':
    main()
--- a/inspect_submissions.py
+++ b/inspect_submissions.py
@@ -1,16 +1,19 @@
 import os, sys
 from utils.inspector import hash_submissions, inspect_for_duplicate_hashes
-CSV_DIR = os.path.join(os.getcwd(), 'csv')
+from utils.inspector import generate_hashes_submissions, generate_duplicate_hashes_submissions
 from utils.settings import BB_SUBMISSIONS_DIR
 def main():
-    submissions_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo submissions dir name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [submissions dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
+    submissions_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo submissions directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [submissions dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
-    submissions_dir_path = os.path.join('BB_submissions', submissions_dir_name)
+    
-    if not os.path.isdir(submissions_dir_path):
+    submissions_dir_path = os.path.join(BB_SUBMISSIONS_DIR, submissions_dir_name)
-        exit(f'Directory {submissions_dir_path} does not exist.\nMake sure "{submissions_dir_name}" exists in "BB_submissions".\n')
+    if not os.path.exists(submissions_dir_path):
-    else:
+        exit('[Info] Directory does not exist - nothing to inspect')
-        hashes_csv_file_path = hash_submissions(submissions_dir_path)  # generate CSV file with hashes for all files (except for any 'excluded') & return path to CSV file for finding duplicate/suspicious hashes
+    if not os.listdir(submissions_dir_path):  # if no files in dir
-        inspect_for_duplicate_hashes(hashes_csv_file_path)  # generate CSV file with files having duplicate/suspicious hashes
+        exit(f'[Info] No files found in this submissions directory - nothing to inspect')    
    hashes_csv_file_path = generate_hashes_submissions(submissions_dir_path)  # generate CSV file with hashes for all files in submissions (except for any 'excluded') & return path to CSV file for finding duplicate hashes
    generate_duplicate_hashes_submissions(hashes_csv_file_path)  # generate CSV file with files having duplicate hashes
 if __name__ == '__main__':    
--- a/organise_gradebook.py
+++ b/organise_gradebook.py
@@ -1,18 +1,20 @@
 import os, sys
 from utils.organiser import organise_gradebook, check_submissions_dir_for_compressed
 from utils.settings import BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR
 def main():
    gradebook_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\n')
-    gradebook_dir = os.path.join('BB_gradebooks', gradebook_name)  # gradebook from Blackboard with all submissions
+    gradebook_dir = os.path.join(BB_GRADEBOOKS_DIR, gradebook_name)  # gradebook from Blackboard with all submissions
-    submissions_dir = os.path.join('BB_submissions', gradebook_name)  # target dir for extracted submissions
+    submissions_dir = os.path.join(BB_SUBMISSIONS_DIR, gradebook_name)  # target dir for extracted submissions
    abs_path = os.getcwd()  # absolute path of main/this script
-    print(f'\nGradebook directory to organise: {os.path.join(abs_path, gradebook_dir)}')
+    print(f'\nGradebook directory to organise:\n{os.path.join(abs_path, gradebook_dir)}', flush=True)
-    
+        
    organise_gradebook(gradebook_dir, submissions_dir)
    check_submissions_dir_for_compressed(submissions_dir) 
 if __name__ == '__main__':    
    main()
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,5 @@
-# py7zr==0.20.2
+# for organise gradebook script
 # rarfile==4.0
 py7zr
 rarfile
 # for inspect gradebook/submissions scripts
 pandas
--- a/utils/extractor.py
+++ b/utils/extractor.py
@@ -2,33 +2,33 @@ import os, shutil, platform
 import zipfile, rarfile
 from py7zr import SevenZipFile, exceptions
-BAD_DIR_NAME = '__BAD__'
+from utils.settings import BAD_DIR_NAME, IGNORE_DIRS
-def mark_file_as_BAD(file, bad_exception):
+
 def mark_file_as_BAD(file: str, bad_exception: Exception) -> None:
    try:
        filename = os.path.basename(file)
        bad_dir = os.path.join(os.path.dirname(file), BAD_DIR_NAME)
        os.makedirs(bad_dir, exist_ok=True)
        bad_file_path = os.path.join(bad_dir, filename)
        shutil.move(file, bad_file_path)
-        print(f'[Warning] Found BAD compressed file: {filename}\nMoved to: {bad_file_path}\nError message: {bad_exception}')
+        print(f'\n[Warning] Found BAD compressed file: {filename}\nMoved to: {bad_file_path}\nError message: {bad_exception}\n', flush=True)
    except Exception as e: 
-        print(f'[Error] {e}')
+        print(f'\n[ERROR] {e}\n', flush=True)
-
+def extract_zip(zip_file: str, target_dir: str) -> None | Exception:
 def extract_zip(zip_file, target_dir):
    try:
        with zipfile.ZipFile(zip_file, 'r') as zip_ref:
-            members = [ m for m in zip_ref.infolist() if "__MACOSX" not in m.filename ]
+            members = [ m for m in zip_ref.infolist() if not any(dir_name in m.filename for dir_name in IGNORE_DIRS) ]  # filter out files/dirs using IGNORE_DIRS
-            zip_ref.extractall(target_dir, members=members)  # extract all files, ignoring those with the "__MACOSX" string in the name
+            zip_ref.extractall(target_dir, members=members)  # extract remaining files
            zip_ref.close()
    except zipfile.BadZipfile as e:
        mark_file_as_BAD(zip_file, e)
    except Exception as e:
-        print(f'[ERROR] Something went wrong while extracting zip contents. Check the error message, get student id and download / organise manually\nError message: {e}')
+        print(f'\n[ERROR] Something went wrong while extracting the contents of a submitted zip file. Check the error message, get student id and download / organise manually\n\nError message: {e}\n', flush=True)
        return e
-
+def extract_rar(rar_file: str, target_dir: str) -> None:
 def extract_rar(rar_file, target_dir):
    try:    
        with rarfile.RarFile(rar_file, 'r') as rar_ref:
            if platform.system() == 'Windows':
@@ -36,26 +36,27 @@ def extract_rar(rar_file, target_dir):
            else:  # if Linux or Mac
                rarfile.UNRAR_TOOL = 'unrar'
            files = rar_ref.namelist()
-            files = [ f for f in files if "__MACOSX" not in f ]  # filter out files with "__MACOSX" in the name
+            files = [ f for f in files if not any(dir_name in f for dir_name in IGNORE_DIRS) ]  # filter out files/dirs using IGNORE_DIRS
            rar_ref.extractall(target_dir, files)  # extract the remaining files
            rar_ref.close()
    except OSError as e:
        mark_file_as_BAD(rar_file, e)
    except rarfile.BadRarFile as e:
        mark_file_as_BAD(rar_file, e)
    except rarfile.NotRarFile as e:
        mark_file_as_BAD(rar_file, e)
    except rarfile.RarCannotExec as e:
-        print('[Error] Missing unrar tool\nfor Windows: make sure file UnRAR.exe exists in directory \'utils\'\nfor Linux/Mac: need to install unrar (check README)')
+        print('\n[ERROR] Missing unrar tool\nfor Windows: make sure file UnRAR.exe exists in directory \'utils\'\nfor Linux/Mac: need to install unrar (check README)\n', flush=True)
        exit()
-
+def extract_7z(seven_zip_file: str, target_dir: str) -> None:
 def extract_7z(seven_zip_file, target_dir):
    try:  # extract the 7z file using py7zr
        with open(seven_zip_file, 'rb') as f:
            seven_zip = SevenZipFile(seven_zip_file, mode='r')
            if not seven_zip.getnames():
                raise exceptions.Bad7zFile
            files = seven_zip.getnames()
-            files = [ f for f in files if "__MACOSX" not in f ]  # filter out files with "__MACOSX" in the name
+            files = [ f for f in files if not any(dir_name in f for dir_name in IGNORE_DIRS) ]  # filter out files/dirs using IGNORE_DIRS
            seven_zip.extract(target_dir, targets=files)  # extract the remaining files
            seven_zip.close()
    except exceptions.Bad7zFile as e:
@@ -63,15 +64,14 @@ def extract_7z(seven_zip_file, target_dir):
    except Exception as e:
        mark_file_as_BAD(seven_zip_file, e)
-
+def extract_file_to_dir(file_path: str, student_dir: str) -> None | Exception:
 def extract_file_to_dir(file_path, student_dir):
    os.makedirs(student_dir, exist_ok=True)  # create the subdirectory for student
    if file_path.lower().endswith('.zip'):
-        extract_zip(file_path, student_dir)
+        return extract_zip(file_path, student_dir)
    elif file_path.lower().endswith('.rar'):
        extract_rar(file_path, student_dir) 
    elif file_path.lower().endswith('.7z'):
        extract_7z(file_path, student_dir) 
    else:
-        print(f"[Error] unknown file type: {file_path}")
+        print(f'\n[ERROR] unknown file type: {file_path}\n', flush=True)
--- a/utils/inspector.py
+++ b/utils/inspector.py
@@ -3,77 +3,125 @@ from datetime import datetime
 import csv
 import hashlib
 import pandas as pd
 from functools import partial
-CSV_DIR = os.path.join(os.getcwd(), 'csv')
+from utils.settings import CSV_DIR, BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR, MIN_FILESIZE_IN_BYTES
 def load_excluded_filenames(submissions_dir_name: str) -> list[str]:  # helper function for hashing all files
    csv_file_path = os.path.join(CSV_DIR, f'{submissions_dir_name}_excluded.csv')
    if not os.path.exists(csv_file_path):  # if csv file with excluded file names for submission does not exist
-        print(f'[WARNING] Cannot find CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected')
+        print(f'[WARNING] Cannot find CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected', flush=True)
        return []  # return empty list to continue without any excluded file names
    else:  # if csv file with excluded file names for submission exists
        try:            
            df = pd.read_csv(csv_file_path)
            filename_list = df['exclude_filename'].tolist()  # get the values of the 'filename' column as a list
-            print(f'[INFO] Using CSV file with list of excluded file names: {csv_file_path}')
+            filename_list = [ f.lower() for f in filename_list ]  # convert to lowercase for comparison with submission files
            print(f'[INFO] Using CSV file with list of excluded file names: {csv_file_path}', flush=True)
            return filename_list
        except Exception as e:  # any exception, print error and return empty list to continue without any excluded file names
-            print(f'[WARNING] Unable to load / read CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected')
+            print(f'[WARNING] Unable to load / read CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected', flush=True)
-            print(f'[INFO] Error message: {e}')
+            print(f'[INFO] Error message: {e}', flush=True)
            return []
 def get_hashes_in_dir(dir_path: str, excluded_filenames: list = []) -> list:  # helper function for hashing all files
    hash_list = []
    for subdir, dirs, files in os.walk(dir_path):  # loop through all files in the directory and generate hashes
        for filename in files:
-            if filename not in excluded_filenames:  # do not hash for inspection file names in the excluded list
+            if filename.lower() not in excluded_filenames:  # convert to lowercase for comparison with excluded files & do not hash if in the excluded list
                filepath = os.path.join(subdir, filename)
-                with open(filepath, 'rb') as f:
+                if os.path.getsize(filepath) > MIN_FILESIZE_IN_BYTES:  # file size more than MIN_FILESIZE_IN_BYTES (as set in settings.py)
-                    filehash = hashlib.sha256(f.read()).hexdigest()
+                    with open(filepath, 'rb') as f: 
-                    hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash})
+                        filehash = hashlib.sha256(f.read()).hexdigest()
                        #if filehash != 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855':  # do not include hashes of empty files
                        hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash})
                # else:
                #     print(f'size: {os.path.getsize(filepath)}B, {filepath}')
    return hash_list
 def generate_hashes_gradebook(gradebook_dir_path: str) -> str:  # main function for hashing all files in gradebook
    gradebook_dir_name = os.path.abspath(gradebook_dir_path).split(os.path.sep)[-1]  # get name of gradebook by separating path and use rightmost part
    if not os.path.isdir(gradebook_dir_path):
        exit(f'Directory {gradebook_dir_path} does not exist.\nMake sure "{gradebook_dir_name}" exists in "{BB_GRADEBOOKS_DIR}".\n')
    dicts_with_hashes_list = get_hashes_in_dir(gradebook_dir_path)
    for hash_dict in dicts_with_hashes_list:
        student_id = hash_dict['filename'].split('_attempt_')[0].split('_')[-1]
        relative_path = os.path.join('..', hash_dict["filepath"])
        hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
        del hash_dict['filepath']
        hash_dict.update({'Student ID': student_id})
 def hash_submissions(submissions_dir_path: str) -> str:  # main function for hashing all files
    os.makedirs(CSV_DIR, exist_ok=True)
-    submissions_dir_name = os.path.abspath(submissions_dir_path).split(os.path.sep)[-1]  # get name of submission/assignment by separating path and use rightmost part
+    csv_file_name = f'{gradebook_dir_name}_gradebook_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
    excluded_filenames = load_excluded_filenames(submissions_dir_name)
    csv_file_name = f'{submissions_dir_name}_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
    csv_file_path = os.path.join(CSV_DIR, csv_file_name)
-    with open(csv_file_path, 'w', newline='') as csvfile:  # open the output CSV file for writing
+
    with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile:  # open the output CSV file for writing
        fieldnames = ['Student ID', 'filename', 'sha256 hash']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(dicts_with_hashes_list)
    print(f'[INFO] Created CSV file with all files & hashes in gradebook: {gradebook_dir_name}\nCSV file: {csv_file_path}', flush=True)
    return csv_file_path
 def generate_hashes_submissions(submissions_dir_path: str) -> str:  # main function for hashing all files in submissions
    submissions_dir_name = os.path.abspath(submissions_dir_path).split(os.path.sep)[-1]  # get name of submission/assignment by separating path and use rightmost part
    if not os.path.isdir(submissions_dir_path):
        exit(f'Directory {submissions_dir_path} does not exist.\nMake sure "{submissions_dir_name}" exists in "{BB_SUBMISSIONS_DIR}".\n')
    excluded_filenames = load_excluded_filenames(submissions_dir_name)
    dicts_with_hashes_list = []
    for student_dir_name in os.listdir(submissions_dir_path):  # loop through each student dir to get hashes for all files per student
        student_dir_path = os.path.join(submissions_dir_path, student_dir_name)
        student_dicts_with_hashes_list = get_hashes_in_dir(student_dir_path, excluded_filenames)  # dict with hashes for all student files - except for 'excluded' file names
        student_dicts_list = []
        for hash_dict in student_dicts_with_hashes_list:
            hash_dict.update({'Student ID': student_dir_name})  # update hash records with student id
            relative_path = os.path.join('..', hash_dict["filepath"])
            hash_dict['filepath'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filepath"]}")'
            hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
            student_dicts_list.append(hash_dict)  # append file dict to student list of dict for csv export
        dicts_with_hashes_list.append(student_dicts_list)  # append student hashes to main list with all submissions
    os.makedirs(CSV_DIR, exist_ok=True)
    csv_file_name = f'{submissions_dir_name}_submissions_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
    csv_file_path = os.path.join(CSV_DIR, csv_file_name)
    with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile:  # open the output CSV file for writing
        fieldnames = ['Student ID', 'filepath', 'filename', 'sha256 hash']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for student_dict in dicts_with_hashes_list:
            writer.writerows(student_dict)
    print(f'[INFO] Created CSV file with all files & hashes for submissions in: {submissions_dir_name}\nCSV file: {csv_file_path}', flush=True)
    return csv_file_path 
-        for student_dir_name in os.listdir(submissions_dir_path):  # loop through each student dir to get hashes for all files per student
+def generate_duplicate_hashes_generic(hashes_csv_file_path: str, drop_columns: list[str]):
            student_dir_path = os.path.join(submissions_dir_path, student_dir_name)
            hashes_dict = get_hashes_in_dir(student_dir_path, excluded_filenames)  # dict with hashes for all student files - except for 'excluded' file names
            for d in hashes_dict:
                d.update({'Student ID': student_dir_name})  # update hash records with student id
            writer.writerows(hashes_dict)
    print(f'[INFO] Created CSV file with all files & hashes in {submissions_dir_name}\nCSV file: {csv_file_path}')
    return csv_file_path
 def inspect_for_duplicate_hashes(hashes_csv_file_path: str):  # main function for finding duplicate / suspicious hashes
    csv = pd.read_csv(hashes_csv_file_path)
    df = pd.DataFrame(csv)  # df with all files and their hashes
-    drop_columns = ['filepath', 'filename']  # only need to keep 'student id' and 'sha256 hash' for groupby later
+    df_clean = df.drop(columns=drop_columns)  # clear not needed columns
-    df = df.drop(columns=drop_columns)  # clear not needed columns
+    duplicate_hash = df_clean.loc[df_clean.duplicated(subset=['sha256 hash'], keep=False), :]  # all files with duplicate hash - incl. files from the same student id
-    duplicate_hash = df.loc[df.duplicated(subset=['sha256 hash'], keep=False), :]  # all files with duplicate hash - incl. files from the same student id
+    # agg() for 'Student ID' True if more than 1 in groupby (= files with the same hash by multiple student ids)
-    hash_with_multiple_student_ids = duplicate_hash.groupby('sha256 hash').agg(lambda x: len(x.unique())>1)  # true if more than 1 unique student ids (= files with the same hash by multiple student ids), false if unique student id (= files from the same student id with the same hash)
+    # False if unique (= files from the same student id with the same hash)
-    suspicious_hashes_list = hash_with_multiple_student_ids[hash_with_multiple_student_ids['Student ID']==True].index.to_list()  # list with duplicate hashes - only if different student id (doesn't include files from same student id)
+    hash_with_multiple_student_ids = duplicate_hash.groupby('sha256 hash').agg(lambda x: len(x.unique())>1)
    files_with_suspicious_hash = df[df['sha256 hash'].isin(suspicious_hashes_list)]  # df with all files with duplicate/suspicious hash, excludes files from the same student id
    df_suspicious = files_with_suspicious_hash.sort_values(['sha256 hash', 'Student ID'])  # sort before output to csv
    # list with duplicate hashes - only if different student id (doesn't include files from same student id)
    duplicate_hashes_list = hash_with_multiple_student_ids[hash_with_multiple_student_ids['Student ID']==True].index.to_list()
    files_with_duplicate_hash = df[df['sha256 hash'].isin(duplicate_hashes_list)]  # df with all files with duplicate hash, excludes files from the same student id
    df_duplicate = files_with_duplicate_hash.sort_values(['sha256 hash', 'Student ID'])  # sort before output to csv
    gradebook_or_submissions_str = os.path.basename(hashes_csv_file_path).split('_file_hashes_')[0].split('_')[-1]  # 'gradebook' or 'submissions' depending on which files hashes csv is read 
    assignment_name = os.path.basename(hashes_csv_file_path).split(f'_{gradebook_or_submissions_str}_')[0]
    csv_out = hashes_csv_file_path.rsplit('_', 1)[0].replace('file_hashes', 'duplicate_') + datetime.now().strftime("%Y%m%d-%H%M%S") + '.csv'
    try:
-        submissions_dir_name = os.path.basename(hashes_csv_file_path).split('_file_hashes_')[0]
+        df_duplicate.to_csv(csv_out, index=False)
-        csv_out = hashes_csv_file_path.rsplit('_', 1)[0].replace('file_hashes', 'suspicious_') + datetime.now().strftime("%Y%m%d-%H%M%S") + '.csv'
+        print(f'[INFO] Created CSV file with duplicate hashes in {gradebook_or_submissions_str}: {assignment_name}\nCSV file: {csv_out}', flush=True)
        df_suspicious.to_csv(csv_out, index=False)
        print(f'[INFO] Created CSV file with duplicate/suspicious hashes in {submissions_dir_name}\nCSV file: {csv_out}')
    except Exception as e:
-        exit(f'[ERROR] Something went wrong while trying to save csv file with suspicious hashes\nError message: {e}')
+        exit(f'[ERROR] Something went wrong while trying to save csv file with duplicate hashes\nError message: {e}')
 # partials for generate_duplicate_hashes_generic(), setting the appropriate drop_columns for gradebook / submissions 
 generate_duplicate_hashes_gradebook = partial(generate_duplicate_hashes_generic, drop_columns=['filename'])
 generate_duplicate_hashes_submissions = partial(generate_duplicate_hashes_generic, drop_columns=['filepath', 'filename'])
--- a/utils/organiser.py
+++ b/utils/organiser.py
@@ -1,12 +1,47 @@
 import os, shutil, re
 from collections import defaultdict
 from utils.extractor import extract_file_to_dir
-
+from utils.settings import BAD_DIR_NAME, MULTIPLE_DIR_NAME, BB_GRADEBOOKS_DIR, IGNORE_DIRS, TRACKED_FILE_EXT
 BAD_DIR_NAME = '__BAD__'
-def validate_gradebook_dir_name(src_dir):
+def _parse_filename(file_path: str) -> tuple[str, str] | None:
    """Extract STUDENTNUMBER and DATETIME from the filename."""
    pattern = r'^(.*?)_(\d+)_attempt_(\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2})(?:_.*)?(?:\..+)?$'
    match = re.match(pattern, file_path)
    if match:
        return match.group(2), match.group(3)  # STUDENTNUMBER, DATETIME
    return None, None
 def _filter_multiple_attempts(directory: str) -> None:
    """Keep only the latest attempt for each student and move older attempts to MULTIPLE_DIR_NAME."""
    submissions = defaultdict(list)
    multiple_folder = os.path.join(directory, MULTIPLE_DIR_NAME)
    os.makedirs(multiple_folder, exist_ok=True)
    # collect all valid files
    for filename in os.listdir(directory):
        filepath = os.path.join(directory, filename)
        if os.path.isfile(filepath):
            student_number, timestamp = _parse_filename(filename)
            if student_number and timestamp:
                submissions[student_number].append((timestamp, filepath))
    # process submissions
    for student, files in submissions.items():
        files.sort(reverse=True, key=lambda x: x[0])  # sort by timestamp (most recent first)
        latest_timestamp = files[0][0]  # get the most recent timestamp
        # keep all files from the latest attempt, move older ones
        for timestamp, filepath in files:
            if timestamp != latest_timestamp:
                shutil.move(filepath, os.path.join(multiple_folder, os.path.basename(filepath)))
    print(f"\n[Info] Multiple submission attempts filtering completed.\nOlder submissions moved to folder: {MULTIPLE_DIR_NAME}")
 def _validate_gradebook_dir_name(src_dir: str) -> None:
    if not os.path.isdir(src_dir):  # check if it exists and is a directory
-        print(f"\n[Error] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in 'BB_gradebooks'")
+        print(f'\n[ERROR] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in "{BB_GRADEBOOKS_DIR}"')
        exit()
    if not os.listdir(src_dir):  # check if there are any files in the directory
        print(f'\n[Info] No files found in this gradebook - nothing to organise')
@@ -15,34 +50,62 @@ def validate_gradebook_dir_name(src_dir):
        print(f'\n[Info] Gradebook has only invalid compressed files in: {os.path.join(src_dir, BAD_DIR_NAME)}\n[Info] Nothing to organise')
        exit()
 def _get_comment_from_submission_txt(file_path: str) -> tuple[str, str] | None:
    no_comment_regex = f'Comments:\nThere are no student comments for this assignment.'
    no_comment_pattern = re.compile(no_comment_regex)
-def get_comment_from_submission_txt(file_path):
+    with open(file_path, encoding='utf-8') as f:
    no_comment_text = f'Comments:\nThere are no student comments for this assignment.'
    no_comment_text_regex = no_comment_text
    no_comment_regex_compile = re.compile(no_comment_text_regex)
    with open(file_path) as f:
        file_contents = f.read()
-        if not no_comment_regex_compile.findall(file_contents):
+        if not no_comment_pattern.findall(file_contents):
-            regular_expression = f'Comments:\n.*'
+            comment_regex = f'Comments:\n.*'
-            regex_compile = re.compile(regular_expression)
+            name_regex = f'^Name:\s*.*'
-            match = regex_compile.findall(file_contents)
+            comment_pattern = re.compile(comment_regex)
-            match = str(match).replace('\\n', '').replace('[','').replace(']','').replace('"','')
+            name_pattern = re.compile(name_regex)
-            match = str(match).split('Comments:')[-1]
+            if comment_pattern.findall(file_contents):
-            return match
+                comment_match = comment_pattern.findall(file_contents)[0]
                comment = comment_match.split('\n')[1]
                name_match = name_pattern.findall(file_contents)[0]
                name = name_match.split('Name:')[1].split('(')[0].strip() or ''
                return comment, name
    return None, None
 def _get_comment_from_submission_txt_BB_ultra(file_path: str) -> tuple[str, str] | None:
    with open(file_path, encoding='utf-8') as f:
        file_contents = f.read()
    match = re.search(r'Submission Field:\s*<br>(.*)', file_contents, re.DOTALL)  # find the section starting with "Submission Field: <br>"
    if not match:
        return None, None
    section = match.group(1)    
    section = re.sub(r'\s*<p><a href.*?</a>', '', section, flags=re.DOTALL)  # remove the part starting with "<p><a href" and ending with "</a></p>"    
    paragraphs = re.findall(r'<p>(.*?)</p>', section, re.DOTALL) or None  # extract text inside <p> tags
    if not paragraphs:
        return None, None
    cleaned_text = '\n'.join(p.replace('<br>', '\n') for p in paragraphs)  # replace <br> with new lines within paragraphs
    if not cleaned_text:
        return None, None
    name_regex = f'^Name:\s*.*'
    name_pattern = re.compile(name_regex)
    name_match = name_pattern.findall(file_contents)[0]
    name = name_match.split('Name:')[1].split('(')[0].strip() or ''
-def get_gradebook_stats(src_dir):
+    return cleaned_text.strip(), name  # comment, name
-    all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f ]
+
-    dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f ]
+def _get_gradebook_stats(src_dir: str) -> dict[str, int]:
    all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
    dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
    normal_files = [ f for f in all_files if os.path.isfile(f) ]
    tracked_file_extensions = [ '.zip', '.rar', '.7z', '.txt' ]  # add extension in list to track stats for more
    files_counter = {}
    files_counter['all'], files_counter['dirs'], files_counter['normal'] = len(all_files), len(dirs), len(normal_files)
    tracked_files_counter = 0
-    for ext in tracked_file_extensions:
+    for ext in TRACKED_FILE_EXT:
        files_counter[ext] = len([ f for f in normal_files if f.lower().endswith(ext) ])
        tracked_files_counter += files_counter[ext]
@@ -50,68 +113,74 @@ def get_gradebook_stats(src_dir):
    files_counter['untracked'] = files_counter['normal'] - tracked_files_counter
    dirs_msg = f'. Also found {len(dirs)} dir(s), wasn\'t expecting any!' if len(dirs) else ''
-    tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in tracked_file_extensions ] 
+    tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in TRACKED_FILE_EXT ] 
    tracked_msg = f"{', '.join(str(f) for f in tracked_files_list)}"
-    msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(tracked_file_extensions)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}'
+    msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(TRACKED_FILE_EXT)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}'
-    print(msg)
+    print(msg, flush=True)
    return files_counter
-
+def _organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, student_no: str) -> None:
 def organise_file_per_student(src_dir, dest_dir, file_name, student_no):
    student_dir = os.path.join(dest_dir, student_no)
    os.makedirs(student_dir, exist_ok=True)  # create student directory if it doesn't exist
    file_path = os.path.join(src_dir, file_name)
    if os.path.isfile(file_path):
        file_path_lowercase = file_path.lower()
        if file_path_lowercase.endswith('.zip') or file_path_lowercase.endswith('.rar') or file_path_lowercase.endswith('.7z'):
-            extract_file_to_dir(file_path, student_dir)  # extract the file to student directory
+            exception_flag = extract_file_to_dir(file_path, student_dir)  # extract the file to student directory
-            if os.path.exists(file_path):  # check if compressed file exists (or it was BAD and moved), and remove if exists
+            # check if compressed file exists (or it was BAD and moved), and no exception was returned from extracting - remove if both true
            if os.path.exists(file_path) and exception_flag is None:
                os.remove(file_path)  # delete compressed file after successful extraction
        else:
            if file_path_lowercase.endswith('.txt'):
-                comment = get_comment_from_submission_txt(file_path)  # get student comment (if any) from submission txt file
+                comment, name = _get_comment_from_submission_txt_BB_ultra(file_path)  # get student comment (if any), and name, from submission txt file
-                if comment:
+                if comment and name:
                    comments_filename = f'{dest_dir}_comments.txt'
                    with open(comments_filename, 'a') as f:
-                        f.write(f'\nStudent number: {student_no} - File: {file_path}\nComment: {comment}\n')
+                        f.write(f'\nStudent number: {student_no} - Student name: {name}\nFile: {file_path}\nComment: {comment}\n')
            else:
-                file_name = file_name.split('_attempt_')[1].split('_', 1)[1]  # rename any remaining files before moving - remove the BB generated info added to the original file name
+                try:
                    file_name = file_name.split('_attempt_', 1)[1].split('_', 1)[1]  # rename any remaining files before moving - remove the BB generated info added to the original file name
                except IndexError as e:
                    print(f'Cannot process file - possible incorrect format of filename')
            new_file_path = os.path.join(student_dir, os.path.basename(file_name))
            shutil.move(file_path, new_file_path)  # move the file to student directory
-
+def organise_gradebook(src_dir: str, dest_dir: str) -> None:
 def organise_gradebook(src_dir, dest_dir):
    """1) extracts .zip, .rar, .7z files, organises contents into directories per student number, and deletes compressed files after successful extraction
    2) organises all other files in gradebook into directories per student number
    3) checks if there are any comments in submission text files and extracts them into a file
    """
-    validate_gradebook_dir_name(src_dir)  # check if dir exists, and has files in it - exits if not
+    _validate_gradebook_dir_name(src_dir)  # check if dir exists, and has files in it - exits if not
    os.makedirs(dest_dir, exist_ok=True)  # create the destination directory if it doesn't exist
-    files_counter = get_gradebook_stats(src_dir)  # print stats about the files in gradebook and get files_counter dict to use later
+    _filter_multiple_attempts(src_dir)
-    students_numbers = []  # list to add and count unique student numbers from all files in gradebook 
+    print('\nGetting gradebook stats...', flush=True)
-    print('\nStart organising...\n')
+    files_counter = _get_gradebook_stats(src_dir)  # print stats about the files in gradebook and get files_counter dict to use later
-    for file_name in os.listdir(src_dir):  # iterate through all files in the directory
+    students_numbers: list[str] = []  # list to add and count unique student numbers from all files in gradebook 
-        if BAD_DIR_NAME not in file_name:  # ignore dir BAD_DIR_NAME (created after first run if corrupt compressed files found)
+    print('\nStart organising... (this may take a while depending on the number -and size- of submissions)\n', flush=True)
            student_no = file_name.split('_attempt_')[0].split('_')[-1]  # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !!
            students_numbers.append(student_no)
            organise_file_per_student(src_dir, dest_dir, file_name, student_no)
    abs_path = os.getcwd()  # absolute path of main script
    print(f'[Info] Submissions organised into directory: {os.path.join(abs_path, dest_dir)}')
    print(f'[Info] Unique student numbers in gradebook files: {len(set(students_numbers))}')
    if files_counter['.txt'] == 0:
        print(f'[Info] No submission text files found, file with comments not created')
    else:
        print(f'[Info] Comments in file: {dest_dir}_comments.txt')
    print(f'[Note] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction')
    for file_name in os.listdir(src_dir):  # iterate through all files in the directory
        if BAD_DIR_NAME not in file_name and MULTIPLE_DIR_NAME not in file_name:  # ignore dirs BAD_DIR_NAME (created after first run if corrupt compressed files found) and MULTIPLE_DIR_NAME (dir with older attempts)
            student_no = file_name.split('_attempt_', 1)[0].split('_')[-1]  # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !!
            students_numbers.append(student_no)
            _organise_file_per_student(src_dir, dest_dir, file_name, student_no)
-def check_submissions_dir_for_compressed(submissions_dir):
+    ignored_str = ', '.join(IGNORE_DIRS)
    print(f'[Info] Skipped extracting files in dirs with name that includes any of the following strings: {ignored_str}\n', flush=True)
    abs_path = os.getcwd()  # absolute path of main script
    print(f'[Info] Submissions organised into directory: {os.path.join(abs_path, dest_dir)}\n', flush=True)
    print(f'[Info] Unique student numbers in gradebook files: {len(set(students_numbers))}\n', flush=True)
    if files_counter['.txt'] == 0:
        print(f'[Info] No submission text files found, file with comments not created\n', flush=True)
    else:
        print(f'[Info] Comments in file: {dest_dir}_comments.txt\n', flush=True)
    print(f'[Info] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction\n', flush=True)
 def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
    """checks if any submitted compressed files contain more compressed files inside (they are not recursively extracted)
    \nprints any compressed files location that need to be extracted manually
    """
-    compressed_files = []
+    compressed_files: list[str] = []
    abs_path = os.getcwd()
    for the_path, dirc, files in os.walk(submissions_dir):
        for fname in files:
@@ -121,6 +190,6 @@ def check_submissions_dir_for_compressed(submissions_dir):
    if compressed_files:
        compressed_files_str = '\n'.join(compressed_files)
-        print(f'\n[Warning] One or more compressed files from the gradebook contain compressed file(s) inside ({len(compressed_files)} found in total)')
+        print(f'\n[Warning] One or more compressed files found in the extracted and organised submission files ({len(compressed_files)} found in total)')
-        print('\nSee below the organised per student compressed files, and extract them manually:\n')
+        print('\n[Info] See below the list of compressed files, organised per student, and extract them manually if necessary:\n')
        print(compressed_files_str)
--- a/utils/settings.py
+++ b/utils/settings.py
@@ -0,0 +1,16 @@
 import os
 BB_GRADEBOOKS_DIR = 'BB_gradebooks'  # directory with extracted gradebooks downloaded from Blackboard
 BB_SUBMISSIONS_DIR = 'BB_submissions'  # directory with organised gradebook submissions
 BAD_DIR_NAME = '__BAD__'  # for organise_gradebook.py - directory with corrupt/invalid compressed files
 MULTIPLE_DIR_NAME = '__multiple__'  # for organise_gradebook.py - directory with older attempts / submissions when there is more than one. script organises only the most recent.
 CSV_DIR = os.path.join(os.getcwd(), 'csv-inspect')  # for inspect_gradebook.py and inspect_submissions.py - output dir for generated CSV files
 IGNORE_DIRS = [ '__MACOSX', '.git', 'node_modules', 'vendor' ]  # list of dir names to ignore from extracting
 TRACKED_FILE_EXT = [ '.zip', '.rar', '.7z', '.txt', '.pde' ]  # add extension in list to track stats for more
 # inspect
 MIN_FILESIZE_IN_BYTES = 10
Author	SHA1	Message	Date
vangef	cc4d700028	code clean-up + minor improvements	2025-02-21 22:06:35 +00:00
vangef	5cad017a83	1) filter multiple attempts and keep only latest 2) new way to get any submission comment - compatible with BB ultra	2025-02-21 17:49:03 +00:00
vangef	5a2d03db7d	skip "empty" (based on MIN FILE SIZE) files from inspection	2025-02-21 17:47:55 +00:00
vangef	5f91e08b00	added dir name to move multiple submissions / attempt (except for the latest). and minimum size for files to inspect (and skip empty files)	2025-02-21 17:47:07 +00:00
vangef	f25688dc9f	use relative path (instead of full path) for csv HYPERLINKs - allows moving/sharing generated files w/ submission files	2024-11-05 23:20:19 +00:00
vangef	beefb025d6	tracked file extensions moved to settings.py + encoding added when reading comments	2024-11-05 23:13:13 +00:00
vangef	b7f9db0efc	try/except when splitting the BB generated filename	2024-10-24 22:58:33 +01:00
vangef	3d86409f75	fix encoding error when assignment name has an emoji (or unicode, in general)	2024-10-24 22:57:20 +01:00
vangef	6a2144517b	Restructure documentation - separate Inspect by hash	2024-10-24 22:55:38 +01:00
vangef	9ca32f1e48	added maxsplit limit = 1 when splitting the submitted files (fix for breaking when the student's file name inlcuded 'attempt')	2024-10-04 15:03:44 +01:00
vangef	d3767b54a5	docs: layout / structure changes	2024-04-26 21:00:50 +01:00
vangef	de7dc817aa	update gitgnore re: requirements	2024-04-26 12:39:20 +01:00
vangef	ebc7a2599d	update docs for added default ignored dir '.git'	2024-04-26 12:38:45 +01:00
vangef	71092daee0	added '.git' to IGNORE_DIRS	2024-04-26 12:37:49 +01:00
vangef	c5ad6ed5f0	updated changelog with default settings customisation	2024-03-02 19:50:12 +00:00
vangef	51024deac4	move sections around for improved reading 'flow'	2024-03-02 19:49:07 +00:00
vangef	d04dac9b97	fixed list indentation for docs	2024-03-02 02:39:02 +00:00
vangef	c92a77ae5e	fix number of _: should be 3	2024-03-02 02:28:41 +00:00
vangef	dd350e5190	updated docs: features (ignored dirs) + instructions (requirements, edit defaults)	2024-03-02 02:15:30 +00:00
vangef	0385e13da7	IGNORE_DIRS (as in previous commit) + get student name as well when extracting comment + terminal output increase spacing for [Info]	2024-03-01 15:47:33 +00:00
vangef	7577148f83	added IGNORE_DIRS in settings.py and allow for multiple dir names to be ignored from extracting	2024-03-01 15:45:34 +00:00
vangef	8a4dee8e73	rephrasing msg to be more precise / clear	2024-02-26 21:09:01 +00:00
vangef	08ffefa798	add/remove blank lines for PEP + add BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR in settings.py to allow easy changing of dir names	2024-02-26 20:48:45 +00:00
vangef	bf7aaa12f2	terminal output improvements + flush=True for print()	2024-02-23 21:23:23 +00:00
vangef	08281194c2	#vangef section in gitignore	2024-02-23 21:21:42 +00:00
vangef	81fe02e9df	added requirements.txt	2024-02-23 21:20:43 +00:00
vangef	2381b26cca	re-format terminal output information	2024-02-23 18:44:11 +00:00
vangef	2217988f96	(again) fix OSError: [WinError 6] The handle is invalid' - add try/except in rar file extractall() - possible issue with rar5 files	2024-02-23 18:06:04 +00:00
vangef	0841a1a478	fix OSError: [WinError 6] The handle is invalid' - add try/except in rar file extractall() - possible issue with rar5 files	2024-02-23 18:03:13 +00:00
vangef	196e215133	extra check for comment existence	2023-12-08 14:02:44 +00:00
vangef	f011cdcda0	docs edits	2023-12-08 14:01:48 +00:00
vangef	b6c52ac26f	fixed list indentation for docs	2023-07-23 22:23:21 +01:00
vangef	d2d74754f4	docs updated	2023-07-17 11:59:01 +01:00
vangef	ea81d185eb	remove favicon	2023-07-17 03:17:43 +01:00
vangef	1f02083119	update README-inspect for hyperlinks in CSV for file names/paths	2023-03-16 21:04:22 +00:00
vangef	e5b06c9baf	HYPERLINKS for file paths and names in generated csv	2023-03-16 20:50:46 +00:00
vangef	9520e39590	check if gradebook/submissions dir exists and has files to inspect	2023-03-16 20:40:20 +00:00
vangef	80aaf3ee5d	deleted __init__.py from root dir - not needed	2023-03-14 18:15:55 +00:00
vangef	e61940a7dc	edit in Note for inspect gradebook section	2023-03-10 22:17:39 +00:00
vangef	0b49923aa8	updates in README-inspect	2023-03-10 17:40:37 +00:00
vangef	5efa7e72e7	minor change in README-inspect	2023-03-10 17:05:34 +00:00
vangef	8568e96a09	separate README for 'inspect by hash' & changes for 'inspect gradebook'	2023-03-10 16:52:34 +00:00
vangef	91d05e9e88	fix for gradebook_dir_name value	2023-03-10 12:57:37 +00:00
vangef	9ab782dcc9	added inspect_gradebook & code restructure for 'inspect by hash' feature	2023-03-10 12:50:14 +00:00
vangef	aff2644676	constants to settings.py	2023-03-10 12:48:29 +00:00
vangef	e1ef82fa18	moved constants to settings.py	2023-03-10 12:47:15 +00:00
vangef	e4c15291db	changed dir for output csv files	2023-03-10 12:43:28 +00:00
vangef	bd34e0f5cc	README edit	2023-03-07 20:15:03 +00:00
vangef	4bdace6683	README edits & remove requirements.txt	2023-03-07 20:02:56 +00:00
vangef	57573f8d03	more READMEs re-formatting & edits	2023-03-05 17:26:37 +00:00
vangef	ecce8de13c	don't remove zip file if unknown Exception while extracting	2023-03-05 14:56:10 +00:00
vangef	c0b674311c	major re-formatting and other edits in README	2023-03-05 13:37:07 +00:00
vangef	e7f451d4f6	check empty file hash - do not include empty files in hash list	2023-03-03 13:48:56 +00:00
vangef	3e677d9ddd	added typing & more code cleanup	2023-03-03 13:13:28 +00:00
vangef	2b6fe45b42	READMEs update	2023-03-03 13:11:17 +00:00
vangef	964a1f6abb	added TODO to .gitignore	2023-03-03 13:09:49 +00:00
vangef	7ae6929ffe	markdown violations fix	2023-03-03 00:07:26 +00:00
vangef	cd66bf345d	backtick didn't work	2023-03-02 23:55:19 +00:00
vangef	2a7298943e	2 text style changes	2023-03-02 23:53:31 +00:00
vangef	39606569f9	more on usage about 'exclude files from hashing'	2023-03-02 23:48:55 +00:00
vangef	42bbfe2ba1	added info+usage about 'exclude files from hashing'	2023-03-02 23:34:54 +00:00