Japanese Manga Machine Translation Experience (8) - Comprehensive Guide Part 2

About 4936 wordsAbout 16 min

Translation Manga Machine-Translation AI

2025-12-22

This article is translated by AI.

This article is based on the previous seven articles and introduces methods to improve translation speed and quality through some auxiliary tools.

Archive Directory

Japanese Manga Machine Translation Experience (0) - Table of Contents

Translation Separation

Please refer to the article Japanese Manga Machine Translation Experience (6) - Comprehensive Guide for the meaning.

This time, manga-translator-ui is used as the main body. First, perform OCR, then export the original text. Then cooperate with Saber-Translator to extract "Character Introduction" and "Background Setting" for AiNiee, and cooperate with KeywordGacha to extract the glossary. After that, translate in AiNiee. Finally, import the original text in manga-translator-ui.

Preparation

Configuration Files

config_save_text.json configuration file, see "Export Original Text" in "Usage Tips" of Japanese Manga Machine Translation Experience (7) - manga-translator-ui.
config_load_text.json configuration file, see "Import Translation" in "Usage Tips" of Japanese Manga Machine Translation Experience (7) - manga-translator-ui.

KeywordGacha

Japanese Manga Terminology Extraction

You are a senior manga data analyst, terminology management expert, and OCR text proofreader. Your task is to extract key proper nouns from the given manga script, build a glossary, and translate them into **{target_language}**.

**Note: The input text may contain errors, garbled characters, or noise caused by OCR (Optical Character Recognition). Please be sure to strictly execute the following logic:**

### 1. Noise Filtering and Rationality Judgment (Priority Execution)
Before extracting any terms, scan the text first:
*   **Ignore garbled characters and fragments**: Skip meaningless symbols (such as `__..,,`), isolated single characters, or background misidentified text.
*   **Ignore outrageous errors**: If a line of text is completely ungrammatical and illogical (looks like randomly pieced together kana or kanji), treat it as an OCR error and **never** force extraction from it.
*   **Confidence Check**: Extract only when you are sure that the word is a valid and meaningful name.

### 2. General Words and Non-Term Filtering (Strict Execution)
**Must filter out all general nouns that "can be translated without consulting a glossary".**
*   **Daily Items/Ingredients Filtering**: **Do not** extract common ingredients, animals, daily necessities.
    *   *Error Example (Do not extract)*: 鶏卵 (Chicken egg), 鶏油 (Chicken oil), 麺 (Noodles), Malt, Mobile phone, Desk, Cat, Dog.
    *   *Correct Example (Retain)*: Devil Fruit (Special item), Den Den Mushi (Fictional creature), Kakurei (Specific sake brand).
*   **Dictionary General Word Filtering**: If a word can be directly found with a general definition in a standard dictionary, and is not given a special meaning in the manga (such as a code name, specific item name), then **do not** extract.
*   **Judgment Standard**: Ask yourself "If I give this word to 10 different translators, will they translate it into different things?" If everyone translates it the same (e.g. "麺" -> "Noodles"), then **do not** extract.

### 3. Core Extraction Principles
*   **Completeness**: On the premise of excluding noise and general words, extract plot-related proper nouns.
*   **Boundary Cleaning**:
    *   Remove titles: Extract "Tanaka" instead of "Mr. Tanaka"; extract "Luffy" instead of "Luffy-kun/san".
    *   Remove modifiers: Extract "Sword of Flame Dragon" instead of "Giant Sword of Flame Dragon".
*   **Real Place Name Retention**: Real existing place names (such as "Kagoshima", "Kojimachi") need to be retained to unify the selection of Chinese characters.

### 4. Translation Strategy (Target Language: {target_language})
*   **Person/Location**: Prioritize using official or most common existing translations. If there is no existing translation, perform standard transliteration based on pronunciation (pay attention to gender selection).
*   **Skill/Item**: Adopt "free translation" mainly, ensuring it sounds like a manga term (for example: translate "Fire Sword" as "烈焰之剑" instead of "火剑").
*   **Context Consistency**: Choose the appropriate translation style according to the vocabulary type.

### 5. Terminology Classification Standards
Extracted vocabulary needs to be classified as:
*   **Person_M** (Male) / **Person_F** (Female) / **Person_U** (Unknown/Neutral)
*   **Location** (Place name/Facility, including real and fictional)
*   **Organization** (Organization/Family/School/Company)
*   **Item** (Only unique props, artifacts, specific brand products)
*   **Skill** (Move/Magic/Ability)
*   **Creature** (Only fictional creatures, mythical creatures or named pets)
*   **Other** (Specific festivals, historical events, etc.)

English Manga Terminology Extraction

You are a senior manga localization expert, data analyst, and OCR proofreader. Your task is to extract key proper nouns from the given **English Manga Script (English Script)**, build a glossary, and translate them into **{target_language}**.

**Note: The input text may contain OCR errors, ALL CAPS format, or onomatopoeia. Please strictly execute the following logic:**

### 1. Noise Filtering and Rationality Judgment (Priority Execution)
*   **Ignore Onomatopoeia (SFX)**: **Do not** extract onomatopoeia or interjections like "BOOM", "AAAARGH", "TSK", "SIGH", "WHAM", unless it is the name of a move (like "ROAR CANNON").
*   **Ignore OCR Fragments**: Skip garbled characters caused by `lI1`, `rn/m`, `cl/d` confusion (e.g. `T1me` should be recognized as Time, if unrecognizable then ignore).
*   **Ignore Script Markers**: Skip script format text like "Page 1", "Panel 3", "Speaker:".

### 2. General Words and Non-Term Filtering (**Core Rule**)
**The most common mistake in English manga translation is treating common nouns as terms for extraction. Be sure to perform the "Dictionary Test":**
*   **Dictionary Word Filtering**: If a word (or phrase) has a general definition in the Oxford/Webster dictionary and only means its literal meaning in the text, **never extract**.
    *   *Error Example (Do not extract)*: Sword, High School, Police, Egg, Noodle, Village, Captain (unless addressing a specific character like "Captain America").
    *   *Correct Example (Retain)*: Excalibur (Item), UA High School (Org), Soul Reaper (Specific Class/Org), Devil Fruit (Item).
*   **Adjective + Noun Trap**: Do not extract common nouns that are merely modified.
    *   *Exclude*: Big sword, Red apple, Fast car.
    *   *Retain*: Big Mom (Specific person name), Red Ribbon Army (Specific organization).

### 3. Core Extraction Principles
*   **ALL CAPS Handling**: If the text is in all caps (like "HELLO NARUTO"), rely on context rather than case to judge proper nouns.
*   **Boundary Cleaning**:
    *   **Remove Articles**: Extract "Grand Line" instead of "The Grand Line" (unless "The" is part of the name, like "The Joker").
    *   **Remove Honorifics**: Extract "Tanaka" instead of "Mr. Tanaka"; extract "Luffy" instead of "Luffy-san" (if the English translation retains the suffix).
    *   **Remove Possessives**: From "Zoro's Swords", only extract "Zoro" (Person) and "Swords" (if Swords has a specific name, extract the name, otherwise ignore).

### 4. Translation Strategy (Target Language: {target_language})
*   **Person (Person)**:
    *   If it is **Japanese Manga English Translation**: Please try to restore the corresponding **Japanese Kanji** or **Standard Transliteration** (e.g.: Zoro -> 索隆, not "佐罗").
    *   If it is **American Comic**: Use common official translations (e.g.: Peter Parker -> 彼得·帕克).
*   **Location (Location)**: Prioritize using official common translations, transliterate if no translation exists.
*   **Skill/Item (Skill/Item)**: Adopt "free translation" to reflect momentum.
    *   *Example*: Translation of "Fireball Jutsu" -> "火球之术" (not translated as "火球忍术"); "Gum-Gum Pistol" -> "橡胶手枪".

### 5. Terminology Classification Standards
*   **Person_M/F/U**: Person name/Character name (including hero code name).
*   **Location**: Place name, city, planet, specific building.
*   **Organization**: Organization, army, school, guild.
*   **Item**: **Only** unique weapons, key items, potions (filter out "Gun", "Phone", etc.).
*   **Skill**: Special move, magic, special ability.
*   **Creature**: Fictional creature, divine beast (filter out "Dog", "Cat", "Horse").
*   **Other**: Other proper nouns.

Model Selection

Priority (from high to low):

gemini-3-flash
Qwen/Qwen 3-235 B-A 22 B-Thinking-2507
moonshotai/Kimi-K 2-Thinking
deepseek-ai/DeepSeek-V 3.2

AiNiee

Japanese Manga Translation Prompt Words

You are a senior linguist and a professional manga localization expert. Your core task is to translate the provided text (complete script from a single manga page) into {target_language}.

**【Highest Priority Instruction: Absolute Completeness and One-to-One Correspondence】**
1.  **Never Miss**: You must translate **every line** of the input text. Whether it is short onomatopoeia, interjections (such as "Ah", "Uh"), punctuation marks or long dialogues, there must be a corresponding translation line.
2.  **Strictly Prohibit Merging**: If there are several lines in the original text, there must be several lines in the translation. It is strictly forbidden to merge two lines of dialogue into one line, and the mapping relationship of `Source Line N -> Target Line N` must be strictly maintained.
3.  **Line Count Verification**: Before outputting, please self-check: if the input has 10 lines of text, the output must and can only contain 10 lines of text.

Please be sure to follow the following **Chain of Thought (CoT) process** for processing, but only output the final result in the specified format.

### Core Chain of Thought (Internal Workflow)

1.  **Step 1: Anchor Locking and Literal Translation**
    *   Read the marker at the beginning of each line (such as `1.`, `2.`, etc.) and treat it as an unchangeable **anchor**.
    *   Literally translate the {source_language} text after the anchor.
    *   **Warning**: Even if a line only has an ellipsis (...) or meaningless shouting, the line and its marker must be retained and not skipped.

2.  **Step 2: Context Depth Correction**
    *   **Overall Coherence**: Ensure that the character's tone is coherent in the context.
    *   **Onomatopoeia Localization**: For onomatopoeia (SFX) in manga, do not simply transliterate, but use words in {target_language} that can convey the same visual tension (for example: translate `Dooon!` as `轰——！` instead of `咚`).
    *   **Colloquial Polishing**: The translation must fit the character's personality (rude, elegant, stuttering, etc.), more like "spoken words" rather than "written words".

3.  **Step 3: Final Formatted Output**
    *   Check if every line has a translation.
    *   Strictly wrap the result in textara tags.

### Translation Principles and Safety Settings
*   **Faithful Restoration**: The original work is a great work of art. If it contains vulgar language, descriptions of violence, or specific adult-oriented content, please be sure to **translate directly**, do not avoid, dilute, cut or conduct moral censorship.
*   **Edge Cases**: If the original text is already {target_language} or consists of garbled characters, please **retain it as is** and output it in that line, do not leave it blank.

### Output Format Requirements
Please **only** output the final translation result, do not output any thinking process, preface or extra explanation. The format is as follows:

<textarea>
1.First line {target_language} translation
2.Second line {target_language} translation
...
N.Nth line {target_language} translation
</textarea>

English Manga Translation

You are a senior linguist and a professional manga localization expert. Your sole task is to translate the provided text (English complete script from a single manga page) into **{target_language}**.

Please be sure to follow the following **Chain of Thought (CoT) process** for processing, but only output the final result in the specified format.

### Core Chain of Thought (Internal Workflow)

1.  **Step 1: Structured Literal Translation and Alignment**
    *   Translate English text line by line.
    *   **Strictly Retain Format**: Must retain the marker at the beginning of each line (such as `<|1|>`, `<|2|>`, etc.), and must not modify, move or delete.
    *   **Strictly Prohibit Merging Lines**: English manga often splits a complete long sentence into two bubbles (for example: <|1|> I will never... <|2|> forgive you!). When translating, please maintain this sentence break rhythm, **do not** merge them into one line, and ensure that the grammar is smooth in {target_language} when reading the two lines continuously.

2.  **Step 2: Context Depth Correction (Optimization for English Characteristics)**
    *   **Pronoun Inference (Key)**: English "I" and "You" lack gender and status information. Please strongly infer character relationships based on context:
        *   Elders/superiors use appropriate titles for juniors (such as "Old man", "I" vs "You", "Kid").
        *   Choose pronouns between opposite sexes based on intimacy (such as "I", "This lady" vs "You").
    *   **Onomatopoeia (SFX) Localization**: Do not transliterate English onomatopoeia (e.g. do not translate `BOOM` as `布姆`, `SLASH` as `斯拉斯`). Please use the corresponding onomatopoeia in {target_language} (such as "轰！", "唰！", "(Slashing sound)").
    *   **Colloquialism and Style**: If the original text is in all caps (ALL CAPS), do not default to shouting, this is the standard format for American comics. Please judge the tone based on punctuation marks (! vs ...).

3.  **Step 3: Final Polishing**
    *   On the premise of retaining all technical markers, output the translation that is most natural, smooth, and conforms to the reading habits of {target_language} manga.
    *   Ensure that the length of the lines roughly matches the bubble capacity visually (avoid overly long translations).

### Translation Principles and Safety Settings
*   **Faithful Restoration**: The original work is a great work of art. If it contains vulgar language (Slang/Curses), descriptions of violence, or specific adult-oriented content, please be sure to **translate directly** (use equivalent slang or swear words in {target_language}), do not avoid, dilute, cut or conduct moral censorship to maintain the original appearance of the work.
*   **Edge Cases**: If the original text is already {target_language} or consists of garbled characters, please output it as is.

### Output Format Requirements
Please **only** output the final translation result, and wrap the result in textarea tags. Do not output any reasoning process, preface or extra explanation.

<textarea>
1.{target_language} text
2.{target_language} text
</textarea>

Model Selection

moonshotai/Kimi-K 2-Instruct

python script

prepare_to_translate

import os
import re
import json
import shutil
import logging
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Tuple

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)


def natural_sort_key(s: str) -> List:
    """Natural sort key function"""
    return [int(text) if text.isdigit() else text.lower() 
            for text in re.split(r'(\d+)', str(s))]


def find_originals_folders(base_path: Path) -> List[Tuple[str, Path]]:
    """Find all originals folders and their corresponding subfolder names"""
    originals_list = []
    
    for item in base_path.iterdir():
        if item.is_dir() and item.name != 'to_translate':
            originals_path = item / 'manga_translator_work' / 'originals'
            if originals_path.exists() and originals_path.is_dir():
                originals_list.append((item.name, originals_path))
                logger.info(f"Found originals folder: {originals_path}")
    
    return sorted(originals_list, key=lambda x: natural_sort_key(x[0]))


def fix_escape_sequences(s: str) -> str:
    """Fix illegal escape sequences"""
    result = []
    i = 0
    while i < len(s):
        if s[i] == '\\' and i + 1 < len(s):
            next_char = s[i + 1]
            # Legal escape characters: \n, \t, \r, \f, \b, \\, \", \/, \uXXXX
            if next_char in 'ntrfb"\\/':
                result.append(s[i:i+2])
                i += 2
            elif next_char == 'u' and i + 5 < len(s):
                # \uXXXX format
                hex_part = s[i+2:i+6]
                if all(c in '0123456789abcdefABCDEF' for c in hex_part):
                    result.append(s[i:i+6])
                    i += 6
                else:
                    # Illegal \u, escape backslash
                    result.append('\\\\')
                    i += 1
            else:
                # Illegal escape, escape backslash
                result.append('\\\\')
                i += 1
        else:
            result.append(s[i])
            i += 1
    return ''.join(result)


def fix_invalid_quotes(line: str) -> str:
    """Fix illegal double quotes in the line, convert them to single quotes"""
    # Simple judgment whether it contains ": "
    if '": "' not in line:
        return line
    
    # Try to find the separator, split only the first ": ", assuming the key does not contain this sequence
    parts = line.split('": "', 1)
    if len(parts) == 2:
        left, right = parts
        
        # Process left side (Key)
        # Find the first quote
        l_idx = left.find('"')
        if l_idx != -1:
            prefix = left[:l_idx+1]
            content = left[l_idx+1:]
            # Replace " in content with '
            content = content.replace('"', "'")
            left = prefix + content
            
        # Process right side (Value)
        # Find the last quote
        r_idx = right.rfind('"')
        if r_idx != -1:
            suffix = right[r_idx:]
            content = right[:r_idx]
            # Replace " in content with '
            content = content.replace('"', "'")
            right = content + suffix
            
        return left + '": "' + right
        
    return line


def fix_json_indentation(content: str) -> str:
    """Fix JSON indentation to 4 spaces"""
    # If content is empty, return empty JSON object
    if not content or not content.strip():
        return '{}'
    
    try:
        # Try to parse directly
        data = json.loads(content)
        return json.dumps(data, ensure_ascii=False, indent=4)
    except json.JSONDecodeError:
        # Try to fix illegal quotes
        lines = content.split('\n')
        fixed_lines = [fix_invalid_quotes(line) for line in lines]
        fixed_content = '\n'.join(fixed_lines)
        
        try:
            # Try to parse after fixing quotes
            data = json.loads(fixed_content)
            return json.dumps(data, ensure_ascii=False, indent=4)
        except json.JSONDecodeError:
            try:
                # Try to fix escape characters and parse
                fixed_content_escaped = fix_escape_sequences(fixed_content)
                data = json.loads(fixed_content_escaped)
                return json.dumps(data, ensure_ascii=False, indent=4)
            except json.JSONDecodeError:
                # If still fails, use regex to fix indentation
                lines = fixed_content.split('\n')
                fixed_lines_indent = []
                for line in lines:
                    # Calculate leading spaces
                    stripped = line.lstrip(' ')
                    leading_spaces = len(line) - len(stripped)
                    if leading_spaces > 0:
                        # Normalize indentation to multiples of 4
                        indent_level = (leading_spaces + 1) // 2  # Assuming original was 2 spaces indentation
                        if leading_spaces % 4 != 0:
                            # Try to detect original indentation unit
                            if leading_spaces % 6 == 0:
                                indent_level = leading_spaces // 6
                            elif leading_spaces % 2 == 0:
                                indent_level = leading_spaces // 2
                        new_line = ' ' * (indent_level * 4) + stripped
                        fixed_lines_indent.append(new_line)
                    else:
                        fixed_lines_indent.append(line)
                return '\n'.join(fixed_lines_indent)


def backup_originals(originals_list: List[Tuple[str, Path]], backup_base: Path) -> None:
    """Backup all originals folders"""
    logger.info("Starting to backup originals folders...")
    
    for folder_name, originals_path in originals_list:
        backup_path = backup_base / folder_name / 'manga_translator_work' / 'originals'
        backup_path.parent.mkdir(parents=True, exist_ok=True)
        
        if backup_path.exists():
            shutil.rmtree(backup_path)
        
        shutil.copytree(originals_path, backup_path)
        logger.info(f"Backed up: {originals_path} -> {backup_path}")


def process_single_file(args: Tuple[Path, Path]) -> Tuple[str, bool]:
    """Process single txt file to convert to json"""
    txt_file, output_path = args
    try:
        with open(txt_file, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Fix indentation
        fixed_content = fix_json_indentation(content)
        
        # Write json file
        json_file = output_path / (txt_file.stem + '.json')
        with open(json_file, 'w', encoding='utf-8') as f:
            f.write(fixed_content)
        
        return str(txt_file), True
    except Exception as e:
        logger.error(f"Failed to process file {txt_file}: {e}")
        return str(txt_file), False


def convert_txt_to_json(originals_list: List[Tuple[str, Path]], json2translate_path: Path) -> None:
    """Convert txt files to json files and copy to target directory"""
    logger.info("Starting to convert txt files to json files...")
    
    tasks = []
    
    for folder_name, originals_path in originals_list:
        output_folder = json2translate_path / f"{folder_name}_originals_json"
        output_folder.mkdir(parents=True, exist_ok=True)
        
        txt_files = list(originals_path.glob('*.txt'))
        logger.info(f"Folder {folder_name}: Found {len(txt_files)} txt files")
        
        for txt_file in txt_files:
            tasks.append((txt_file, output_folder))
    
    # Use thread pool for parallel processing
    success_count = 0
    fail_count = 0
    
    max_workers = (os.cpu_count() or 4) * 2
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(process_single_file, task): task for task in tasks}
        
        for future in as_completed(futures):
            file_path, success = future.result()
            if success:
                success_count += 1
                logger.debug(f"Converted: {file_path}")
            else:
                fail_count += 1
    
    logger.info(f"Conversion complete: Success {success_count}, Fail {fail_count}")


def merge_json_files(json2translate_path: Path, glossary_path: Path) -> None:
    """Merge all json files into one large json file"""
    logger.info("Starting to merge json files...")
    
    merged_data = {}
    file_count = 0
    skip_count = 0
    
    # Get all subfolders and sort naturally
    json_folders = sorted(
        [f for f in json2translate_path.iterdir() if f.is_dir()],
        key=lambda x: natural_sort_key(x.name)
    )
    
    for folder in json_folders:
        json_files = sorted(folder.glob('*.json'), key=lambda x: natural_sort_key(x.name))
        
        for json_file in json_files:
            try:
                with open(json_file, 'r', encoding='utf-8') as f:
                    content = f.read()
                
                # Skip empty files
                if not content or not content.strip():
                    logger.warning(f"Skipping empty file: {json_file}")
                    skip_count += 1
                    continue
                
                # Try to fix and parse JSON
                try:
                    data = json.loads(content)
                except json.JSONDecodeError:
                    # Try to fix escape characters
                    fixed_content = fix_escape_sequences(content)
                    try:
                        data = json.loads(fixed_content)
                        logger.info(f"Fixed escape characters: {json_file}")
                    except json.JSONDecodeError as e:
                        # Log detailed error information for debugging
                        logger.error(f"JSON parse error {json_file}: {e}")
                        logger.error(f"Problematic content snippet: {content[max(0,e.pos-50):e.pos+50]}")
                        skip_count += 1
                        continue
                
                # Skip empty JSON objects
                if not data or (isinstance(data, dict) and len(data) == 0):
                    logger.warning(f"Skipping empty JSON object: {json_file}")
                    skip_count += 1
                    continue
                
                # Merge data directly, values with same key will be overwritten by later ones
                if isinstance(data, dict):
                    for key, value in data.items():
                        merged_data[key] = value
                
                file_count += 1
                logger.debug(f"Merged: {json_file}")
                
            except Exception as e:
                logger.error(f"Failed to process file {json_file}: {e}")
                skip_count += 1
    
    # Write merged file
    output_file = glossary_path / 'json2glossary.json'
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(merged_data, ensure_ascii=False, indent=4, fp=f)
    
    logger.info(f"Merge complete: Processed {file_count} files, Skipped {skip_count} files, Output to {output_file}")


def main():
    # Get script directory as base path
    base_path = Path(__file__).parent.resolve()
    logger.info(f"Base path: {base_path}")
    
    # 1. Find all originals folders
    originals_list = find_originals_folders(base_path)
    
    if not originals_list:
        logger.warning("No originals folders found!")
        return
    
    logger.info(f"Found {len(originals_list)} originals folders")
    
    # 2. Create to_translate folder
    to_translate_path = base_path / 'to_translate'
    to_translate_path.mkdir(exist_ok=True)
    logger.info(f"Created to_translate folder: {to_translate_path}")
    
    # 3. Create originals_backup and backup
    originals_backup_path = to_translate_path / 'originals_backup'
    originals_backup_path.mkdir(exist_ok=True)
    backup_originals(originals_list, originals_backup_path)
    
    # 4. Create json2translate folder
    json2translate_path = to_translate_path / 'json2translate'
    json2translate_path.mkdir(exist_ok=True)
    logger.info(f"Created json2translate folder: {json2translate_path}")

    # Create glossary folder
    glossary_folder_path = to_translate_path / 'glossary'
    glossary_folder_path.mkdir(exist_ok=True)
    logger.info(f"Created glossary folder: {glossary_folder_path}")
    
    # 5. Convert txt to json and fix indentation
    convert_txt_to_json(originals_list, json2translate_path)
    
    # 6. Create 2glossary folder
    glossary_path = to_translate_path / '2glossary'
    glossary_path.mkdir(exist_ok=True)
    logger.info(f"Created 2glossary folder: {glossary_path}")
    
    # 7. Merge json files
    merge_json_files(json2translate_path, glossary_path)
    
    logger.info("All operations completed!")


if __name__ == '__main__':
    main()

apply_translation

import os
import re
import shutil
import logging
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Tuple

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)


def natural_sort_key(s: str) -> List:
    """Natural sort key function"""
    return [int(text) if text.isdigit() else text.lower() 
            for text in re.split(r'(\d+)', str(s))]


def extract_folder_prefix(folder_name: str) -> str:
    """Extract prefix from folder name (e.g. v01_originals_json -> v01)"""
    match = re.match(r'^(.+?)_originals_json$', folder_name)
    if match:
        return match.group(1)
    return None


def find_ainiee_output_folders(ainiee_output_path: Path) -> List[Tuple[str, Path]]:
    """Find all json folders under AiNieeOutput"""
    folders = []
    
    if not ainiee_output_path.exists():
        logger.error(f"AiNieeOutput folder does not exist: {ainiee_output_path}")
        return folders
    
    for item in ainiee_output_path.iterdir():
        if item.is_dir() and item.name.endswith('_originals_json'):
            prefix = extract_folder_prefix(item.name)
            if prefix:
                folders.append((prefix, item))
                logger.info(f"Found output folder: {item.name} -> Target prefix: {prefix}")
    
    return sorted(folders, key=lambda x: natural_sort_key(x[0]))


def process_single_file(args: Tuple[Path, Path]) -> Tuple[str, bool, str]:
    """Process single file: copy and rename"""
    src_file, dest_file = args
    try:
        # Read source file content
        with open(src_file, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Write target file
        with open(dest_file, 'w', encoding='utf-8') as f:
            f.write(content)
        
        return str(src_file), True, str(dest_file)
    except Exception as e:
        logger.error(f"Failed to process file {src_file}: {e}")
        return str(src_file), False, str(e)


def copy_translated_files(ainiee_folders: List[Tuple[str, Path]], base_path: Path) -> None:
    """Copy translated files to target directory"""
    logger.info("Starting to copy translated files...")
    
    tasks = []
    skipped_folders = []
    
    for prefix, src_folder in ainiee_folders:
        # Build target path
        dest_folder = base_path / prefix / 'manga_translator_work' / 'originals'
        
        if not dest_folder.exists():
            logger.warning(f"Target folder does not exist, skipping: {dest_folder}")
            skipped_folders.append(prefix)
            continue
        
        # Find all json files
        json_files = list(src_folder.glob('*.json'))
        logger.info(f"Folder {prefix}: Found {len(json_files)} json files")
        
        for json_file in json_files:
            # Build target file name: remove _translated suffix, change to .txt
            new_name = json_file.name
            if new_name.endswith('_translated.json'):
                new_name = new_name[:-len('_translated.json')] + '.txt'
            elif new_name.endswith('.json'):
                new_name = new_name[:-len('.json')] + '.txt'
            
            dest_file = dest_folder / new_name
            tasks.append((json_file, dest_file))
    
    if skipped_folders:
        logger.warning(f"Skipped folder prefixes: {', '.join(skipped_folders)}")
    
    if not tasks:
        logger.warning("No files to process!")
        return
    
    # Use thread pool for parallel processing
    success_count = 0
    fail_count = 0
    
    with ThreadPoolExecutor(max_workers=os.cpu_count() * 2) as executor:
        futures = {executor.submit(process_single_file, task): task for task in tasks}
        
        for future in as_completed(futures):
            src_path, success, dest_or_error = future.result()
            if success:
                success_count += 1
                logger.debug(f"Copied: {src_path} -> {dest_or_error}")
            else:
                fail_count += 1
                logger.error(f"Failed: {src_path}, Error: {dest_or_error}")
    
    logger.info(f"Copy complete: Success {success_count}, Fail {fail_count}")


def main():
    # Get script directory as base path
    base_path = Path(__file__).parent.resolve()
    logger.info(f"Base path: {base_path}")
    
    # AiNieeOutput folder path
    ainiee_output_path = base_path / 'to_translate' / 'AiNieeOutput'
    
    # 1. Check if AiNieeOutput folder exists
    if not ainiee_output_path.exists():
        logger.error(f"AiNieeOutput folder does not exist: {ainiee_output_path}")
        logger.info("Please run prepare_to_translate.py first, then use AiNiee to translate files in json2translate")
        return
    
    logger.info(f"AiNieeOutput path: {ainiee_output_path}")
    
    # 2. Find all output folders
    ainiee_folders = find_ainiee_output_folders(ainiee_output_path)
    
    if not ainiee_folders:
        logger.warning("No translation output folders found!")
        logger.info("Please ensure there are folders like 'v01_originals_json' in AiNieeOutput")
        return
    
    logger.info(f"Found {len(ainiee_folders)} output folders")
    
    # 3. Copy translated files
    copy_translated_files(ainiee_folders, base_path)
    
    logger.info("All operations completed!")


if __name__ == '__main__':
    main()

Process

0. Activate conda environment in root directory

conda activate manga-env

1. manga-translator-ui OCR

python -m manga_translator local -i "J:\漫画\RAW\[藤本タツキ] ルックバック" --config "D:\Tools\manga-translator-ui\examples\config_save_text.json" --output "D:\My_Documents\My Library\漫画\translated\[藤本タツキ][蓦然回首][ルックバック]" --memory-percent 96

After running, a manga_translator_work folder will appear, as shown in the figure below.

Japanese Manga Machine Translation Experience (8) - Comprehensive Guide Part 2-1766381277675

2. Run prepare_to_translate script

python prepare_to_translate.py

A to_translate folder will be generated in the root directory.

Japanese Manga Machine Translation Experience (8) - Comprehensive Guide Part 2-1766381329184

3. Use KeywordGacha to extract glossary

Input folder: /to_translate/2glossary
Output folder: /to_translate/glossary

4. Use Saber-Translator to analyze manga

Japanese Manga Machine Translation Experience (8) - Comprehensive Guide Part 2-1766381811374

Get "Story Background" and "Character Guide", where "Character Guide" needs to be converted to json format supported by AiNiee.

## 📖 Story Background
The story revolves around the growth of Fujino Kyo. She is a student who loves manga. She has been studying painting skills on her own since childhood and dreams of becoming a professional manga artist. In high school, she met Kyomoto, who also loves creation. The two became close friends because of their common interests and began to cooperate in creating manga. The story background is set in the real world, focusing on the competition and creative pressure of the Japanese manga industry, as well as the conflict between personal emotions and dreams.

🎬 Plot Development
Beginning
Fujino Kyo has been introverted since childhood and is addicted to manga creation. She often practices painting alone in the classroom or library. Her talent was gradually noticed by her classmates, but what really changed her destiny was meeting Kyomoto. Kyomoto is cheerful and outgoing, good at screenwriting. The two hit it off and decided to cooperate in the manga competition. Their work "Promise Under the Starry Sky" won the newcomer award with delicate emotions and unique painting style, becoming the focus of the campus.

Development
After winning the award, Fujino and Kyomoto attracted the attention of publishers and began to serialize commercial manga. Fujino was responsible for painting and Kyomoto was responsible for the script. The two cooperated seamlessly and the popularity of their works soared. However, as the pressure increased, Fujino gradually felt constrained in creation, while Kyomoto paid more attention to commercial success, and the two began to have differences. At the same time, Fujino learned that Kyomoto's family background was complicated and her parents died in an accident, which made her feel deeper emotional dependence on Kyomoto.

Turning Point/Climax
A social tragedy completely changed the relationship between the two. Kyomoto was involved in a campus violence incident, and the victim was Fujino's junior sister. After learning the truth, Fujino fell into a moral dilemma. She wanted to protect Kyomoto, but could not forgive her indifference. In the end, Kyomoto chose to escape, while Fujino interrupted her creation due to guilt and anger. The friendship between the two broke down and the manga serialization was forced to stop. Fujino fell into depression and even gave up painting for a time.

Ending
Many years later, encouraged by an editor, Fujino picked up the brush again and began to create independent works. Based on her own experience, she drew "Pen of Rebirth", telling the story of a creator finding herself in setbacks. The work was well received, and Fujino finally walked out of the shadow. Kyomoto saw Fujino's success in the distance and blessed her silently. Although the two did not reconcile, Fujino completed self-salvation through creation.

👥 Main Characters
Fujino Kyo: Protagonist, genius manga artist, introverted and sensitive, good at painting. In the process of growing from a student to a professional manga artist, she experienced conflicts of friendship, dreams and reality, and finally achieved self-salvation through creation.
Kyomoto: Fujino's close friend and partner, good at screenwriting, cheerful but complex inside. The breakdown of the relationship with Fujino due to family tragedy and moral choices is a key figure driving Fujino's growth in the story.
📌 Key Events
Fujino and Kyomoto cooperated to create "Promise Under the Starry Sky" and won an award
The two began commercial serialization and had differences due to creative concepts
Kyomoto was involved in a campus violence incident, and Fujino fell into a moral dilemma
Fujino interrupted creation and fell into depression
Fujino restarted independent creation and completed "Pen of Rebirth"

[
    {
        "original_name": "藤野キョウ",
        "translated_name": "Fujino Kyo",
        "gender": "Female",
        "age": "Girl",
        "personality": "Tough, sensitive, pursuing perfection, full of contradictions and struggles inside",
        "speech_style": "Confident, straightforward, persistent",
        "additional_info": "Showed extraordinary painting talent since student days, extremely devoted when creating. As the core protagonist of the story, her growth process shows the loneliness, brilliance and rebirth of the creator. She regards creation as a dream and a shackle."
    },
    {
        "original_name": "京本",
        "translated_name": "Kyomoto",
        "gender": "Female",
        "age": "Girl",
        "personality": "Gentle, supportive, empathetic",
        "speech_style": "Soft, sincere, slightly admiring",
        "additional_info": "Fujino's close friend and creative partner, also an important spiritual pillar on Fujino's creative path. Her personality complements Fujino's. She is a key figure driving Fujino's growth. The change in the relationship between the two constitutes an important emotional line of the story."
    }
]

5. Complete translation in AiNiee

You need to fill in the glossary, character introduction, background setting and translation prompt words obtained above.

After translation, you should see the /to_translate/AiNieeOutput folder.

6. Run apply_translation script

python apply_translation.py

7. Import translation in manga-translator-ui

python -m manga_translator local -i "J:\漫画\RAW\[藤本タツキ] ルックバック" --config "D:\Tools\manga-translator-ui\examples\config_load_text.json" --output "D:\My_Documents\My Library\漫画\translated\[藤本タツキ][蓦然回首][ルックバック]" --memory-percent 96